[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: LLM Experiments, Part 1: Corrections
From: |
Andrew Hyatt |
Subject: |
Re: LLM Experiments, Part 1: Corrections |
Date: |
Mon, 22 Jan 2024 16:21:26 -0400 |
User-agent: |
Gnus/5.13 (Gnus v5.13) |
On 22 January 2024 21:57, Psionic K <psionik@positron.solutions>
wrote:
> I think things have to be synchronous here.
Snapshot isolation is the best strategy for merging here. We
don't
know what user commands affected the region in question, so
using undo states to merge might need to undo really arbitrary
user commands. To snapshot isolate, basically you store a
copy of the buffer text and hold two markers where that text
was. You can merge the result if it arrives on time and then
diff the snapshot with the buffer text between the markers.
If things are too different for a valid merge, you can give up
and drop the results. These days various CRDT (conflict-free
replicated data type) treatments have great insights into
dealing with much worse problems of multiple asynchronous
writers, and it's a good place to look. There is a crdt.el
package for some inspiration.
This is a good tip, thank you.
But definitely not synchronous.
I think the changing of the text out from under you is just one
problem to solve, but the other is that we start some llm-powered
workflow, then the user is free to do whatever they want for just
like 10 seconds. It requires both us and the user to do more - we
would need to communicate to the user that something is awaiting
their input. Then the user would need to run a command to go back
to the experience we want to put them in (in this case, an ediff
session). It's a bit weird, and I think it's a bit too
complicated. I'm still leaning to the synchronous side, but it's
worth trying out an async solution and seeing just how bad it is.
As a package author, I would want to treat my LLM like a fancy
process. I create it, I handle results. I have a merging
strategy (this is mainly up to the client, not the library),
but I don't care about the asynchronous details and I don't
want to be tied to each call.
The LLM library does work like this already. It has async methods
that have callbacks. This is about higher-level functionality.
> Question 6
A rock solid library that sticks to the domain model is best
for ecosystem growth. When that doesn't happen, we get four
or five 75% finished packages because every author is having
to figure out and integrate their high level features with so
many backends. If you want to work on high level things,
build a client for your library and experience both sides.
Totally agree, and that's what I've started and will continue to
do with these demos, which inform the development of the llm-flows
layer I'm building.
Every model will have some mostly static configuration,
dynamic arguments that could change all the time but in
practice change just a few times, and then the input data.
The static configuration, if absolutely necessary, can be
updated for one call via dynamic binding. The dynamic
arguments should be abstracted into a "context" object that
the model backend figures out how to translate into a valid
API call. The input data is an arbitrary bundle of whatever
that model type consumes as input. The library user will want
to get a valid context of the dynamic arguments from the
library, enabling them to make changes to it in subsequent
calls, but they don't really want to touch it that much. As
a package author, I would want to focus on integrating outputs
and piping in inputs. I don't want to write a UI for tuning
the model parameters. If the model can ask the user to make
adjustments and just give me a record of their decision I can
use later, that would be fantastic. I should be able to
integrate more closely with backends I know about but
otherwise just call with the provided context and my inputs.
Agreed, such adjustments should be part of a common layer.
Providers offer multiple models. As a library user, it's
inconvenient if I have to go through long incantations to get
each context that represents the capability to make valid
calls for the provider. I want to initialize once and then
use an existing context to pull out the correct context based
on the input or output type I need, and then make refinements
that are specific to a call, such as changing quality or
entropy etc. Input or output type and settings that tune the
call are two different things. Settings are mostly
provider-specific argument data that doesn't affect the
validity of connecting one model to another. Input and output
type affect which pipes can be connected to which other pipes.
This distinction between input or output types and other
arguments become important in composition. I should be able
to connect any string to string model with any other model
that handles strings no matter what the other settings are.
I think we're on the same page here. Anything for quality tuning
should be generic, I hope - perhaps a knob on quality to price
tradeoff that can be used to many things, including understanding
how much context to provide. The rest is already generic in the
llm package.
Integrating these systems will be more like distributed
streaming programming than feeding inputs to a GPU with tight
synchronization and everything under our watch, although a
local model might work that way inside its own box. We should
treat them like unreliable external services. A call to the
model is a command. When I send a command, I should store how
to handle the reply, but I shouldn't couple myself to it with
nested callbacks or async, which we fortunately don't have
anyway. The call data just goes into a pile. If the reply
shows up and it matches a call, we handle it. If things time
out, we dead-letter and drop the record of making a call.
This is a very good way to get around the limitations of the
process as our main asynchronous primitive for now. It works
for big distributed services which by their very nature cannot
lock each other or share memory. It will work for connecting
many models to each other.
I'm not sure I understand this part. Yes, we can have a system
that stores callbacks in some hashmap or something, and that's
better than tying it directly to a specific process. However,
something must always be understanding when the process is done,
or if it has timed out, and that's the process. I'm not sure how
the centralized storage reduces the coupling to the process. But
if I'm reading this correctly, it seems like an argument for using
state machines with the centralized storage acting as a driver for
state changes, which may be a good way to think about this.
Thank you for your thorough and thoughtful response!
- Re: LLM Experiments, Part 1: Corrections, (continued)
- Re: LLM Experiments, Part 1: Corrections, João Távora, 2024/01/22
- Re: LLM Experiments, Part 1: Corrections, Karthik Chikmagalur, 2024/01/23
- Re: LLM Experiments, Part 1: Corrections, contact, 2024/01/23
- Re: LLM Experiments, Part 1: Corrections, Andrew Hyatt, 2024/01/24
- LLM Experiments, Part 1: Corrections, Psionic K, 2024/01/22
- Re: LLM Experiments, Part 1: Corrections,
Andrew Hyatt <=