emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: LLM Experiments, Part 1: Corrections


From: Andrew Hyatt
Subject: Re: LLM Experiments, Part 1: Corrections
Date: Mon, 22 Jan 2024 16:21:26 -0400
User-agent: Gnus/5.13 (Gnus v5.13)


On 22 January 2024 21:57, Psionic K <psionik@positron.solutions> wrote: > I think things have to be synchronous here. Snapshot isolation is the best strategy for merging here. We don't know what user commands affected the region in question, so using undo states to merge might need to undo really arbitrary user commands. To snapshot isolate, basically you store a copy of the buffer text and hold two markers where that text was. You can merge the result if it arrives on time and then diff the snapshot with the buffer text between the markers. If things are too different for a valid merge, you can give up and drop the results. These days various CRDT (conflict-free replicated data type) treatments have great insights into dealing with much worse problems of multiple asynchronous writers, and it's a good place to look. There is a crdt.el package for some inspiration.

This is a good tip, thank you.
But definitely not synchronous. I think the changing of the text out from under you is just one problem to solve, but the other is that we start some llm-powered workflow, then the user is free to do whatever they want for just like 10 seconds. It requires both us and the user to do more - we would need to communicate to the user that something is awaiting their input. Then the user would need to run a command to go back to the experience we want to put them in (in this case, an ediff session). It's a bit weird, and I think it's a bit too complicated. I'm still leaning to the synchronous side, but it's worth trying out an async solution and seeing just how bad it is.

As a package author, I would want to treat my LLM like a fancy process. I create it, I handle results. I have a merging strategy (this is mainly up to the client, not the library), but I don't care about the asynchronous details and I don't want to be tied to each call.

The LLM library does work like this already. It has async methods that have callbacks. This is about higher-level functionality.

> Question 6 A rock solid library that sticks to the domain model is best for ecosystem growth. When that doesn't happen, we get four or five 75% finished packages because every author is having to figure out and integrate their high level features with so many backends. If you want to work on high level things, build a client for your library and experience both sides.

Totally agree, and that's what I've started and will continue to do with these demos, which inform the development of the llm-flows layer I'm building.

Every model will have some mostly static configuration, dynamic arguments that could change all the time but in practice change just a few times, and then the input data. The static configuration, if absolutely necessary, can be updated for one call via dynamic binding. The dynamic arguments should be abstracted into a "context" object that the model backend figures out how to translate into a valid API call. The input data is an arbitrary bundle of whatever that model type consumes as input. The library user will want to get a valid context of the dynamic arguments from the library, enabling them to make changes to it in subsequent calls, but they don't really want to touch it that much. As a package author, I would want to focus on integrating outputs and piping in inputs. I don't want to write a UI for tuning the model parameters. If the model can ask the user to make adjustments and just give me a record of their decision I can use later, that would be fantastic. I should be able to integrate more closely with backends I know about but otherwise just call with the provided context and my inputs.
Agreed, such adjustments should be part of a common layer.


Providers offer multiple models. As a library user, it's inconvenient if I have to go through long incantations to get each context that represents the capability to make valid calls for the provider. I want to initialize once and then use an existing context to pull out the correct context based on the input or output type I need, and then make refinements that are specific to a call, such as changing quality or entropy etc. Input or output type and settings that tune the call are two different things. Settings are mostly provider-specific argument data that doesn't affect the validity of connecting one model to another. Input and output type affect which pipes can be connected to which other pipes. This distinction between input or output types and other arguments become important in composition. I should be able to connect any string to string model with any other model that handles strings no matter what the other settings are. I think we're on the same page here. Anything for quality tuning should be generic, I hope - perhaps a knob on quality to price tradeoff that can be used to many things, including understanding how much context to provide. The rest is already generic in the llm package.

Integrating these systems will be more like distributed streaming programming than feeding inputs to a GPU with tight synchronization and everything under our watch, although a local model might work that way inside its own box. We should treat them like unreliable external services. A call to the model is a command. When I send a command, I should store how to handle the reply, but I shouldn't couple myself to it with nested callbacks or async, which we fortunately don't have anyway. The call data just goes into a pile. If the reply shows up and it matches a call, we handle it. If things time out, we dead-letter and drop the record of making a call. This is a very good way to get around the limitations of the process as our main asynchronous primitive for now. It works for big distributed services which by their very nature cannot lock each other or share memory. It will work for connecting many models to each other. I'm not sure I understand this part. Yes, we can have a system that stores callbacks in some hashmap or something, and that's better than tying it directly to a specific process. However, something must always be understanding when the process is done, or if it has timed out, and that's the process. I'm not sure how the centralized storage reduces the coupling to the process. But if I'm reading this correctly, it seems like an argument for using state machines with the centralized storage acting as a driver for state changes, which may be a good way to think about this.

Thank you for your thorough and thoughtful response!



reply via email to

[Prev in Thread] Current Thread [Next in Thread]