[elpa] externals/llm 23616e6cf5 1/2: Upgrade Google Cloud Vertex to Gemi

emacs-elpa-diffs
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[elpa] externals/llm 23616e6cf5 1/2: Upgrade Google Cloud Vertex to Gemi

From:	ELPA Syncer
Subject:	[elpa] externals/llm 23616e6cf5 1/2: Upgrade Google Cloud Vertex to Gemini
Date:	Thu, 14 Dec 2023 21:58:10 -0500 (EST)
branch: externals/llm
commit 23616e6cf597b2e5be6824b645751ca2b790ba3b
Author: Andrew Hyatt <ahyatt@gmail.com>
Commit: Andrew Hyatt <ahyatt@gmail.com>

    Upgrade Google Cloud Vertex to Gemini
    
    Also make it so llm-chat-async falls back to llm-chat-streaming, which is 
useful
    for Gemini, which is streaming-only (via Vertex).
---
 NEWS.org      |   3 +
 README.org    |   7 ++-
 llm-vertex.el | 197 +++++++++++++---------------------------------------------
 llm.el        |  10 ++-
 4 files changed, 61 insertions(+), 156 deletions(-)

diff --git a/NEWS.org b/NEWS.org
index 4d9ca60b51..e3f750691a 100644
--- a/NEWS.org
+++ b/NEWS.org
@@ -1,3 +1,6 @@
+* Version 0.7
+- Upgrade Google Cloud Vertex to Gemini - previous models are no longer 
available.
+- Provide default for ~llm-chat-async~ to fall back to streaming if not 
defined for a provider.
 * Version 0.6
 - Add provider =llm-llamacpp=.
 - Fix issue with Google Cloud Vertex not responding to messages with a system 
interaction.
diff --git a/README.org b/README.org
index d0d5accc57..df994e3fd6 100644
--- a/README.org
+++ b/README.org
@@ -24,7 +24,7 @@ You can set up with ~make-llm-openai~, with the following 
parameters:
 - ~:key~, the Open AI key that you get when you sign up to use Open AI's APIs. 
 Remember to keep this private.  This is non-optional.
 - ~:chat-model~: A model name from the 
[[https://platform.openai.com/docs/models/gpt-4][list of Open AI's model 
names.]]  Keep in mind some of these are not available to everyone.  This is 
optional, and will default to a reasonable 3.5 model.
 - ~:embedding-model~: A model name from 
[[https://platform.openai.com/docs/guides/embeddings/embedding-models][list of 
Open AI's embedding model names.]]  This is optional, and will default to a 
reasonable model.
-** Vertex
+** Vertex (Gemini via Google Cloud)
 You can set up with ~make-llm-vertex~, with the following parameters:
 - ~:project~: Your project number from Google Cloud that has Vertex API 
enabled.
 - ~:chat-model~: A model name from the 
[[https://cloud.google.com/vertex-ai/docs/generative-ai/chat/chat-prompts#supported_model][list
 of Vertex's model names.]]  This is optional, and will default to a reasonable 
model.
@@ -33,6 +33,11 @@ You can set up with ~make-llm-vertex~, with the following 
parameters:
 In addition to the provider, which you may want multiple of (for example, to 
charge against different projects), there are customizable variables:
 - ~llm-vertex-gcloud-binary~: The binary to use for generating the API key.
 - ~llm-vertex-gcloud-region~: The gcloud region to use.  It's good to set this 
to a region near where you are for best latency.  Defaults to "us-central1".
+
+  If you haven't already, you must run the following command before using this:
+  #+begin_src sh
+  gcloud beta services identity create --service=aiplatform.googleapis.com 
--project=PROJECT_ID
+  #+end_src
 ** Ollama
 [[https://ollama.ai/][Ollama]] is a way to run large language models locally. 
There are [[https://ollama.ai/library][many different models]] you can use with 
it. You set it up with the following parameters:
 - ~:scheme~: The scheme (http/https) for the connection to ollama.  This 
default to "http".
diff --git a/llm-vertex.el b/llm-vertex.el
index 485bd2046d..a931900d6e 100644
--- a/llm-vertex.el
+++ b/llm-vertex.el
@@ -56,7 +56,7 @@ and there is no default. The maximum value possible here is 
2049."
   :type 'integer
   :group 'llm-vertex)
 
-(defcustom llm-vertex-default-chat-model "chat-bison"
+(defcustom llm-vertex-default-chat-model "gemini-pro"
   "The default model to ask for.
 This should almost certainly be a chat model, other models are
 for more specialized uses."
@@ -111,10 +111,10 @@ KEY-GENTIME keeps track of when the key was generated, 
because the key must be r
 
 (defun llm-vertex--error-message (err-response)
   "Return a user-visible error message from ERR-RESPONSE."
-  (format "Problem calling GCloud Vertex AI: status: %s message: %s (%s)"
-          (assoc-default 'status (assoc-default 'error err-response))
-          (assoc-default 'message (assoc-default 'error err-response))
-          err-response))
+  (let ((err (assoc-default 'error (aref err-response 0))))
+    (format "Problem calling GCloud Vertex AI: status: %s message: %s"
+            (assoc-default 'code err)
+            (assoc-default 'message err))))
 
 (defun llm-vertex--handle-response (response extractor)
   "If RESPONSE is an error, throw it, else call EXTRACTOR."
@@ -144,34 +144,18 @@ KEY-GENTIME keeps track of when the key was generated, 
because the key must be r
                      :data `(("instances" . [(("content" . ,string))])))
    #'llm-vertex--embedding-extract-response))
 
-(defun llm-vertex--parameters-ui (prompt)
-  "Return a alist setting parameters, appropriate for the ui API.
-If nothing needs to be set, return nil."
-  (let ((param-struct-alist))
-    (when (llm-chat-prompt-temperature prompt)
-      (push `("temperature" . (("float_val" . ,(llm-chat-prompt-temperature 
prompt)))) param-struct-alist))
-    (when (llm-chat-prompt-max-tokens prompt)
-      (push `("maxOutputTokens" . (("int_val" . ,(llm-chat-prompt-max-tokens 
prompt)))) param-struct-alist))
-    ;; Wrap in the "parameters" and "struct_val" keys
-    (if param-struct-alist
-        `(("parameters" . (("struct_val" . ,param-struct-alist)))))))
-
 (defun llm-vertex--get-chat-response-streaming (response)
   "Return the actual response from the RESPONSE struct returned.
 This handles different kinds of models."
   (pcase (type-of response)
     ('vector (mapconcat #'llm-vertex--get-chat-response-streaming
                         response ""))
-    ('cons (let* ((outputs (assoc-default 'outputs response))
-                  (structVal-list (assoc-default 'structVal (aref outputs 0)))
-                  (candidates (assoc-default 'candidates structVal-list)))
-             (if candidates
-                 (let* ((listVal (assoc-default 'listVal candidates))
-                        (structVal (assoc-default 'structVal (aref listVal 0)))
-                        (content (assoc-default 'content structVal))
-                        (stringVal (aref (assoc-default 'stringVal content) 
0)))
-                   stringVal)
-               (aref (assoc-default 'stringVal (assoc-default 'content 
structVal-list)) 0))))))
+    ('cons (let ((parts (assoc-default 'parts
+                                       (assoc-default 'content 
+                                                      (aref (assoc-default 
'candidates response) 0)))))
+             (if parts
+                 (assoc-default 'text (aref parts 0))
+               "")))))
 
 (defun llm-vertex--get-partial-chat-ui-repsonse (response)
   "Return the partial response from as much of RESPONSE as we can parse.
@@ -217,46 +201,26 @@ If there are no non-interaction parts, return nil."
     (when system-prompt
       (mapconcat #'identity (nreverse system-prompt) "\n"))))
 
-(defun llm-vertex--chat-request-streaming (prompt model)
+(defun llm-vertex--chat-request-streaming (prompt)
   "Return an alist with chat input for the streaming API.
-PROMPT contains the input to the call to the chat API. MODEL
-contains the model to use, which can change the request."
-  (let ((system-prompt (llm-vertex--collapsed-system-prompt prompt)))    
+PROMPT contains the input to the call to the chat API."
+  (let ((system-prompt (llm-vertex--collapsed-system-prompt prompt)))
+    (when (and (= 1 (length (llm-chat-prompt-interactions prompt))) 
system-prompt)
+      (setf (llm-chat-prompt-interaction-content (car 
(llm-chat-prompt-interactions prompt)))
+              (concat system-prompt "\n" (llm-chat-prompt-interaction-content
+                                          (car (llm-chat-prompt-interactions 
prompt))))))
     (append
-     `(("inputs" . ((("struct_val" .
-                      ,(if (string-match-p "text-bison" model)
-                          (progn
-                            (unless (= 1 (length (llm-chat-prompt-interactions 
prompt)))
-                              (error "Vertex model 'text-bison' must contain 
only one interaction"))
-                            `(("prompt" . (("string_val" .
-                                           [,(format "'\"%s\"'"
-                                                     (concat system-prompt 
(when system-prompt "\n")
-                                                             
(llm-chat-prompt-interaction-content
-                                                              (car 
(llm-chat-prompt-interactions prompt )))))])))))
-                         `(("messages" .
-                            (("list_val" .
-                              ,(mapcar (lambda (interaction)
-                                         `(("struct_val" . (("content" .
-                                                             (("string_val" .
-                                                               (,(format 
"'\"%s\"'"
-                                                                         
(llm-chat-prompt-interaction-content
-                                                                          
interaction))))))
-                                                            ("author" .
-                                                             (("string_val" .
-                                                               ,(format 
"'\"%s\"'"
-                                                                        (pcase 
(llm-chat-prompt-interaction-role interaction)
-                                                                          
('user "user")
-                                                                          
('system "system")
-                                                                          
('assistant "assistant"))))))))))
-                                       ;; Only append the system prompt if 
this is the first message of the conversation.
-                                       (if (and system-prompt (= (length 
(llm-chat-prompt-interactions prompt)) 1))
-                                           (cons 
(make-llm-chat-prompt-interaction
-                                                  :role 'user
-                                                  :content (concat 
system-prompt (llm-chat-prompt-interaction-content
-                                                                               
    (car (llm-chat-prompt-interactions prompt)))))
-                                                 (cdr 
(llm-chat-prompt-interactions prompt)))
-                                         (llm-chat-prompt-interactions 
prompt)))))))))))))
-     (llm-vertex--parameters-ui prompt))))
+     `((contents
+        .
+        ,(mapcar (lambda (interaction)
+                    `((role . ,(pcase (llm-chat-prompt-interaction-role 
interaction)
+                                 ('user "USER")
+                                 ('assistant "ASSISTANT")))
+                      (parts .
+                              ((text . ,(llm-chat-prompt-interaction-content
+                                         interaction))))))
+                  (llm-chat-prompt-interactions prompt))))
+     (llm-vertex--chat-parameters prompt))))
 
 (defun llm-vertex--chat-parameters (prompt)
   "From PROMPT, create the parameters section.
@@ -269,110 +233,37 @@ nothing to add, in which case it is nil."
     (when (llm-chat-prompt-max-tokens prompt)
       (push `(maxOutputTokens . ,(llm-chat-prompt-max-tokens prompt)) 
params-alist))
     (when params-alist
-      `(parameters . ,params-alist))))
+      `((generation_config . ,params-alist)))))
 
-(defun llm-vertex--text-request (prompt)
-  "From PROMPT, create the data for the vertex text reequest.
-The text request can only have one interaction."
-  (unless (= 1 (length (llm-chat-prompt-interactions prompt)))
-    (error "Model text-bison can only have 1 prompt interaction"))
-  (let ((system-prompt (llm-vertex--collapsed-system-prompt prompt)))
-    (append
-     `((instances . [((prompt . ,(concat system-prompt
-                                        (when system-prompt "\n")
-                                        (llm-chat-prompt-interaction-content
-                                              (car 
(llm-chat-prompt-interactions prompt))))))]))
-     (let ((params (llm-vertex--chat-parameters (let ((p (copy-llm-chat-prompt 
prompt)))
-                                                  ;; For some reason vertex 
requires max-tokens
-                                                  (setf 
(llm-chat-prompt-max-tokens p)
-                                                        
llm-vertex-default-max-output-tokens)
-                                                  p))))
-       (when params (list params))))))
-
-(defun llm-vertex--chat-request-v1 (prompt model)
-  "From PROMPT, create the data for the vertex chat request."
-  (if (string-match-p "text-bison" model)
-      (llm-vertex--text-request prompt)
-    (let ((prompt-alist))
-      (when (llm-chat-prompt-context prompt)
-        (push `("context" . ,(llm-chat-prompt-context prompt)) prompt-alist))
-      (when (llm-chat-prompt-examples prompt)
-        (push `("examples" . ,(apply #'vector
-                                     (mapcar (lambda (example)
-                                        `(("input" . (("content" . ,(car 
example))))
-                                          ("output" . (("content" . ,(cdr 
example))))))
-                                             (llm-chat-prompt-examples 
prompt))))
-              prompt-alist))
-      (push `("messages" . ,(apply #'vector
-                                   (mapcar (lambda (interaction)
-                                             `(("author" . (pcase 
(llm-chat-prompt-interaction-role interaction)
-                                                             ('user "user")
-                                                             ('system (error 
"System role not supported"))
-                                                             ('assistant 
"assistant")))
-                                               ("content" . 
,(llm-chat-prompt-interaction-content interaction))))
-                                           (llm-chat-prompt-interactions 
prompt))))
-            prompt-alist)      
-      (append
-        `(("instances" . [,prompt-alist]))
-        (let ((params (llm-vertex--chat-parameters prompt)))
-          (when params (list params)))))))
-
-(defun llm-vertex--chat-url (provider streaming)
+(defun llm-vertex--chat-url (provider)
 "Return the correct url to use for PROVIDER.
 If STREAMING is non-nil, use the URL for the streaming API."
-  (format 
"https://%s-aiplatform.googleapis.com/v1/projects/%s/locations/%s/publishers/google/models/%s:%s";
+  (format 
"https://%s-aiplatform.googleapis.com/v1/projects/%s/locations/%s/publishers/google/models/%s:streamGenerateContent";
           llm-vertex-gcloud-region
           (llm-vertex-project provider)
           llm-vertex-gcloud-region
-          (llm-vertex-chat-model provider)
-          (if streaming "serverStreamingPredict" "predict")))
-
-(defun llm-vertex--chat-extract-response (response)
-  "Return the chat response contained in the server RESPONSE.
-This should handle the various kinds of responses that the
-different models can return."
-  (let* ((predictions (aref (assoc-default 'predictions response) 0))
-         (candidates (assoc-default 'candidates predictions)))
-    (if candidates
-        (assoc-default 'content (aref candidates 0))
-      (assoc-default 'content predictions))))
-
-(cl-defmethod llm-chat-async ((provider llm-vertex) prompt response-callback 
error-callback)
-  (llm-vertex-refresh-key provider)
-  (let ((buf (current-buffer)))
-    (llm-request-async (llm-vertex--chat-url provider nil)
-                     :headers `(("Authorization" . ,(format "Bearer %s" 
(llm-vertex-key provider))))
-                     :data (llm-vertex--chat-request-v1 prompt 
(llm-vertex-chat-model provider))
-                     :on-success (lambda (data)
-                                   (let ((response 
(llm-vertex--chat-extract-response data)))
-                                     (setf (llm-chat-prompt-interactions 
prompt)
-                                           (append 
(llm-chat-prompt-interactions prompt)
-                                                   (list 
(make-llm-chat-prompt-interaction :role 'assistant :content response))))
-                                     (llm-request-callback-in-buffer buf 
response-callback response)))
-                     :on-error (lambda (_ data)
-                                 (llm-request-callback-in-buffer buf 
error-callback 'error
-                                          (llm-vertex--error-message data))))))
+          (llm-vertex-chat-model provider)))
+
+;; API reference: 
https://cloud.google.com/vertex-ai/docs/generative-ai/multimodal/send-chat-prompts-gemini#gemini-chat-samples-drest
 
 (cl-defmethod llm-chat ((provider llm-vertex) prompt)
+  ;; Gemini just has a streaming response, but we can just call it 
synchronously.
   (llm-vertex-refresh-key provider)
-  (let ((response (llm-vertex--handle-response
-                 (llm-request-sync
-                  (llm-vertex--chat-url provider nil)
-                  :headers `(("Authorization" . ,(format "Bearer %s" 
(llm-vertex-key provider))))
-                  :data (llm-vertex--chat-request-v1 prompt 
(llm-vertex-chat-model provider)))
-                 #'llm-vertex--chat-extract-response)))
+  (let ((response (llm-vertex--get-chat-response-streaming
+                   (llm-request-sync (llm-vertex--chat-url provider)
+                                     :headers `(("Authorization" . ,(format 
"Bearer %s" (llm-vertex-key provider))))
+                                     :data (llm-vertex--chat-request-streaming 
prompt)))))
     (setf (llm-chat-prompt-interactions prompt)
           (append (llm-chat-prompt-interactions prompt)
                   (list (make-llm-chat-prompt-interaction :role 'assistant 
:content response))))
     response))
 
-;; API reference: 
https://cloud.google.com/vertex-ai/docs/generative-ai/learn/streaming
 (cl-defmethod llm-chat-streaming ((provider llm-vertex) prompt 
partial-callback response-callback error-callback)
   (llm-vertex-refresh-key provider)
   (let ((buf (current-buffer)))
-    (llm-request-async (llm-vertex--chat-url provider t)
+    (llm-request-async (llm-vertex--chat-url provider)
                      :headers `(("Authorization" . ,(format "Bearer %s" 
(llm-vertex-key provider))))
-                     :data (llm-vertex--chat-request-streaming prompt 
(llm-vertex-chat-model provider))
+                     :data (llm-vertex--chat-request-streaming prompt)
                      :on-partial (lambda (partial)
                                    (when-let ((response 
(llm-vertex--get-partial-chat-ui-repsonse partial)))
                                      (llm-request-callback-in-buffer buf 
partial-callback response)))
@@ -414,8 +305,8 @@ MODEL "
    (llm-request-sync (llm-vertex--count-token-url provider)
                      :headers `(("Authorization" . ,(format "Bearer %s" 
(llm-vertex-key provider))))
                      :data (llm-vertex--to-count-token-request
-                            (llm-vertex--chat-request-v1
-                             (llm-make-simple-chat-prompt string) 
(llm-vertex-chat-model provider))))
+                            (llm-vertex--chat-request-streaming
+                             (llm-make-simple-chat-prompt string))))
    #'llm-vertex--count-tokens-extract-response))
 
 (provide 'llm-vertex)
diff --git a/llm.el b/llm.el
index f77cb60360..27decd722b 100644
--- a/llm.el
+++ b/llm.el
@@ -152,8 +152,14 @@ ERROR-CALLBACK receives the error response.
 
 The prompt's interactions list will be updated to encode the
 conversation so far."
-  (ignore provider prompt response-callback error-callback)
-  (signal 'not-implemented nil))
+  ;; By default, you can turn a streaming call into an async call, so we can
+  ;; fall back to streaming if async is not populated.
+  (llm-chat-streaming provider prompt
+                      ;; Do nothing on partial callback
+                      (lambda (_))
+                      (lambda (text)
+                        (funcall response-callback text))
+                      (lambda (err msg) (funcall error-callback err msg))))
 
 (cl-defgeneric llm-chat-streaming (provider prompt partial-callback 
response-callback error-callback)
   "Stream a response to PROMPT from PROVIDER.
[Prev in Thread]
Current Thread
[Next in Thread]
[elpa] externals/llm updated (d903c6ab75 -> 9fe2f7597d), ELPA Syncer, 2023/12/14
- [elpa] externals/llm 23616e6cf5 1/2: Upgrade Google Cloud Vertex to Gemini, ELPA Syncer <=
- [elpa] externals/llm 9fe2f7597d 2/2: Add Gemini provider, an alternate and easier to use choice vs Vertex, ELPA Syncer, 2023/12/14
Prev by Date: [elpa] externals/dape 81ebf9766a 2/2: Remove tramp prefix from find-file functions
Next by Date: [elpa] externals/llm 9fe2f7597d 2/2: Add Gemini provider, an alternate and easier to use choice vs Vertex
Previous by thread: [elpa] externals/llm updated (d903c6ab75 -> 9fe2f7597d)
Next by thread: [elpa] externals/llm 9fe2f7597d 2/2: Add Gemini provider, an alternate and easier to use choice vs Vertex
Index(es):
- Date
- Thread