
如何快速實(shí)現(xiàn)REST API集成以優(yōu)化業(yè)務(wù)流程
"model": "llama2",
"prompt": "What color is the sky at different times of the day? Respond using JSON",
"format": "json",
"stream": false
}'
高級參數(shù)都可以在請求中攜帶,比如keep_alive
,默認(rèn)是5分鐘,5分鐘內(nèi)沒有任何操作,釋放內(nèi)存。如果是-1
,是一直加載在內(nèi)存。
響應(yīng)返回的格式:
{
"model": "llama2",
"created_at": "2023-11-09T21:07:55.186497Z",
"response": "{\n\"morning\": {\n\"color\": \"blue\"\n},\n\"noon\": {\n\"color\": \"blue-gray\"\n},\n\"afternoon\": {\n\"color\": \"warm gray\"\n},\n\"evening\": {\n\"color\": \"orange\"\n}\n}\n",
"done": true,
"context": [1, 2, 3],
"total_duration": 4648158584,
"load_duration": 4071084,
"prompt_eval_count": 36,
"prompt_eval_duration": 439038000,
"eval_count": 180,
"eval_duration": 4196918000
}
在 powershell
訪問API
格式為:
(Invoke-WebRequest -method POST -Body '{"model":"llama2", "prompt":"Why is the sky blue?", "stream": false}' -uri http://localhost:11434/api/generate ).Content | ConvertFrom-json
python
訪問API
:
url_generate = "http://localhost:11434/api/generate"
def get_response(url, data):
response = requests.post(url, json=data)
response_dict = json.loads(response.text)
response_content = response_dict["response"]
return response_content
data = {
"model": "gemma:7b",
"prompt": "Why is the sky blue?",
"stream": False
}
res = get_response(url_generate,data)
print(res)
上面是通過python
對接口進(jìn)行訪問,可在程序代碼直接調(diào)用,適合批量操作,生成結(jié)果。
正常請求時(shí),options
都省略了,options
可以設(shè)置很多參數(shù),比如temperature
,是否使用gpu
,上下文的長度等,都在此設(shè)置。下面是一個(gè)包含options
的請求:
curl http://localhost:11434/api/generate -d '{
"model": "llama2",
"prompt": "Why is the sky blue?",
"stream": false,
"options": {
"num_keep": 5,
"seed": 42,
"num_predict": 100,
"top_k": 20,
"top_p": 0.9,
"tfs_z": 0.5,
"typical_p": 0.7,
"repeat_last_n": 33,
"temperature": 0.8,
"repeat_penalty": 1.2,
"presence_penalty": 1.5,
"frequency_penalty": 1.0,
"mirostat": 1,
"mirostat_tau": 0.8,
"mirostat_eta": 0.6,
"penalize_newline": true,
"stop": ["\n", "user:"],
"numa": false,
"num_ctx": 1024,
"num_batch": 2,
"num_gqa": 1,
"num_gpu": 1,
"main_gpu": 0,
"low_vram": false,
"f16_kv": true,
"vocab_only": false,
"use_mmap": true,
"use_mlock": false,
"rope_frequency_base": 1.1,
"rope_frequency_scale": 0.8,
"num_thread": 8
}
}'
格式
POST /api/chat
和上面生成補(bǔ)全很像。
參數(shù)
model
:(必填)型號名稱messages
:聊天的消息,這個(gè)可以用來保留聊天記憶該message
對象具有以下字段:
role
:消息的角色,system
或者user
assistant
content
: 消息內(nèi)容images
(可選):要包含在消息中的圖像列表(對于多模式模型,例如llava
)高級參數(shù)(可選):
format
:返回響應(yīng)的格式。目前唯一接受的值是json
options
:模型文件文檔中列出的其他模型參數(shù),例如temperature
template
:要使用的提示模板(覆蓋 中定義的內(nèi)容Modelfile
)stream
:false
響應(yīng)是否作為單個(gè)響應(yīng)對象返回,而不是對象流keep_alive
:控制模型在請求后加載到內(nèi)存中的時(shí)間(默認(rèn)值5m
:)發(fā)送聊天請求:
curl http://localhost:11434/api/chat -d '{
"model": "llama2",
"messages": [
{
"role": "user",
"content": "why is the sky blue?"
}
]
}'
和generate
的區(qū)別,message
和prompt
對應(yīng),prompt
后面直接跟要聊的內(nèi)容,而message
里面還有role
角色,user
相當(dāng)于提問的內(nèi)容。
響應(yīng)返回的內(nèi)容:
{
"model": "llama2",
"created_at": "2023-08-04T19:22:45.499127Z",
"done": true,
"total_duration": 4883583458,
"load_duration": 1334875,
"prompt_eval_count": 26,
"prompt_eval_duration": 342546000,
"eval_count": 282,
"eval_duration": 4535599000
}
還可以發(fā)送帶聊天記錄的請求:
curl http://localhost:11434/api/chat -d '{
"model": "llama2",
"messages": [
{
"role": "user",
"content": "why is the sky blue?"
},
{
"role": "assistant",
"content": "due to rayleigh scattering."
},
{
"role": "user",
"content": "how is that different than mie scattering?"
}
]
}'
python
格式的生成聊天補(bǔ)全:
url_chat = "http://localhost:11434/api/chat"
data = {
"model": "llama2",
"messages": [
{
"role": "user",
"content": "why is the sky blue?"
},
"stream": False
}
response = requests.post(url_chat, json=data)
response_dict = json.loads(response.text)
print(response_dict)
格式
POST /api/create
name
:要?jiǎng)?chuàng)建的模型的名稱modelfile
(可選):模型文件的內(nèi)容stream
:(可選)如果false
響應(yīng)將作為單個(gè)響應(yīng)對象返回,而不是對象流path
(可選):模型文件的路徑modelfile
后面直接是modelfile
的內(nèi)容,比如基于那個(gè)模型,有那些設(shè)定,創(chuàng)建模型的請求:
curl http://localhost:11434/api/create -d '{
"name": "mario",
"modelfile": "FROM llama2\nSYSTEM You are mario from Super Mario Bros."
}'
基于llama2
創(chuàng)建一個(gè)模型,系統(tǒng)角色進(jìn)行設(shè)定。返回結(jié)果就不多做介紹。
使用python
創(chuàng)建一個(gè)模型:
url_create = "http://localhost:11434/api/create"
data = {
"name": "mario",
"modelfile": "FROM llama2\nSYSTEM You are mario from Super Mario Bros."
}
response = requests.post(url, json=data)
response_dict = json.loads(response.text)
print(response_dict)
這個(gè)python
和上面的相同的功能。
格式
GET /api/tags
列出本地所有模型。
使用python
顯示模型。
url_list = "http://localhost:11434/api/tags"
def get_list(url):
response = requests.get(url)
response_dict = json.loads(response.text)
model_names = [model["name"] for model in response_dict["models"]]
names = []
# 打印所有模型的名稱
for name in model_names:
names.append(name)
for idx, name in enumerate(names, start=1):
print(f"{idx}. {name}")
return names
get_list(url_list)
返回結(jié)果:
1. codellama:13b
2. codellama:7b-code
3. gemma:2b
4. gemma:7b
5. gemma_7b:latest
6. gemma_sumary:latest
7. llama2:7b
8. llama2:latest
9. llava:7b
10. llava:v1.6
11. mistral:latest
12. mistrallite:latest
13. nomic-embed-text:latest
14. qwen:1.8b
15. qwen:4b
16. qwen:7b
格式
POST /api/show
顯示有關(guān)模型的信息,包括詳細(xì)信息、模型文件、模板、參數(shù)、許可證和系統(tǒng)提示。
參數(shù)
name
:要顯示的模型名稱請求
curl http://localhost:11434/api/show -d '{
"name": "llama2"
}'
使用python
顯示模型信息:
url_show_info = "http://localhost:11434/api/show"
def show_model_info(url,model_name):
data = {
"name": model_name
}
response = requests.post(url, json=data)
response_dict = json.loads(response.text)
print(response_dict)
show_model_info(url_show_info,"gemma:7b")
返回的結(jié)果:
{'license': 'Gemma Terms of Use \n\nLast modified: February 21, 2024\n\nBy using, reproducing, modifying, distributing, performing or displaying any portion or element of Gemma, Model Derivatives including via any Hosted Service, (each as defined below) (collectively, the "Gemma Services") or otherwise accepting the terms of this Agreement, you agree to be bound by this Agreement.\n\nSection 1: DEFINITIONS\n1.1 Definitions\n(a) "Agreement" or "Gemma Terms of Use" means these terms and conditions that govern the use, reproduction, Distribution or modification of the Gemma Services and any terms and conditions incorporated by reference.\n\n(b) "Distribution" or "Distribute" means any transmission, publication, or other sharing of Gemma or Model Derivatives to a third party, including by providing or making Gemma or its functionality available as a hosted service via API, web access, or any other electronic or remote means ("Hosted Service").\n\n(c) "Gemma" means the set of machine learning language models, trained model weights and parameters identified at ai.google.dev/gemma, regardless of the source that you obtained it from.\n\n(d) "Google" means Google LLC.\n\n(e) "Model Derivatives" means all (i) modifications to Gemma, (ii) works based on Gemma, or (iii) any other machine learning model which is created by transfer of patterns of the weights, parameters, operations, or Output of Gemma, to that model in order to cause that model to perform similarly to Gemma, including distillation methods that use intermediate data representations or methods based on the generation of synthetic data Outputs by Gemma for training that model. For clarity, Outputs are not deemed Model Derivatives.\n\n(f) "Output" means the information content output of Gemma or a Model Derivative that results from operating or otherwise using Gemma or the Model Derivative, including via a Hosted Service.\n\n1.2\nAs used in this Agreement, "including" means "including without limitation".\n\nSection 2: ELIGIBILITY AND USAGE\n2.1 Eligibility\nYou represent and warrant that you have the legal capacity to enter into this Agreement (including being of sufficient age of consent). If you are accessing or using any of the Gemma Services for or on behalf of a legal entity, (a) you are entering into this Agreement on behalf of yourself and that legal entity, (b) you represent and warrant that you have the authority to act on behalf of and bind that entity to this Agreement and (c) references to "you" or "your" in the remainder of this Agreement refers to both you (as an individual) and that entity.\n\n2.2 Use\nYou may use, reproduce, modify, Distribute, perform or display any of the Gemma Services only in accordance with the terms of this Agreement, and must not violate (or encourage or permit anyone else to violate) any term of this Agreement.\n\nSection 3: DISTRIBUTION AND RESTRICTIONS\n3.1 Distribution and Redistribution\nYou may reproduce or Distribute copies of Gemma or Model Derivatives if you meet all of the following conditions:\n\nYou must include the use restrictions referenced in Section 3.2 as an enforceable provision in any
.......
除了以上功能,還可以復(fù)制模型,刪除模型,拉取模型,另外,如果有ollama的帳號,還可把模型推到ollama的服務(wù)器。
windows用戶默認(rèn)存儲(chǔ)位置:
C:\Users\<username>\.ollama\models
更改默認(rèn)存儲(chǔ)位置,在環(huán)境變量中設(shè)置OLLAMA_MODELS
對應(yīng)存儲(chǔ)位置,實(shí)現(xiàn)模型存儲(chǔ)位置更改。
可能有從HuggingFace
下載的gguf
模型,可以通過modelfile
創(chuàng)建模型導(dǎo)入gguf
模型。創(chuàng)建一個(gè)Modelfile
文件:
FROM ./mistral-7b-v0.1.Q4_0.gguf
通過這個(gè)Modelfile創(chuàng)建新模型:
ollama create example -f Modelfile
example
為新模型名,使用時(shí)直接調(diào)用這個(gè)模型名就可以。
正常運(yùn)行模型時(shí),很少對參數(shù)進(jìn)行設(shè)置,在發(fā)送請求時(shí),可以通過options
對參數(shù)進(jìn)行設(shè)置,比如設(shè)置上下文的token
數(shù):
curl http://localhost:11434/api/generate -d '{
"model": "llama2",
"prompt": "Why is the sky blue?",
"options": {
"num_ctx": 4096
}
}'
默認(rèn)是2048,這里修改成了4096,還可以設(shè)置比如是否使用gpu,后臺(tái)服務(wù)跑起來,剛出來這些東西,都可以在參數(shù)里進(jìn)行設(shè)置。
兼容openai
接口,通過openai
的包可以直接調(diào)用訪問ollama
提供的后臺(tái)服務(wù)。
from openai import OpenAI
client = OpenAI(
base_url='http://localhost:11434/v1/',
# required but ignored
api_key='ollama',
)
chat_completion = client.chat.completions.create(
messages=[
{
'role': 'user',
'content': 'Say this is a test',
}
],
model='llama2',
)
得到:
ChatCompletion(id='chatcmpl-173', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='\nThe question " Why is the sky blue? " is a common one, and there are several reasons why the sky appears blue to our eyes. Here are some possible explanations:\n\n1. Rayleigh scattering: When sunlight enters Earth\'s atmosphere, it encounters tiny molecules of gases such as nitrogen and oxygen. These molecules scatter the light in all directions, but they scatter shorter (blue) wavelengths more than longer (red) wavelengths. This is known as Rayleigh scattering. As a result, the blue light is dispersed throughout the atmosphere, giving the sky its blue appearance.\n2. Mie scattering: In addition to Rayleigh scattering, there is also a phenomenon called Mie scattering, which occurs when light encounters much larger particles in the atmosphere, such as dust and water droplets. These particles can also scatter light, but they preferentially scatter longer (red) wavelengths, which can make the sky appear more red or orange during sunrise and sunset.\n3. Angel\'s breath: Another explanation for why the sky appears blue is due to a phenomenon called "angel\'s breath." This occurs when sunlight passes through a layer of cool air near the Earth\'s surface, causing the light to be scattered in all directions and take on a bluish hue.\n4. Optical properties of the atmosphere: The atmosphere has its own optical properties, which can affect how light is transmitted and scattered. For example, the atmosphere scatters shorter wavelengths (such as blue and violet) more than longer wavelengths (such as red and orange), which can contribute to the blue color of the sky.\n5. Perspective: The way we perceive the color of the sky can also be affected by perspective. From a distance, the sky may appear blue because our brains are wired to perceive blue as a color that is further away. This is known as the "Perspective Problem."\n\nIt\'s worth noting that the color of the sky can vary depending on the time of day, the amount of sunlight, and other environmental factors. For example, during sunrise and sunset, the sky may appear more red or orange due to the scattering of light by atmospheric particles.', role='assistant', function_call=None, tool_calls=None))], created=1710810193, model='llama2:7b', object='chat.completion', system_fingerprint='fp_ollama', usage=CompletionUsage(completion_tokens=498, prompt_tokens=34, total_tokens=532))
最后一個(gè)實(shí)現(xiàn)翻譯助手,這么多大模型,中西語料足夠,讓他充當(dāng)個(gè)免費(fèi)翻譯沒問題吧。我愿意在網(wǎng)上找英文資源,有時(shí)會(huì)沒有字幕,自己英語又不好,如果能把字幕翻譯的活干好了,這個(gè)大模型學(xué)習(xí),也算有所收獲。下面通過python
代碼,訪問ollama
,給他設(shè)定一個(gè)身份,讓他充當(dāng)一個(gè)翻譯的角色,后面只給他英文內(nèi)容,他直接輸出中文內(nèi)容(”Translate the following into chinese and only show me the translated”)。只是一個(gè)demo
,字幕提取,讀取翻譯應(yīng)該都可以搞定。下面演示是要翻譯的內(nèi)容為grok
網(wǎng)頁介紹內(nèi)容,看一下他翻譯的效果。
import requests
import json
text = """
We are releasing the base model weights and network architecture of Grok-1, our large language model. Grok-1 is a 314 billion parameter Mixture-of-Experts model trained from scratch by xAI.
This is the raw base model checkpoint from the Grok-1 pre-training phase, which concluded in October 2023. This means that the model is not fine-tuned for any specific application, such as dialogue.
We are releasing the weights and the architecture under the Apache 2.0 license.
To get started with using the model, follow the instructions at github.com/xai-org/grok.
Model Details
Base model trained on a large amount of text data, not fine-tuned for any particular task.
314B parameter Mixture-of-Experts model with 25% of the weights active on a given token.
Trained from scratch by xAI using a custom training stack on top of JAX and Rust in October 2023.
The cover image was generated using Midjourney based on the following prompt proposed by Grok: A 3D illustration of a neural network, with transparent nodes and glowing connections, showcasing the varying weights as different thicknesses and colors of the connecting lines.
"""
#"Describe the bug. When selecting to use a self hosted ollama instance, there is no way to do 2 things:Set the server endpoint for the ollama instance. in my case I have a desktop machine with a good GPU and run ollama there, when coding on my laptop i want to use the ollama instance on my desktop, no matter what value is set for cody.autocomplete.advanced.serverEndpoint, cody will always attempt to use http://localhost:11434, so i cannot sepcify the ip of my desktop machine hosting ollama.Use a different model on ollama - no matter what value is set for cody.autocomplete.advanced.model, for example when llama-code-13b is selected, the vscode output tab for cody always says: █ CodyCompletionProvider:initialized: unstable-ollama/codellama:7b-code "
url_generate = "http://localhost:11434/api/generate"
data = {
"model": "mistral:latest",
"prompt": f"{text}",#"Why is the sky blue?",
"system":"Translate the following into chinese and only show me the translated",
"stream": False
}
def get_response(url, data):
response = requests.post(url, json=data)
response_dict = json.loads(response.text)
response_content = response_dict["response"]
return response_content
res = get_response(url_generate,data)
print(res)
大概演示一下,具體細(xì)節(jié)再調(diào)整吧。今天內(nèi)容到些結(jié)束。
本文章轉(zhuǎn)載微信公眾號@峰哥Python筆記