"model": "llama2",
"prompt": "What color is the sky at different times of the day? Respond using JSON",
"format": "json",
"stream": false
}'

高級參數(shù)都可以在請求中攜帶,比如keep_alive,默認(rèn)是5分鐘,5分鐘內(nèi)沒有任何操作,釋放內(nèi)存。如果是-1,是一直加載在內(nèi)存。

響應(yīng)返回的格式:

{
"model": "llama2",
"created_at": "2023-11-09T21:07:55.186497Z",
"response": "{\n\"morning\": {\n\"color\": \"blue\"\n},\n\"noon\": {\n\"color\": \"blue-gray\"\n},\n\"afternoon\": {\n\"color\": \"warm gray\"\n},\n\"evening\": {\n\"color\": \"orange\"\n}\n}\n",
"done": true,
"context": [1, 2, 3],
"total_duration": 4648158584,
"load_duration": 4071084,
"prompt_eval_count": 36,
"prompt_eval_duration": 439038000,
"eval_count": 180,
"eval_duration": 4196918000
}

在 powershell訪問API格式為:

(Invoke-WebRequest -method POST -Body '{"model":"llama2", "prompt":"Why is the sky blue?", "stream": false}' -uri http://localhost:11434/api/generate ).Content | ConvertFrom-json

python訪問API

url_generate = "http://localhost:11434/api/generate"
def get_response(url, data):
response = requests.post(url, json=data)
response_dict = json.loads(response.text)
response_content = response_dict["response"]
return response_content

data = {
"model": "gemma:7b",
"prompt": "Why is the sky blue?",
"stream": False
}

res = get_response(url_generate,data)
print(res)

上面是通過python對接口進(jìn)行訪問,可在程序代碼直接調(diào)用,適合批量操作,生成結(jié)果。

正常請求時(shí),options都省略了,options可以設(shè)置很多參數(shù),比如temperature,是否使用gpu,上下文的長度等,都在此設(shè)置。下面是一個(gè)包含options的請求:

curl http://localhost:11434/api/generate -d '{
"model": "llama2",
"prompt": "Why is the sky blue?",
"stream": false,
"options": {
"num_keep": 5,
"seed": 42,
"num_predict": 100,
"top_k": 20,
"top_p": 0.9,
"tfs_z": 0.5,
"typical_p": 0.7,
"repeat_last_n": 33,
"temperature": 0.8,
"repeat_penalty": 1.2,
"presence_penalty": 1.5,
"frequency_penalty": 1.0,
"mirostat": 1,
"mirostat_tau": 0.8,
"mirostat_eta": 0.6,
"penalize_newline": true,
"stop": ["\n", "user:"],
"numa": false,
"num_ctx": 1024,
"num_batch": 2,
"num_gqa": 1,
"num_gpu": 1,
"main_gpu": 0,
"low_vram": false,
"f16_kv": true,
"vocab_only": false,
"use_mmap": true,
"use_mlock": false,
"rope_frequency_base": 1.1,
"rope_frequency_scale": 0.8,
"num_thread": 8
}
}'

2.生成聊天補(bǔ)全

格式

POST /api/chat

和上面生成補(bǔ)全很像。

參數(shù)

message對象具有以下字段:

高級參數(shù)(可選):

發(fā)送聊天請求:

curl http://localhost:11434/api/chat -d '{
"model": "llama2",
"messages": [
{
"role": "user",
"content": "why is the sky blue?"
}
]
}'

generate的區(qū)別,messageprompt對應(yīng),prompt后面直接跟要聊的內(nèi)容,而message里面還有role角色,user相當(dāng)于提問的內(nèi)容。

響應(yīng)返回的內(nèi)容:

{
"model": "llama2",
"created_at": "2023-08-04T19:22:45.499127Z",
"done": true,
"total_duration": 4883583458,
"load_duration": 1334875,
"prompt_eval_count": 26,
"prompt_eval_duration": 342546000,
"eval_count": 282,
"eval_duration": 4535599000
}

還可以發(fā)送帶聊天記錄的請求:

curl http://localhost:11434/api/chat -d '{
"model": "llama2",
"messages": [
{
"role": "user",
"content": "why is the sky blue?"
},
{
"role": "assistant",
"content": "due to rayleigh scattering."
},
{
"role": "user",
"content": "how is that different than mie scattering?"
}
]
}'

python格式的生成聊天補(bǔ)全:

url_chat =  "http://localhost:11434/api/chat"
data = {
"model": "llama2",
"messages": [
{
"role": "user",
"content": "why is the sky blue?"
},
"stream": False
}
response = requests.post(url_chat, json=data)
response_dict = json.loads(response.text)
print(response_dict)

3.創(chuàng)建模型

格式

POST /api/create

參數(shù)

創(chuàng)建模型的請求:

curl http://localhost:11434/api/create -d '{
"name": "mario",
"modelfile": "FROM llama2\nSYSTEM You are mario from Super Mario Bros."
}'

基于llama2創(chuàng)建一個(gè)模型,系統(tǒng)角色進(jìn)行設(shè)定。返回結(jié)果就不多做介紹。

使用python創(chuàng)建一個(gè)模型:

url_create =  "http://localhost:11434/api/create"
data = {
"name": "mario",
"modelfile": "FROM llama2\nSYSTEM You are mario from Super Mario Bros."
}
response = requests.post(url, json=data)
response_dict = json.loads(response.text)
print(response_dict)

這個(gè)python和上面的相同的功能。

4.顯示模型

格式

GET /api/tags

列出本地所有模型。

使用python顯示模型。

url_list = "http://localhost:11434/api/tags"
def get_list(url):
response = requests.get(url)
response_dict = json.loads(response.text)
model_names = [model["name"] for model in response_dict["models"]]
names = []
# 打印所有模型的名稱
for name in model_names:
names.append(name)
for idx, name in enumerate(names, start=1):
print(f"{idx}. {name}")
return names
get_list(url_list)

返回結(jié)果:

1. codellama:13b
2. codellama:7b-code
3. gemma:2b
4. gemma:7b
5. gemma_7b:latest
6. gemma_sumary:latest
7. llama2:7b
8. llama2:latest
9. llava:7b
10. llava:v1.6
11. mistral:latest
12. mistrallite:latest
13. nomic-embed-text:latest
14. qwen:1.8b
15. qwen:4b
16. qwen:7b

5.顯示模型信息

格式

POST /api/show

顯示有關(guān)模型的信息,包括詳細(xì)信息、模型文件、模板、參數(shù)、許可證和系統(tǒng)提示。

參數(shù)

請求

curl http://localhost:11434/api/show -d '{
"name": "llama2"
}'

使用python顯示模型信息:

url_show_info  = "http://localhost:11434/api/show"
def show_model_info(url,model_name):
data = {
"name": model_name
}
response = requests.post(url, json=data)
response_dict = json.loads(response.text)
print(response_dict)
show_model_info(url_show_info,"gemma:7b")

返回的結(jié)果:

{'license': 'Gemma Terms of Use \n\nLast modified: February 21, 2024\n\nBy using, reproducing, modifying, distributing, performing or displaying any portion or element of Gemma, Model Derivatives including via any Hosted Service, (each as defined below) (collectively, the "Gemma Services") or otherwise accepting the terms of this Agreement, you agree to be bound by this Agreement.\n\nSection 1: DEFINITIONS\n1.1 Definitions\n(a) "Agreement" or "Gemma Terms of Use" means these terms and conditions that govern the use, reproduction, Distribution or modification of the Gemma Services and any terms and conditions incorporated by reference.\n\n(b) "Distribution" or "Distribute" means any transmission, publication, or other sharing of Gemma or Model Derivatives to a third party, including by providing or making Gemma or its functionality available as a hosted service via API, web access, or any other electronic or remote means ("Hosted Service").\n\n(c) "Gemma" means the set of machine learning language models, trained model weights and parameters identified at ai.google.dev/gemma, regardless of the source that you obtained it from.\n\n(d) "Google" means Google LLC.\n\n(e) "Model Derivatives" means all (i) modifications to Gemma, (ii) works based on Gemma, or (iii) any other machine learning model which is created by transfer of patterns of the weights, parameters, operations, or Output of Gemma, to that model in order to cause that model to perform similarly to Gemma, including distillation methods that use intermediate data representations or methods based on the generation of synthetic data Outputs by Gemma for training that model. For clarity, Outputs are not deemed Model Derivatives.\n\n(f) "Output" means the information content output of Gemma or a Model Derivative that results from operating or otherwise using Gemma or the Model Derivative, including via a Hosted Service.\n\n1.2\nAs used in this Agreement, "including" means "including without limitation".\n\nSection 2: ELIGIBILITY AND USAGE\n2.1 Eligibility\nYou represent and warrant that you have the legal capacity to enter into this Agreement (including being of sufficient age of consent). If you are accessing or using any of the Gemma Services for or on behalf of a legal entity, (a) you are entering into this Agreement on behalf of yourself and that legal entity, (b) you represent and warrant that you have the authority to act on behalf of and bind that entity to this Agreement and (c) references to "you" or "your" in the remainder of this Agreement refers to both you (as an individual) and that entity.\n\n2.2 Use\nYou may use, reproduce, modify, Distribute, perform or display any of the Gemma Services only in accordance with the terms of this Agreement, and must not violate (or encourage or permit anyone else to violate) any term of this Agreement.\n\nSection 3: DISTRIBUTION AND RESTRICTIONS\n3.1 Distribution and Redistribution\nYou may reproduce or Distribute copies of Gemma or Model Derivatives if you meet all of the following conditions:\n\nYou must include the use restrictions referenced in Section 3.2 as an enforceable provision in any 
.......

6.其他

除了以上功能,還可以復(fù)制模型,刪除模型,拉取模型,另外,如果有ollama的帳號,還可把模型推到ollama的服務(wù)器。

7.OLLama相關(guān)設(shè)置

1.本地模型存儲(chǔ)

windows用戶默認(rèn)存儲(chǔ)位置:

C:\Users\<username>\.ollama\models

更改默認(rèn)存儲(chǔ)位置,在環(huán)境變量中設(shè)置OLLAMA_MODELS對應(yīng)存儲(chǔ)位置,實(shí)現(xiàn)模型存儲(chǔ)位置更改。

2.導(dǎo)入GGUF模型

可能有從HuggingFace下載的gguf模型,可以通過modelfile創(chuàng)建模型導(dǎo)入gguf模型。創(chuàng)建一個(gè)Modelfile文件:

FROM ./mistral-7b-v0.1.Q4_0.gguf

通過這個(gè)Modelfile創(chuàng)建新模型:

ollama create example -f Modelfile

example為新模型名,使用時(shí)直接調(diào)用這個(gè)模型名就可以。

3.參數(shù)設(shè)置

正常運(yùn)行模型時(shí),很少對參數(shù)進(jìn)行設(shè)置,在發(fā)送請求時(shí),可以通過options對參數(shù)進(jìn)行設(shè)置,比如設(shè)置上下文的token數(shù):

curl http://localhost:11434/api/generate -d '{
"model": "llama2",
"prompt": "Why is the sky blue?",
"options": {
"num_ctx": 4096
}
}'

默認(rèn)是2048,這里修改成了4096,還可以設(shè)置比如是否使用gpu,后臺(tái)服務(wù)跑起來,剛出來這些東西,都可以在參數(shù)里進(jìn)行設(shè)置。

8.兼容openai

兼容openai接口,通過openai的包可以直接調(diào)用訪問ollama提供的后臺(tái)服務(wù)。

from openai import OpenAI
client = OpenAI(
base_url='http://localhost:11434/v1/',
# required but ignored
api_key='ollama',
)

chat_completion = client.chat.completions.create(
messages=[
{
'role': 'user',
'content': 'Say this is a test',
}
],
model='llama2',
)

得到:

ChatCompletion(id='chatcmpl-173', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='\nThe question " Why is the sky blue? " is a common one, and there are several reasons why the sky appears blue to our eyes. Here are some possible explanations:\n\n1. Rayleigh scattering: When sunlight enters Earth\'s atmosphere, it encounters tiny molecules of gases such as nitrogen and oxygen. These molecules scatter the light in all directions, but they scatter shorter (blue) wavelengths more than longer (red) wavelengths. This is known as Rayleigh scattering. As a result, the blue light is dispersed throughout the atmosphere, giving the sky its blue appearance.\n2. Mie scattering: In addition to Rayleigh scattering, there is also a phenomenon called Mie scattering, which occurs when light encounters much larger particles in the atmosphere, such as dust and water droplets. These particles can also scatter light, but they preferentially scatter longer (red) wavelengths, which can make the sky appear more red or orange during sunrise and sunset.\n3. Angel\'s breath: Another explanation for why the sky appears blue is due to a phenomenon called "angel\'s breath." This occurs when sunlight passes through a layer of cool air near the Earth\'s surface, causing the light to be scattered in all directions and take on a bluish hue.\n4. Optical properties of the atmosphere: The atmosphere has its own optical properties, which can affect how light is transmitted and scattered. For example, the atmosphere scatters shorter wavelengths (such as blue and violet) more than longer wavelengths (such as red and orange), which can contribute to the blue color of the sky.\n5. Perspective: The way we perceive the color of the sky can also be affected by perspective. From a distance, the sky may appear blue because our brains are wired to perceive blue as a color that is further away. This is known as the "Perspective Problem."\n\nIt\'s worth noting that the color of the sky can vary depending on the time of day, the amount of sunlight, and other environmental factors. For example, during sunrise and sunset, the sky may appear more red or orange due to the scattering of light by atmospheric particles.', role='assistant', function_call=None, tool_calls=None))], created=1710810193, model='llama2:7b', object='chat.completion', system_fingerprint='fp_ollama', usage=CompletionUsage(completion_tokens=498, prompt_tokens=34, total_tokens=532))

9.翻譯助手

最后一個(gè)實(shí)現(xiàn)翻譯助手,這么多大模型,中西語料足夠,讓他充當(dāng)個(gè)免費(fèi)翻譯沒問題吧。我愿意在網(wǎng)上找英文資源,有時(shí)會(huì)沒有字幕,自己英語又不好,如果能把字幕翻譯的活干好了,這個(gè)大模型學(xué)習(xí),也算有所收獲。下面通過python代碼,訪問ollama,給他設(shè)定一個(gè)身份,讓他充當(dāng)一個(gè)翻譯的角色,后面只給他英文內(nèi)容,他直接輸出中文內(nèi)容(”Translate the following into chinese and only show me the translated”)。只是一個(gè)demo,字幕提取,讀取翻譯應(yīng)該都可以搞定。下面演示是要翻譯的內(nèi)容為grok網(wǎng)頁介紹內(nèi)容,看一下他翻譯的效果。

import requests
import json

text = """
We are releasing the base model weights and network architecture of Grok-1, our large language model. Grok-1 is a 314 billion parameter Mixture-of-Experts model trained from scratch by xAI.

This is the raw base model checkpoint from the Grok-1 pre-training phase, which concluded in October 2023. This means that the model is not fine-tuned for any specific application, such as dialogue.

We are releasing the weights and the architecture under the Apache 2.0 license.

To get started with using the model, follow the instructions at github.com/xai-org/grok.

Model Details
Base model trained on a large amount of text data, not fine-tuned for any particular task.
314B parameter Mixture-of-Experts model with 25% of the weights active on a given token.
Trained from scratch by xAI using a custom training stack on top of JAX and Rust in October 2023.
The cover image was generated using Midjourney based on the following prompt proposed by Grok: A 3D illustration of a neural network, with transparent nodes and glowing connections, showcasing the varying weights as different thicknesses and colors of the connecting lines.
"""

#"Describe the bug. When selecting to use a self hosted ollama instance, there is no way to do 2 things:Set the server endpoint for the ollama instance. in my case I have a desktop machine with a good GPU and run ollama there, when coding on my laptop i want to use the ollama instance on my desktop, no matter what value is set for cody.autocomplete.advanced.serverEndpoint, cody will always attempt to use http://localhost:11434, so i cannot sepcify the ip of my desktop machine hosting ollama.Use a different model on ollama - no matter what value is set for cody.autocomplete.advanced.model, for example when llama-code-13b is selected, the vscode output tab for cody always says: █ CodyCompletionProvider:initialized: unstable-ollama/codellama:7b-code "

url_generate = "http://localhost:11434/api/generate"

data = {
"model": "mistral:latest",
"prompt": f"{text}",#"Why is the sky blue?",
"system":"Translate the following into chinese and only show me the translated",
"stream": False
}

def get_response(url, data):
response = requests.post(url, json=data)
response_dict = json.loads(response.text)
response_content = response_dict["response"]
return response_content

res = get_response(url_generate,data)
print(res)

大概演示一下,具體細(xì)節(jié)再調(diào)整吧。今天內(nèi)容到些結(jié)束。

本文章轉(zhuǎn)載微信公眾號@峰哥Python筆記

上一篇:

基于 LangChain技術(shù)的物流行業(yè)信息咨詢智能問答系統(tǒng)(四)

下一篇:

ChatGPT API接口編程基礎(chǔ)與使用技巧
#你可能也喜歡這些API文章!

我們有何不同?

API服務(wù)商零注冊

多API并行試用

數(shù)據(jù)驅(qū)動(dòng)選型,提升決策效率

查看全部API→
??

熱門場景實(shí)測,選對API

#AI文本生成大模型API

對比大模型API的內(nèi)容創(chuàng)意新穎性、情感共鳴力、商業(yè)轉(zhuǎn)化潛力

25個(gè)渠道
一鍵對比試用API 限時(shí)免費(fèi)

#AI深度推理大模型API

對比大模型API的邏輯推理準(zhǔn)確性、分析深度、可視化建議合理性

10個(gè)渠道
一鍵對比試用API 限時(shí)免費(fèi)