這次運(yùn)行花了大約15秒,但它還沒有找到我們要求的所有信息。接下來,讓我們嘗試一種不同的方法。

方法二:Pydantic

? ? ? 在下面的代碼中,Pydantic用于定義表示競(jìng)爭(zhēng)情報(bào)信息結(jié)構(gòu)的數(shù)據(jù)模型。PydanticPython的數(shù)據(jù)驗(yàn)證和解析庫,允許您使用Python數(shù)據(jù)類型定義簡(jiǎn)單或復(fù)雜的數(shù)據(jù)結(jié)構(gòu)。在這種情況下,我們使用Pydantic模型(競(jìng)爭(zhēng)對(duì)手和公司)來定義競(jìng)爭(zhēng)情報(bào)數(shù)據(jù)的結(jié)構(gòu)。

import pandas as pdfrom typing import Optional, Sequencefrom langchain.llms import OpenAIfrom langchain.output_parsers import PydanticOutputParserfrom langchain.prompts import PromptTemplatefrom pydantic import BaseModel
# Load data from CSVdf = pd.read_csv("data.csv", sep=';')
# Pydantic models for competitive intelligenceclass Competitor(BaseModel): company: str offering: str advantage: str products_and_services: str additional_details: str
class Company(BaseModel): """Identifying information about all competitive intelligence in a text.""" company: Sequence[Competitor]
# Set up a Pydantic parser and prompt templateparser = PydanticOutputParser(pydantic_object=Company)prompt = PromptTemplate( template="Answer the user query.\n{format_instructions}\n{query}\n", input_variables=["query"], partial_variables={"format_instructions": parser.get_format_instructions()},)
# Function to process each row and extract informationdef process_row(row): _input = prompt.format_prompt(query=row['INTEL']) model = OpenAI(temperature=0) output = model(_input.to_string()) result = parser.parse(output) # Convert Pydantic result to a dictionary competitor_data = result.model_dump()
# Flatten the nested structure for DataFrame creation flat_data = {'INTEL': [], 'company': [], 'offering': [], 'advantage': [], 'products_and_services': [], 'additional_details': []}
for entry in competitor_data['company']: flat_data['INTEL'].append(row['INTEL']) flat_data['company'].append(entry['company']) flat_data['offering'].append(entry['offering']) flat_data['advantage'].append(entry['advantage']) flat_data['products_and_services'].append(entry['products_and_services']) flat_data['additional_details'].append(entry['additional_details'])
# Create a DataFrame from the flattened data df_cake = pd.DataFrame(flat_data)
return df_cake
# Apply the function to each row and concatenate the resultsintel_df = pd.concat(df.apply(process_row, axis=1).tolist(), ignore_index=True)
# Display the resulting DataFrameintel_df.head()

       速度很快!與create_extract_chain不同,這次找到了所有條目的詳細(xì)信息。

第一部分總結(jié):

? ? ? 發(fā)現(xiàn)PydanticOutputParser更快、更可靠。每次運(yùn)行大約需要1秒和400個(gè)tokens。而create_extract_chain運(yùn)行大約需要2.5秒和250個(gè)tokens。

? ? ? ?我們已經(jīng)設(shè)法從非結(jié)構(gòu)化文本中提取了一些結(jié)構(gòu)化數(shù)據(jù)!第2部分重點(diǎn)是使用LangChain Agent分析這些結(jié)構(gòu)化數(shù)據(jù)。

第二部分:使用LangChain Agent分析這些結(jié)構(gòu)化數(shù)據(jù)

什么是LangChain Agent?

? ? ? ?在LangChain中,Agent是利用語言模型來選擇要執(zhí)行的操作序列的系統(tǒng)。與Chain不同的是,在Chain中,動(dòng)作被硬編碼在代碼中,而Agent利用語言模型作為“推理引擎”,決定采取哪些動(dòng)作以及以何種順序采取這些動(dòng)作。

? ? ? ?現(xiàn)在,使用LangChain中的CSV Agent來分析我們的結(jié)構(gòu)化數(shù)據(jù)了:

步驟1:創(chuàng)建Agent

       首先加載必要的庫:

from langchain.agents.agent_types import AgentTypefrom langchain_community.llms import OpenAIfrom langchain_experimental.agents.agent_toolkits import create_csv_agent

       創(chuàng)建Agent

agent = create_csv_agent( OpenAI(temperature=0), "data/intel.csv", verbose=True, agent_type=AgentType.ZERO_SHOT_REACT_DESCRIPTION,)

        現(xiàn)在我們可以用一些問題來測(cè)試我們的Agent:

步驟2:向Agent提出問題

? ? ? ? 當(dāng)你問LangChain Agent問題時(shí),你會(huì)看到它思考自己的行為。

        詢問通用問題

agent.run("What insights can I get from this data?")

‘This dataframe contains information about different companies and their products/services, as well as additional details and potential opportunities for improvement.’

       詢問競(jìng)爭(zhēng)對(duì)手優(yōu)勢(shì)

agent.run("What are 3 specific areas of focus that you can obtain through analyzing the advantages offered by the competition?")

Three specific areas of focus that can be obtained through analyzing the advantages offered by the competition are: streamlining production processes, incorporating unique and distinctive flavors, and using sustainable and high-quality ingredients.

        詢問主要競(jìng)爭(zhēng)對(duì)手主題

agent.run("What are some key themes that the competitors represented in the data are focusing on providing? Be specific with examples, and talk about the advantages of these")

‘The key themes that the competitors are focusing on providing are efficiency, unique flavors, and high-quality ingredients. For example, Coco candy co is using the 77Tyrbo Choco machine to coat their candy 

gummies, which streamlines the process and saves time. Cinnamon Bliss Bakery adds a secret touch of cinnamon in their chocolate brownies with the CinnaMagic ingredient, which adds a distinctive flavor. Choco Haven factory uses organic and locally sourced ingredients, including the EcoCocoa brand, to elevate the quality of their chocolates.’

參考文獻(xiàn):

[1] https://github.com/ingridstevens/AI-projects/blob/main/unstructured_data/data.csv

[2] https://medium.com/@ingridwickstevens/extract-structured-data-from-unstructured-text-using-llms-71502addf52b

[3] https://medium.com/@ingridwickstevens/analyze-structured-data-extracted-from-unstructured-text-using-llm-agents-4ea4eaf3ae78

[4] https://github.com/ingridstevens/AI-projects/blob/main/unstructured_data/unstructured_extraction_chain.ipynb

[5] https://github.com/ingridstevens/AI-projects/blob/main/unstructured_data/unstructured_pydantic.ipynb

[6]?https://github.com/ingridstevens/AI-projects/blob/main/unstructured_data/data.csv

文章轉(zhuǎn)自微信公眾號(hào)@ArronAI

上一篇:

LLM之LangChain(四)| 介紹LangChain 0.1在可觀察性、可組合性、流媒體、工具、RAG和代理方面的改進(jìn)

下一篇:

LLM之LangChain(六)| 使用LangGraph創(chuàng)建一個(gè)超級(jí)AI Agent
#你可能也喜歡這些API文章!

我們有何不同?

API服務(wù)商零注冊(cè)

多API并行試用

數(shù)據(jù)驅(qū)動(dòng)選型,提升決策效率

查看全部API→
??

熱門場(chǎng)景實(shí)測(cè),選對(duì)API

#AI文本生成大模型API

對(duì)比大模型API的內(nèi)容創(chuàng)意新穎性、情感共鳴力、商業(yè)轉(zhuǎn)化潛力

25個(gè)渠道
一鍵對(duì)比試用API 限時(shí)免費(fèi)

#AI深度推理大模型API

對(duì)比大模型API的邏輯推理準(zhǔn)確性、分析深度、可視化建議合理性

10個(gè)渠道
一鍵對(duì)比試用API 限時(shí)免費(fèi)