电子邮件提取#
让我们评估 LLM 从电子邮件文本中提取结构化信息的能力。
%pip install -U langchain langchain_benchmarks openai rapidfuzz
import os
# Get your API key from https://smith.langchain.com/settings
os.environ["LANGCHAIN_API_KEY"] = "sk-..."
os.environ["OPENAI_API_KEY"] = "sk-..."
from langchain_benchmarks import clone_public_dataset, registry
为了使此代码正常工作,请使用您的凭据配置 LangSmith 环境变量。
task = registry["Email Extraction"]
task
名称 | 电子邮件提取 |
类型 | 提取任务 |
数据集 ID | a1742786-bde5-4f51-a1d8-e148e5251ddb |
描述 | 一个包含 42 封来自垃圾邮件文件夹的真实电子邮件数据集,已去重,并删除了语义 HTML 标签,以及用于从任意 .mbox 文件(例如 Gmail 导出的文件)中提取和格式化其他电子邮件的脚本。在初始处理后,对数据进行了一些额外的手动清理。请参阅 https://github.com/jacoblee93/oss-model-extraction-evals。 |
print(task.description)
A dataset of 42 real emails deduped from a spam folder, with semantic HTML tags removed, as well as a script for initial extraction and formatting of other emails from an arbitrary .mbox file like the one exported by Gmail.
Some additional cleanup of the data was done by hand after the initial pass.
See https://github.com/jacoblee93/oss-model-extraction-evals.
克隆与该任务关联的数据集
clone_public_dataset(task.dataset_id, dataset_name=task.name)
Dataset Email Extraction already exists. Skipping.
You can access the dataset at https://smith.langchain.com/o/ebbaf2eb-769b-4505-aca2-d11de10372a4/datasets/309a2fce-ce68-43aa-befb-67f94d0c3570.
架构#
每个提取任务都在 Pydantic BaseModel 对象中定义了预期的输出架构,我们可以使用它来获取 JSON 架构对象。
import pprint
pprint.pprint(task.schema.schema())
{'definitions': {'ToneEnum': {'description': 'The tone of the email.',
'enum': ['positive', 'negative'],
'title': 'ToneEnum',
'type': 'string'}},
'description': 'Relevant information about an email.',
'properties': {'action_items': {'description': 'A list of action items '
'requested by the email',
'items': {'type': 'string'},
'title': 'Action Items',
'type': 'array'},
'sender': {'description': "The sender's name, if available",
'title': 'Sender',
'type': 'string'},
'sender_address': {'description': "The sender's address, if "
'available',
'title': 'Sender Address',
'type': 'string'},
'sender_phone_number': {'description': "The sender's phone "
'number, if available',
'title': 'Sender Phone Number',
'type': 'string'},
'tone': {'allOf': [{'$ref': '#/definitions/ToneEnum'}],
'description': 'The tone of the email.'},
'topic': {'description': 'High level description of what the '
'email is about',
'title': 'Topic',
'type': 'string'}},
'required': ['action_items', 'topic', 'tone'],
'title': 'Email',
'type': 'object'}
定义提取链#
让我们构建一个提取链,我们可以使用它从电子邮件中获取结构化信息。
from langchain.chat_models import ChatOpenAI
from langchain.output_parsers.openai_functions import JsonOutputFunctionsParser
llm = ChatOpenAI(model="gpt-3.5-turbo-16k", temperature=0).bind_functions(
functions=[task.schema],
function_call=task.schema.schema()["title"],
)
output_parser = JsonOutputFunctionsParser()
extraction_chain = task.instructions | llm | output_parser | (lambda x: {"output": x})
extraction_chain.invoke(
{
"input": "Hello Dear MR. I want you to send me gold to get rich."
" First buy an envelope. Then open it and put some gold inside. "
"Then close it and finally mail it to my address at 12345 My Gold Way."
" You can call me any time at 000-1212-1111."
}
)
{'output': {'sender': 'Unknown',
'sender_phone_number': '000-1212-1111',
'sender_address': '12345 My Gold Way',
'action_items': ['Buy an envelope',
'Put gold inside',
'Close the envelope',
"Mail it to sender's address"],
'topic': 'Request to send gold',
'tone': 'positive'}}
现在是衡量我们链路有效性的时刻!
评估#
让我们现在评估一下链路。
from langsmith.client import Client
from langchain_benchmarks.extraction import get_eval_config
client = Client()
eval_llm = ChatOpenAI(model="gpt-4", model_kwargs={"seed": 42})
eval_config = get_eval_config(eval_llm)
test_run = client.run_on_dataset(
dataset_name=task.name,
llm_or_chain_factory=extraction_chain,
evaluation=eval_config,
verbose=True,
project_metadata={
"arch": "openai-functions",
},
)
View the evaluation results for project 'monthly-look-12' at:
https://smith.langchain.com/o/ebbaf2eb-769b-4505-aca2-d11de10372a4/projects/p/177d564f-516d-4b65-bae0-37154b529470?eval=true
View all tests for Dataset Email Extraction at:
https://smith.langchain.com/o/ebbaf2eb-769b-4505-aca2-d11de10372a4/datasets/309a2fce-ce68-43aa-befb-67f94d0c3570
[------------------------------------------------->] 42/42
Eval quantiles:
inputs.input \
count 42
unique 42
top --- \n|\n\nEvery business faces its set of cu...
freq 1
mean NaN
std NaN
min NaN
25% NaN
50% NaN
75% NaN
max NaN
outputs.output \
count 42
unique 42
top {'sender': 'EMC Financial', 'sender_address': ...
freq 1
mean NaN
std NaN
min NaN
25% NaN
50% NaN
75% NaN
max NaN
feedback.json_edit_distance feedback.score_string:accuracy error \
count 42.000000 42.000000 0
unique NaN NaN 0
top NaN NaN NaN
freq NaN NaN NaN
mean 0.566434 0.485714 NaN
std 0.178473 0.235374 NaN
min 0.190883 0.100000 NaN
25% 0.441978 0.300000 NaN
50% 0.581750 0.300000 NaN
75% 0.687949 0.700000 NaN
max 0.901852 0.900000 NaN
execution_time
count 42.000000
unique NaN
top NaN
freq NaN
mean 3.527634
std 0.518258
min 2.579424
25% 3.153659
50% 3.525745
75% 3.796416
max 5.144408
与另一个 LLM 比较#
让我们与 Anthropic LLM 进行比较。
from langchain.chat_models import ChatAnthropic
from langchain.output_parsers.xml import XMLOutputParser
from langchain.prompts import ChatPromptTemplate
# This is the schema the model will populate
xsd = """<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified">
<xs:element name="Email">
<xs:complexType>
<xs:sequence>
<xs:element name="sender" type="xs:string" minOccurs="0"/>
<xs:element name="sender_phone_number" type="xs:string" minOccurs="0"/>
<xs:element name="sender_address" type="xs:string" minOccurs="0"/>
<xs:element name="action_items" type="ActionItemsType" minOccurs="1"/>
<xs:element name="topic" type="xs:string" minOccurs="1"/>
<xs:element name="tone" type="ToneEnumType" minOccurs="1"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:complexType name="ActionItemsType">
<xs:sequence>
<xs:element name="item" type="xs:string" minOccurs="0" maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>
<xs:simpleType name="ToneEnumType">
<xs:restriction base="xs:string">
<xs:enumeration value="positive"/>
<xs:enumeration value="negative"/>
</xs:restriction>
</xs:simpleType>
</xs:schema>"""
prompt = ChatPromptTemplate.from_messages(
[
(
"system",
"You are a data extraction bot. Always respond "
"only with XML of the following schema:\n{xsd}",
),
(
"user",
"Extract Email from the folowing Document:\n"
"<Document>\n{input}\n</Document>\n"
"RESPOND ONLY IN XML THEN STOP.",
),
]
).partial(xsd=xsd)
claude = ChatAnthropic(model="claude-2", temperature=1)
def convert_parsed_email(email_dict: dict) -> dict:
"""Conver the XML-parsed dictionary to a flattened dict."""
if "Email" not in email_dict:
return email_dict
# Flatten the tags
result = {k: v for item in email_dict["Email"] for k, v in item.items()}
result["action_items"] = [
item["item"] for item in (result.get("action_items") or [])
]
return {"output": result}
claude_extraction_chain = prompt | claude | XMLOutputParser() | convert_parsed_email
result = claude_extraction_chain.invoke(
{
"input": "Hello Dear MR. I want you to send me gold to get rich."
" First buy an envelope. Then open it and put some gold inside. "
"Then close it and finally mail it to my address at 12345 My Gold Way."
" You can call me any time at 000-1212-1111."
}
)
result
{'output': {'sender': None,
'sender_phone_number': '000-1212-1111',
'sender_address': '12345 My Gold Way',
'action_items': ['buy an envelope',
'open it',
'put some gold inside',
'close it',
'mail it to my address'],
'topic': 'sending gold',
'tone': 'negative'}}
claude_test_run = client.run_on_dataset(
dataset_name=task.name,
llm_or_chain_factory=claude_extraction_chain,
evaluation=eval_config,
verbose=True,
project_metadata={
"arch": "claude-xml",
},
)
View the evaluation results for project 'frosty-moon-4' at:
https://smith.langchain.com/o/ebbaf2eb-769b-4505-aca2-d11de10372a4/projects/p/81d41017-bcda-450d-8991-9bf744c7ebb8?eval=true
View all tests for Dataset Email Extraction at:
https://smith.langchain.com/o/ebbaf2eb-769b-4505-aca2-d11de10372a4/datasets/309a2fce-ce68-43aa-befb-67f94d0c3570
[--------------------------------------> ] 33/42
Chain failed for example 9a707fca-4ba7-4f7d-8912-b9fd71e9901e with inputs {'input': "---|---|---|--- \n \nBook with Fall Sale Extras Through November 21! Savings! OBC! Visa Gift Card\n+ More \n \n--- \n|\n\n| | | | | | | \n--- \n| | \n--- \n| | SHOP THE FALL CRUISE SALE \n--- \n| | \n--- \n \n**Celebrity Cruises** Celebrity Cruises receive **Exclusive Pricing** with\nup to **$450 BONUS Savings per Stateroom** based on double\noccupancyand even more for extra guests! Enjoy **Exclusive Tips**\non 2024 sailings, up to**$2150 Onboard Credit** , and up to a **$1700 Visa\nGiftCard** on Galapagos sailings or up to a **$650 Visa Gift Card** on\nother departures. **Drinks** and **Wi-Fi** are All Included, too! **See=\nThis Offer =E2=96=B8** \n \n| | \n--- \n \n**Viking** Enjoy your favorite Viking voyages with up to=C2=A0 **$1200\nShipboard Credit** from Online Vacation Center when you book by Nov 21!\nPlus, select sailings get **Airfare** , **Stateroom Upgrades** , **Special\nFares** =C2=A0and only **$25 Deposits** on the world's #1 Cruise Line for\nOceans, Rivers & Expeditions! Guided Tours, Wi-Fi, Select Beverages, Meals &\nMore Included. **SeeThis Offer =E2=96=B8** \n \n| | \n--- \n \n**Royal Caribbean** Sail Away on Royal Caribbean withup to **$1000 BONUS\nOnboard Credit** and **Specialty Dining** exclusively from Online Vacation\nCenter!=C2=A0Plus, up to **30% SAVINGS** on all Cruises, **Kids Sail =\nFree** on select sailings and up to **$500 Savings on Airfare** on select\nAlaska and Europe sailings. **SeeThis Offer =E2=96=B8** \n \n| | \n--- \n \n**Oceania Cruises** Choose Your Offer! Receive **Prepaid Gratuities** on\nselect sailings OR receive up to **$1000 Onboard Credit** on 30 Europe\nvoyages. Enjoy _simply_ MORE™ with **2 for 1** Cruise Fares, **Roundtrip\nAirfare** , Transfers & Taxes, **Unlimited Wi-Fi** , up to **$1600 Shore\nExcursion Credit** , Specialty Dining, Champagne, Wine, and more. Plus,\nreceive up to a **$1500 Visa Gift Card** from Online Vacation Center!\n**SeeThis Offer =E2=96=B8** \n \n| | \n--- \n \n**Regent Seven Seas Cruises** Book your luxury cruise on Regent Seven Seas\nby Nov 21 and receive up to **$2000** in **Exclusive Savings** per Suite on\nall sailings through June 2026! Plus, enjoy **Bonus Savings =** worth up to\n**30%** on select 2024 sailings when you book by Nov 12. Receive up to a\n**$1400 Visa Gift Card** from us, and enjoy Regent standard inclusions like\n**Business Class Airfare** on intercontinental flights and **Airfare** on\ndomestic flights, **Shore Excursions** , **Gratuities** and More. **See This\nOffer =E2=96=B8** \n \n| | \n--- \n \n**Azamara** Enjoy up to **$1500 Onboard Credit** , up to an=C2=A0 **$800\nVisa Gift Card** , **Stateroom Upgrades** and **20% Off Suites** onselect\nsailings, and More on Azamara during our Fall Sale! Plus up to a **$200\nBONUS Visa Gift Card** on our Exclusive Cruise Packages. Receive Azamara\nstandard inclusions like select **Beverages , **Gratuities** and More. **See\nThis Offer =E2=96=B8** \n \n| | \n--- \n \n**Norwegian Cruise Line** Enjoy up to **$1000 Onboard Credit** and\n**Gratuities** on 7+ night Balconies or higher during our Fall Sale! Plus\n**50% OFF** Cruise Fares and **Free at Sea:** Open Bar, Specialty Dining, =\nWi-Fi, Shore Excursion Credits and extra guests on select sailings. **See=\nThis Offer =E2=96=B8** \n \n| | \n--- \n \n**Luxury Hotels** Whether your personal definition ofluxury is an urban\noasis or an opulent villa, a wine-country cottage or a Caribbean hammock,\nOnline Vacation Center has the perfect accommodations for your next\nvacation. Book now for **Exclusive Offers** **Discounts** ,\n**Extra Nights** , **Resort Credits** , **Complimentary Amenities** and\nMore! **SeeThis Offer =E2=96=B8** \n \n| | \n--- \n \n**Enrichment Journeys** Book an **Enrichment Journey** on Celebrity Cruises\nfor up to **$2150 Onboard Credit** , up to **$450 Off** per stateroom and up\nto a **$650 Visa Gift Card** with **Exclusive Tips** on 2024 sailings +\n**Drinks** and **Wi-Fi** All Included. Journeys include **Airfare**\n, 4-star+ **Hotel** Stays, **Transfers** , **Taxes** , select **Meals**\nand More. **SeeThis Offer =E2=96=B8** \n \n| | \n--- \n \n**Princess Cruises** Enjoy up to **$1200 Onboard Credit** , up to **50% Off\nCruise Fares =** & **50% Off Deposits** during our Fall Sale! Choose =\nPrincess Plus to receive Included **Drinks, Crew Appreciation** & **Wi-Fi**\n_(over $950 in added value!)_ OR skip the frills for the lowest rate. **See\nThis Offer =E2=96=B8** \n \n| | \n--- \n \n**Holland America Line** Get more on your Holland America cruise with up to\n**$1450 Onboard Credit** and **Gratuities** on select sailings, exclusively\nfrom us! Plus, **Have It All** with **Wi-Fi, Beverages, Specialty Dining**\nand **Shore Excursions** or skip the frills for a lower cruise fare. For a\nlimited time, enjoy **BONUS Shore Excursion** & **Air Credits** , $99\nDeposits and **Kids Sail Free** on select 2024 sailings. **SeeThis Offer\n=E2=96=B8** \n \n| | \n--- \n| | \n--- \n| | \n--- \n|\n\n### Hours of Operation\n\n**Monday=E2=80=93Friday** 9 am=E2=80=936 pm ET **Saturday** 10 am=E2=80=934\npm ET **Sunday** Closed \n \n--- \n| | \n--- \n \n**Terms and Conditions** : New Bookings Only. Select Sailings Apply.\nRates, itinerary and any available amenities are by sail date and are\nsubject to change. **Repricing an existing reservation or requesting a\ncancel/rebook is not permitted for this promotion. This promotion is not\napplicable for reservations that used FCCs or utilized Lift & Shift program.\nCall to see what you qualify for (please note that any modifications may\nresult in a $100 per person change fee). Fall Sale**: Offer expires\n11/21/23. Airfare is included on select sailings from select gateways.\nAdditional gateways may be available for lowadd-ons. The identity of the air\ncarrier, which may include the carrier's code-share partner, will be\nassigned and disclosed at a later date. Purchases made onboard plane or in\nterminal not included. Onboard Credit isper stateroom on select sailings.\nPrices are per person, double occupancy.Prices and itineraries are based on\navailability and are subject to changewithout notice. Offer can be withdrawn\nat any time. All fares may be subject to fuel surcharges if imposed by\ncruise lines and airlines. Government taxes, air taxes, transfers, service\nfees and other ancillary charges are additional unless otherwise noted.\nAdditional terms, conditionsand restrictions apply; view individual offers\nfor more information. Online Vacation Center reserves the right to cancel\nthe Offer at any time, correct any errors, inaccuracies or omissions, and\nchange or update fares, fees and surcharges at any time without prior\nnotice. Online Vacation Center is a registered Seller of Travel with the\nStates of Florida (ST-32947), California (CST-2064227-40) and Washington (WA\nSOT 602250083). 110823CB \n \n| | \n--- \n \n* * *\n\nThis message was sent to address: jacob@gmail.com \n \nMore Travel Deals \\- Sign Up \\- Forward to Friend \\- Unsubscribe \\- Privacy \\-\nDisclaimers \n \n(C) 2023 Dunhill Vacations Inc. - 2307 W. Broward Blvd, Ste 402 - Fort\nLauderdale, FL 33312 \n \n--- \n\\----_NmP-64d90535a0e2740e-Part_1--\n\n"}
Error Type: ValueError, Message: Could not parse output: <Email>
<sender></sender>
<sender_phone_number></sender_phone_number>
<sender_address></sender_address>
<action_items>
<item>Book Celebrity Cruises by Nov 21 for exclusive pricing, bonuses, and gifts</item>
<item>Book Viking by Nov 21 for bonuses and special offers</item>
<item>Book Royal Caribbean by Nov 21 for onboard credits, dining, and savings</item>
<item>Book Oceania Cruises by Nov 21 for prepaid gratuities or onboard credits</item>
<item>Book Regent Seven Seas by Nov 21 for exclusive savings and gift cards</item>
<item>Book Azamara by Nov 21 for onboard credits, upgrades, and savings</item>
<item>Book Norwegian Cruise Line for discounts, amenities, and savings</item>
<item>Book luxury hotels for exclusive offers and discounts</item>
<item>Book an Enrichment Journey on Celebrity Cruises for bonuses and inclusions</item>
<item>Book Princess Cruises for discounts, amenities, and onboard credits</item>
<item>Book Holland America Line for bonuses,
[------------------------------------------------->] 42/42
Eval quantiles:
inputs.input \
count 42
unique 42
top --- \n|\n\nEvery business faces its set of cu...
freq 1
mean NaN
std NaN
min NaN
25% NaN
50% NaN
75% NaN
max NaN
outputs.output \
count 41
unique 41
top {'sender': 'Sam', 'sender_phone_number': '800....
freq 1
mean NaN
std NaN
min NaN
25% NaN
50% NaN
75% NaN
max NaN
feedback.json_edit_distance feedback.score_string:accuracy \
count 41.000000 41.000000
unique NaN NaN
top NaN NaN
freq NaN NaN
mean 0.382352 0.565854
std 0.164442 0.238338
min 0.107011 0.100000
25% 0.252252 0.300000
50% 0.375427 0.700000
75% 0.532982 0.700000
max 0.753704 1.000000
error execution_time
count 1 42.000000
unique 1 NaN
top Could not parse output: <Email>\n <sender></s... NaN
freq 1 NaN
mean NaN 9.082149
std NaN 2.192165
min NaN 6.203642
25% NaN 7.807354
50% NaN 8.497452
75% NaN 9.632442
max NaN 19.564479
检查#
在这里,我们将稍微了解一下底层结果。
需要注意的几点
对于本次运行,Anthropic 的平均表现更好
正确性很低 - 正确获取确切信息可能很困难
df = test_run.to_dataframe().join(claude_test_run.to_dataframe(), rsuffix="_claude")
df.head(5)
inputs.input | outputs.output | reference | feedback.json_edit_distance | feedback.score_string:accuracy | error | execution_time | inputs.input_claude | outputs.output_claude | reference_claude | feedback.json_edit_distance_claude | feedback.score_string:accuracy_claude | error_claude | execution_time_claude | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
61c40266-b994-49a2-8768-d54704cee079 | --- \n|\n\n每个企业都面临着一系列的挑战... | {'sender': 'EMC Financial', 'sender_address': ... | {'output': {'tone': 'positive', 'topic': 'Busi... | 0.562112 | 0.7 | None | 4.358837 | --- \n|\n\n每个企业都面临着一系列的挑战... | {'sender': 'Sam', 'sender_phone_number': '800.... | {'output': {'tone': 'positive', 'topic': 'Busi... | 0.301242 | 0.7 | None | 10.501042 |
2dcfadff-51dc-458c-8af0-f47a795d0c9b | 你好 Jacob!\n\n \n\n你注意到这些了吗... | {'sender': 'EMC 的 Sam', 'action_items': ['Fil... | {'output': {'tone': 'positive', 'topic': 'Gree... | 0.505338 | 0.7 | None | 3.946547 | 你好 Jacob!\n\n \n\n你注意到这些了吗... | {'sender': 'EMC 的 Sam', 'sender_phone_number'... | {'output': {'tone': 'positive', 'topic': 'Gree... | 0.113879 | 0.7 | None | 8.511848 |
a9c481ba-9ca5-408c-8c9c-f29127a70f7b | 你好!\n\n | \n--- \n \n我们更新了我们... | {'sender': 'Crunchbase 团队', 'action_items': ... | {'output': {'tone': 'positive', 'topic': 'Upda... | 0.245283 | 0.9 | None | 3.972396 | 你好!\n\n | \n--- \n \n我们更新了我们... | {'sender': None, 'sender_phone_number': None, ... | {'output': {'tone': 'positive', 'topic': 'Upda... | 0.343434 | 0.7 | None | 9.739630 |
98358188-6e36-42ef-9298-83acf8d9dd12 | 请考虑所有捐赠方式 \n保护红狼... | {'sender': 'Tim Whalen', 'sender_address': 'Sa... | {'output': {'tone': 'positive', 'topic': 'Dona... | 0.280556 | 0.7 | None | 3.890567 | 请考虑所有捐赠方式 \n保护红狼... | {'sender': None, 'sender_phone_number': None, ... | {'output': {'tone': 'positive', 'topic': 'Dona... | 0.255556 | 0.3 | None | 9.640687 |
0f29e857-fc08-45dd-b1ea-dde1e00c4a62 | 有些旅行者会提前计划;而另一些人更喜欢随性... | {'sender': 'Dunhill Vacations Inc.', 'sender_a... | {'output': {'tone': 'positive', 'topic': 'Trav... | 0.552463 | 0.7 | None | 4.252478 | 有些旅行者会提前计划;而另一些人更喜欢随性... | {'sender': 'Dunhill Vacations Inc.', 'sender_p... | {'output': {'tone': 'positive', 'topic': 'Trav... | 0.584582 | 0.3 | None | 6.803259 |
(
df["feedback.json_edit_distance"].mean(),
df["feedback.json_edit_distance_claude"].mean(),
)
(0.5664337704936568, 0.382351925386955)
(
df["feedback.score_string:accuracy"].mean(),
df["feedback.score_string:accuracy_claude"].mean(),
)
(0.48571428571428565, 0.5658536585365853)
# Rows for which OAI > Claude by at least 30%, according to the LLM-based evaluator
oai_beats_claude = df[
(df["feedback.score_string:accuracy"] - df["feedback.score_string:accuracy_claude"])
>= 0.3
]
oai_beats_claude[["inputs.input", "outputs.output", "outputs.output_claude"]]
inputs.input | outputs.output | outputs.output_claude | |
---|---|---|---|
98358188-6e36-42ef-9298-83acf8d9dd12 | 请考虑所有捐赠方式 \n保护红狼... | {'sender': 'Tim Whalen', 'sender_address': 'Sa... | {'sender': None, 'sender_phone_number': None, ... |
0f29e857-fc08-45dd-b1ea-dde1e00c4a62 | 有些旅行者会提前计划;而另一些人更喜欢随性... | {'sender': 'Dunhill Vacations Inc.', 'sender_a... | {'sender': 'Dunhill Vacations Inc.', 'sender_p... |
35414bbc-4d38-41ed-876f-2a6a067e66d5 | --- \n \n|\n\n我们通过了“停止危险”法案... | {'sender': 'Matt Haney', 'sender_address': '10... | {'sender': 'Matt Haney', 'sender_phone_number'... |
ff1b2ed6-26a7-4501-96aa-6e3e10eadc72 | --- \n|\n\n# 我们提供独特的融资方案... | {'sender': 'info@championadvance.com', 'sender... | {'sender': None, 'sender_phone_number': None, ... |
# Rows for which Claude > OAI by at least 50%, according to the LLM-based evaluator
oai_beats_claude = df[
(df["feedback.score_string:accuracy_claude"] - df["feedback.score_string:accuracy"])
>= 0.5
]
oai_beats_claude[["inputs.input", "outputs.output", "outputs.output_claude"]]
inputs.input | outputs.output | outputs.output_claude | |
---|---|---|---|
02cfdfc4-c3dc-47e6-ad44-8e437ebf2dce | ---|---|---|--- \n \n| \n--- \n **限时...** | {'action_items': [], 'topic': '限时优惠'... | {'sender': 'Dunhill Vacations Inc.', 'sender_p... |
198dc232-8f98-484a-a65e-048cfb517282 | 你好 Jacob,\n\n \n\n对于许多小企业来说... | {'sender': 'EMC 的 Sam', 'action_items': ['Kic... | {'sender': 'EMC 的 Sam', 'sender_phone_number'... |
c222957f-cc7e-46af-9cca-1270f3fa5621 | 你好 Jacob,\n\n \n\n你知道财富吗... | {'sender': 'EMC 的 Sam', 'action_items': ['qua... | {'sender': 'EMC 的 Sam', 'sender_phone_number'... |
119ef037-8744-4eb9-93df-64458278e4f8 | --- \n| | 立即资格预审 \n--- \n \n \n您好... | {'sender': 'EMC 的 Sam', 'action_items': ['Che... | {'sender': 'EMC 的 Sam id:2023-09-19-20:17:53:... |