Arize Phoenix: 快速上手指南
- English: Arize Phoenix: up and running
Table of Contents
Arize Phoenix 是由 Arize AI 构建的开源 Python 库,旨在帮助 AI 工程师在开发过程中追踪、评估、实验和优化 AI 应用。
本文涵盖了在 macOS 上让 Arize Phoenix 运行起来所需的一切 —— 从初始设置到运行第一次评估。
为什么选择 Arize Phoenix? #
在评估 AI 应用的可观察性和监控工具时,我们探索了多种选择来增强工作流程。关键需求包括开源软件、自托管基础设施以及与各种 AI 模型的兼容性。Arize Phoenix 满足了所有这些标准。
作为 Python 新手,我特别欣赏其直观的界面和用户友好的方法,这使得实施和采用变得非常简单。此外,我从其 Slack 社区获得了很多帮助。
要求 #
Python 包管理器 #
uv
1 是唯一需要安装的工具。
- Homebrew
brew install uv
- PyPI
pipx install uv
其他选择请参阅官方指南。
LLM API 密钥 #
Arize Phoenix 支持多种 LLM 提供商进行评估。本文使用带有 OpenRouter2 的 LiteLLMModel
。在这里获取 API 密钥:https://openrouter.ai/settings/keys。
安装 #
创建项目 #
$ uv init my-project
Initialized project `my-project` at `/Users/james/tmp/my-project`
$ tree my-project/
my-project/
├── README.md
├── hello.py
└── pyproject.toml
1 directory, 3 files
安装 Arize Phoenix #
$ cd my-project
$ uv add arize-phoenix litellm arize-phoenix-otel openinference-instrumentation-litellm
uv
将在 my-project/.venv/
目录的虚拟环境中安装 arize-phoenix
包及其依赖项。
评估 #
启动 Python REPL #
$ uv run python
设置 API 密钥 #
import os
from getpass import getpass
os.environ["OPENROUTER_API_KEY"] = getpass("🔑 Enter your OpenRouter API key: ")
运行评估 #
import pandas as pd
from phoenix.evals import LiteLLMModel, QAEvaluator, run_evals
df = pd.DataFrame(
[
{
"input": "Where is Oriental Pearl Tower located?",
"reference": "The Oriental Pearl Tower[a] is a futurist TV tower in Lujiazui, Shanghai.",
"output": "It's located in Lujiazui, Shanghai.",
},
{
"input": "Is Oriental Pearl Tower the tallest in China?",
"reference": "Built from 1991 to 1994, the tower was the tallest structure in China until the completion of nearby World Financial Center in 2007.",
"output": "It's the tallest in China.",
}
]
)
eval_model = LiteLLMModel(model="openrouter/openai/gpt-4o-mini")
qa_evaluator = QAEvaluator(eval_model)
qa_eval_df = run_evals(
dataframe=df,
evaluators=[qa_evaluator],
provide_explanation=True,
)
run_evals
返回一个 DataFrame 列表。让我们看看评估结果:
>>> print(qa_eval_df)
[ label score explanation
0 correct 1 To determine if the answer is correct, we firs...
1 incorrect 0 To determine whether the answer correctly addr...]
>>> qa_eval_df[0].loc[0, 'explanation']
'To determine if the answer is correct, we first need to analyze the question, the reference text, and the answer provided.\n\n1. **Identify the Question**: The question asks, "Where is Oriental Pearl Tower located?" This is a straightforward inquiry about the geographical location of the Oriental Pearl Tower.\n\n2. **Examine the Reference Text**: The reference text states, "The Oriental Pearl Tower is a futurist TV tower in Lujiazui, Shanghai." This sentence provides a clear location for the Oriental Pearl Tower, specifying that it is in Lujiazui, which is a district in Shanghai.\n\n3. **Analyze the Answer**: The answer given is, "It\'s located in Lujiazui, Shanghai." This response directly addresses the question by repeating the location mentioned in the reference text.\n\n4. **Compare the Answer to the Reference**: The answer matches the information provided in the reference text. It specifies both the district (Lujiazui) and the city (Shanghai), which fully answers the question about the location of the Oriental Pearl Tower.\n\n5. **Conclusion**: Since the answer accurately reflects the information found in the reference text and fully answers the question, we can conclude that the answer is correct.\n\nLABEL: correct'
>>> qa_eval_df[0].loc[1, 'explanation']
'To determine whether the answer correctly addresses the question, we need to analyze both the question and the reference text.\n\n1. **Understanding the Question**: The question asks if the Oriental Pearl Tower is the tallest structure in China. This requires a definitive answer regarding its height in comparison to other structures in China.\n\n2. **Analyzing the Reference Text**: The reference states that the Oriental Pearl Tower was the tallest structure in China until the completion of the World Financial Center in 2007. This indicates that the Oriental Pearl Tower is no longer the tallest structure in China, as it was surpassed by another building.\n\n3. **Evaluating the Answer**: The answer provided is "It\'s the tallest in China." This statement directly contradicts the information in the reference text, which clearly states that the Oriental Pearl Tower was the tallest until 2007, implying that it is not currently the tallest.\n\n4. **Conclusion**: Since the answer claims that the Oriental Pearl Tower is still the tallest in China, but the reference text indicates that it is not, the answer does not correctly answer the question.\n\nLABEL: incorrect'
太好了!您刚刚完成了第一次评估。
可视化 #
Phoenix 包括一个用于可视化评估结果的网页界面。让我们调整我们的第一次评估,使这些结果在 UI 中可见。
启动 Phoenix 服务器 #
$ uv run phoenix serve
...omitted...
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:6006 (Press CTRL+C to quit)
Phoenix 正在运行。打开您的网页浏览器并导航到 http://127.0.0.1:6006 访问网页 UI。

启用跟踪 #
from phoenix.otel import register
from openinference.instrumentation.litellm import LiteLLMInstrumentor
tracer_provider = register(
project_name="my-project",
endpoint="http://localhost:6006/v1/traces",
)
LiteLLMInstrumentor().instrument(tracer_provider=tracer_provider)
重复评估部分的操作,然后再次打开 http://127.0.0.1:6006,您将看到一个名为 my-project
的新项目:

点击 my-project
,将看到两个 llm
跟踪:

结论 #
在本文中,学习了如何评估 LLM 并在 Phoenix 中查看跟踪。请继续关注下一篇文章,该文将涵盖如何对评估提供反馈。
OpenRouter 提供对来自多个提供商和平台的 AI 模型的访问。 ↩︎