跳到內容

計劃-執行-驗證:跨模型實作

本指南展示如何建構可跨多個 LLM 供應商運作的計劃-執行-驗證代理。與 ReAct 的單一代理不同,PEV 管理三個獨立元件(規劃器、執行器、驗證器),每個都可以使用不同的供應商。

成本優化:在品質重要的地方使用昂貴的模型(規劃、驗證),在速度重要的地方使用便宜的模型(執行)。

靈活性:每個元件可以獨立使用最適合的供應商:

  • 規劃器:使用最佳推理模型(例如 Claude Opus 用於複雜規劃)
  • 執行器:使用最快的模型(例如 GPT-4o-mini 用於快速執行)
  • 驗證器:使用最佳評估模型(例如 Gemini 提供不同視角)

可靠性:每個元件的備援選項減少單點故障。

  1. LangChain/LangGraph(推薦):框架管理的供應商抽象,採用最新的 2026 模式
  2. 手動抽象(教育性):建構您自己的實作以了解內部運作

方法 1:LangChain/LangGraph(推薦)

Section titled “方法 1:LangChain/LangGraph(推薦)”

LangChain 預設提供出色的供應商抽象。在 2026 年,LangGraph 是生產環境計劃-執行代理的推薦框架。

  • LangGraph:現在是使用 StateGraph 進行計劃-執行工作流程的標準
  • 需求langchain-core >= 0.3(支援 Pydantic v2)
  • 官方文件LangGraph 計劃與執行教學

LangChain 為所有 LLM 供應商使用統一的介面:

from langchain_anthropic import ChatAnthropic
from langchain_openai import ChatOpenAI
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_community.llms import Ollama
# 所有這些都具有相同的介面!
llm_claude = ChatAnthropic(model="claude-sonnet-4-5-20250929")
llm_gpt = ChatOpenAI(model="gpt-4-turbo-preview")
llm_gemini = ChatGoogleGenerativeAI(model="gemini-1.5-pro")
llm_local = Ollama(model="mistral")
# 相同的方法呼叫適用於所有
response = llm_claude.invoke("Hello!")
response = llm_gpt.invoke("Hello!")
Terminal window
pip install langchain>=0.3.0 langchain-anthropic langchain-openai langchain-google-genai langchain-community
from dataclasses import dataclass
from enum import Enum
from typing import List, Dict, Any, Optional
class StepStatus(Enum):
PENDING = "pending"
IN_PROGRESS = "in_progress"
COMPLETE = "complete"
FAILED = "failed"
@dataclass
class Step:
step_id: str
name: str
description: str
actions: List[str]
acceptance_criteria: List[str]
expected_outputs: List[str]
status: StepStatus = StepStatus.PENDING
retry_count: int = 0
feedback: List[str] = None
@dataclass
class Plan:
goal: str
steps: List[Step]
success_criteria: List[str]
@dataclass
class VerificationResult:
passed: bool
action: str # "pass", "retry", "replan"
feedback: Optional[str] = None
evidence: Optional[str] = None
from langchain_core.messages import SystemMessage, HumanMessage
import json
class PlannerComponent:
"""Creates structured plans using any LLM"""
SYSTEM_PROMPT = """You are an expert planning specialist.
Create a detailed plan for the given task. Respond with ONLY valid JSON:
{
"goal": "High-level objective",
"steps": [
{
"step_id": "1",
"name": "Brief step name",
"description": "What to do in detail",
"actions": ["action1", "action2"],
"acceptance_criteria": ["Specific measurable criterion 1"],
"expected_outputs": ["file1.md"]
}
],
"success_criteria": ["Overall success criterion"]
}
Make criteria SPECIFIC and MEASURABLE."""
def __init__(self, llm):
self.llm = llm
def create_plan(self, user_request: str, context: Dict[str, Any]) -> Plan:
messages = [
SystemMessage(content=self.SYSTEM_PROMPT),
HumanMessage(content=f"REQUEST: {user_request}\n\nCONTEXT: {json.dumps(context)}")
]
response = self.llm.invoke(messages)
plan_json = self._extract_json(response.content)
steps = [Step(**s) for s in plan_json["steps"]]
return Plan(
goal=plan_json["goal"],
steps=steps,
success_criteria=plan_json["success_criteria"]
)
from langchain_core.tools import tool
from langchain_core.messages import ToolMessage
class ExecutorComponent:
"""Executes steps using tools with any LLM"""
def __init__(self, llm, tools: List):
self.llm = llm
self.tools = tools
self.tool_map = {tool.name: tool for tool in tools}
self.llm_with_tools = llm.bind_tools(tools)
def execute_step(self, step: Step, context: Dict[str, Any]) -> Dict[str, Any]:
messages = [
SystemMessage(content="Execute the step thoroughly using tools."),
HumanMessage(content=f"STEP: {step.name}\nCRITERIA: {step.acceptance_criteria}")
]
artifacts = []
actions_taken = []
for i in range(10): # Max 10 tool iterations
response = self.llm_with_tools.invoke(messages)
messages.append(response)
if not response.tool_calls:
return {
"summary": response.content,
"artifacts": artifacts,
"actions_taken": actions_taken
}
# Execute tools
for tool_call in response.tool_calls:
result = self.tool_map[tool_call["name"]].invoke(tool_call["args"])
actions_taken.append({"tool": tool_call["name"], "result": str(result)[:200]})
messages.append(ToolMessage(content=str(result), tool_call_id=tool_call["id"]))
return {"summary": "Execution incomplete", "artifacts": artifacts, "actions_taken": actions_taken}
class VerifierComponent:
"""Verifies step execution with any LLM"""
SYSTEM_PROMPT = """You are a verification specialist.
Check if execution meets ALL acceptance criteria.
Respond with ONLY valid JSON:
{
"passed": true/false,
"action": "pass|retry|replan",
"evidence": "Specific evidence",
"feedback": "Actionable feedback if retry needed"
}"""
def __init__(self, llm):
self.llm = llm
def verify(self, step: Step, result: Dict[str, Any]) -> VerificationResult:
messages = [
SystemMessage(content=self.SYSTEM_PROMPT),
HumanMessage(content=f"STEP: {step.name}\nCRITERIA: {step.acceptance_criteria}\nRESULT: {result}")
]
response = self.llm.invoke(messages)
verification_json = json.loads(response.content)
return VerificationResult(
passed=verification_json["passed"],
action=verification_json["action"],
feedback=verification_json.get("feedback"),
evidence=verification_json.get("evidence")
)
class PlanExecuteVerifyAgent:
"""Complete PEV agent with LangChain - works with ANY providers!"""
def __init__(self, planner_llm, executor_llm, verifier_llm, tools: List):
self.planner = PlannerComponent(planner_llm)
self.executor = ExecutorComponent(executor_llm, tools)
self.verifier = VerifierComponent(verifier_llm)
def run(self, user_request: str, context: Dict[str, Any]) -> Optional[Plan]:
# Phase 1: Planning
plan = self.planner.create_plan(user_request, context)
# Phase 2: Execute with verification
for step in plan.steps:
success = self._execute_step_with_verification(step, context)
if not success:
return None
return plan
def _execute_step_with_verification(self, step: Step, context: Dict[str, Any]) -> bool:
max_retries = 3
for attempt in range(1, max_retries + 1):
result = self.executor.execute_step(step, context)
verification = self.verifier.verify(step, result)
if verification.passed:
step.status = StepStatus.COMPLETE
return True
elif verification.action == "retry" and attempt < max_retries:
step.feedback.append(verification.feedback)
step.retry_count += 1
else:
step.status = StepStatus.FAILED
return False
return False
# 全部使用 Claude
planner = ChatAnthropic(model="claude-opus-4-6")
executor = ChatAnthropic(model="claude-sonnet-4-5-20250929")
verifier = ChatAnthropic(model="claude-haiku-4-5-20251001")
# 或使用 OpenAI:
from langchain_openai import ChatOpenAI
planner = ChatOpenAI(model="gpt-4-turbo-preview")
executor = ChatOpenAI(model="gpt-4-turbo-preview")
verifier = ChatOpenAI(model="gpt-3.5-turbo")
# 或混合供應商以優化成本:
planner = ChatAnthropic(model="claude-opus-4-6") # 最佳推理
executor = ChatOpenAI(model="gpt-4o-mini") # 快速且便宜
verifier = ChatGoogleGenerativeAI(model="gemini-1.5-flash") # 良好的評估
agent = PlanExecuteVerifyAgent(planner, executor, verifier, tools)

建構您自己的供應商抽象以了解其內部運作方式。

graph TB
subgraph Logic["代理邏輯"]
Controller[控制器]
PlanLogic[計劃邏輯]
ExecLogic[執行邏輯]
VerifyLogic[驗證邏輯]
end
subgraph Adapter["LLM 適配器層"]
AdapterInt[LLM 提供商介面]
end
subgraph Providers["提供商實作"]
Claude[Claude 適配器]
OpenAI[OpenAI 適配器]
Gemini[Gemini 適配器]
Local[本地適配器]
end
Controller --> PlanLogic
Controller --> ExecLogic
Controller --> VerifyLogic
PlanLogic --> AdapterInt
ExecLogic --> AdapterInt
VerifyLogic --> AdapterInt
AdapterInt --> Claude
AdapterInt --> OpenAI
AdapterInt --> Gemini
AdapterInt --> Local
from abc import ABC, abstractmethod
from dataclasses import dataclass
from typing import List, Dict, Any
@dataclass
class LLMRequest:
messages: List[Dict[str, str]]
tools: Optional[List[Dict]] = None
temperature: float = 0.7
max_tokens: int = 4000
@dataclass
class LLMResponse:
content: str
tool_calls: List[Dict]
finish_reason: str
class LLMProvider(ABC):
@abstractmethod
def complete(self, request: LLMRequest) -> LLMResponse:
"""Send request to LLM and get standardized response"""
pass
import anthropic
class ClaudeProvider(LLMProvider):
def __init__(self, api_key: str, model: str = "claude-sonnet-4-5-20250929"):
self.client = anthropic.Anthropic(api_key=api_key)
self.model = model
def complete(self, request: LLMRequest) -> LLMResponse:
response = self.client.messages.create(
model=self.model,
messages=request.messages,
tools=request.tools or [],
temperature=request.temperature,
max_tokens=request.max_tokens
)
return LLMResponse(
content=response.content[0].text if response.content else "",
tool_calls=[{"name": tc.name, "args": tc.input} for tc in response.content if tc.type == "tool_use"],
finish_reason=response.stop_reason
)
class AgnosticPEVAgent:
"""PEV agent that works with any LLM providers via manual abstraction"""
def __init__(self, planner: LLMProvider, executor: LLMProvider, verifier: LLMProvider, tools: List):
self.planner = planner
self.executor = executor
self.verifier = verifier
self.tools = tools
def run(self, user_request: str, context: str = "") -> Dict[str, Any]:
# Phase 1: Planning
plan = self._create_plan(user_request, context)
# Phase 2: Execute steps with verification
results = []
for step in plan.steps:
step_result = self._execute_and_verify_step(step)
results.append(step_result)
if not step_result["verified"]:
if step_result["retry_count"] >= step.max_retries:
return {"success": False, "error": f"Step {step.id} failed"}
return {"success": True, "result": results[-1]["output"], "plan": plan}
# 每個元件可以使用不同的供應商!
planner_provider = ClaudeProvider(api_key=os.getenv("ANTHROPIC_API_KEY"), model="claude-opus-4-6")
executor_provider = OpenAIProvider(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o-mini")
verifier_provider = GeminiProvider(api_key=os.getenv("GOOGLE_API_KEY"), model="gemini-1.5-flash")
agent = AgnosticPEVAgent(
planner=planner_provider,
executor=executor_provider,
verifier=verifier_provider,
tools=tools
)
result = agent.run("Review legal documents")
供應商最適合成本速度推理品質
Claude Opus 4.6規劃優秀
Claude Sonnet 4.5平衡非常好
Claude Haiku 4.5執行/驗證
GPT-4 Turbo通用目的中高非常好
GPT-4o-mini快速執行非常快
Gemini 1.5 Pro多模態、長上下文非常好
Gemini 1.5 Flash驗證
本地 (Ollama)隱私、離線免費不一定不一定
import os
from langchain_anthropic import ChatAnthropic
from langchain_openai import ChatOpenAI
from langchain_core.tools import tool
# 定義工具
@tool
def read_file(path: str) -> str:
"""Read a file from disk."""
with open(path, 'r') as f:
return f.read()
@tool
def write_file(path: str, content: str) -> str:
"""Write content to a file."""
os.makedirs(os.path.dirname(path) or ".", exist_ok=True)
with open(path, 'w') as f:
f.write(content)
return f"Success: Wrote to {path}"
tools = [read_file, write_file]
# 成本優化的混合供應商設定
planner_llm = ChatAnthropic(model="claude-opus-4-6", temperature=0) # 最佳推理
executor_llm = ChatOpenAI(model="gpt-4o-mini", temperature=0) # 快速且便宜
verifier_llm = ChatAnthropic(model="claude-haiku-4-5-20251001", temperature=0) # 便宜的驗證
# 建立代理
agent = PlanExecuteVerifyAgent(
planner_llm=planner_llm,
executor_llm=executor_llm,
verifier_llm=verifier_llm,
tools=tools
)
# 執行
result = agent.run(
user_request="Review all legal documents and create comprehensive reports",
context={"folder_path": "/project/legal_docs"}
)
場景建議
生產系統LangChain/LangGraph(方法 1)
學習內部運作手動抽象(方法 2)
成本優化混合供應商(昂貴的規劃器、便宜的執行器)
最大控制帶有自訂邏輯的手動抽象
快速原型LangChain 全部使用相同供應商
供應商特定功能混合(LangChain + 自訂適配器)
graph LR
subgraph Expensive["昂貴的模型 ($$$)"]
P1[規劃器:Claude Opus]
V1[驗證器:Claude Haiku]
end
subgraph Cheap["便宜的模型 ($)"]
E1[執行器:GPT-4o-mini]
end
P1 --> E1
E1 --> V1
V1 --> E1

策略:在品質重要的地方使用昂貴的模型(規劃、最終驗證),在速度重要的地方使用便宜的模型(執行)。

import pytest
@pytest.fixture
def providers():
return {
"claude": ChatAnthropic(model="claude-sonnet-4-5-20250929"),
"gpt": ChatOpenAI(model="gpt-4-turbo-preview"),
"gemini": ChatGoogleGenerativeAI(model="gemini-1.5-pro")
}
@pytest.mark.parametrize("planner,executor,verifier", [
("claude", "claude", "claude"),
("claude", "gpt", "claude"),
("gpt", "gpt", "gpt"),
])
def test_pev_agent(providers, planner, executor, verifier):
agent = PlanExecuteVerifyAgent(
planner_llm=providers[planner],
executor_llm=providers[executor],
verifier_llm=providers[verifier],
tools=test_tools
)
result = agent.run("Simple test task", {})
assert result is not None
方面LangChain(方法 1)手動(方法 2)
設定時間快(約 100 行)慢(適配器約 500 行)
供應商切換一行程式碼更新適配器類別
工具呼叫統一的 bind_tools()解析每個供應商的格式
維護LangChain 處理更新您維護適配器
控制一些抽象開銷完全控制
學習價值使用框架模式了解內部運作
生產就緒✅ 是(2026 標準)⚠️ 需要測試

對於生產環境:使用 LangChain/LangGraph(方法 1)

  • 內建的供應商抽象
  • 最新的 2026 模式與 StateGraph
  • 豐富的生態系統(記憶體、回呼、可觀測性)
  • 社群支援和定期更新

對於學習:嘗試手動抽象(方法 2)

  • 了解供應商差異
  • 學習工具呼叫如何跨 API 運作
  • 在需要時建構自訂功能

成本優化:策略性地混合供應商

  • 用於規劃的昂貴模型(Claude Opus)
  • 用於執行的便宜模型(GPT-4o-mini)
  • 用於驗證的快速模型(Claude Haiku、Gemini Flash)