如何用DSPy优化RAG prompt示例-洪萨配资

探索DSPy的复合示例应用，包括问答、情感分类、RAG系统等。

https://blog.csdn.net/liliang199/article/details/155860692

这里通过DSPy优化RAG prompt，示例DSPy优化prompt过程。

1 定义RAG系统

1.1 定义LLM

首先是LLM模型设置，这里配置LLM模型ollama/gemma3n:e2b，示例代码如下。

import dspy # 1. 配置语言模型 (这里以OpenAI为例，需提前设置API密钥) lm = dspy.LM(model="ollama/gemma3n:e2b", api_base="http://localhost:11434") dspy.configure(lm=lm)

1.2 定义检索器

然后是定义检索器，这里构建模拟知识库，通过计算查询词在文档中出现的次数在匹配相关文档。

实际项目中，应该用真实向量数据库替代这里的基于频次的retriever文档匹配逻辑。

示例代码如下所示。

import dspy import json # ===== 2. 构建模拟知识库（实际项目中替换为真实向量数据库） ===== class SimpleRetriever: """一个简单的内存检索器，模拟向量数据库功能""" def __init__(self, documents): # documents格式: [{"text": "...", "id": 1}, ...] self.documents = documents def retrieve(self, query, k=3): """简单关键词匹配检索（实际应用应使用向量检索）""" query_lower = query.lower() scored_docs = [] for doc in self.documents: text = doc["text"].lower() # 简单评分：计算查询词在文档中出现的次数 score = sum(1 for word in query_lower.split()[0] if word in text) if score > 0: scored_docs.append((score, doc["text"])) # 按分数排序并返回前k个 scored_docs.sort(reverse=True, key=lambda x: x[0]) return [text for _, text in scored_docs[:k]] # 创建示例知识库（你的实际文档数据） knowledge_base = [ {"id": 1, "text": "爱因斯坦在1921年因对理论物理的贡献，特别是发现光电效应定律而获得诺贝尔物理学奖。"}, {"id": 2, "text": "光电效应是指当光照射到金属表面时，会从金属中发射出电子的现象。这一发现对量子力学的发展至关重要。"}, {"id": 3, "text": "阿尔伯特·爱因斯坦（1879-1955）是德裔理论物理学家，相对论的创始人，也是量子力学的重要奠基人之一。"}, {"id": 4, "text": "诺贝尔物理学奖是根据阿尔弗雷德·诺贝尔的遗嘱设立的，旨在表彰在物理学领域做出杰出贡献的科学家。"}, {"id": 5, "text": "1921年的诺贝尔物理学奖颁奖典礼于1922年举行，因为1921年没有候选人被认为符合获奖标准。"}, ] # 初始化检索器 retriever = SimpleRetriever(knowledge_base) question = "什么是光电效应？" num_passages = 3 contexts = retriever.retrieve(question, k=num_passages) print(contexts)

输出测试示例如下

['爱因斯坦在1921年因对理论物理的贡献，特别是发现光电效应定律而获得诺贝尔物理学奖。', '光电效应是指当光照射到金属表面时，会从金属中发射出电子的现象。这一发现对量子力学的发展至关重要。', '阿尔伯特·爱因斯坦（1879-1955）是德裔理论物理学家，相对论的创始人，也是量子力学的重要奠基人之一。']

1.3 定义RAG系统

首先，通过DSPy签名定义RAG系统的输入输出。

即RAG系统输入: 问题上下文，问题本身；RAG的输出: 基于上下文给出简洁准确的答案。

其次，结合DSPy签名定义RAG模块，比如：

在检索阶段，如何获取相关文档

在生成阶段，基于检索到的上下文生成答案。

最后，返回答案，包括context上下文，answer答案，原因reasoning。

示例代码如下所示。

# ===== 3. 定义DSPy签名 ===== class GenerateAnswer(dspy.Signature): """基于给定上下文回答问题。""" context = dspy.InputField(desc="相关背景信息") question = dspy.InputField() answer = dspy.OutputField(desc="简洁、准确的答案，基于上下文") # ===== 4. 构建RAG模块 ===== class RAG(dspy.Module): def __init__(self, retriever, num_passages=3): super().__init__() self.retriever = retriever self.num_passages = num_passages # 使用ChainOfThought让模型先推理再回答 self.generate_answer = dspy.ChainOfThought(GenerateAnswer) def forward(self, question): # 1. 检索阶段：获取相关文档 contexts = self.retriever.retrieve(question, k=self.num_passages) context_str = "\n---\n".join(contexts) # 2. 生成阶段：基于检索到的上下文生成答案 prediction = self.generate_answer( context=context_str, question=question ) # 返回完整结果 return dspy.Prediction( contexts=contexts, answer=prediction.answer, reasoning=prediction.reasoning # ChainOfThought提供的推理过程 ) # ===== 5. 初始化RAG系统 ===== rag_system = RAG(retriever, num_passages=2) user_question = "爱因斯坦在哪年获得诺贝尔奖？" result = rag_system(user_question) print(result)

示例输出如下所示

Prediction(
contexts=['爱因斯坦在1921年因对理论物理的贡献，特别是发现光电效应定律而获得诺贝尔物理学奖。', '1921年的诺贝尔物理学奖颁奖典礼于1922年举行，因为1921年没有候选人被认为符合获奖标准。'],
answer='1921',
reasoning='根据上下文，爱因斯坦在1921年获得诺贝尔物理学奖。上下文还提到1921年没有候选人被认为符合获奖标准，但奖颁奖典礼在1922年举行。'
)

2 RAG优化对比

2.1 DSPy优化前RAG

这里运行优化钱的RAG系统。

# ===== 6. 测试RAG系统 ===== def test_rag_system(): """测试RAG系统的示例问题""" test_questions = [ "爱因斯坦因什么获得诺贝尔奖？", "什么是光电效应？", "谁创立了相对论？" ] for question in test_questions: print(f"\n{'='*60}") print(f"问题: {question}") print(f"{'='*60}") # 获取答案 result = rag_system(question) # 打印检索到的上下文 print("检索到的上下文:") for i, ctx in enumerate(result.contexts, 1): print(f"{i}. {ctx[:100]}...") # 只显示前100字符 # 打印推理过程（如果有） if hasattr(result, 'reasoning') and result.reasoning: print(f"\n模型推理: {result.reasoning}") # 打印最终答案 print(f"\n最终答案: {result.answer}") print("RAG问答系统启动...") # 测试基础版本 test_rag_system()

输出如下所示

RAG问答系统启动...
============================================================
问题: 爱因斯坦因什么获得诺贝尔奖？
============================================================
检索到的上下文:
1. 爱因斯坦在1921年因对理论物理的贡献，特别是发现光电效应定律而获得诺贝尔物理学奖。...
2. 1921年的诺贝尔物理学奖颁奖典礼于1922年举行，因为1921年没有候选人被认为符合获奖标准。...
模型推理: The context states that Einstein received the Nobel Prize in Physics in 1921 for his contributions to theoretical physics, particularly for discovering the law of the photoelectric effect. The additional information mentions the award ceremony was held in 1922 because no candidate was deemed suitable for the award in 1921.
最终答案: Einstein won the Nobel Prize in Physics for discovering the law of the photoelectric effect.
============================================================
问题: 什么是光电效应？
============================================================
检索到的上下文:
1. 爱因斯坦在1921年因对理论物理的贡献，特别是发现光电效应定律而获得诺贝尔物理学奖。...
2. 光电效应是指当光照射到金属表面时，会从金属中发射出电子的现象。这一发现对量子力学的发展至关重要。...
模型推理: 光电效应是指当光照射到金属表面时，会从金属中发射出电子的现象。爱因斯坦在1921年因对理论物理的贡献，特别是发现光电效应定律而获得诺贝尔物理学奖。
最终答案: 光电效应是指当光照射到金属表面时，会从金属中发射出电子的现象。
============================================================
问题: 谁创立了相对论？
============================================================
检索到的上下文:
1. 阿尔伯特·爱因斯坦（1879-1955）是德裔理论物理学家，相对论的创始人，也是量子力学的重要奠基人之一。...
2. 爱因斯坦在1921年因对理论物理的贡献，特别是发现光电效应定律而获得诺贝尔物理学奖。...
模型推理: The context states that Albert Einstein was the founder of the theory of relativity.
最终答案: Albert Einstein

2.2 DSPy优化后RAG

这里通过BootstrapFewShot优化RAG系统的提示。

即通过准备少量训练例字和评估自理，优化RA个系统的提示词。

训练示例如下

# 准备训练示例
trainset = [
dspy.Example(
question="爱因斯坦的诺贝尔奖贡献是什么？",
contexts=[
"爱因斯坦在1921年因对理论物理的贡献，特别是发现光电效应定律而获得诺贝尔物理学奖。",
"光电效应是指当光照射到金属表面时，会从金属中发射出电子的现象。"
],
answer="发现光电效应定律"
).with_inputs('question'),

dspy.Example(
question="谁创立了相对论？",
contexts=[
"阿尔伯特·爱因斯坦（1879-1955）是德裔理论物理学家，相对论的创始人。",
"爱因斯坦是相对论的创始人，也是量子力学的重要奠基人之一。"
],
answer="阿尔伯特·爱因斯坦"
).with_inputs('question'),
]

测试示例如下

# 定义评估指标
def validate_answer(example, prediction, trace=None):
# 简单验证：预测答案是否包含关键词
correct_keywords = {
"爱因斯坦的诺贝尔奖贡献是什么？": ["光电效应"],
"谁创立了相对论？": ["爱因斯坦"]
}

question = example.question
if question in correct_keywords:
return any(keyword in prediction.answer for keyword in correct_keywords[question])
return True

通过查看发现，训练和测试例，通过分析问题和问题上下文，倾向于输出阶段凝练的信息。

示例代码如下

# ===== 7. 优化RAG系统（可选：使用BootstrapFewShot） ===== def optimize_rag_system(): """使用少量示例优化RAG提示""" from dspy.teleprompt import BootstrapFewShot # 准备训练示例 trainset = [ dspy.Example( question="爱因斯坦的诺贝尔奖贡献是什么？", contexts=[ "爱因斯坦在1921年因对理论物理的贡献，特别是发现光电效应定律而获得诺贝尔物理学奖。", "光电效应是指当光照射到金属表面时，会从金属中发射出电子的现象。" ], answer="发现光电效应定律" ).with_inputs('question'), dspy.Example( question="谁创立了相对论？", contexts=[ "阿尔伯特·爱因斯坦（1879-1955）是德裔理论物理学家，相对论的创始人。", "爱因斯坦是相对论的创始人，也是量子力学的重要奠基人之一。" ], answer="阿尔伯特·爱因斯坦" ).with_inputs('question'), ] # 定义评估指标 def validate_answer(example, prediction, trace=None): # 简单验证：预测答案是否包含关键词 correct_keywords = { "爱因斯坦的诺贝尔奖贡献是什么？": ["光电效应"], "谁创立了相对论？": ["爱因斯坦"] } question = example.question if question in correct_keywords: return any(keyword in prediction.answer for keyword in correct_keywords[question]) return True # 创建优化器 teleprompter = BootstrapFewShot( metric=validate_answer, max_bootstrapped_demos=2, max_labeled_demos=2 ) # 优化RAG系统 print("正在优化RAG系统...") optimized_rag = teleprompter.compile(RAG(retriever), trainset=trainset) return optimized_rag user_question = "爱因斯坦在哪年获得诺贝尔奖？" opt_reg = optimize_rag_system() result = opt_reg(user_question) print(result)

输出示例如下

正在优化RAG系统...
100%|██████████| 2/2 [00:53<00:00, 26.74s/it]
Bootstrapped 2 full traces after 1 examples for up to 1 rounds, amounting to 2 attempts.
Prediction(
contexts=['爱因斯坦在1921年因对理论物理的贡献，特别是发现光电效应定律而获得诺贝尔物理学奖。', '1921年的诺贝尔物理学奖颁奖典礼于1922年举行，因为1921年没有候选人被认为符合获奖标准。', '阿尔伯特·爱因斯坦（1879-1955）是德裔理论物理学家，相对论的创始人，也是量子力学的重要奠基人之一。'],
answer='1921',
reasoning='The question asks for the year Albert Einstein received the Nobel Prize, based on the provided context. The context states that Einstein received the Nobel Prize in 1921 for his contributions to theoretical physics, particularly for discovering the photoelectric effect law. It also mentions that the award ceremony was held in 1922 because no candidates met the criteria in 1921. Therefore, the Nobel Prize was awarded in 1921.'
)

2.3 DSPy优化后RAG测试

这里用优化后的RAG系统opt_rag，运行test_rag_system()的所有测试问题。

测试代码示例如下。

# ===== 6. 测试RAG系统 ===== def test_opt_rag_system(): """测试RAG系统的示例问题""" test_questions = [ "爱因斯坦因什么获得诺贝尔奖？", "什么是光电效应？", "谁创立了相对论？" ] for question in test_questions: print(f"\n{'='*60}") print(f"问题: {question}") print(f"{'='*60}") # 获取答案 result = opt_reg(question) # 打印检索到的上下文 print("检索到的上下文:") for i, ctx in enumerate(result.contexts, 1): print(f"{i}. {ctx[:100]}...") # 只显示前100字符 # 打印推理过程（如果有） if hasattr(result, 'reasoning') and result.reasoning: print(f"\n模型推理: {result.reasoning}") # 打印最终答案 print(f"\n最终答案: {result.answer}") print("opt RAG问答系统启动...") # 测试优化版本 test_opt_rag_system()

输出如下所示，

可见相比优化前的回答，优化后RAG系统的回答更集中和简短凝练。

opt RAG问答系统启动...
============================================================
问题: 爱因斯坦因什么获得诺贝尔奖？
============================================================
检索到的上下文:
1. 爱因斯坦在1921年因对理论物理的贡献，特别是发现光电效应定律而获得诺贝尔物理学奖。...
2. 1921年的诺贝尔物理学奖颁奖典礼于1922年举行，因为1921年没有候选人被认为符合获奖标准。...
3. 阿尔伯特·爱因斯坦（1879-1955）是德裔理论物理学家，相对论的创始人，也是量子力学的重要奠基人之一。...
模型推理: The question asks what Albert Einstein was awarded the Nobel Prize for, based on the provided context. The context explicitly states that he received the Nobel Prize in 1921 for his contributions to theoretical physics, particularly for discovering the law of the photoelectric effect.
最终答案: 光电效应定律
============================================================
问题: 什么是光电效应？
============================================================
检索到的上下文:
1. 爱因斯坦在1921年因对理论物理的贡献，特别是发现光电效应定律而获得诺贝尔物理学奖。...
2. 光电效应是指当光照射到金属表面时，会从金属中发射出电子的现象。这一发现对量子力学的发展至关重要。...
3. 阿尔伯特·爱因斯坦（1879-1955）是德裔理论物理学家，相对论的创始人，也是量子力学的重要奠基人之一。...
模型推理: The question asks for a definition of the photoelectric effect, based on the provided context. The context defines the photoelectric effect as the phenomenon where electrons are emitted from a metal surface when light shines on it. It also states that this discovery is crucial for the development of quantum mechanics.
最终答案: 当光照射到金属表面时，会从金属中发射出电子的现象。
============================================================
问题: 谁创立了相对论？
============================================================
检索到的上下文:
1. 阿尔伯特·爱因斯坦（1879-1955）是德裔理论物理学家，相对论的创始人，也是量子力学的重要奠基人之一。...
2. 爱因斯坦在1921年因对理论物理的贡献，特别是发现光电效应定律而获得诺贝尔物理学奖。...
3. 光电效应是指当光照射到金属表面时，会从金属中发射出电子的现象。这一发现对量子力学的发展至关重要。...
模型推理: The question asks who founded the theory of relativity. The context states that Albert Einstein is a theoretical physicist and the founder of the theory of relativity.
最终答案: 阿尔伯特·爱因斯坦