使用大型语言模型使新闻推荐变得可解释-洪萨配资

原文：towardsdatascience.com/making-news-recommendations-explainable-with-large-language-models-74f119c7e036?source=collection_archive---------2-----------------------#2024-11-30

通过基于提示的实验，提升内容个性化推荐的准确性和透明推理。

https://medium.com/@helloheld?source=post_page---byline--74f119c7e036--------------------------------https://towardsdatascience.com/?source=post_page---byline--74f119c7e036-------------------------------- Alex Held

·发布于 Towards Data Science ·阅读时间：7 分钟·2024 年 11 月 30 日

–

https://github.com/OpenDocCN/towardsdatascience-blog-zh-2024/raw/master/docs/img/ff1026c0fe89bf7dceb16a6d47f1dc3f.png

在合适的时间向读者提供相关内容。图片来自作者。

在 DER SPIEGEL ，我们不断探索改进如何向读者推荐新闻文章的方法。在我们最新的（离线）实验中，我们研究了大型语言模型（LLMs）是否能够有效地根据读者的阅读历史预测他们可能感兴趣的文章。

我们的方法

我们进行了一项研究，邀请参与者填写调查问卷，评估他们对各种新闻文章的兴趣。这为我们提供了关于读者偏好的真实数据。对于每个参与者，我们有两个关键信息：他们的实际阅读历史（即在填写调查问卷前他们阅读过哪些文章）以及他们在调查中对一组新文章的评分。了解更多关于这种混合方法的离线新闻推荐系统评估方法，请点击这里：

## 混合方法离线评估新闻推荐系统

结合调查反馈与行为点击数据，优化内容个性化推荐。

towardsdatascience.com

然后我们使用了Anthropic API来访问Claude 3.5 Sonnet，一个最先进的语言模型，作为我们的推荐引擎。对于每个读者，我们为模型提供了他们的阅读历史（新闻标题和文章摘要），并要求模型预测他们对调查中这些文章的兴趣程度。以下是我们使用的提示：

You are a news recommendation system.Based on the user's reading history,predict how likely they are to read new articles.Score each articlefrom0to1000,where1000means highest likelihood to read.Reading history(Previous articles read by the user):[List of previously read articleswithtitlesandsummaries]Please rate the following articles(provide a score0-1000foreach):[List of candidate articles to rate]You must respondwitha JSONobjectinthisformat:{"recommendations":[{"article_id":"article-id-here","score":score}]}

通过这种方法，我们现在可以将调查中的实际评分与 LLM 的预测分数进行比较。这一比较提供了一个理想的数据集，用于评估语言模型预测读者兴趣的能力。

结果与关键发现

研究结果令人印象深刻。为了理解表现，我们可以看两个关键指标。首先是Precision@5：LLM（大语言模型）获得了 56%的得分，这意味着当系统向用户推荐前 5 篇文章时（从 15 篇中选出），平均（几乎）有 3 篇是用户在调查中评分最高的文章。查看这些预测的分布可以揭示出更加令人印象深刻的结果：对于 24%的用户，系统正确识别出至少 4 或 5 篇他们最喜欢的文章；对于另外 41%的用户，系统正确识别出他们前 5 篇文章中的 3 篇。

为了将其置于正确的视角中，如果我们随机推荐文章，我们的精准度将仅为 38.8%（详细信息请参见之前的medium 文章）。即使是基于文章流行度的推荐（推荐大多数人阅读的内容）也只有 42.1%，而我们之前使用基于嵌入的方法的精准度为 45.4%。

https://github.com/OpenDocCN/towardsdatascience-blog-zh-2024/raw/master/docs/img/73b93682f37bf1eb4cdc35c9c5a9c262.png

作者图表

以下图表展示了提升效果：尽管了解用户的任何信息比纯粹猜测（随机模型）要好，但基于 LLM 的方法展示了最强的表现。即使与我们复杂的基于嵌入的逻辑相比，LLM 在预测准确性上也实现了显著提升。

https://github.com/OpenDocCN/towardsdatascience-blog-zh-2024/raw/master/docs/img/944e6bf25ecad03090641f39d1448a3b.png

作者图表

作为第二个评估指标，我们使用了Spearman 相关系数。其值为 0.41，显著高于我们基于嵌入的方法（0.17）。这也表明，LLM 不仅在找到相关文章方面表现更好，还能更好地理解读者可能偏爱某篇文章超过另一篇文章的程度。

超越表现：可解释性的力量

User has221articlesinreading history Top5Comparison:--------------------------------------------------------------------------------Top5Predicted by Claude:1\.Wie ich mit38Jahren zum ersten Mal lernte,strukturiert zu arbeiten(Score:850,Actual Value:253.0)2\.Warum wir den Umgang mit der Sonne neu lernen müssen(Score:800,Actual Value:757.0)3\.Lohnt sich ein Speicher für Solarstrom vom Balkon?(Score:780,Actual Value:586.0)4\.»Man muss sich fragen,ob dieser spezielle deutsche Weg wirklich intelligent ist«(Score:750,Actual Value:797.0)5\.Wie Bayern versucht,sein Drogenproblem unsichtbar zu machen(Score:720,Actual Value:766.0)Actual Top5fromSurvey:4\.»Man muss sich fragen,ob dieser spezielle deutsche Weg wirklich intelligent ist«(Value:797.0,Predicted Score:750)5\.Wie Bayern versucht,sein Drogenproblem unsichtbar zu machen(Value:766.0,Predicted Score:720)2\.Warum wir den Umgang mit der Sonne neu lernen müssen(Value:757.0,Predicted Score:800)6\.Abitur als Lotterie?(Value:601.0,Predicted Score:650)3\.Lohnt sich ein Speicher für Solarstrom vom Balkon?(Value:586.0,Predicted Score:780)Performance Metrics:--------------------------------------------------------------------------------Spearman Correlation:0.673Precision@5:0.800Claude's Analysis:--------------------------------------------------------------------------------1\.Key patternsinuser's reading history:-Strong interestinsocialandpolitical issues-Regular engagementwithlifestyleandpersonal development content-Interestinhealthandwellness topics-Frequent reading of articles about economicandfinancial matters-Attraction to critical analysisandcommentary pieces-Interestinfoodandcooking content2\.Scoring explanation:Highest scored articles:-"Wie ich mit 38 Jahren zum ersten Mal lernte, strukturiert zu arbeiten"(850)-Alignswiththe user's interestinpersonal developmentandlifestyle content-"Warum wir den Umgang mit der Sonne neu lernen müssen"(800)-Matches interestinhealthandwellness topics-"Lohnt sich ein Speicher für Solarstrom vom Balkon?"(780)-Corresponds to interestinpractical adviceandeconomic considerations Medium scored articles:-"Man muss sich fragen, ob dieser spezielle deutsche Weg wirklich intelligent ist"(750)-Fits pattern of interestinpolitical commentary-"Wie Bayern versucht, sein Drogenproblem unsichtbar zu machen"(720)-Matches interestinsocial issuesandcritical reporting-"Abitur als Lotterie?"(650)-Alignswithinterestineducationalandsocial topics Lower scored articles:-"Eine Brise Formel 1"(550)-Limited sports contentinreading history-"Reizender Absatz"(450)-Less alignmentwithdemonstrated interests-"Hier wird jetzt auf ganz, ganz hohem Niveau gemeckert"(400)-Styleandtopic less alignedwithuser preferences The scoring prioritizes articles thatmatchthe user's demonstrated interestsinsocial issues,practical advice,andcritical analysiswhilegiving lower scores to sportsandlighter content that appears less frequentlyintheir reading history.