AI可可AI生活 - [人人能懂] AI的“人设”与陷阱：它在对你撒谎吗？ - EarsOnMe

主播

节目简介

来源：小宇宙

00:00:37 你的AI管家，靠谱吗？一份来自未来的安全报告

00:04:40 AI“发疯”？科学家找到了它的“性格开关”

00:09:33 比结果更重要的，是“想明白”的过程

00:14:09 AI的“降维打击”：复杂世界里的简单活法

00:18:23 AI的“暖男”人设，可能是个陷阱？

本期介绍的几篇论文：

[LG] Security Challenges in AI Agent Deployment: Insights from a Large Scale Public Competition

[Gray Swan AI]

https://arxiv.org/abs/2507.20526

---

[CL] Persona Vectors: Monitoring and Controlling Character Traits in Language Models

[Anthropic Fellows Program & Constellation]

https://arxiv.org/abs/2507.21509

---

[LG] RLVMR: Reinforcement Learning with Verifiable Meta-Reasoning Rewards for Robust Long-Horizon Agents

[Tencent]

https://arxiv.org/abs/2507.22844

---

[LG] Geometry of Neural Reinforcement Learning in Continuous State and Action Spaces

[Brown University & Amazon Web Services]

https://arxiv.org/abs/2507.20853

---

[CL] Training language models to be warm and empathetic makes them less reliable and more sycophantic

[University of Oxford]

https://arxiv.org/abs/2507.21919

---

[CL] On The Role of Pretrained Language Models in General-Purpose Text Embeddings: A Survey

[Not explicitly stated, survey paper]

https://arxiv.org/abs/2507.20783

[人人能懂] AI的“人设”与陷阱：它在对你撒谎吗？