Album
时长:
10分钟
播放:
154
发布:
3周前
主播...
简介...
https://xiaoyuzhoufm.com

本期的 14 篇论文如下:


[00:16] 🌱 Agent Learning via Early Experience(基于早期经验的主体学习)


[00:50] 🧠 MM-HELIX: Boosting Multimodal Long-Chain Reflective Reasoning with Holistic Platform and Adaptive Hybrid Policy Optimization(MM-HELIX:以整体平台与自适应混合策略优化激发多模态长链反思推理)


[01:42] 🧪 From What to Why: A Multi-Agent System for Evidence-based Chemical Reaction Condition Reasoning(从“是什么”到“为什么”:面向循证化学反应条件推理的多智能体系统)


[02:19] 🎬 UniVideo: Unified Understanding, Generation, and Editing for Videos(UniVideo:统一理解、生成与编辑视频的多模态框架)


[03:01] 🧠 When Thoughts Meet Facts: Reusable Reasoning for Long-Context LMs(当思想邂逅事实:面向长上下文语言模型的可复用推理)


[03:43] 🧠 Meta-Awareness Enhances Reasoning Models: Self-Alignment Reinforcement Learning(元认知增强推理模型:自对齐强化学习)


[04:25] 🧠 MemMamba: Rethinking Memory Patterns in State Space Model(MemMamba:重新思考状态空间模型中的记忆模式)


[05:17] 🛡 The Alignment Waltz: Jointly Training Agents to Collaborate for Safety(对齐圆舞曲:联合训练智能体协同守护安全)


[05:53] 🎯 Hybrid Reinforcement: When Reward Is Sparse, It's Better to Be Dense(混合强化:奖励稀疏时,密集信号更胜一筹)


[06:40] 🧪 NewtonBench: Benchmarking Generalizable Scientific Law Discovery in LLM Agents(NewtonBench:评测大模型智能体在通用科学定律发现中的基准)


[07:17] 🪚 DeepPrune: Parallel Scaling without Inter-trace Redundancy(DeepPrune:并行扩展中消除跨路径冗余的高效推理框架)


[07:54] 🚀 Training-Free Group Relative Policy Optimization(免训练群组相对策略优化)


[08:24] 🪄 ARTDECO: Towards Efficient and High-Fidelity On-the-Fly 3D Reconstruction with Structured Scene Representation(ARTDECO:面向高效高保真即时三维重建的结构化场景表征)


[08:55] 🤥 LLMs Learn to Deceive Unintentionally: Emergent Misalignment in Dishonesty from Misaligned Samples to Biased Human-AI Interactions(大模型在欺骗性样本与偏见人机交互中意外学会欺骗:不诚实行为的新兴错位)





【关注我们】


您还可以在以下平台找到我们,获得播客内容以外更多信息


小红书: AI速递

评价...

空空如也

小宇宙热门评论...

暂无小宇宙热门评论

EarsOnMe

加入我们的 Discord

与播客爱好者一起交流

立即加入

扫描微信二维码

添加微信好友,获取更多播客资讯

微信二维码

播放列表

自动播放下一个

播放列表还是空的

去找些喜欢的节目添加进来吧