2026.04.08 | Video-MME-v2地狱题库拷打模型;Claw-Eval全程审计守卫可信代理
HuggingFace 每日AI论文速递

2026.04.08 | Video-MME-v2地狱题库拷打模型;Claw-Eval全程审计守卫可信代理

12分钟 112 2周前
节目简介
来源:小宇宙
【赞助商】
通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事
传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd
【目录】
本期的 15 篇论文如下:
[00:34] 🎯 Video-MME-v2: Towards the Next Stage in Benchmarks for Comprehensive Video Understanding(Video-MME-v2:迈向全面视频理解基准的下一个阶段)
[01:19] 🔬 Claw-Eval: Toward Trustworthy Evaluation of Autonomous Agents(Claw-Eval:迈向可信赖的自主智能体评估)
[02:06] 🤖 Learning to Retrieve from Agent Trajectories(从智能体轨迹中学习检索)
[02:53] 🧪 ACES: Who Tests the Tests? Leave-One-Out AUC Consistency for Code Generation(ACES:谁来测试测试?代码生成的留一法AUC一致性)
[03:42] 👗 Vanast: Virtual Try-On with Human Image Animation via Synthetic Triplet Supervision(Vanast:基于合成三元组监督的虚拟试穿与人体图像动画)
[04:31] ⏱ Beyond Accuracy: Unveiling Inefficiency Patterns in Tool-Integrated Reasoning(超越准确率:揭示工具集成推理中的低效模式)
[05:23] 🧠 ThinkTwice: Jointly Optimizing Large Language Models for Reasoning and Self-Refinement(ThinkTwice:联合优化大型语言模型的推理与自我精炼能力)
[06:03] 🔍 Paper Circle: An Open-source Multi-agent Research Discovery and Analysis Framework(论文圈:一个开源的多智能体研究文献发现与分析框架)
[06:52] 🔍 How Well Do Agentic Skills Work in the Wild: Benchmarking LLM Skill Usage in Realistic Settings(智能体技能在真实场景中的效用评估:基准测试LLM在现实环境下的技能使用)
[07:33] 🚀 MegaTrain: Full Precision Training of 100B+ Parameter Large Language Models on a Single GPU(MegaTrain:在单GPU上全精度训练1000亿+参数大语言模型)
[08:11] 🛠 DARE: Diffusion Large Language Models Alignment and Reinforcement Executor(DARE:扩散大语言模型的对齐与强化执行器)
[08:54] 🧠 In-Place Test-Time Training(原位测试时训练)
[09:39] 🎬 Watch Before You Answer: Learning from Visually Grounded Post-Training(先看后答:基于视觉基础的后训练学习)
[10:13] 🔍 Demystifying When Pruning Works via Representation Hierarchies(通过表征层次解析剪枝何时有效)
[10:59] 🤖 Action Images: End-to-End Policy Learning via Multiview Video Generation(动作图像:通过多视角视频生成的端到端策略学习)
【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递

加入我们的 Discord

与播客爱好者一起交流

立即加入

扫描微信二维码

添加微信好友,获取更多播客资讯

微信二维码

播放列表

自动播放下一个

播放列表还是空的

去找些喜欢的节目添加进来吧