主播
节目简介
来源:小宇宙
【赞助商】
通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事
传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd
【目录】
本期的 15 篇论文如下:
[00:27] 🛡 ClawKeeper: Comprehensive Safety Protection for OpenClaw Agents Through Skills, Plugins, and Watchers(ClawKeeper:通过技能、插件和监视器为OpenClaw代理提供全面的安全保护)
[01:20] 💻 Terminal Agents Suffice for Enterprise Automation(终端智能体足以实现企业自动化)
[02:03] 📊 MiroEval: Benchmarking Multimodal Deep Research Agents in Process and Outcome(MiroEval:面向过程和结果的多模态深度研究智能体基准测试)
[02:54] 🧠 ViGoR-Bench: How Far Are Visual Generative Models From Zero-Shot Visual Reasoners?(ViGoR-Bench:视觉生成模型距离零样本视觉推理器还有多远?)
[03:40] 🔬 Vision2Web: A Hierarchical Benchmark for Visual Website Development with Agent Verification(Vision2Web:基于智能体验证的视觉网站开发分层基准)
[04:26] 📊 QuitoBench: A High-Quality Open Time Series Forecasting Benchmark(QuitoBench:一个高质量开放时间序列预测基准)
[05:12] 🧠 Reasoning Shift: How Context Silently Shortens LLM Reasoning(推理偏移:上下文如何悄然缩短大语言模型的推理过程)
[05:59] 📊 HippoCamp: Benchmarking Contextual Agents on Personal Computers(HippoCamp:在个人计算机上评估情境智能体的基准)
[06:52] 🧠 PerceptionComp: A Video Benchmark for Complex Perception-Centric Reasoning(PerceptionComp:面向复杂感知推理的视频基准测试)
[07:34] ⚡ Universal YOCO for Efficient Depth Scaling(通用YOCO:实现高效深度扩展)
[08:12] 🔄 Brevity Constraints Reverse Performance Hierarchies in Language Models(简洁性约束逆转语言模型的性能层级)
[08:48] 🧠 GaussianGPT: Towards Autoregressive 3D Gaussian Scene Generation(GaussianGPT:迈向自回归3D高斯场景生成)
[09:25] 📝 Paper Reconstruction Evaluation: Evaluating Presentation and Hallucination in AI-written Papers(论文重构评估:评估AI撰写论文的呈现质量与幻觉问题)
[10:11] 🚀 Embarrassingly Simple Self-Distillation Improves Code Generation(极其简单的自蒸馏提升代码生成能力)
[10:54] 🤖 Proactive Agent Research Environment: Simulating Active Users to Evaluate Proactive Assistants(主动式智能体研究环境:通过模拟主动用户来评估主动式助手)
【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递
通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事
传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd
【目录】
本期的 15 篇论文如下:
[00:27] 🛡 ClawKeeper: Comprehensive Safety Protection for OpenClaw Agents Through Skills, Plugins, and Watchers(ClawKeeper:通过技能、插件和监视器为OpenClaw代理提供全面的安全保护)
[01:20] 💻 Terminal Agents Suffice for Enterprise Automation(终端智能体足以实现企业自动化)
[02:03] 📊 MiroEval: Benchmarking Multimodal Deep Research Agents in Process and Outcome(MiroEval:面向过程和结果的多模态深度研究智能体基准测试)
[02:54] 🧠 ViGoR-Bench: How Far Are Visual Generative Models From Zero-Shot Visual Reasoners?(ViGoR-Bench:视觉生成模型距离零样本视觉推理器还有多远?)
[03:40] 🔬 Vision2Web: A Hierarchical Benchmark for Visual Website Development with Agent Verification(Vision2Web:基于智能体验证的视觉网站开发分层基准)
[04:26] 📊 QuitoBench: A High-Quality Open Time Series Forecasting Benchmark(QuitoBench:一个高质量开放时间序列预测基准)
[05:12] 🧠 Reasoning Shift: How Context Silently Shortens LLM Reasoning(推理偏移:上下文如何悄然缩短大语言模型的推理过程)
[05:59] 📊 HippoCamp: Benchmarking Contextual Agents on Personal Computers(HippoCamp:在个人计算机上评估情境智能体的基准)
[06:52] 🧠 PerceptionComp: A Video Benchmark for Complex Perception-Centric Reasoning(PerceptionComp:面向复杂感知推理的视频基准测试)
[07:34] ⚡ Universal YOCO for Efficient Depth Scaling(通用YOCO:实现高效深度扩展)
[08:12] 🔄 Brevity Constraints Reverse Performance Hierarchies in Language Models(简洁性约束逆转语言模型的性能层级)
[08:48] 🧠 GaussianGPT: Towards Autoregressive 3D Gaussian Scene Generation(GaussianGPT:迈向自回归3D高斯场景生成)
[09:25] 📝 Paper Reconstruction Evaluation: Evaluating Presentation and Hallucination in AI-written Papers(论文重构评估:评估AI撰写论文的呈现质量与幻觉问题)
[10:11] 🚀 Embarrassingly Simple Self-Distillation Improves Code Generation(极其简单的自蒸馏提升代码生成能力)
[10:54] 🤖 Proactive Agent Research Environment: Simulating Active Users to Evaluate Proactive Assistants(主动式智能体研究环境:通过模拟主动用户来评估主动式助手)
【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递