HuggingFace 每日AI论文速递
10分钟速读热门AI论文

Album
主播:
拨号上网
出版方:
佚名
订阅数:
7495
集数:
289
最近更新:
1周前
评分
暂无评分
0人评价
5星
0%
4星
0%
3星
0%
2星
0%
1星
0%
播客简介...
每天10分钟,带您快速了解当日HuggingFace热门AI论文内容。每个工作日更新,欢迎订阅。 📢播客节目在小宇宙、Apple Podcast平台搜索【HuggingFace 每日AI论文速递】 🖼另外还有图文版,可在小红书搜索并关注【AI速递】
HuggingFace 每日AI论文速递的创作者...
拨号上网
HuggingFace 每日AI论文速递的音频...

2025.05.30 | 推理扩展提升表格推理;多模态模型视频反馈有待优化。

本期的 15 篇论文如下: [00:22] 📊 Table-R1: Inference-Time Scaling for Table Reasoning(Table-R1:表格推理的推理时扩展) [01:02] 🤖 VF-Eval: Evaluating Multimodal LLMs for Generating Feedback on AIGC Videos(VF-Eval:评估多模态大语言模型生成AIGC视频反馈的能力) [01:45] 🧠 Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence(Spatial-MLLM:提升多模态大语言模型在基于视觉的空间智能方面的能力) [02:25] 🧠 The Climb Carves Wisdom Deeper Than the Summit: On the Noisy Rewards in Learning to Reason(行胜于言:论证推理学习中的噪声奖励) [03:11] 🤖 ZeroGUI: Automating Online GUI Learning at Zero Human Cost(ZeroGUI:零人工成本的在线GUI学习自动化) [03:45] 🤔 VideoReasonBench: Can MLLMs Perform Vision-Centric Complex Video Reasoning?(VideoReasonBench:多模态大语言模型能否执行以视觉为中心的复杂视频推理?) [04:39] 🧬 Satori-SWE: Evolutionary Test-Time Scaling for Sample-Efficient Software Engineering(Satori-SWE: 面向高效软件工程的演化测试时扩展) [05:15] 🤔 Are Reasoning Models More Prone to Hallucination?(推理模型更容易产生幻觉吗?) [05:51] 🤖 cadrille: Multi-modal CAD Reconstruction with Online Reinforcement Learning(cadrille:基于在线强化学习的多模态CAD重建) [06:29] 🎨 D-AR: Diffusion via Autoregressive Models(D-AR:基于自回归模型的扩散) [07:16] 📸 AnySplat: Feed-forward 3D Gaussian Splatting from Unconstrained Views(AnySplat:来自非约束视角的Feed-forward 3D高斯溅射) [07:53] 🛠 SWE-bench Goes Live!(SWE-bench-Live:一个实时更新的问题解决基准评测) [08:36] 💡 Multi-Domain Explainability of Preferences(偏好的多领域可解释性) [09:16] 🤖 UniRL: Self-Improving Unified Multimodal Models via Supervised and Reinforcement Learning(UniRL:基于监督学习和强化学习的自提升统一多模态模型) [10:01] 🗣 FAMA: The First Large-Scale Open-Science Speech Foundation Model for English and Italian(FAMA:首个面向英语和意大利语的大规模开放科学语音基础模型) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

11分钟
59
1周前

2025.05.29 | 熵机制提升模型性能;令牌路由优化推理效率。

本期的 15 篇论文如下: [00:22] 🤖 The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models(用于推理语言模型的强化学习的熵机制) [00:56] 🛣 R2R: Efficiently Navigating Divergent Reasoning Paths with Small-Large Model Token Routing(R2R:通过大小模型令牌路由高效导航不同的推理路径) [01:40] 🧠 Skywork Open Reasoner 1 Technical Report(Skywork开放推理器1技术报告) [02:20] 🔍 Sherlock: Self-Correcting Reasoning in Vision-Language Models(夏洛克:视觉-语言模型中的自我纠正推理) [02:55] 🤖 Unsupervised Post-Training for Multi-Modal LLM Reasoning via GRPO(基于GRPO的无监督后训练提升多模态LLM推理能力) [03:35] 🤖 SWE-rebench: An Automated Pipeline for Task Collection and Decontaminated Evaluation of Software Engineering Agents(SWE-rebench:一个用于软件工程代理任务收集和去污染评估的自动化流程) [04:25] 🚀 SageAttention2++: A More Efficient Implementation of SageAttention2(SageAttention2++:一种更高效的SageAttention2实现) [05:12] 🧠 Advancing Multimodal Reasoning via Reinforcement Learning with Cold Start(通过强化学习与冷启动推进多模态推理) [05:59] 🎬 Fostering Video Reasoning via Next-Event Prediction(通过预测下一事件促进视频推理) [06:42] 💡 RenderFormer: Transformer-based Neural Rendering of Triangle Meshes with Global Illumination(RenderFormer:基于Transformer的三角形网格全局光照神经渲染) [07:25] 🔬 DeepResearchGym: A Free, Transparent, and Reproducible Evaluation Sandbox for Deep Research(DeepResearchGym:一个免费、透明且可复现的深度研究评估沙盒) [08:16] 🖼 Chain-of-Zoom: Extreme Super-Resolution via Scale Autoregression and Preference Alignment(链式缩放:通过尺度自回归和偏好对齐实现极限超分辨率) [08:58] 🧩 Universal Reasoner: A Single, Composable Plug-and-Play Reasoner for Frozen LLMs(通用推理器:一个用于冻结LLM的单一、可组合的即插即用推理器) [09:38] 🚚 SVRPBench: A Realistic Benchmark for Stochastic Vehicle Routing Problem(SVRPBench:一个面向随机车辆路径问题的真实基准) [10:26] 🌐 Judging Quality Across Languages: A Multilingual Approach to Pretraining Data Filtering with Language Models(跨语言质量评估:一种基于语言模型的多语种预训练数据过滤方法) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

11分钟
52
1周前

2025.05.28 | 多模态Agent科研任务成功率低;逻辑推理模型存在显著局限。

本期的 15 篇论文如下: [00:23] 🧪 ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows(ScienceBoard:评估现实科学工作流程中的多模态自主Agent) [01:09] 🤔 MME-Reasoning: A Comprehensive Benchmark for Logical Reasoning in MLLMs(MME-推理:多模态大型语言模型中逻辑推理的综合基准) [01:51] 🖼 Paper2Poster: Towards Multimodal Poster Automation from Scientific Papers(Paper2Poster:基于科研论文的多模态海报自动生成) [02:28] 🎨 OmniConsistency: Learning Style-Agnostic Consistency from Paired Stylization Data(OmniConsistency:从配对风格化数据中学习与风格无关的一致性) [03:06] 🎬 OpenS2V-Nexus: A Detailed Benchmark and Million-Scale Dataset for Subject-to-Video Generation(OpenS2V-Nexus:一个用于主题驱动视频生成的详细基准和百万级数据集) [03:50] 🧠 SynLogic: Synthesizing Verifiable Reasoning Data at Scale for Learning Logical Reasoning and Beyond(SynLogic:大规模合成可验证推理数据,用于学习逻辑推理及其他能力) [04:32] 💡 Exploring the Latent Capacity of LLMs for One-Step Text Generation(探索大型语言模型在一步文本生成中的潜在能力) [05:13] 🧠 VerIPO: Cultivating Long Reasoning in Video-LLMs via Verifier-Gudied Iterative Policy Optimization(VerIPO:通过验证器引导的迭代策略优化,培养视频大型语言模型中的长期推理能力) [05:48] 🤔 Don't Overthink it. Preferring Shorter Thinking Chains for Improved LLM Reasoning(别想太多:偏好更短的思维链以提升大型语言模型的推理能力) [06:29] 🤔 MMMR: Benchmarking Massive Multi-Modal Reasoning Tasks(MMMR:大规模多模态推理任务的基准测试) [07:09] 🤖 UI-Genie: A Self-Improving Approach for Iteratively Boosting MLLM-based Mobile GUI Agents(UI-Genie:一种迭代提升基于MLLM的移动GUI代理的自提升方法) [07:52] 🎬 Sparse VideoGen2: Accelerate Video Generation with Sparse Attention via Semantic-Aware Permutation(Sparse VideoGen2:通过语义感知置换和稀疏注意力加速视频生成) [08:28] 📹 MME-VideoOCR: Evaluating OCR-Based Capabilities of Multimodal LLMs in Video Scenarios(MME-VideoOCR:评估多模态大型语言模型在视频场景中基于OCR的能力) [09:16] 🧩 GraLoRA: Granular Low-Rank Adaptation for Parameter-Efficient Fine-Tuning(GraLoRA:用于参数高效微调的细粒度低秩适配) [10:02] 🕵 Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning?(Video-Holmes:多模态大语言模型能否像福尔摩斯一样进行复杂的视频推理?) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

11分钟
99+
1周前
喜欢听HuggingFace 每日AI论文速递的人也喜欢的播客...
HuggingFace 每日AI论文速递的评价...

空空如也

EarsOnMe

加入我们的 Discord

与播客爱好者一起交流

立即加入

播放列表

自动播放下一个

播放列表还是空的

去找些喜欢的节目添加进来吧