2025.06.05 | 紧凑强大视觉模型;多阶段训练提升推理能力

HuggingFace 每日AI论文速递

本期的 15 篇论文如下: [00:21] 🤖 MiMo-VL Technical Report(MiMo-VL 技术报告) [01:14] 💡 Advancing Multimodal Reasoning: From Optimized Cold Start to Staged Reinforcement Learning(多模态推理进阶:从优化冷启动到分阶段强化学习) [01:57] 🤖 AmbiK: Dataset of Ambiguous Tasks in Kitchen Environment(AmbiK:厨房环境中歧义性任务数据集) [02:42] 🔄 CASS: Nvidia to AMD Transpilation with Data, Models, and Benchmark(CASS:基于数据、模型和基准的Nvidia到AMD的转译) [03:20] 🔬 A Controllable Examination for Long-Context Language Models(长文本语言模型的可控评测) [04:14] ✍ SuperWriter: Reflection-Driven Long-Form Generation with Large Language Models(SuperWriter:基于反思机制的LLM长文本生成) [04:55] 🤔 MMR-V: What's Left Unsaid? A Benchmark for Multimodal Deep Reasoning in Videos(MMR-V:未尽之言?视频中多模态深度推理的基准测试) [05:37] 🔎 Establishing Trustworthy LLM Evaluation via Shortcut Neuron Analysis(基于捷径神经元分析建立可信赖的大语言模型评估体系) [06:17] 🌐 Voyager: Long-Range and World-Consistent Video Diffusion for Explorable 3D Scene Generation(航行者:用于可探索3D场景生成的长程和世界一致的视频扩散) [07:04] 💡 IllumiCraft: Unified Geometry and Illumination Diffusion for Controllable Video Generation(IllumiCraft:用于可控视频生成的统一几何与光照扩散) [07:49] 🎨 Image Editing As Programs with Diffusion Models(扩散模型驱动的图像编辑程序化) [08:27] 🎯 $Ψ$-Sampler: Initial Particle Sampling for SMC-Based Inference-Time Reward Alignment in Score Models(Ψ-采样器:基于SMC的评分模型中用于推理时奖励对齐的初始粒子采样) [09:04] 📊 VisCoder: Fine-Tuning LLMs for Executable Python Visualization Code Generation(VisCoder:微调大型语言模型以生成可执行的Python可视化代码) [09:48] 💡 Unleashing the Reasoning Potential of Pre-trained LLMs by Critique Fine-Tuning on One Problem(通过在单一问题上进行评价微调来释放预训练大型语言模型的推理潜力) [10:28] 🎬 LayerFlow: A Unified Model for Layer-aware Video Generation(LayerFlow:一种用于分层感知视频生成的统一模型) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

11分钟
99+
3个月前

2025.06.04 | 强化学习提升LLM性能;UniWorld统一视觉理解与生成。

HuggingFace 每日AI论文速递

本期的 15 篇论文如下: [00:23] 💡 Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning(反思、重试、奖励:通过强化学习实现LLM的自我提升) [01:09] 🖼 UniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation(UniWorld:用于统一视觉理解与生成的高分辨率语义编码器) [01:53] 🧪 CSVQA: A Chinese Multimodal Benchmark for Evaluating STEM Reasoning Capabilities of VLMs(CSVQA:一个用于评估视觉语言模型STEM推理能力的中文多模态基准) [02:37] 🤖 VS-Bench: Evaluating VLMs for Strategic Reasoning and Decision-Making in Multi-Agent Environments(VS-Bench:评估视觉语言模型在多智能体环境中进行战略推理和决策的能力) [03:15] 🧠 SynthRL: Scaling Visual Reasoning with Verifiable Data Synthesis(SynthRL:利用可验证数据合成扩展视觉推理) [04:01] 🧠 OmniSpatial: Towards Comprehensive Spatial Reasoning Benchmark for Vision Language Models(OmniSpatial:面向视觉语言模型的综合空间推理基准) [04:47] 🤖 Visual Embodied Brain: Let Multimodal Large Language Models See, Think, and Control in Spaces(视觉具身大脑:让多模态大型语言模型在空间中观察、思考和控制) [05:24] 👀 MotionSight: Boosting Fine-Grained Motion Understanding in Multimodal LLMs(MotionSight:提升多模态大型语言模型中的细粒度运动理解能力) [06:10] 🤖 GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents(GUI-Actor:面向GUI代理的无坐标视觉定位) [06:48] 🎬 Sparse-vDiT: Unleashing the Power of Sparse Attention to Accelerate Video Diffusion Transformers(Sparse-vDiT:释放稀疏注意力以加速视频扩散Transformer) [07:27] 🧩 DINGO: Constrained Inference for Diffusion LLMs(DINGO:扩散LLM的约束推理) [08:10] 🎬 AnimeShooter: A Multi-Shot Animation Dataset for Reference-Guided Video Generation(AnimeShooter:一个用于参考引导视频生成的多镜头动画数据集) [08:47] 🤖 Robot-R1: Reinforcement Learning for Enhanced Embodied Reasoning in Robotics(Robot-R1:用于增强机器人具身推理的强化学习) [09:35] 🤖 Co-Evolving LLM Coder and Unit Tester via Reinforcement Learning(基于强化学习的LLM代码生成器与单元测试器协同进化) [10:21] 🖼 Native-Resolution Image Synthesis(原生分辨率图像合成) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

11分钟
99+
3个月前

2025.06.03 | 高熵Token提升LLM推理;推理健身房优化强化学习环境。

HuggingFace 每日AI论文速递

本期的 15 篇论文如下: [00:22] 🧠 Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning(超越80/20法则:高熵少数Token驱动LLM推理的有效强化学习) [01:05] 🧠 REASONING GYM: Reasoning Environments for Reinforcement Learning with Verifiable Rewards(推理健身房:基于可验证奖励的强化学习推理环境) [01:46] 🤖 SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics(SmolVLA:一种用于经济高效型机器人的视觉-语言-动作模型) [02:31] 🚀 Taming LLMs by Scaling Learning Rates with Gradient Grouping(通过梯度分组调整学习率以驯服大型语言模型) [03:19] 🧩 Jigsaw-R1: A Study of Rule-based Visual Reinforcement Learning with Jigsaw Puzzles(拼图-R1:基于规则的视觉强化学习与拼图游戏研究) [04:06] 🎬 Temporal In-Context Fine-Tuning for Versatile Control of Video Diffusion Models(用于视频扩散模型多功能控制的时序上下文微调) [04:43] 🤖 ARIA: Training Language Agents with Intention-Driven Reward Aggregation(ARIA:基于意图驱动的奖励聚合训练语言智能体) [05:27] 🤖 LoHoVLA: A Unified Vision-Language-Action Model for Long-Horizon Embodied Tasks(LoHoVLA:用于长时程具身任务的统一视觉-语言-动作模型) [06:02] 🤖 ShapeLLM-Omni: A Native Multimodal LLM for 3D Generation and Understanding(ShapeLLM-Omni:用于3D生成与理解的原生多模态LLM) [06:41] 🤖 Learning Video Generation for Robotic Manipulation with Collaborative Trajectory Control(基于协作轨迹控制的机器人操作视频生成学习) [07:15] 🚀 AReaL: A Large-Scale Asynchronous Reinforcement Learning System for Language Reasoning(AReaL:用于语言推理的大规模异步强化学习系统) [07:56] 🌍 EarthMind: Towards Multi-Granular and Multi-Sensor Earth Observation with Large Multimodal Models(地球之 Mind:面向多粒度和多传感器地球观测的大型多模态模型) [08:35] 🤔 SRPO: Enhancing Multimodal LLM Reasoning via Reflection-Aware Reinforcement Learning(SRPO:通过反思感知强化学习增强多模态LLM的推理能力) [09:14] 🤖 MiCRo: Mixture Modeling and Context-aware Routing for Personalized Preference Learning(MiCRo:用于个性化偏好学习的混合建模和上下文感知路由) [09:48] 🤖 Incentivizing Reasoning for Advanced Instruction-Following of Large Language Models(激励推理以提升大型语言模型的高级指令跟随能力) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

10分钟
99+
3个月前

【月末特辑】5月最火AI论文 | 小型语言模型在翻译中表现优异;多模态推理模型发展历程综述。

HuggingFace 每日AI论文速递

本期的 10 篇论文如下: [00:40] TOP1(🔥209) | 🌐 Mutarjim: Advancing Bidirectional Arabic-English Translation with a Small Language Model(Mutarjim:利用小型语言模型推进阿拉伯语-英语双向翻译) [03:07] TOP2(🔥172) | 🧠 Perception, Reason, Think, and Plan: A Survey on Large Multimodal Reasoning Models(感知、推理、思考与规划:大型多模态推理模型综述) [05:19] TOP3(🔥171) | 🤖 Qwen3 Technical Report(Qwen3技术报告) [07:49] TOP4(🔥168) | 🚀 Absolute Zero: Reinforced Self-play Reasoning with Zero Data(绝对零度:基于零数据的强化自博弈推理) [09:39] TOP5(🔥141) | 💡 Seed1.5-VL Technical Report(Seed1.5-VL 技术报告) [12:15] TOP6(🔥140) | 🗜 Shifting AI Efficiency From Model-Centric to Data-Centric Compression(AI效率转移:从以模型为中心到以数据为中心的压缩) [14:08] TOP7(🔥126) | 💡 Emerging Properties in Unified Multimodal Pretraining(统一多模态预训练中的涌现属性) [16:30] TOP8(🔥121) | 🗣 MiniMax-Speech: Intrinsic Zero-Shot Text-to-Speech with a Learnable Speaker Encoder(MiniMax-Speech:具有可学习说话人编码器的内在零样本语音合成) [19:21] TOP9(🔥116) | 💡 Beyond 'Aha!': Toward Systematic Meta-Abilities Alignment in Large Reasoning Models(超越“灵光乍现”:迈向大型推理模型中系统性元能力对齐) [21:49] TOP10(🔥111) | 🔗 Chain-of-Model Learning for Language Model(语言模型的链式模型学习) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

24分钟
99+
3个月前

2025.06.02 | 延长RL提升推理;快慢思考优化推理。

HuggingFace 每日AI论文速递

本期的 15 篇论文如下: [00:23] 🧠 ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models(ProRL:延长的强化学习拓展大型语言模型的推理边界) [01:01] 🧠 AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time(AlphaOne:测试时驱动大模型进行快慢思考的推理框架) [01:42] 🤔 Time Blindness: Why Video-Language Models Can't See What Humans Can?(时间盲区:为何视频-语言模型无法像人类一样观察?) [02:32] 🖼 Don't Look Only Once: Towards Multimodal Interactive Reasoning with Selective Visual Revisitation(不要只看一次:迈向具有选择性视觉重访的多模态交互推理) [03:13] 📊 Large Language Models for Data Synthesis(用于数据合成的大型语言模型) [03:59] 🖼 ViStoryBench: Comprehensive Benchmark Suite for Story Visualization(ViStoryBench:故事可视化综合基准测试套件) [04:39] 🧪 HardTests: Synthesizing High-Quality Test Cases for LLM Coding(HardTests:为大型语言模型代码生成合成高质量测试用例) [05:21] 🤖 Open CaptchaWorld: A Comprehensive Web-based Platform for Testing and Benchmarking Multimodal LLM Agents(开放验证码世界:一个用于测试和评估多模态大型语言模型代理的综合性Web平台) [05:59] 🤔 Vision Language Models are Biased(视觉语言模型存在偏见) [06:41] 🦾 CoDA: Coordinated Diffusion Noise Optimization for Whole-Body Manipulation of Articulated Objects(CoDA:用于铰接物体全身操控的协同扩散噪声优化) [07:20] 🚀 CLaSp: In-Context Layer Skip for Self-Speculative Decoding(CLaSp:用于自推测解码的上下文层跳跃) [08:03] 📐 UniGeo: Taming Video Diffusion for Unified Consistent Geometry Estimation(UniGeo:驾驭视频扩散模型以实现统一的、一致的几何估计) [08:44] 🤔 MetaFaith: Faithful Natural Language Uncertainty Expression in LLMs(MetaFaith:大型语言模型中忠实的自然语言不确定性表达) [09:28] ✍ EasyText: Controllable Diffusion Transformer for Multilingual Text Rendering(EasyText:用于多语言文本渲染的可控扩散Transformer) [10:11] 🎧 Fork-Merge Decoding: Enhancing Multimodal Understanding in Audio-Visual Large Language Models(Fork-Merge解码:增强视听大型语言模型中的多模态理解) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

11分钟
99+
3个月前

2025.05.30 | 推理扩展提升表格推理;多模态模型视频反馈有待优化。

HuggingFace 每日AI论文速递

本期的 15 篇论文如下: [00:22] 📊 Table-R1: Inference-Time Scaling for Table Reasoning(Table-R1:表格推理的推理时扩展) [01:02] 🤖 VF-Eval: Evaluating Multimodal LLMs for Generating Feedback on AIGC Videos(VF-Eval:评估多模态大语言模型生成AIGC视频反馈的能力) [01:45] 🧠 Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence(Spatial-MLLM:提升多模态大语言模型在基于视觉的空间智能方面的能力) [02:25] 🧠 The Climb Carves Wisdom Deeper Than the Summit: On the Noisy Rewards in Learning to Reason(行胜于言:论证推理学习中的噪声奖励) [03:11] 🤖 ZeroGUI: Automating Online GUI Learning at Zero Human Cost(ZeroGUI:零人工成本的在线GUI学习自动化) [03:45] 🤔 VideoReasonBench: Can MLLMs Perform Vision-Centric Complex Video Reasoning?(VideoReasonBench:多模态大语言模型能否执行以视觉为中心的复杂视频推理?) [04:39] 🧬 Satori-SWE: Evolutionary Test-Time Scaling for Sample-Efficient Software Engineering(Satori-SWE: 面向高效软件工程的演化测试时扩展) [05:15] 🤔 Are Reasoning Models More Prone to Hallucination?(推理模型更容易产生幻觉吗?) [05:51] 🤖 cadrille: Multi-modal CAD Reconstruction with Online Reinforcement Learning(cadrille:基于在线强化学习的多模态CAD重建) [06:29] 🎨 D-AR: Diffusion via Autoregressive Models(D-AR:基于自回归模型的扩散) [07:16] 📸 AnySplat: Feed-forward 3D Gaussian Splatting from Unconstrained Views(AnySplat:来自非约束视角的Feed-forward 3D高斯溅射) [07:53] 🛠 SWE-bench Goes Live!(SWE-bench-Live:一个实时更新的问题解决基准评测) [08:36] 💡 Multi-Domain Explainability of Preferences(偏好的多领域可解释性) [09:16] 🤖 UniRL: Self-Improving Unified Multimodal Models via Supervised and Reinforcement Learning(UniRL:基于监督学习和强化学习的自提升统一多模态模型) [10:01] 🗣 FAMA: The First Large-Scale Open-Science Speech Foundation Model for English and Italian(FAMA:首个面向英语和意大利语的大规模开放科学语音基础模型) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

11分钟
59
3个月前

2025.05.29 | 熵机制提升模型性能;令牌路由优化推理效率。

HuggingFace 每日AI论文速递

本期的 15 篇论文如下: [00:22] 🤖 The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models(用于推理语言模型的强化学习的熵机制) [00:56] 🛣 R2R: Efficiently Navigating Divergent Reasoning Paths with Small-Large Model Token Routing(R2R:通过大小模型令牌路由高效导航不同的推理路径) [01:40] 🧠 Skywork Open Reasoner 1 Technical Report(Skywork开放推理器1技术报告) [02:20] 🔍 Sherlock: Self-Correcting Reasoning in Vision-Language Models(夏洛克:视觉-语言模型中的自我纠正推理) [02:55] 🤖 Unsupervised Post-Training for Multi-Modal LLM Reasoning via GRPO(基于GRPO的无监督后训练提升多模态LLM推理能力) [03:35] 🤖 SWE-rebench: An Automated Pipeline for Task Collection and Decontaminated Evaluation of Software Engineering Agents(SWE-rebench:一个用于软件工程代理任务收集和去污染评估的自动化流程) [04:25] 🚀 SageAttention2++: A More Efficient Implementation of SageAttention2(SageAttention2++:一种更高效的SageAttention2实现) [05:12] 🧠 Advancing Multimodal Reasoning via Reinforcement Learning with Cold Start(通过强化学习与冷启动推进多模态推理) [05:59] 🎬 Fostering Video Reasoning via Next-Event Prediction(通过预测下一事件促进视频推理) [06:42] 💡 RenderFormer: Transformer-based Neural Rendering of Triangle Meshes with Global Illumination(RenderFormer:基于Transformer的三角形网格全局光照神经渲染) [07:25] 🔬 DeepResearchGym: A Free, Transparent, and Reproducible Evaluation Sandbox for Deep Research(DeepResearchGym:一个免费、透明且可复现的深度研究评估沙盒) [08:16] 🖼 Chain-of-Zoom: Extreme Super-Resolution via Scale Autoregression and Preference Alignment(链式缩放:通过尺度自回归和偏好对齐实现极限超分辨率) [08:58] 🧩 Universal Reasoner: A Single, Composable Plug-and-Play Reasoner for Frozen LLMs(通用推理器:一个用于冻结LLM的单一、可组合的即插即用推理器) [09:38] 🚚 SVRPBench: A Realistic Benchmark for Stochastic Vehicle Routing Problem(SVRPBench:一个面向随机车辆路径问题的真实基准) [10:26] 🌐 Judging Quality Across Languages: A Multilingual Approach to Pretraining Data Filtering with Language Models(跨语言质量评估:一种基于语言模型的多语种预训练数据过滤方法) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

11分钟
52
3个月前

2025.05.28 | 多模态Agent科研任务成功率低;逻辑推理模型存在显著局限。

HuggingFace 每日AI论文速递

本期的 15 篇论文如下: [00:23] 🧪 ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows(ScienceBoard:评估现实科学工作流程中的多模态自主Agent) [01:09] 🤔 MME-Reasoning: A Comprehensive Benchmark for Logical Reasoning in MLLMs(MME-推理:多模态大型语言模型中逻辑推理的综合基准) [01:51] 🖼 Paper2Poster: Towards Multimodal Poster Automation from Scientific Papers(Paper2Poster:基于科研论文的多模态海报自动生成) [02:28] 🎨 OmniConsistency: Learning Style-Agnostic Consistency from Paired Stylization Data(OmniConsistency:从配对风格化数据中学习与风格无关的一致性) [03:06] 🎬 OpenS2V-Nexus: A Detailed Benchmark and Million-Scale Dataset for Subject-to-Video Generation(OpenS2V-Nexus:一个用于主题驱动视频生成的详细基准和百万级数据集) [03:50] 🧠 SynLogic: Synthesizing Verifiable Reasoning Data at Scale for Learning Logical Reasoning and Beyond(SynLogic:大规模合成可验证推理数据,用于学习逻辑推理及其他能力) [04:32] 💡 Exploring the Latent Capacity of LLMs for One-Step Text Generation(探索大型语言模型在一步文本生成中的潜在能力) [05:13] 🧠 VerIPO: Cultivating Long Reasoning in Video-LLMs via Verifier-Gudied Iterative Policy Optimization(VerIPO:通过验证器引导的迭代策略优化,培养视频大型语言模型中的长期推理能力) [05:48] 🤔 Don't Overthink it. Preferring Shorter Thinking Chains for Improved LLM Reasoning(别想太多:偏好更短的思维链以提升大型语言模型的推理能力) [06:29] 🤔 MMMR: Benchmarking Massive Multi-Modal Reasoning Tasks(MMMR:大规模多模态推理任务的基准测试) [07:09] 🤖 UI-Genie: A Self-Improving Approach for Iteratively Boosting MLLM-based Mobile GUI Agents(UI-Genie:一种迭代提升基于MLLM的移动GUI代理的自提升方法) [07:52] 🎬 Sparse VideoGen2: Accelerate Video Generation with Sparse Attention via Semantic-Aware Permutation(Sparse VideoGen2:通过语义感知置换和稀疏注意力加速视频生成) [08:28] 📹 MME-VideoOCR: Evaluating OCR-Based Capabilities of Multimodal LLMs in Video Scenarios(MME-VideoOCR:评估多模态大型语言模型在视频场景中基于OCR的能力) [09:16] 🧩 GraLoRA: Granular Low-Rank Adaptation for Parameter-Efficient Fine-Tuning(GraLoRA:用于参数高效微调的细粒度低秩适配) [10:02] 🕵 Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning?(Video-Holmes:多模态大语言模型能否像福尔摩斯一样进行复杂的视频推理?) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

11分钟
99+
3个月前

2025.05.27 | AI效率提升需数据压缩;小型模型翻译更高效。

HuggingFace 每日AI论文速递

本期的 15 篇论文如下: [00:24] 🗜 Shifting AI Efficiency From Model-Centric to Data-Centric Compression(AI效率转移:从以模型为中心到以数据为中心的压缩) [01:05] 🌐 Mutarjim: Advancing Bidirectional Arabic-English Translation with a Small Language Model(Mutarjim:利用小型语言模型推进阿拉伯语-英语双向翻译) [02:00] 📊 BizFinBench: A Business-Driven Real-World Financial Benchmark for Evaluating LLMs(BizFinBench:一个用于评估大型语言模型在业务驱动的真实金融场景表现的基准) [02:40] 🖼 Alchemist: Turning Public Text-to-Image Data into Generative Gold(炼金术士:将公共文本到图像数据转化为生成式金矿) [03:18] 🧠 Embodied Agents Meet Personalization: Exploring Memory Utilization for Personalized Assistance(具身智能体与个性化相遇:探索用于个性化辅助的记忆利用) [03:59] 🧠 PATS: Process-Level Adaptive Thinking Mode Switching(PATS:过程级自适应思维模式切换) [04:52] 🧠 ARM: Adaptive Reasoning Model(自适应推理模型) [05:37] 🧩 Enigmata: Scaling Logical Reasoning in Large Language Models with Synthetic Verifiable Puzzles(谜题:利用合成可验证谜题扩展大型语言模型的逻辑推理能力) [06:18] 🤖 B-score: Detecting biases in large language models using response history(B-score:利用响应历史检测大型语言模型中的偏见) [06:58] 🧠 Deciphering Trajectory-Aided LLM Reasoning: An Optimization Perspective(解析轨迹辅助的大语言模型推理:一个优化的视角) [07:39] 🛡 Lifelong Safety Alignment for Language Models(语言模型的终身安全对齐) [08:14] 🧪 MOOSE-Chem2: Exploring LLM Limits in Fine-Grained Scientific Hypothesis Discovery via Hierarchical Search(MOOSE-Chem2: 探索大型语言模型在基于层级搜索的精细化科学假设发现中的能力极限) [09:00] 🗺 Can MLLMs Guide Me Home? A Benchmark Study on Fine-Grained Visual Reasoning from Transit Maps(多模态大语言模型能指引我回家吗?基于交通地图的细粒度视觉推理基准研究) [09:43] 🧮 Surrogate Signals from Format and Length: Reinforcement Learning for Solving Mathematical Problems without Ground Truth Answers(来自格式和长度的替代信号:用于解决没有标准答案的数学问题的强化学习) [10:28] 🧠 Reinforcement Fine-Tuning Powers Reasoning Capability of Multimodal Large Language Models(强化微调驱动多模态大语言模型的推理能力) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

11分钟
99+
3个月前

2025.05.26 | TabSTAR提升表格数据分类性能;QwenLong-L1优化长文本推理

HuggingFace 每日AI论文速递

本期的 15 篇论文如下: [00:23] 📊 TabSTAR: A Foundation Tabular Model With Semantically Target-Aware Representations(TabSTAR:具有语义目标感知表征的表格基础模型) [00:59] 🧠 QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning(QwenLong-L1:基于强化学习的长文本大型推理模型) [01:43] 🤔 Reasoning Model is Stubborn: Diagnosing Instruction Overriding in Reasoning Models(推理模型是顽固的:诊断推理模型中的指令覆盖问题) [02:19] 🚀 Quartet: Native FP4 Training Can Be Optimal for Large Language Models(Quartet:原生FP4训练对于大型语言模型是最优的) [03:01] 🤖 One RL to See Them All: Visual Triple Unified Reinforcement Learning(万法归一:视觉三元统一强化学习) [03:36] 🤖 Distilling LLM Agent into Small Models with Retrieval and Code Tools(利用检索和代码工具将大型语言模型Agent提炼到小型模型中) [04:21] 🤔 PhyX: Does Your Model Have the "Wits" for Physical Reasoning?(PhyX:你的模型具备物理推理的“智慧”吗?) [05:02] ♾ QwenLong-CPRS: Towards $\infty$-LLMs with Dynamic Context Optimization(QwenLong-CPRS:通过动态上下文优化迈向无限长的语言模型) [05:46] 🧬 Scaling Image and Video Generation via Test-Time Evolutionary Search(基于测试时演化搜索的图像和视频生成扩展) [06:21] 🎬 Model Already Knows the Best Noise: Bayesian Active Noise Selection via Attention in Video Diffusion Model(模型早已知晓最佳噪声:视频扩散模型中基于注意力的贝叶斯主动噪声选择) [07:06] 🤔 VeriThinker: Learning to Verify Makes Reasoning Model Efficient(VeriThinker:通过学习验证来提高推理模型的效率) [07:45] 🧪 MOOSE-Chem3: Toward Experiment-Guided Hypothesis Ranking via Simulated Experimental Feedback(MOOSE-Chem3:通过模拟实验反馈实现实验指导下的假设排序) [08:27] 🎧 AudioTrust: Benchmarking the Multifaceted Trustworthiness of Audio Large Language Models(AudioTrust:音频大语言模型多方面可信度基准测试) [09:10] 💻 FullFront: Benchmarking MLLMs Across the Full Front-End Engineering Workflow(FullFront:跨越完整前端工程工作流程的多模态大语言模型基准测试) [09:51] 🤥 Teaching with Lies: Curriculum DPO on Synthetic Negatives for Hallucination Detection(谎言教学:基于合成负样本的课程DPO用于幻觉检测) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

11分钟
69
3个月前
EarsOnMe

加入我们的 Discord

与播客爱好者一起交流

立即加入

扫描微信二维码

添加微信好友,获取更多播客资讯

微信二维码

播放列表

自动播放下一个

播放列表还是空的

去找些喜欢的节目添加进来吧