节目列表: HuggingFace 每日AI论文速递 - EarsOnMe

2025.09.26 | SciReasoner八项全能；MMR1模糊区炼出开源多模态

本期的 15 篇论文如下： [00:20] 🔬 SciReasoner: Laying the Scientific Reasoning Ground Across Disciplines（SciReasoner：跨学科夯实科学推理基石） [01:00] 🧠 MMR1: Enhancing Multimodal Reasoning with Variance-Aware Sampling and Open Resources（MMR1：基于方差感知采样与开放资源的多模态推理增强） [01:41] 📈 VCRL: Variance-based Curriculum Reinforcement Learning for Large Language Models（VCRL：面向大语言模型的方差驱动课程强化学习） [02:26] 🌳 Tree Search for LLM Agent Reinforcement Learning（基于树搜索的大语言模型智能体强化学习） [03:06] 🖼 Seedream 4.0: Toward Next-generation Multimodal Image Generation（Seedream 4.0：面向下一代多模态图像生成） [03:40] 🎯 Hunyuan3D-Omni: A Unified Framework for Controllable Generation of 3D Assets（Hunyuan3D-Omni：统一可控3D资产生成框架） [04:29] 🤖 AutoIntent: AutoML for Text Classification（AutoIntent：面向文本分类任务的自动化机器学习框架） [05:10] ⚖ TrustJudge: Inconsistencies of LLM-as-a-Judge and How to Alleviate Them（TrustJudge：LLM-as-a-Judge的评分不一致性及缓解之道） [05:43] 🎢 CE-GPPO: Controlling Entropy via Gradient-Preserving Clipping Policy Optimization in Reinforcement Learning（CE-GPPO：通过梯度保留裁剪策略优化控制强化学习中的熵） [06:30] 🖼 Does FLUX Already Know How to Perform Physically Plausible Image Composition?（FLUX已掌握物理可信图像合成？） [07:31] ✂ CHARM: Control-point-based 3D Anime Hairstyle Auto-Regressive Modeling（CHARM：基于控制点的3D动漫发型自回归建模） [08:26] 🧠 Recon-Act: A Self-Evolving Multi-Agent Browser-Use System via Web Reconnaissance, Tool Generation, and Task Execution（Recon-Act：基于网络侦察、工具生成与任务执行的自我演化多智能体浏览器操作系统） [09:12] 🎮 V-GameGym: Visual Game Generation for Code Large Language Models（V-GameGym：面向代码大模型的视觉游戏生成基准） [09:49] 🗣 Interactive Recommendation Agent with Active User Commands（支持主动用户指令的交互式推荐智能体） [10:22] 🔍 BESPOKE: Benchmark for Search-Augmented Large Language Model Personalization via Diagnostic Feedback（BESPOKE：基于诊断反馈的搜索增强大模型个性化评测基准）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

2025.09.25 | 视频模型零样本全能；隐式思维链省token提效

本期的 10 篇论文如下： [00:22] 🎥 Video models are zero-shot learners and reasoners（视频模型是零样本学习者与推理者） [01:09] 🧠 SIM-CoT: Supervised Implicit Chain-of-Thought（SIM-CoT：基于监督式隐式思维链的高效推理） [01:55] 🪶 EmbeddingGemma: Powerful and Lightweight Text Representations（EmbeddingGemma：强大而轻量的文本表征模型） [02:29] 🗣 Advancing Speech Understanding in Speech-Aware Language Models with GRPO（基于GRPO提升语音感知大模型开放域理解能力） [03:06] 🌍 LLMs4All: A Review on Large Language Models for Research and Applications in Academic Disciplines（LLMs4All：面向各学科研究与应用的通用大模型综述） [03:52] 🎬 EditVerse: Unifying Image and Video Editing and Generation with In-Context Learning（EditVerse：用上下文学习统一图像与视频编辑生成） [04:29] 🌀 Lavida-O: Elastic Large Masked Diffusion Models for Unified Multimodal Understanding and Generation（Lavida-O：弹性大掩码扩散模型统一多模态理解与生成） [05:19] 🎬 PhysCtrl: Generative Physics for Controllable and Physics-Grounded Video Generation（PhysCtrl：基于生成式物理的可控且物理真实的视频生成框架） [05:58] 📄 Logics-Parsing Technical Report（Logics-Parsing 技术报告：基于强化学习的大模型端到端文档解析） [06:44] 🤖 On the Use of Agentic Coding: An Empirical Study of Pull Requests on GitHub（关于自主编码的实证研究：GitHub上由AI代理发起的拉取请求分析）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

7分钟

2025.09.24 | 阿语OCR刷新指标；无标注RL涨分

本期的 15 篇论文如下： [00:24] 📜 Baseer: A Vision-Language Model for Arabic Document-to-Markdown OCR（Baseer：面向阿拉伯文档OCR的视觉-语言模型） [00:58] 🚀 Reinforcement Learning on Pre-Training Data（基于预训练数据的强化学习） [01:37] 👁 Do You Need Proprioceptive States in Visuomotor Policies?（无需本体感觉状态的视觉-运动策略是否可行？） [02:36] 🚀 MiniCPM-V 4.5: Cooking Efficient MLLMs via Architecture, Data, and Training Recipe（MiniCPM-V 4.5：通过架构、数据与训练配方烹饪高效多模态大模型） [03:24] 🎯 MAPO: Mixed Advantage Policy Optimization（混合优势策略优化：解决GRPO中优势分配难题） [04:06] 🚀 Hyper-Bagel: A Unified Acceleration Framework for Multimodal Understanding and Generation（Hyper-Bagel：统一加速多模态理解与生成的一体化框架） [04:44] 🎯 VolSplat: Rethinking Feed-Forward 3D Gaussian Splatting with Voxel-Aligned Prediction（VolSplat：基于体素对齐预测的前馈3D高斯抛雪球重建新范式） [05:31] 🌌 Lyra: Generative 3D Scene Reconstruction via Video Diffusion Model Self-Distillation（Lyra：基于视频扩散模型自蒸馏的生成式3D场景重建） [06:08] 🧩 What Characterizes Effective Reasoning? Revisiting Length, Review, and Structure of CoT（有效推理的密码：重新审视思维链长度、回顾与结构） [06:41] 🗣 Large Language Models Discriminate Against Speakers of German Dialects（大型语言模型对德语方言使用者的歧视） [07:32] 📊 OpenGVL - Benchmarking Visual Temporal Progress for Data Curation（OpenGVL——面向数据整理的视觉时序进展评测基准） [08:19] 🪄 HyRF: Hybrid Radiance Fields for Memory-efficient and High-quality Novel View Synthesis（HyRF：混合辐射场实现内存高效且高质量的新视角合成） [09:07] 🛠 CAR-Flow: Condition-Aware Reparameterization Aligns Source and Target for Better Flow Matching（条件感知重参数化对齐源域与目标域的流匹配） [09:41] 🛰 Zero-Shot Multi-Spectral Learning: Reimagining a Generalist Multimodal Gemini 2.5 Model for Remote Sensing Applications（零样本多光谱学习：让通用多模态Gemini 2.5模型在遥感任务中重焕新生） [10:28] 🌍 VIR-Bench: Evaluating Geospatial and Temporal Understanding of MLLMs via Travel Video Itinerary Reconstruction（VIR-Bench：通过旅行视频行程重建评测多模态大模型的地理-时空理解力）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

2025.09.23 | 少78条示范让AI飙73.5%；免掩膜视频插主体超Pika

本期的 15 篇论文如下： [00:21] 🚀 LIMI: Less is More for Agency（LIMI：少即是多，打造AI智能体） [00:55] 🎬 OmniInsert: Mask-Free Video Insertion of Any Reference via Diffusion Transformer Models（无需掩膜的视频任意主体插入：基于扩散Transformer模型） [01:28] 🧩 OnePiece: Bringing Context Engineering and Reasoning to Industrial Cascade Ranking System（OnePiece：面向工业级级联排序系统的上下文工程与推理融合框架） [02:19] 🌐 Qwen3-Omni Technical Report（Qwen3-Omni技术报告：首个无性能损耗的全模态大模型） [02:55] 🎬 TempSamp-R1: Effective Temporal Sampling with Reinforcement Fine-Tuning for Video LLMs（TempSamp-R1：面向视频时序定位任务的高效离策略强化微调框架） [03:28] 📐 GeoPQA: Bridging the Visual Perception Gap in MLLMs for Geometric Reasoning（GeoPQA：弥合多模态大模型几何推理中的视觉感知鸿沟） [04:15] 🎯 DiffusionNFT: Online Diffusion Reinforcement with Forward Process（DiffusionNFT：基于前向过程在线扩散强化学习） [05:05] 🤖 ByteWrist: A Parallel Robotic Wrist Enabling Flexible and Anthropomorphic Motion for Confined Spaces（ByteWrist：面向狭窄空间的可穿戴并行机器人腕关节） [05:42] 💬 EpiCache: Episodic KV Cache Management for Long Conversational Question Answering（EpiCache：面向长对话问答的情景式KV缓存管理） [06:24] 🧠 SWE-Bench Pro: Can AI Agents Solve Long-Horizon Software Engineering Tasks?（SWE-Bench Pro：AI智能体能攻克长周期软件工程难题吗？） [07:01] 🧠 FlagEval Findings Report: A Preliminary Evaluation of Large Reasoning Models on Automatically Verifiable Textual and Visual Questions（FlagEval发现报告：大推理模型在可自动验证文本与视觉问题上的初步测评） [08:05] 🎬 VideoFrom3D: 3D Scene Video Generation via Complementary Image and Video Diffusion Models（VideoFrom3D：基于互补图像与视频扩散模型的3D场景视频生成） [08:53] 🧪 ARE: Scaling Up Agent Environments and Evaluations（ARE：扩展智能体环境与评测规模） [09:28] 🧩 QWHA: Quantization-Aware Walsh-Hadamard Adaptation for Parameter-Efficient Fine-Tuning on Large Language Models（QWHA：面向大模型量化部署的沃尔什-哈达玛参数高效微调方法） [10:17] 🔍 Analyzing the Effects of Supervised Fine-Tuning on Model Knowledge from Token and Parameter Levels（从token与参数双视角解析监督微调对模型知识的影响）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

2025.09.22 | 有向图驱动代码生成；双通道视觉统一模型

本期的 13 篇论文如下： [00:25] 🗺 RPG: A Repository Planning Graph for Unified and Scalable Codebase Generation（RPG：用于统一可扩展代码库生成的仓库规划图） [01:00] 🌉 MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer（MANZANO：基于混合视觉词元器的简洁可扩展统一多模态模型） [01:42] 🧩 Latent Zoning Network: A Unified Principle for Generative Modeling, Representation Learning, and Classification（潜区分网络：生成建模、表示学习与分类的统一原理） [02:25] 🎯 BaseReward: A Strong Baseline for Multimodal Reward Model（BaseReward：多模态奖励模型的强力基线） [02:56] 🏠 SPATIALGEN: Layout-guided 3D Indoor Scene Generation（SpatialGen：布局引导的3D室内场景生成） [03:46] 🧠 BTL-UI: Blink-Think-Link Reasoning Model for GUI Agent（BTL-UI：面向GUI智能体的“眨眼-思考-连接”脑启发推理模型） [04:30] 🎭 Lynx: Towards High-Fidelity Personalized Video Generation（Lynx：面向高保真个性化视频生成） [05:20] 🤖 A Vision-Language-Action-Critic Model for Robotic Real-World Reinforcement Learning（用于机器人真实强化学习的视觉-语言-动作-评价模型） [05:54] 📹 RGB-Only Supervised Camera Parameter Optimization in Dynamic Scenes（动态场景下仅基于RGB视频监督的相机参数优化） [06:21] 🗣 Do You Hear What I Mean? Quantifying the Instruction-Perception Gap in Instruction-Guided Expressive Text-To-Speech Systems（你听见的是我想表达的吗？量化指令感知差距的表达型文本转语音系统研究） [07:07] 🎬 Video2Roleplay: A Multimodal Dataset and Framework for Video-Guided Role-playing Agents（Video2Roleplay：面向视频引导角色扮演智能体的多模态数据集与框架） [07:50] 🗣 WhisTLE: Deeply Supervised, Text-Only Domain Adaptation for Pretrained Speech Recognition Transformers（WhisTLE：面向预训练语音识别Transformer的纯文本深度监督域适应方法） [08:30] 🗣 Ask-to-Clarify: Resolving Instruction Ambiguity through Multi-turn Dialogue（主动询问以澄清：通过多轮对话消解指令歧义）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

9分钟

96

【周末特辑】9月第4周最火AI论文 | OmniWorld打造4D数据工厂；WebWeaver让AI边搜边写

本期的 5 篇论文如下： [00:43] TOP1(🔥95) | 🌍 OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling（OmniWorld：面向4D世界建模的多领域多模态大规模数据集） [02:51] TOP2(🔥93) | 🔍 WebWeaver: Structuring Web-Scale Evidence with Dynamic Outlines for Open-Ended Deep Research（WebWeaver：面向开放型深度研究的动态提纲式网络证据结构化框架） [05:09] TOP3(🔥91) | 🤖 Scaling Agents via Continual Pre-training（基于持续预训练扩展智能体系统规模的研究） [07:33] TOP4(🔥88) | 🖥 ScaleCUA: Scaling Open-Source Computer Use Agents with Cross-Platform Data（ScaleCUA：基于跨平台数据的开源计算机智能体规模化方案） [10:48] TOP5(🔥79) | 🌊 FlowRL: Matching Reward Distributions for LLM Reasoning（FlowRL：通过流匹配奖励分布提升大语言模型推理能力）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

13分钟

2025.09.19 | 跨平台GUI模型刷榜；FlowRL分布匹配提推理

本期的 15 篇论文如下： [00:26] 🖥 ScaleCUA: Scaling Open-Source Computer Use Agents with Cross-Platform Data（ScaleCUA：基于跨平台数据的开源计算机智能体规模化方案） [01:01] 🌊 FlowRL: Matching Reward Distributions for LLM Reasoning（FlowRL：通过流匹配奖励分布提升大语言模型推理能力） [01:57] 🧭 Reasoning over Boundaries: Enhancing Specification Alignment via Test-time Delibration（跨越边界推理：借助测试时深思提升规范对齐） [02:55] 🧬 Evolving Language Models without Labels: Majority Drives Selection, Novelty Promotes Variation（无需标签即可让语言模型自我进化：多数选择驱动，新颖性促进变异） [03:34] 🎨 Understand Before You Generate: Self-Guided Training for Autoregressive Image Generation（先理解再生成：面向自回归图像生成的自引导训练） [04:12] 🔍 FinSearchComp: Towards a Realistic, Expert-Level Evaluation of Financial Search and Reasoning（FinSearchComp：迈向真实专家级金融搜索与推理评测） [04:56] 🤖 RynnVLA-001: Using Human Demonstrations to Improve Robot Manipulation（RynnVLA-001：利用人类示范提升机器人操作能力） [05:39] 🔮 AToken: A Unified Tokenizer for Vision（AToken：面向视觉的统一Tokenizer） [06:10] 🌌 WorldForge: Unlocking Emergent 3D/4D Generation in Video Diffusion Model via Training-Free Guidance（WorldForge：无需训练即可在视频扩散模型中解锁3D/4D生成的涌现能力） [06:58] 🖼 MultiEdit: Advancing Instruction-based Image Editing on Diverse and Challenging Tasks（MultiEdit：面向多样复杂任务的指令式图像编辑新突破） [07:54] 🎮 RecoWorld: Building Simulated Environments for Agentic Recommender Systems（RecoWorld：为智能推荐系统打造仿真训练沙盒） [08:28] 🎯 Unleashing the Potential of Multimodal LLMs for Zero-Shot Spatio-Temporal Video Grounding（释放多模态大模型零样本时空视频定位潜能） [09:03] 🔍 Mind the Gap: A Closer Look at Tokenization for Multiple-Choice Question Answering with LLMs（留意空格：面向LLM选择题问答的Tokenization再审视） [09:51] 🩺 EchoVLM: Dynamic Mixture-of-Experts Vision-Language Model for Universal Ultrasound Intelligence（EchoVLM：面向通用超声智能的动态混合专家视觉-语言模型） [10:34] 🛰 FSG-Net: Frequency-Spatial Synergistic Gated Network for High-Resolution Remote Sensing Change Detection（FSG-Net：频-空协同门控网络用于高分辨率遥感变化检测）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

2025.09.18 | FP8压缩+翻译微调低成本炼阿语大模型；2B-8B小模型洗数据硬刚GPT-4o

本期的 14 篇论文如下： [00:19] 🐪 Hala Technical Report: Building Arabic-Centric Instruction & Translation Models at Scale（Hala技术报告：规模化构建阿拉伯语为中心的指令与翻译模型） [00:56] 🚀 SAIL-VL2 Technical Report（SAIL-VL2技术报告） [01:42] 🌐 PANORAMA: The Rise of Omnidirectional Vision in the Embodied AI Era（全景视界：具身AI时代的360°视觉崛起） [02:33] 🎓 GenExam: A Multidisciplinary Text-to-Image Exam（GenExam：多学科文本到图像生成考试基准） [03:25] 🧹 Scrub It Out! Erasing Sensitive Memorization in Code Language Models via Machine Unlearning（擦除敏感记忆！用机器遗忘技术为代码大模型“去隐私”） [03:59] 🩺 MedResearcher-R1: Expert-Level Medical Deep Researcher via A Knowledge-Informed Trajectory Synthesis Framework（MedResearcher-R1：基于知识引导轨迹合成的专家级医学深度研究智能体） [04:37] 🔍 MARS2 2025 Challenge on Multimodal Reasoning: Datasets, Methods, Results, Discussion, and Outlook（MARS2 2025多模态推理挑战赛：数据集、方法、结果、讨论与展望） [05:22] 🎭 Wan-Animate: Unified Character Animation and Replacement with Holistic Replication（Wan-Animate：统一角色动画与替换的完整复现框架） [05:59] 🧮 THOR: Tool-Integrated Hierarchical Optimization via RL for Mathematical Reasoning（THOR：融合工具的分层强化学习优化数学推理） [06:40] 🔍 Improving Context Fidelity via Native Retrieval-Augmented Reasoning（提升上下文保真度的原生检索增强推理方法） [07:20] 🌍 AERIS: Argonne Earth Systems Model for Reliable and Skillful Predictions（AERIS：面向可靠且高技巧地球系统预测的阿尔贡地球系统模型） [08:13] 🎛 SteeringControl: Holistic Evaluation of Alignment Steering in LLMs（SteeringControl：对大模型对齐操控的全景评估） [08:48] ⚛ Quantum Variational Activation Functions Empower Kolmogorov-Arnold Networks（量子变分激活函数赋能Kolmogorov-Arnold网络） [09:37] 🚀 Hybrid Quantum-Classical Model for Image Classification（用于图像分类的混合量子-经典模型）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

10分钟

2025.09.17 | WebWeaver框架提升可信长文报告；Agentic预训练扩展智能体系统

本期的 11 篇论文如下： [00:27] 🔍 WebWeaver: Structuring Web-Scale Evidence with Dynamic Outlines for Open-Ended Deep Research（WebWeaver：面向开放型深度研究的动态提纲式网络证据结构化框架） [01:08] 🤖 Scaling Agents via Continual Pre-training（基于持续预训练扩展智能体系统规模的研究） [01:52] ⛵ WebSailor-V2: Bridging the Chasm to Proprietary Agents via Synthetic Data and Scalable Reinforcement Learning（WebSailor-V2：依托合成数据与可扩展强化学习跨越开源与私有代理鸿沟） [02:36] 🧠 Towards General Agentic Intelligence via Environment Scaling（迈向通用智能体的环境规模化之路） [03:09] 🔍 WebResearcher: Unleashing unbounded reasoning capability in Long-Horizon Agents（WebResearcher：在长程智能体中释放无界推理能力） [03:59] 🧠 ReSum: Unlocking Long-Horizon Search Intelligence via Context Summarization（ReSum：基于上下文压缩的无限视界搜索智能解锁） [04:39] 🚀 Single-stream Policy Optimization（单流策略优化：大语言模型强化学习的去组化革新） [05:19] 🎮 Hunyuan3D Studio: End-to-End AI Pipeline for Game-Ready 3D Asset Generation（Hunyuan3D工作室：面向游戏级3D资产生成的端到端AI管线） [06:00] 🧩 3D Aware Region Prompted Vision Language Model（具备3D感知能力的区域提示视觉语言模型） [06:36] 💡 EconProver: Towards More Economical Test-Time Scaling for Automated Theorem Proving（EconProver：面向自动定理证明的更经济测试时扩展方法） [07:07] ⚛ Exact Coset Sampling for Quantum Lattice Algorithms（量子格点算法的精确陪集采样）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

8分钟

2025.09.16 | OmniWorld建4D数据底座；UI-S1半在线驯界面代理

本期的 14 篇论文如下： [00:24] 🌍 OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling（OmniWorld：面向4D世界建模的多领域多模态大规模数据集） [01:12] 🤖 UI-S1: Advancing GUI Automation via Semi-online Reinforcement Learning（UI-S1：基于半在线强化学习的图形界面自动化新进展） [01:51] 🏠 InternScenes: A Large-scale Simulatable Indoor Scene Dataset with Realistic Layouts（InternScenes：具备真实布局的大规模可模拟室内场景数据集） [02:27] 🖱 LazyDrag: Enabling Stable Drag-Based Editing on Multi-Modal Diffusion Transformers via Explicit Correspondence（LazyDrag：通过显式对应关系在多模态扩散Transformer上实现稳定拖拽编辑） [02:58] 📊 Locality in Image Diffusion Models Emerges from Data Statistics（图像扩散模型中的局部性源于数据统计特性） [03:29] 🤔 Measuring Epistemic Humility in Multimodal Large Language Models（多模态大模型中的认知谦逊评估研究） [03:57] 🤖 Nav-R1: Reasoning and Navigation in Embodied Scenes（Nav-R1：具身场景中的推理与导航） [04:25] 🔍 Lost in Embeddings: Information Loss in Vision-Language Models（迷失在嵌入空间：视觉-语言模型中的信息损失） [04:54] 🌐 CognitiveSky: Scalable Sentiment and Narrative Analysis for Decentralized Social Media（CognitiveSky：面向去中心化社交媒体的情感与叙事可扩展分析框架） [05:19] 🔍 Look Again, Think Slowly: Enhancing Visual Reflection in Vision-Language Models（再审视，慢思考：增强视觉语言模型的视觉反思能力） [05:57] 🧠 EthicsMH: A Pilot Benchmark for Ethical Reasoning in Mental Health AI（心理健康AI伦理推理的试验基准：EthicsMH） [06:30] ⚖ Learning to Optimize Multi-Objective Alignment Through Dynamic Reward Weighting（通过动态奖励加权实现多目标对齐优化学习） [07:16] 🧠 PersonaX: Multimodal Datasets with LLM-Inferred Behavior Traits（PersonaX：基于大语言模型推断行为特质的多模态数据集） [07:52] 🔍 GAPrune: Gradient-Alignment Pruning for Domain-Aware Embeddings（GAPrune：面向领域感知嵌入的梯度对齐剪枝方法）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

9分钟

2025.09.15 | 数据集升级测互动；模型大小非长程瓶颈

本期的 14 篇论文如下： [00:25] 📚 IntrEx: A Dataset for Modeling Engagement in Educational Conversations（IntrEx：面向教育对话中参与度建模的数据集） [01:02] 📏 The Illusion of Diminishing Returns: Measuring Long Horizon Execution in LLMs（“收益递减的幻觉”：衡量大语言模型的长时程执行能力） [01:54] 🧩 X-Part: high fidelity and structure coherent shape decomposition（X-Part：高保真且结构一致的三维形状分解） [02:33] 🖼 InfGen: A Resolution-Agnostic Paradigm for Scalable Image Synthesis（InfGen：分辨率无关的可扩展图像合成新范式） [03:04] 🔍 HANRAG: Heuristic Accurate Noise-resistant Retrieval-Augmented Generation for Multi-hop Question Answering（HANRAG：面向多跳问答的启发式精准抗噪检索增强生成方法） [03:50] 🎙 VStyle: A Benchmark for Voice Style Adaptation with Spoken Instructions（VStyle：基于语音指令的语音风格自适应基准） [04:44] 🌸 FLOWER: Democratizing Generalist Robot Policies with Efficient Vision-Language-Action Flow Policies（FLOWER：以高效视觉-语言-动作流策略普及通用机器人策略） [05:20] 🎨 Inpainting-Guided Policy Optimization for Diffusion Large Language Models（面向扩散大语言模型的基于文本补全引导的策略优化方法） [05:58] 🤖 Virtual Agent Economies（虚拟代理经济） [06:28] 📈 QuantAgent: Price-Driven Multi-Agent LLMs for High-Frequency Trading（QuantAgent：面向高频交易的价格驱动多智能体大语言模型框架） [07:02] 🧪 MCP-AgentBench: Evaluating Real-World Language Agent Performance with MCP-Mediated Tools（MCP-AgentBench：基于MCP中介工具的通用语言智能体真实性能评测） [07:41] 🎨 Color Me Correctly: Bridging Perceptual Color Spaces and Text Embeddings for Improved Diffusion Generation（精准上色：连接感知色彩空间与文本嵌入以提升扩散生成质量） [08:31] 🦎 LoFT: Parameter-Efficient Fine-Tuning for Long-tailed Semi-Supervised Learning in Open-World Scenarios（LoFT：面向开放世界长尾场景的参数高效半监督微调方法） [09:13] 🗞 CMHG: A Dataset and Benchmark for Headline Generation of Minority Languages in China（CMHG：中国少数民族语言新闻标题生成数据集与评测基准）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

10分钟