HuggingFace 每日AI论文速递
10分钟速读热门AI论文

Album
主播:
拨号上网
出版方:
佚名
订阅数:
1.12万
集数:
463
最近更新:
1周前
播客简介...
每天10分钟,带您快速了解当日HuggingFace热门AI论文内容。每个工作日更新,欢迎订阅。 📢播客节目在小宇宙、Apple Podcast平台搜索【HuggingFace 每日AI论文速递】 🖼另外还有图文版,可在小红书搜索并关注【AI速递】
HuggingFace 每日AI论文速递的创作者...
HuggingFace 每日AI论文速递的节目...

2025.12.12 | RL捏3D新纪录;AI奥赛摘银牌

HuggingFace 每日AI论文速递

本期的 15 篇论文如下: [00:25] 🤖 Are We Ready for RL in Text-to-3D Generation? A Progressive Investigation(我们准备好将强化学习应用于文本到3D生成领域了吗?一项渐进式研究) [01:01] 🧠 Long-horizon Reasoning Agent for Olympiad-Level Mathematical Problem Solving(用于奥赛级数学问题求解的长程推理智能体) [01:36] 🚀 T-pro 2.0: An Efficient Russian Hybrid-Reasoning Model and Playground(T-pro 2.0:一个高效的俄语混合推理模型与实验平台) [02:18] 🔍 OPV: Outcome-based Process Verifier for Efficient Long Chain-of-Thought Verification(OPV:基于结果的流程验证器,用于高效的长链思维验证) [03:04] 🏆 Achieving Olympia-Level Geometry Large Language Model Agent via Complexity Boosting Reinforcement Learning(通过复杂度提升强化学习实现奥林匹克级别的几何大语言模型智能体) [04:06] 🎬 MoCapAnything: Unified 3D Motion Capture for Arbitrary Skeletons from Monocular Videos(MoCapAnything:基于单目视频的任意骨架统一三维运动捕捉) [04:46] 🔬 From Macro to Micro: Benchmarking Microscopic Spatial Intelligence on Molecules via Vision-Language Models(从宏观到微观:基于视觉语言模型的分子微观空间智能基准测试) [05:22] 🧠 Thinking with Images via Self-Calling Agent(通过自调用智能体进行图像思维推理) [06:08] 🧩 VQRAE: Representation Quantization Autoencoders for Multimodal Understanding, Generation and Reconstruction(VQRAE:用于多模态理解、生成与重建的表征量化自编码器) [06:48] 🤖 Evaluating Gemini Robotics Policies in a Veo World Simulator(在Veo世界模拟器中评估Gemini机器人策略) [07:30] 🚀 Stronger Normalization-Free Transformers(更强大的无归一化Transformer) [08:05] 📊 The FACTS Leaderboard: A Comprehensive Benchmark for Large Language Model Factuality(FACTS 排行榜:大型语言模型事实准确性综合基准) [08:36] 🎬 Tool-Augmented Spatiotemporal Reasoning for Streamlining Video Question Answering Task(工具增强的时空推理:简化视频问答任务) [09:14] 🌀 MoRel: Long-Range Flicker-Free 4D Motion Modeling via Anchor Relay-based Bidirectional Blending with Hierarchical Densification(MoRel:基于锚点中继双向混合与分层致密化的长程无闪烁4D运动建模) [09:50] 🤖 Confucius Code Agent: An Open-sourced AI Software Engineer at Industrial Scale(孔子代码智能体:工业级开源AI软件工程师) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

11分钟
51
1周前

2025.12.11 | StereoWorld单目秒变立体大片;BiCo跨域拼贴新概念

HuggingFace 每日AI论文速递

本期的 15 篇论文如下: [00:22] 🎥 StereoWorld: Geometry-Aware Monocular-to-Stereo Video Generation(StereoWorld:几何感知的单目到立体视频生成) [00:59] 🎨 Composing Concepts from Images and Videos via Concept-prompt Binding(通过概念-提示绑定从图像和视频中组合概念) [01:43] 🧠 BrainExplore: Large-Scale Discovery of Interpretable Visual Representations in the Human Brain(BrainExplore:人脑中可解释视觉表征的大规模发现) [02:20] 🎨 OmniPSD: Layered PSD Generation with Diffusion Transformer(OmniPSD:基于扩散Transformer的分层PSD生成) [03:05] 🚀 InfiniteVL: Synergizing Linear and Sparse Attention for Highly-Efficient, Unlimited-Input Vision-Language Models(InfiniteVL:融合线性与稀疏注意力以实现高效、无限输入的视觉语言模型) [03:47] ⚡ Fast-Decoding Diffusion Language Models via Progress-Aware Confidence Schedules(通过进度感知置信度调度实现扩散语言模型的快速解码) [04:31] 🚗 UniUGP: Unifying Understanding, Generation, and Planing For End-to-end Autonomous Driving(UniUGP:面向端到端自动驾驶的理解、生成与规划统一框架) [05:06] 🧠 EtCon: Edit-then-Consolidate for Reliable Knowledge Editing(EtCon:面向可靠知识编辑的先编辑后巩固范式) [05:56] 🤖 HiF-VLA: Hindsight, Insight and Foresight through Motion Representation for Vision-Language-Action Models(HiF-VLA:通过运动表征实现视觉-语言-动作模型的后见、洞见与先见) [06:46] 🔍 WonderZoom: Multi-Scale 3D World Generation(WonderZoom:多尺度三维世界生成) [07:23] 🤖 Learning Unmasking Policies for Diffusion Language Models(扩散语言模型的解掩码策略学习) [07:53] 🔭 IF-Bench: Benchmarking and Enhancing MLLMs for Infrared Images with Generative Visual Prompting(IF-Bench:基于生成式视觉提示的红外图像多模态大语言模型基准测试与增强) [08:51] ⚡ Beyond Unified Models: A Service-Oriented Approach to Low Latency, Context Aware Phonemization for Real Time TTS(超越统一模型:面向服务的低延迟、上下文感知实时TTS音素化方法) [09:31] 🎬 VideoSSM: Autoregressive Long Video Generation with Hybrid State-Space Memory(VideoSSM:基于混合状态空间记忆的自回归长视频生成) [10:16] 🛡 Pay Less Attention to Function Words for Free Robustness of Vision-Language Models(减少对功能词的关注以免费提升视觉语言模型的鲁棒性) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

11分钟
47
1周前

2025.12.10 | 潜在轨迹控运动;WebGPU实时溅射

HuggingFace 每日AI论文速递

本期的 15 篇论文如下: [00:24] 🎬 Wan-Move: Motion-controllable Video Generation via Latent Trajectory Guidance(Wan-Move:通过潜在轨迹引导实现运动可控的视频生成) [00:55] 🚀 Visionary: The World Model Carrier Built on WebGPU-Powered Gaussian Splatting Platform(Visionary:基于WebGPU驱动的高斯溅射平台的世界模型载体) [01:32] 🎬 Preserving Source Video Realism: High-Fidelity Face Swapping for Cinematic Quality(保持源视频真实感:面向电影级质量的高保真人脸交换) [02:13] 🎬 OneStory: Coherent Multi-Shot Video Generation with Adaptive Memory(OneStory:基于自适应记忆的连贯多镜头视频生成) [02:49] ⚡ ThreadWeaver: Adaptive Threading for Efficient Parallel Reasoning in Language Models(ThreadWeaver:面向语言模型高效并行推理的自适应线程技术) [03:45] 🤖 MIND-V: Hierarchical Video Generation for Long-Horizon Robotic Manipulation with RL-based Physical Alignment(MIND-V:基于强化学习物理对齐的长时程机器人操作分层视频生成) [04:47] 🚀 Boosting Unsupervised Video Instance Segmentation with Automatic Quality-Guided Self-Training(通过自动质量引导自训练提升无监督视频实例分割) [05:18] 🌲 TreeGRPO: Tree-Advantage GRPO for Online RL Post-Training of Diffusion Models(TreeGRPO:基于树优势的GRPO用于扩散模型的在线强化学习后训练) [05:55] 🚀 From Next-Token to Next-Block: A Principled Adaptation Path for Diffusion LLMs(从下一个词到下一个块:扩散语言模型的原则性适应路径) [06:30] 📊 EcomBench: Towards Holistic Evaluation of Foundation Agents in E-commerce(EcomBench:面向电子商务领域基础智能体的全面评估) [07:02] 🧩 Modular Neural Image Signal Processing(模块化神经图像信号处理) [07:33] 🧭 Ground Slow, Move Fast: A Dual-System Foundation Model for Generalizable Vision-and-Language Navigation(慢思考,快行动:用于通用视觉语言导航的双系统基础模型) [08:16] 🤖 DeepCode: Open Agentic Coding(DeepCode:开放式智能体编码) [08:48] 🎯 TrackingWorld: World-centric Monocular 3D Tracking of Almost All Pixels(TrackingWorld:以世界为中心的几乎所有像素单目三维跟踪) [09:30] 🎬 Efficiently Reconstructing Dynamic Scenes One D4RT at a Time(高效动态场景重建:一次一个D4RT) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

10分钟
86
1周前
HuggingFace 每日AI论文速递的评价...

空空如也

EarsOnMe

加入我们的 Discord

与播客爱好者一起交流

立即加入

扫描微信二维码

添加微信好友,获取更多播客资讯

微信二维码

播放列表

自动播放下一个

播放列表还是空的

去找些喜欢的节目添加进来吧