HuggingFace 每日AI论文速递 - 2025.12.04 | Qwen3-VL多模态超长上下文；PretrainZero强化主动预训练 - EarsOnMe

主播

节目简介

来源：小宇宙

本期的 15 篇论文如下：

[00:24] 🧠 Qwen3-VL Technical Report（Qwen3-VL 技术报告）

[00:57] 🧠 PretrainZero: Reinforcement Active Pretraining（PretrainZero：强化主动预训练）

[01:36] 🎬 ViDiC: Video Difference Captioning（ViDiC：视频差异描述）

[02:24] 🧠 OneThinker: All-in-one Reasoning Model for Image and Video（OneThinker：面向图像与视频的全能推理模型）

[03:07] 🔄 Rethinking Prompt Design for Inference-time Scaling in Text-to-Visual Generation（重新思考文本到视觉生成中推理时扩展的提示设计）

[03:59] ⚙ Steering Vision-Language-Action Models as Anti-Exploration: A Test-Time Scaling Approach（引导视觉-语言-动作模型作为反探索：一种测试时缩放方法）

[04:46] 🤖 SpaceTools: Tool-Augmented Spatial Reasoning via Double Interactive RL（SpaceTools：通过双重交互式强化学习实现工具增强的空间推理）

[05:22] 🔧 Thinking with Programming Vision: Towards a Unified View for Thinking with Images（以编程视觉思考：迈向图像思维的统一视角）

[06:01] 🔄 Flowing Backwards: Improving Normalizing Flows via Reverse Representation Alignment（逆向流动：通过反向表征对齐改进标准化流）

[06:51] 🎮 RELIC: Interactive Video World Model with Long-Horizon Memory（RELIC：具备长时记忆的交互式视频世界模型）

[07:34] 🍳 CookAnything: A Framework for Flexible and Consistent Multi-Step Recipe Image Generation（CookAnything：灵活且一致的多步骤食谱图像生成框架）

[08:26] 🧠 SR-GRPO: Stable Rank as an Intrinsic Geometric Reward for Large Language Model Alignment（SR-GRPO：将稳定秩作为大语言模型对齐的内在几何奖励）

[09:01] 📊 AlignBench: Benchmarking Fine-Grained Image-Text Alignment with Synthetic Image-Caption Pairs（AlignBench：基于合成图像-描述对评估细粒度图文对齐的基准）

[09:38] 🧠 SkillFactory: Self-Distillation For Learning Cognitive Behaviors（SkillFactory：用于学习认知行为的自蒸馏方法）

[10:20] 📱 UniQL: Unified Quantization and Low-rank Compression for Adaptive Edge LLMs（UniQL：面向自适应边缘大语言模型的统一量化与低秩压缩）

【关注我们】

您还可以在以下平台找到我们，获得播客内容以外更多信息

小红书: AI速递

2025.12.04 | Qwen3-VL多模态超长上下文；PretrainZero强化主动预训练