HuggingFace 每日AI论文速递 - 2025.10.15 | 像素级自监督ViT刷新生成基准；多智能体评测网文翻译新标尺 - EarsOnMe

主播

拨号上网 1 档播客

节目简介

来源：小宇宙

本期的 14 篇论文如下：

[00:20] 🖼 Advancing End-to-End Pixel Space Generative Modeling via Self-supervised Pre-training（通过自监督预训练推进端到端像素空间生成建模）

[00:53] 📚 DITING: A Multi-Agent Evaluation Framework for Benchmarking Web Novel Translation（DITING：面向网络小说翻译评测的多智能体基准框架）

[01:41] 🌐 Scaling Language-Centric Omnimodal Representation Learning（以语言为中心的跨模态表征扩展学习）

[02:29] 🎯 Detect Anything via Next Point Prediction（通过下一点预测检测万物）

[03:02] ⚡ FlashVSR: Towards Real-Time Diffusion-Based Streaming Video Super-Resolution（FlashVSR：迈向实时扩散式流媒体视频超分辨率）

[03:40] 🎯 Temporal Alignment Guidance: On-Manifold Sampling in Diffusion Models（时间对齐引导：扩散模型中的流形采样）

[04:16] 🧠 Dr.LLM: Dynamic Layer Routing in LLMs（Dr.LLM：大模型中的动态层级路由）

[05:03] 🎯 Spatial Forcing: Implicit Spatial Representation Alignment for Vision-language-action Model（空间强迫：面向视觉-语言-动作模型的隐式空间表征对齐）

[05:50] 🤖 ERA: Transforming VLMs into Embodied Agents via Embodied Prior Learning and Online Reinforcement Learning（ERA：借助具身先验学习与在线强化学习将视觉-语言模型转化为具身智能体）

[06:35] 🤖 Robot Learning: A Tutorial（机器人学习教程：从强化学习到多任务通用模型）

[07:27] 🔄 SRUM: Fine-Grained Self-Rewarding for Unified Multimodal Models（SRUM：面向统一多模态模型的细粒度自奖励机制）

[08:01] 🧠 Boundary-Guided Policy Optimization for Memory-efficient RL of Diffusion Large Language Models（面向扩散大语言模型的边界引导策略优化：内存高效的强化学习）

[09:06] 🖼 UniFusion: Vision-Language Model as Unified Encoder in Image Generation（UniFusion：将视觉-语言模型统一作为图像生成的编码器）

[09:43] 🧠 Memory as Action: Autonomous Context Curation for Long-Horizon Agentic Tasks（记忆即行动：面向长程智能体任务的自主上下文策展）

【关注我们】

您还可以在以下平台找到我们，获得播客内容以外更多信息

小红书: AI速递

在小宇宙查看该单集文稿

2025.10.15 | 像素级自监督ViT刷新生成基准；多智能体评测网文翻译新标尺

加入我们的 Discord

扫描微信二维码

播放列表