本期的 5 篇论文如下: [00:44] TOP1(🔥149) | 💡 SmolVLM: Redefining small and efficient multimodal models(SmolVLM:重新定义小型高效多模态模型) [03:07] TOP2(🔥125) | 🎨 OmniSVG: A Unified Scalable Vector Graphics Generation Model(OmniSVG:一个统一的可扩展矢量图形生成模型) [05:57] TOP3(🔥90) | 🎬 One-Minute Video Generation with Test-Time Training(基于测试时训练的分钟级视频生成) [08:13] TOP4(🔥85) | 🚀 Hogwild! Inference: Parallel LLM Generation via Concurrent Attention(Hogwild! 推理:通过并发注意力机制实现并行LLM生成) [10:19] TOP5(🔥76) | 🧠 Kimi-VL Technical Report(Kimi-VL技术报告) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
本期的 14 篇论文如下: [00:22] 🧠 Kimi-VL Technical Report(Kimi-VL技术报告) [01:05] 🎬 VCR-Bench: A Comprehensive Evaluation Framework for Video Chain-of-Thought Reasoning(VCR-Bench:一个用于视频链式思考推理的综合评估框架) [01:54] 🖼 MM-IFEngine: Towards Multimodal Instruction Following(MM-IFEngine: 面向多模态指令跟随) [02:35] 🖼 VisualCloze: A Universal Image Generation Framework via Visual In-Context Learning(VisualCloze:一个基于视觉情境学习的通用图像生成框架) [03:15] 🤔 DeepSeek-R1 Thoughtology: Let's <think> about LLM Reasoning(DeepSeek-R1 思维学:让我们来<思考>关于LLM的推理) [03:54] 🧩 HoloPart: Generative 3D Part Amodal Segmentation(HoloPart:生成式3D部件非模态分割) [04:36] 🤖 C3PO: Critical-Layer, Core-Expert, Collaborative Pathway Optimization for Test-Time Expert Re-Mixing(C3PO:面向测试时专家重混合的关键层、核心专家、协同路径优化) [05:11] 🤖 MOSAIC: Modeling Social AI for Content Dissemination and Regulation in Multi-Agent Simulations(MOSAIC:用于多智能体模拟中内容传播和监管的社会人工智能建模) [05:58] 🖼 Scaling Laws for Native Multimodal Models Scaling Laws for Native Multimodal Models(原生多模态模型的扩展法则) [06:30] 🧠 SoTA with Less: MCTS-Guided Sample Selection for Data-Efficient Visual Reasoning Self-Improvement(更少数据,更强性能:MCTS引导的样本选择用于数据高效的视觉推理自提升) [07:16] 🖼 Towards Visual Text Grounding of Multimodal Large Language Model(面向多模态大语言模型的视觉文本定位) [07:57] 🤖 MonoPlace3D: Learning 3D-Aware Object Placement for 3D Monocular Detection(MonoPlace3D:学习用于单目3D检测的3D感知物体放置) [08:39] 🧭 Compass Control: Multi Object Orientation Control for Text-to-Image Generation(罗盘控制:用于文本到图像生成的多对象方向控制) [09:22] 📍 TAPNext: Tracking Any Point (TAP) as Next Token Prediction(TAPNext:将追踪任意点(TAP)视为下一个令牌预测) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
本期的 15 篇论文如下: [00:25] 🎨 DDT: Decoupled Diffusion Transformer(解耦扩散Transformer) [01:05] 🎬 GenDoP: Auto-regressive Camera Trajectory Generation as a Director of Photography(GenDoP:基于自回归的相机轨迹生成,如同电影摄影师一般) [01:49] 🔍 OLMoTrace: Tracing Language Model Outputs Back to Trillions of Training Tokens(OLMoTrace:将语言模型的输出追溯到数万亿的训练文本) [02:28] 🖼 A Unified Agentic Framework for Evaluating Conditional Image Generation(用于评估条件图像生成的统一代理框架) [03:11] 🤔 Missing Premise exacerbates Overthinking: Are Reasoning Models losing Critical Thinking Skill?(缺失前提加剧过度思考:推理模型是否正在丧失批判性思维能力?) [03:57] 🗣 FantasyTalking: Realistic Talking Portrait Generation via Coherent Motion Synthesis(FantasyTalking:通过连贯运动合成生成逼真会说话的人像) [04:34] 🧐 A Sober Look at Progress in Language Model Reasoning: Pitfalls and Paths to Reproducibility(冷静看待语言模型推理的进展:陷阱与可复现性之路) [05:15] 🖼 OmniCaptioner: One Captioner to Rule Them All(万能字幕器:一统天下的字幕生成器) [05:57] 🧩 Are We Done with Object-Centric Learning?(以对象为中心的学习是否已经结束?) [06:35] 🤖 Self-Steering Language Models(自导向语言模型) [07:09] 🇷 RuOpinionNE-2024: Extraction of Opinion Tuples from Russian News Texts(RuOpinionNE-2024:从俄语新闻文本中提取观点元组) [07:51] 🤖 Masked Scene Modeling: Narrowing the Gap Between Supervised and Self-Supervised Learning in 3D Scene Understanding(掩码场景建模:缩小3D场景理解中监督学习和自监督学习之间的差距) [08:30] 👂 DiTaiListener: Controllable High Fidelity Listener Video Generation with Diffusion(DiTaiListener:基于扩散模型的可控高保真听者视频生成) [09:05] 🤖 VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement Fine-Tuning(VideoChat-R1:通过强化微调增强时空感知能力) [09:47] 🤖 WildGS-SLAM: Monocular Gaussian Splatting SLAM in Dynamic Environments(WildGS-SLAM:动态环境下的单目高斯溅射SLAM) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
本期的 13 篇论文如下: [00:22] 🎨 OmniSVG: A Unified Scalable Vector Graphics Generation Model(OmniSVG:一个统一的可扩展矢量图形生成模型) [01:02] 🧠 Skywork R1V: Pioneering Multimodal Reasoning with Chain-of-Thought(Skywork R1V:以思维链引领多模态推理) [01:42] 🖼 An Empirical Study of GPT-4o Image Generation Capabilities(GPT-4o图像生成能力实证研究) [02:22] 🚀 Hogwild! Inference: Parallel LLM Generation via Concurrent Attention(Hogwild! 推理:通过并发注意力机制实现并行LLM生成) [03:03] 🎨 Less-to-More Generalization: Unlocking More Controllability by In-Context Generation(由少及多泛化:通过上下文生成解锁更多可控性) [03:46] 🧠 COIG-P: A High-Quality and Large-Scale Chinese Preference Dataset for Alignment with Human Values(COIG-P:一个高质量、大规模的中文偏好数据集,用于与人类价值观对齐) [04:24] 🤔 Generative Evaluation of Complex Reasoning in Large Language Models(大语言模型中复杂推理的生成式评估) [05:14] 🎨 Tuning-Free Image Editing with Fidelity and Editability via Unified Latent Diffusion Model(基于统一潜在扩散模型的保真性和可编辑性免调优图像编辑) [05:53] 🎮 V-MAGE: A Game Evaluation Framework for Assessing Visual-Centric Capabilities in Multimodal Large Language Models(V-MAGE:一个用于评估多模态大语言模型中以视觉为中心的能力的游戏评估框架) [06:32] 🧩 CrossWordBench: Evaluating the Reasoning Capabilities of LLMs and LVLMs with Controllable Puzzle Generation(CrossWordBench:利用可控谜题生成评估大型语言模型和大型视觉语言模型的推理能力) [07:15] 🖼 HiFlow: Training-free High-Resolution Image Generation with Flow-Aligned Guidance(HiFlow:基于流对齐引导的免训练高分辨率图像生成) [07:57] 💡 Accelerate Parallelizable Reasoning via Parallel Decoding within One Sequence(通过单序列并行解码加速并行化推理) [08:41] 🤖 Leanabell-Prover: Posttraining Scaling in Formal Reasoning(Leanabell-Prover:形式推理中的后训练缩放) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
本期的 15 篇论文如下: [00:21] 🎬 One-Minute Video Generation with Test-Time Training(基于测试时训练的分钟级视频生成) [01:03] 💡 SmolVLM: Redefining small and efficient multimodal models(SmolVLM:重新定义小型高效多模态模型) [01:39] 🖼 URECA: Unique Region Caption Anything(URECA:独特区域描述一切) [02:17] 🧰 T1: Tool-integrated Self-verification for Test-time Compute Scaling in Small Language Models(工具集成自验证:用于小语言模型中测试时计算扩展) [03:02] 🖼 Concept Lancet: Image Editing with Compositional Representation Transplant(概念柳叶刀:基于成分表示移植的图像编辑) [03:41] 🤔 Quantization Hurts Reasoning? An Empirical Study on Quantized Reasoning Models(量化会损害推理能力吗?量化推理模型的实证研究) [04:26] 📰 LiveVQA: Live Visual Knowledge Seeking(LiveVQA:实时视觉知识检索) [05:08] 🎨 Gaussian Mixture Flow Matching Models(高斯混合流动匹配模型) [05:47] 💡 VAPO: Efficient and Reliable Reinforcement Learning for Advanced Reasoning Tasks(VAPO:用于高级推理任务的高效可靠的强化学习) [06:26] 🕵 Are You Getting What You Pay For? Auditing Model Substitution in LLM APIs(你得到的是你所支付的吗?大型语言模型API中的模型替换审计) [07:17] 🧰 DiaTool-DPO: Multi-Turn Direct Preference Optimization for Tool-Augmented Large Language Models(DiaTool-DPO:用于工具增强的大型语言模型的多轮直接偏好优化) [07:54] ⚕ Clinical ModernBERT: An efficient and long context encoder for biomedical text(临床ModernBERT:一种用于生物医学文本的高效长上下文编码器) [08:28] 🐍 Mamba as a Bridge: Where Vision Foundation Models Meet Vision Language Models for Domain-Generalized Semantic Segmentation(Mamba:连接视觉基础模型与视觉语言模型,实现领域泛化语义分割) [09:12] 🤖 BOP Challenge 2024 on Model-Based and Model-Free 6D Object Pose Estimation(基于模型和无模型的6D物体姿态估计BOP挑战赛2024) [09:48] 🛡 JailDAM: Jailbreak Detection with Adaptive Memory for Vision-Language Model(JailDAM:基于自适应记忆的视觉-语言模型越狱检测) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
本期的 15 篇论文如下: [00:23] 🛠 Multi-SWE-bench: A Multilingual Benchmark for Issue Resolving(Multi-SWE-bench:一个用于问题解决的多语言基准测试) [01:07] 🧠 Agentic Knowledgeable Self-awareness(具身智能的知识型自我感知) [01:49] 🧮 MegaMath: Pushing the Limits of Open Math Corpora(MegaMath:推动开放数学语料库的极限) [02:32] 🤖 SynWorld: Virtual Scenario Synthesis for Agentic Action Knowledge Refinement(SynWorld:用于智能体行为知识精炼的虚拟场景合成) [03:20] 🖼 MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models(MME-Unify:统一多模态理解与生成模型的综合基准) [04:03] 🖼 VARGPT-v1.1: Improve Visual Autoregressive Large Unified Model via Iterative Instruction Tuning and Reinforcement Learning(VARGPT-v1.1:通过迭代指令调优和强化学习改进视觉自回归大型统一模型) [04:42] 🔄 TransMamba: Flexibly Switching between Transformer and Mamba(TransMamba:在Transformer和Mamba之间灵活切换) [05:21] 🤖 APIGen-MT: Agentic Pipeline for Multi-Turn Data Generation via Simulated Agent-Human Interplay(APIGen-MT:基于模拟智能体-人类交互的多轮数据生成的主动式流程) [05:59] 🧑 HumanDreamer-X: Photorealistic Single-image Human Avatars Reconstruction via Gaussian Restoration(HumanDreamer-X:基于高斯恢复的逼真单图像人体化身重建) [06:39] 💡 Comprehensive Relighting: Generalizable and Consistent Monocular Human Relighting and Harmonization(全面重打光:通用且一致的单目人体重打光与和谐化) [07:20] 👂 EvMic: Event-based Non-contact sound recovery from effective spatial-temporal modeling(EvMic:基于有效时空建模的事件相机非接触式声音恢复) [08:02] 🫁 MedSAM2: Segment Anything in 3D Medical Images and Videos(MedSAM2:三维医学图像与视频中的通用分割模型) [08:47] ⚖ BEATS: Bias Evaluation and Assessment Test Suite for Large Language Models(BEATS:大型语言模型偏见评估与评测测试套件) [09:35] 🚄 Slow-Fast Architecture for Video Multi-Modal Large Language Models(面向视频多模态大语言模型的慢-快架构) [10:14] 🎨 SPF-Portrait: Towards Pure Portrait Customization with Semantic Pollution-Free Fine-tuning(SPF-Portrait:面向纯粹人像定制的无语义污染微调) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
本期的 10 篇论文如下: [00:42] TOP1(🔥226) | 🤖 Feature-Level Insights into Artificial Text Detection with Sparse Autoencoders(基于稀疏自编码器的人工文本检测特征分析) [03:07] TOP2(🔥153) | 🧠 Transformers without Normalization(无需归一化的Transformer) [04:59] TOP3(🔥136) | 🎥 DropletVideo: A Dataset and Approach to Explore Integral Spatio-Temporal Consistent Video Generation(DropletVideo:探索整体时空一致性视频生成的数据集与方法) [07:51] TOP4(🔥135) | 🦢 RWKV-7 "Goose" with Expressive Dynamic State Evolution(RWKV-7 "Goose":具有表达性动态状态演化的序列建模架构) [11:11] TOP5(🔥130) | 🎥 ReCamMaster: Camera-Controlled Generative Rendering from A Single Video(ReCamMaster:基于单视频的相机控制生成式渲染) [13:27] TOP6(🔥129) | 🇷 RuCCoD: Towards Automated ICD Coding in Russian(RuCCoD:面向俄语自动化的ICD编码研究) [15:41] TOP7(🔥120) | 🤖 Qwen2.5-Omni Technical Report(Qwen2.5-Omni技术报告) [18:17] TOP8(🔥114) | 🌐 Unified Reward Model for Multimodal Understanding and Generation(多模态理解和生成的统一奖励模型) [20:30] TOP9(🔥113) | 🤖 DAPO: An Open-Source LLM Reinforcement Learning System at Scale(DAPO:一个大规模的开源LLM强化学习系统) [22:29] TOP10(🔥112) | 🧠 I Have Covered All the Bases Here: Interpreting Reasoning Features in Large Language Models via Sparse Autoencoders(我已经覆盖了所有基础:通过稀疏自编码器解读大型语言模型中的推理特征) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
本期的 5 篇论文如下: [00:40] TOP1(🔥101) | 🧠 Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems(具身智能体的进展与挑战:从脑启发智能到进化、协作与安全系统) [03:17] TOP2(🔥83) | 🖼 TextCrafter: Accurately Rendering Multiple Texts in Complex Visual Scenes(TextCrafter:复杂视觉场景中准确渲染多重文本) [05:27] TOP3(🔥80) | 🎬 MoCha: Towards Movie-Grade Talking Character Synthesis(MoCha:面向电影级对话角色合成) [07:49] TOP4(🔥67) | 💡 AdaptiVocab: Enhancing LLM Efficiency in Focused Domains through Lightweight Vocabulary Adaptation(AdaptiVocab:通过轻量级词汇自适应增强LLM在特定领域的效率) [09:57] TOP5(🔥65) | 🎬 Any2Caption:Interpreting Any Condition to Caption for Controllable Video Generation(Any2Caption:将任意条件解析为描述以实现可控视频生成) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
本期的 15 篇论文如下: [00:19] 🧠 Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems(具身智能体的进展与挑战:从脑启发智能到进化、协作与安全系统) [01:01] 🖼 Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing(超越像素的展望:推理驱动的视觉编辑基准测试) [01:41] 🖼 GPT-ImgEval: A Comprehensive Benchmark for Diagnosing GPT4o in Image Generation(GPT-ImgEval:一个用于诊断 GPT4o 在图像生成中表现的综合性基准) [02:25] 🤖 Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme(重新思考视觉语言模型的强化学习扩展:一个透明的、从零开始的框架和综合评估方案) [03:08] 🗣 Scaling Analysis of Interleaved Speech-Text Language Models(交错语音-文本语言模型的规模化分析) [03:52] 🎬 SkyReels-A2: Compose Anything in Video Diffusion Transformers(SkyReels-A2:视频扩散Transformer中的任意元素组合) [04:36] 🧊 ShortV: Efficient Multimodal Large Language Models by Freezing Visual Tokens in Ineffective Layers(ShortV:通过冻结无效层中的视觉 tokens 实现高效多模态大型语言模型) [05:13] 📉 ZClip: Adaptive Spike Mitigation for LLM Pre-Training(ZClip:用于LLM预训练的自适应尖峰缓解) [05:50] 🧠 Inference-Time Scaling for Generalist Reward Modeling(通用奖励建模的推理时扩展) [06:32] 🗣 Audio-visual Controlled Video Diffusion with Masked Selective State Spaces Modeling for Natural Talking Head Generation(基于掩码选择性状态空间建模的音视频控制视频扩散,用于自然对话头部的生成) [07:12] ⏱ Efficient Model Selection for Time Series Forecasting via LLMs(基于大型语言模型的时间序列预测高效模型选择) [07:55] 🤖 Scaling Laws in Scientific Discovery with AI and Robot Scientists(人工智能与机器人科学家在科学发现中的规模法则) [08:35] 🧠 Instruction-Guided Autoregressive Neural Network Parameter Generation(指令引导的自回归神经网络参数生成) [09:18] 🤖 GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning(GenPRM:通过生成式推理扩展过程奖励模型的测试时计算) [10:01] 🧠 Interpreting Emergent Planning in Model-Free Reinforcement Learning(解读免模型强化学习中涌现的规划能力) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
本期的 15 篇论文如下: [00:23] 🎨 MergeVQ: A Unified Framework for Visual Generation and Representation with Disentangled Token Merging and Quantization(MergeVQ:一种用于视觉生成和表示的统一框架,具有解耦的Token合并和量化) [01:00] 🧠 Improved Visual-Spatial Reasoning via R1-Zero-Like Training(通过类R1-Zero训练改进视觉空间推理) [01:45] 🎮 AnimeGamer: Infinite Anime Life Simulation with Next Game State Prediction(动漫玩家:基于下一代游戏状态预测的无限动漫人生模拟) [02:25] 🎬 VideoScene: Distilling Video Diffusion Model to Generate 3D Scenes in One Step(VideoScene:提炼视频扩散模型以一步生成3D场景) [03:03] 🎭 DreamActor-M1: Holistic, Expressive and Robust Human Image Animation with Hybrid Guidance(DreamActor-M1:基于混合引导的整体、富有表现力且鲁棒的人体图像动画) [03:42] 🧐 Understanding R1-Zero-Like Training: A Critical Perspective(理解类R1-Zero训练:一个批判性的视角) [04:28] 🎬 Towards Physically Plausible Video Generation via VLM Planning(基于视觉语言模型规划的物理合理视频生成) [05:09] 🤖 PaperBench: Evaluating AI's Ability to Replicate AI Research(PaperBench:评估人工智能复现人工智能研究的能力) [05:49] 🤖 ScholarCopilot: Training Large Language Models for Academic Writing with Accurate Citations(ScholarCopilot:训练用于学术写作并提供精确引用的**大型语言模型**) [06:31] 💡 ILLUME+: Illuminating Unified MLLM with Dual Visual Tokenization and Diffusion Refinement(ILLUME+:通过双重视觉Token化和扩散细化照亮统一的多模态大语言模型) [07:11] 💃 Articulated Kinematics Distillation from Video Diffusion Models(基于视频扩散模型的铰接运动学提炼) [07:51] 🛡 Safeguarding Vision-Language Models: Mitigating Vulnerabilities to Gaussian Noise in Perturbation-based Attacks(保障视觉-语言模型安全:缓解基于扰动攻击中高斯噪声的脆弱性) [08:32] 👁 DASH: Detection and Assessment of Systematic Hallucinations of VLMs(DASH:视觉语言模型系统性幻觉的检测与评估) [09:11] 🖼 Boost Your Human Image Generation Model via Direct Preference Optimization(通过直接偏好优化提升人体图像生成模型) [09:47] 👁 LSNet: See Large, Focus Small(LSNet:观其大,聚焦小) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
本期的 15 篇论文如下: [00:21] 🎬 Any2Caption:Interpreting Any Condition to Caption for Controllable Video Generation(Any2Caption:将任意条件解析为描述以实现可控视频生成) [01:01] 🎬 Exploring the Effect of Reinforcement Learning on Video Understanding: Insights from SEED-Bench-R1(探索强化学习对视频理解的影响:来自SEED-Bench-R1的见解) [01:48] ⚖ JudgeLRM: Large Reasoning Models as a Judge(JudgeLRM:将大型推理模型作为评判者) [02:30] 🤖 CodeARC: Benchmarking Reasoning Capabilities of LLM Agents for Inductive Program Synthesis(CodeARC:用于归纳程序合成的LLM智能体推理能力基准测试) [03:13] 💡 Open-Qwen2VL: Compute-Efficient Pre-Training of Fully-Open Multimodal LLMs on Academic Resources(Open-Qwen2VL:在学术资源上进行计算高效的完全开源多模态LLM预训练) [04:02] 🎥 GeometryCrafter: Consistent Geometry Estimation for Open-world Videos with Diffusion Priors(GeometryCrafter:基于扩散先验的开放世界视频一致几何体估计) [04:48] 💻 Z1: Efficient Test-time Scaling with Code(Z1:基于代码的高效测试时扩展) [05:26] 🤖 Agent S2: A Compositional Generalist-Specialist Framework for Computer Use Agents(Agent S2:计算机使用代理的组合式通用-专家框架) [06:08] 💃 MixerMDM: Learnable Composition of Human Motion Diffusion Models(MixerMDM:人类运动扩散模型的可学习组合) [06:46] 🏢 Command A: An Enterprise-Ready Large Language Model(Command A:一款面向企业就绪的大型语言模型) [07:31] 💡 Harnessing the Reasoning Economy: A Survey of Efficient Reasoning for Large Language Models(驾驭推理经济:大型语言模型高效推理的综述) [08:09] 🎬 OmniMMI: A Comprehensive Multi-modal Interaction Benchmark in Streaming Video Contexts(OmniMMI:流视频场景下综合性多模态交互基准) [08:53] 🤯 Recitation over Reasoning: How Cutting-Edge Language Models Can Fail on Elementary School-Level Reasoning Problems?(背诵胜于推理:顶尖语言模型如何在小学水平的推理问题上失败?) [09:40] 🖼 Scaling Language-Free Visual Representation Learning(扩展无语言视觉表征学习) [10:23] 🤔 When To Solve, When To Verify: Compute-Optimal Problem Solving and Generative Verification for LLM Reasoning(何时求解,何时验证:LLM推理的计算最优问题求解与生成式验证) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
本期的 15 篇论文如下: [00:22] 🖼 TextCrafter: Accurately Rendering Multiple Texts in Complex Visual Scenes(TextCrafter:复杂视觉场景中准确渲染多个文本) [00:59] 🎬 MoCha: Towards Movie-Grade Talking Character Synthesis(MoCha:面向电影级对话角色合成) [01:39] 🔍 What, How, Where, and How Well? A Survey on Test-Time Scaling in Large Language Models(什么、如何、何地以及如何有效?大型语言模型中测试时扩展的调查) [02:16] 🤖 Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model(Open-Reasoner-Zero:一种基于基础模型扩展强化学习的开源方法) [03:05] 🧠 RIG: Synergizing Reasoning and Imagination in End-to-End Generalist Policy(RIG:端到端通用策略中推理与想象的协同) [03:48] 🧠 Effectively Controlling Reasoning Models through Thinking Intervention(通过思维干预有效控制推理模型) [04:32] 💡 Query and Conquer: Execution-Guided SQL Generation(查询与征服:执行引导的SQL生成) [05:15] ✍ SketchVideo: Sketch-based Video Generation and Editing(SketchVideo:基于草图的视频生成与编辑) [06:04] 🚨 TeleAntiFraud-28k: A Audio-Text Slow-Thinking Dataset for Telecom Fraud Detection(TeleAntiFraud-28k:用于电信诈骗检测的音频-文本慢思考数据集) [06:57] 💡 Efficient Inference for Large Reasoning Models: A Survey(大型推理模型高效推理综述) [07:40] 🤖 Classical Planning with LLM-Generated Heuristics: Challenging the State of the Art with Python Code(基于LLM生成启发式的经典规划:用Python代码挑战最先进水平) [08:29] 🧪 Expanding RL with Verifiable Rewards Across Diverse Domains(利用可验证奖励扩展强化学习至多样化领域) [09:11] ✨ Progressive Rendering Distillation: Adapting Stable Diffusion for Instant Text-to-Mesh Generation without 3D Data(渐进式渲染蒸馏:无需3D数据即可调整Stable Diffusion用于即时文本到网格生成) [09:50] 🤖 TokenHSI: Unified Synthesis of Physical Human-Scene Interactions through Task Tokenization(TokenHSI:通过任务Token化统一合成物理人-场景交互) [10:30] 🇰 KOFFVQA: An Objectively Evaluated Free-form VQA Benchmark for Large Vision-Language Models in the Korean Language(KOFFVQA:一个针对大型视觉-语言模型在韩语中进行客观评估的自由形式VQA基准) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
与播客爱好者一起交流
播放列表还是空的
去找些喜欢的节目添加进来吧