本期的 17 篇论文如下: [00:26] 🤖 AndroidLab: Training and Systematic Benchmarking of Android Autonomous Agents(AndroidLab:Android自主代理的训练与系统基准测试) [01:15] 🌐 WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning(WebRL:通过自进化在线课程强化学习训练LLM网络代理) [01:55] 🌐 Training-free Regional Prompting for Diffusion Transformers(无需训练的扩散变换器区域提示) [02:36] 🌍 Survey of Cultural Awareness in Language Models: Text and Beyond(语言模型中的文化意识调查:文本与超越) [03:15] 🤖 Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent(混元-大:腾讯开源的520亿激活参数模型) [03:52] 📊 DynaMath: A Dynamic Visual Benchmark for Evaluating Mathematical Reasoning Robustness of Vision Language Models(DynaMath:评估视觉语言模型数学推理鲁棒性的动态视觉基准) [04:29] 🎥 How Far is Video Generation from World Model: A Physical Law Perspective(视频生成与世界模型有多远:物理定律视角) [05:08] ⚡ Adaptive Caching for Faster Video Generation with Diffusion Transformers(基于扩散变换器的自适应缓存加速视频生成) [05:48] 🦖 DynaSaur: Large Language Agents Beyond Predefined Actions(DynaSaur:超越预定义动作的大型语言模型代理) [06:26] 🎥 GenXD: Generating Any 3D and 4D Scenes(GenXD:生成任意3D和4D场景) [07:01] 📊 Sparsing Law: Towards Large Language Models with Greater Activation Sparsity(稀疏化定律:迈向更大激活稀疏性的大语言模型) [07:45] 📚 LIBMoE: A Library for comprehensive benchmarking Mixture of Experts in Large Language Models(LIBMoE:大型语言模型中混合专家的综合基准库) [08:26] 🎥 PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance(提示引导下的多样化视频序列理解) [09:08] ⚖ "Give Me BF16 or Give Me Death"? Accuracy-Performance Trade-Offs in LLM Quantization(给我BF16还是给我死亡?LLM量化中的精度-性能权衡) [09:48] 🌌 Decoding Dark Matter: Specialized Sparse Autoencoders for Interpreting Rare Concepts in Foundation Models(解码暗物质:用于解释基础模型中罕见概念的专用稀疏自编码器) [10:36] 🎨 MVPaint: Synchronized Multi-View Diffusion for Painting Anything 3D(MVPaint:同步多视角扩散用于3D绘画) [11:14] 🌍 Swan and ArabicMTEB: Dialect-Aware, Arabic-Centric, Cross-Lingual, and Cross-Cultural Embedding Models and Benchmarks(天鹅与阿拉伯MTEB:方言感知、以阿拉伯语为中心、跨语言和跨文化的嵌入模型与基准) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
本期的 17 篇论文如下: [00:25] 🤖 OS-ATLAS: A Foundation Action Model for Generalist GUI Agents(OS-ATLAS:通用GUI代理的基础动作模型) [01:07] ⚙ Constant Acceleration Flow(恒定加速度流) [01:53] 🍅 TOMATO: Assessing Visual Temporal Reasoning Capabilities in Multimodal Foundation Models(番茄:评估多模态基础模型在视觉时间推理能力) [02:33] 🎨 Randomized Autoregressive Visual Generation(随机自回归视觉生成) [03:10] 🧠 Adapting While Learning: Grounding LLMs for Scientific Problems with Intelligent Tool Usage Adaptation(边学习边适应:通过智能工具使用适应性将LLMs应用于科学问题) [03:50] 📚 Personalization of Large Language Models: A Survey(大型语言模型的个性化:综述) [04:29] 🖼 In-Context LoRA for Diffusion Transformers(上下文LoRA用于扩散变换器) [05:09] ⚡ SambaMixer: State of Health Prediction of Li-ion Batteries using Mamba State Space Models(SambaMixer:使用Mamba状态空间模型预测锂离子电池健康状态) [05:54] 🤖 Survey of User Interface Design and Interaction Techniques in Generative AI Applications(生成式AI应用中的用户界面设计与交互技术综述) [06:32] 🧶 HelloMeme: Integrating Spatial Knitting Attentions to Embed High-Level and Fidelity-Rich Conditions in Diffusion Models(HelloMeme:将空间编织注意力整合到扩散模型中以嵌入高层次和丰富保真度的条件) [07:07] 🌐 M2rc-Eval: Massively Multilingual Repository-level Code Completion Evaluation(M2rc-Eval:大规模多语言仓库级代码补全评估) [07:44] 🌆 CityGaussianV2: Efficient and Geometrically Accurate Reconstruction for Large-Scale Scenes(城市高斯V2:大规模场景的高效几何精确重建) [08:22] 🔄 GPT or BERT: why not both?(GPT还是BERT:为何不两者兼得?) [09:02] 🎭 Face Anonymization Made Simple(面部匿名化变得简单) [09:40] 📊 Zipfian Whitening(齐夫白化) [10:19] 📚 WikiNER-fr-gold: A Gold-Standard NER Corpus(WikiNER-fr-gold:一个金标准命名实体识别语料库) [10:53] 🧠 GRS-QA -- Graph Reasoning-Structured Question Answering Dataset(图推理结构化问答数据集) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
本期的 5 篇论文如下: [00:41] TOP1(🔥191) | 🧠 CLEAR: Character Unlearning in Textual and Visual Modalities(CLEAR:文本与视觉模态中的字符遗忘) [02:58] TOP2(🔥70) | 🤖 GPT-4o System Card(GPT-4o系统卡片) [04:50] TOP3(🔥50) | 🔍 Unpacking SDXL Turbo: Interpreting Text-to-Image Models with Sparse Autoencoders(解构SDXL Turbo:使用稀疏自编码器解释文本到图像模型) [06:53] TOP4(🔥49) | 🗣 CORAL: Benchmarking Multi-turn Conversational Retrieval-Augmentation Generation(CORAL:多轮对话增强生成基准测试) [08:44] TOP5(🔥48) | 🚀 ROCKET-1: Master Open-World Interaction with Visual-Temporal Context Prompting(ROCKET-1:利用视觉-时间上下文提示掌握开放世界交互) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
本期的 11 篇论文如下: [00:27] 🔍 Unpacking SDXL Turbo: Interpreting Text-to-Image Models with Sparse Autoencoders(解构SDXL Turbo:使用稀疏自编码器解释文本到图像模型) [01:05] 🧠 What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective(LLMs训练中快速与慢速思考的层级差异:梯度视角) [01:43] 🔍 A Pointer Network-based Approach for Joint Extraction and Detection of Multi-Label Multi-Class Intents(基于指针网络的多标签多类别意图联合提取与检测方法) [02:23] 🔄 Constraint Back-translation Improves Complex Instruction Following of Large Language Models(约束反向翻译提升大型语言模型复杂指令遵循能力) [02:59] 📄 Language Models can Self-Lengthen to Generate Long Texts(语言模型能够自我延长以生成长文本) [03:35] 📊 BenchX: A Unified Benchmark Framework for Medical Vision-Language Pretraining on Chest X-Rays(BenchX:胸部X光片医学视觉-语言预训练统一基准框架) [04:17] 💾 BitStack: Fine-Grained Size Control for Compressed Large Language Models in Variable Memory Environments(BitStack:在可变内存环境中压缩大型语言模型的细粒度大小控制) [05:04] 🤖 Navigating the Unknown: A Chat-Based Collaborative Interface for Personalized Exploratory Tasks(探索未知:基于聊天的个性化探索任务协作界面) [05:40] 🤖 SelfCodeAlign: Self-Alignment for Code Generation(自代码对齐:代码生成中的自对齐方法) [06:18] 🎥 DELTA: Dense Efficient Long-range 3D Tracking for any video(DELTA:高效密集长程3D视频追踪) [06:57] 🎥 Learning Video Representations without Natural Videos(无需自然视频即可学习视频表示) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
本期的 5 篇论文如下: [00:29] 🗣 CORAL: Benchmarking Multi-turn Conversational Retrieval-Augmentation Generation(CORAL:多轮对话增强生成基准测试) [01:09] 🤖 A Large Recurrent Action Model: xLSTM enables Fast Inference for Robotics Tasks(大型递归动作模型:xLSTM为机器人任务实现快速推理) [01:50] 🔍 Stealing User Prompts from Mixture of Experts(从混合专家模型中窃取用户提示) [02:26] 🩺 AutoMIR: Effective Zero-Shot Medical Information Retrieval without Relevance Labels(自动医疗信息检索:无需相关标签的有效零样本检索) [02:58] 🔄 TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters(TokenFormer:重新思考Transformer的扩展与模型参数的标记化) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
本期的 8 篇论文如下: [00:33] 🧠 CLEAR: Character Unlearning in Textual and Visual Modalities(CLEAR:文本与视觉模态中的字符遗忘) [01:10] 🤖 AutoKaggle: A Multi-Agent Framework for Autonomous Data Science Competitions(AutoKaggle:一种用于自主数据科学竞赛的多智能体框架) [01:46] 🤖 SocialGPT: Prompting LLMs for Social Relation Reasoning via Greedy Segment Optimization(社交GPT:通过贪婪段优化提示LLMs进行社交关系推理) [02:26] 🌐 OpenWebVoyager: Building Multimodal Web Agents via Iterative Real-World Exploration, Feedback and Optimization(开放式网络航海者:通过迭代现实世界探索、反馈和优化构建多模态网络代理) [03:13] 🧠 Flow-DPO: Improving LLM Mathematical Reasoning through Online Multi-Agent Learning(Flow-DPO:通过在线多智能体学习提升LLM数学推理能力) [03:52] 🚀 ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference(ShadowKV:高吞吐量长上下文LLM推理的KV缓存优化) [04:31] 🤖 Robots Pre-train Robots: Manipulation-Centric Robotic Representation from Large-Scale Robot Dataset(机器人预训练机器人:基于大规模机器人数据集的以操作为中心的机器人表示) [05:17] 🤖 Precise and Dexterous Robotic Manipulation via Human-in-the-Loop Reinforcement Learning(基于人机协作强化学习的精确灵巧机器人操作) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
本期的 17 篇论文如下: [00:24] 🇵 Bielik 7B v0.1: A Polish Language Model -- Development, Insights, and Evaluation(Bielik 7B v0.1:波兰语言模型——开发、洞察与评估) [01:00] 🤖 AgentStore: Scalable Integration of Heterogeneous Agents As Specialized Generalist Computer Assistant(AgentStore:可扩展的异构代理作为专业化通才计算机助手集成) [01:39] 🤖 GPT-4o System Card(GPT-4o系统卡片) [02:21] 📄 Document Parsing Unveiled: Techniques, Challenges, and Prospects for Structured Information Extraction(文档解析揭秘:结构化信息提取的技术、挑战与前景) [03:08] 🤖 LongReward: Improving Long-context Large Language Models with AI Feedback(长奖励:通过AI反馈提升长上下文大语言模型) [03:43] 🎥 MarDini: Masked Autoregressive Diffusion for Video Generation at Scale(MarDini:大规模视频生成的掩码自回归扩散模型) [04:22] 🌟 DreamClear: High-Capacity Real-World Image Restoration with Privacy-Safe Dataset Curation(DreamClear:高容量真实世界图像修复与隐私安全数据集构建) [05:10] 🧩 GrounDiT: Grounding Diffusion Transformers via Noisy Patch Transplantation(GrounDiT:基于噪声补丁移植的扩散变换器空间定位) [05:49] 📚 A Survey of Small Language Models(小语言模型综述) [06:23] 💾 COAT: Compressing Optimizer states and Activation for Memory-Efficient FP8 Training(COAT:压缩优化器状态和激活以实现高效的FP8训练) [06:58] ⚡ Fast Best-of-N Decoding via Speculative Rejection(基于推测拒绝的快速最佳N解码) [07:36] 🔍 Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines(视觉搜索助手:赋能视觉-语言模型作为多模态搜索引擎) [08:25] 🎥 LARP: Tokenizing Videos with a Learned Autoregressive Generative Prior(LARP:利用学习到的自回归生成先验进行视频标记化) [09:00] 🤖 Neural Fields in Robotics: A Survey(机器人学中的神经场:综述) [09:40] 🗣 Dialog2Flow: Pre-training Soft-Contrastive Action-Driven Sentence Embeddings for Automatic Dialog Flow Extraction(对话2流程:预训练软对比动作驱动句子嵌入用于自动对话流程提取) [10:15] 🩺 Language Models And A Second Opinion Use Case: The Pocket Professional(语言模型与第二意见应用案例:口袋专家) [10:55] 🤖 Leveraging Locality to Boost Sample Efficiency in Robotic Manipulation(利用局部性提升机器人操作的样本效率) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
本期的 13 篇论文如下: [00:25] 🚀 ROCKET-1: Master Open-World Interaction with Visual-Temporal Context Prompting(ROCKET-1:利用视觉-时间上下文提示掌握开放世界交互) [01:14] 🗣 Continuous Speech Synthesis using per-token Latent Diffusion(基于每标记潜在扩散的连续语音合成) [01:55] ⚡ Teach Multimodal LLMs to Comprehend Electrocardiographic Images(教授多模态大语言模型理解心电图图像) [02:39] 🌐 Infinity-MM: Scaling Multimodal Performance with Large-Scale and High-Quality Instruction Data(无限多模态:通过大规模高质量指令数据扩展多模态性能) [03:23] ⚡ FasterCache: Training-Free Video Diffusion Model Acceleration with High Quality(FasterCache:无训练视频扩散模型加速与高质量生成) [03:56] 🎧 MMAU: A Massive Multi-Task Audio Understanding and Reasoning Benchmark(大规模多任务音频理解与推理基准) [04:34] 🧠 Counting Ability of Large Language Models and Impact of Tokenization(大型语言模型的计数能力及其对分词的影响) [05:08] 🧠 Fictitious Synthetic Data Can Improve LLM Factuality via Prerequisite Learning(通过先决学习利用虚构合成数据提升LLM事实性) [05:46] 🤖 Reflection-Bench: probing AI intelligence with reflection(反射-基准:通过反射探测AI智能) [06:23] 🤖 Hybrid Preferences: Learning to Route Instances for Human vs. AI Feedback(混合偏好:学习路由实例以进行人机反馈) [06:57] 🔍 Leveraging Skills from Unlabeled Prior Data for Efficient Online Exploration(利用未标注的先验数据进行高效在线探索) [07:35] 🔍 Are LLMs Better than Reported? Detecting Label Errors and Mitigating Their Effect on Model Performance(LLM是否优于报告?检测标签错误并减轻其对模型性能的影响) [08:15] 🤖 Dynamic 3D Gaussian Tracking for Graph-Based Neural Dynamics Modeling(基于图神经网络的动态三维高斯跟踪用于神经动力学建模) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
本期的 5 篇论文如下: [00:44] TOP1(🔥79) | ⚡ FrugalNeRF: Fast Convergence for Few-shot Novel View Synthesis without Learned Priors(节俭NeRF:无学习先验的少样本新视角合成快速收敛) [02:42] TOP2(🔥60) | 🌳 SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree(SAM2Long:通过无训练记忆树增强SAM 2以实现长视频分割) [04:19] TOP3(🔥58) | 🚀 Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss(打破内存壁垒:对比损失的近无限批量规模扩展) [06:11] TOP4(🔥55) | 🤖 CompassJudger-1: All-in-one Judge Model Helps Model Evaluation and Evolution(指南针评判者-1:一体化评判模型助力模型评估与进化) [08:28] TOP5(🔥52) | 💼 UCFE: A User-Centric Financial Expertise Benchmark for Large Language Models(UCFE:面向用户的大语言模型金融专业能力基准) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
本期的 21 篇论文如下: [00:26] 🚀 Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss(打破内存壁垒:对比损失的近无限批量规模扩展) [01:09] 🔄 LOGO -- Long cOntext aliGnment via efficient preference Optimization(LOGO -- 通过高效偏好优化实现长上下文对齐) [01:45] 🧠 Unleashing Reasoning Capability of LLMs via Scalable Question Synthesis from Scratch(从零开始释放LLMs的推理能力:可扩展的问题合成方法) [02:30] 🤔 Can Knowledge Editing Really Correct Hallucinations?(知识编辑真的能纠正幻觉吗?) [03:17] 🎮 Unbounded: A Generative Infinite Game of Character Life Simulation(无界:生成式无限角色生活模拟游戏) [04:02] 🎥 Framer: Interactive Frame Interpolation(Framer:交互式帧插值) [04:48] 📊 Distill Visual Chart Reasoning Ability from LLMs to MLLMs(从LLMs到MLLMs的视觉图表推理能力提炼) [05:35] 📉 Why Does the Effective Context Length of LLMs Fall Short?(为什么大型语言模型的有效上下文长度不足?) [06:14] 🔒 Robust Watermarking Using Generative Priors Against Image Editing: From Benchmarking to Advances(基于生成先验的鲁棒水印技术对抗图像编辑:从基准测试到进展) [06:52] 🔧 Skywork-Reward: Bag of Tricks for Reward Modeling in LLMs(天工奖励:LLM奖励建模的技巧包) [07:27] 🌍 CAMEL-Bench: A Comprehensive Arabic LMM Benchmark(CAMEL-Bench:一个全面的阿拉伯语大型多模态模型基准) [08:09] 📊 Should We Really Edit Language Models? On the Evaluation of Edited Language Models(我们真的应该编辑语言模型吗?关于编辑语言模型的评估) [08:43] 🌐 ADEM-VL: Adaptive and Embedded Fusion for Efficient Vision-Language Tuning(ADEM-VL:高效视觉语言调优的自适应嵌入融合方法) [09:20] 🌐 WAFFLE: Multi-Modal Model for Automated Front-End Development(WAFFLE:自动化前端开发的多模态模型) [09:52] 📚 CCI3.0-HQ: a large-scale Chinese dataset of high quality designed for pre-training large language models(CCI3.0-HQ:一个用于预训练大型语言模型的高质量大规模中文数据集) [10:30] 🔄 Stable Consistency Tuning: Understanding and Improving Consistency Models(稳定一致性调优:理解与改进一致性模型) [11:10] 🧮 Language Models are Symbolic Learners in Arithmetic(语言模型在算术中的符号学习者角色) [12:00] 🐍 Taipan: Efficient and Expressive State Space Language Models with Selective Attention(Taipan:高效且表达丰富的状态空间语言模型与选择性注意力) [12:44] 🔄 Value Residual Learning For Alleviating Attention Concentration In Transformers(残差值学习缓解Transformer中的注意力集中问题) [13:23] 📚 Multi-Draft Speculative Sampling: Canonical Architectures and Theoretical Limits(多草稿推测采样:典型架构与理论极限) [14:03] 🤖 Data Scaling Laws in Imitation Learning for Robotic Manipulation(机器人操作中的模仿学习数据缩放定律) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
本期的 10 篇论文如下: [00:25] 🖼 MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models(多图像增强直接偏好优化:大型视觉语言模型) [01:09] 🌍 WorldSimBench: Towards Video Generation Models as World Simulators(世界模拟器:迈向视频生成模型作为世界模拟器) [01:47] 🌊 Scaling Diffusion Language Models via Adaptation from Autoregressive Models(通过自回归模型适应扩展扩散语言模型) [02:20] 📱 Lightweight Neural App Control(轻量级神经应用控制) [03:01] 🏠 ARKit LabelMaker: A New Scale for Indoor 3D Scene Understanding(ARKit标签制造者:室内3D场景理解的新尺度) [03:47] 🖼 Scalable Ranked Preference Optimization for Text-to-Image Generation(可扩展的文本到图像生成中的排序偏好优化) [04:23] 🌆 DynamicCity: Large-Scale LiDAR Generation from Dynamic Scenes(动态城市:动态场景的大规模LiDAR生成) [05:05] 🩺 MedINST: Meta Dataset of Biomedical Instructions(医学指令元数据集:MedINST) [05:52] 🌍 M-RewardBench: Evaluating Reward Models in Multilingual Settings(多语言环境下的奖励模型评估:M-RewardBench) [06:27] 📊 TP-Eval: Tap Multimodal LLMs' Potential in Evaluation by Customizing Prompts(TP-Eval:通过定制提示挖掘多模态大语言模型的评估潜力) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
本期的 8 篇论文如下: [00:27] 🔍 PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction(金字塔式视觉冗余减少:通过金字塔视觉冗余减少加速大型视觉-语言模型) [01:09] 🌟 SpectroMotion: Dynamic 3D Reconstruction of Specular Scenes(光谱运动:镜面场景的动态三维重建) [01:48] 🤖 Aligning Large Language Models via Self-Steering Optimization(通过自引导优化对齐大型语言模型) [02:30] 🇯 JMMMU: A Japanese Massive Multi-discipline Multimodal Understanding Benchmark for Culture-aware Evaluation(JMMMU:一个用于文化意识评估的日本大规模多学科多模态理解基准) [03:11] 🧬 EvoPress: Towards Optimal Dynamic Model Compression via Evolutionary Search(EvoPress:通过进化搜索实现最优动态模型压缩) [03:53] 🧠 MiniPLM: Knowledge Distillation for Pre-Training Language Models(MiniPLM:预训练语言模型的知识蒸馏) [04:30] 🔍 Mitigating Object Hallucination via Concentric Causal Attention(通过同心因果注意力缓解对象幻觉) [05:19] 🧠 Math Neurosurgery: Isolating Language Models' Math Reasoning Abilities Using Only Forward Passes(数学神经外科:仅使用前向传递隔离语言模型的数学推理能力) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
与播客爱好者一起交流
播放列表还是空的
去找些喜欢的节目添加进来吧