https://babi.com/

slot gacor

节目列表: HuggingFace 每日AI论文速递 - EarsOnMe - 精选播客,一听即合

2025.02.28 | 自我校正提升数学推理,强化学习优化医疗推理。

HuggingFace 每日AI论文速递

本期的 19 篇论文如下: [00:23] 🧠 Self-rewarding correction for mathematical reasoning(自我奖励的数学推理校正) [01:03] 🧠 MedVLM-R1: Incentivizing Medical Reasoning Capability of Vision-Language Models (VLMs) via Reinforcement Learning(MedVLM-R1:通过强化学习激励视觉语言模型的医疗推理能力) [01:53] 🧠 R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts(R2-T2:测试时重路由在多模态专家混合模型中的应用) [02:34] 🧬 LongRoPE2: Near-Lossless LLM Context Window Scaling(LongRoPE2:近乎无损的LLM上下文窗口扩展) [03:11] 🧠 FINEREASON: Evaluating and Improving LLMs' Deliberate Reasoning through Reflective Puzzle Solving(FINEREASON:通过反思性谜题解决评估和改进大语言模型的深思熟虑推理) [04:02] 🤖 CODESYNC: Synchronizing Large Language Models with Dynamic Code Evolution at Scale(CODESYNC:大规模动态代码演化与大型语言模型同步) [04:48] 🚀 Lean and Mean: Decoupled Value Policy Optimization with Global Value Guidance(精简与高效:基于全局价值引导的解耦价值策略优化) [05:33] 🧩 UniTok: A Unified Tokenizer for Visual Generation and Understanding(UniTok:面向视觉生成与理解的统一分词器) [06:12] 🚀 NeoBERT: A Next-Generation BERT(NeoBERT:下一代BERT) [06:47] 🌀 FlexiDiT: Your Diffusion Transformer Can Easily Generate High-Quality Samples with Less Compute(FlexiDiT:让你的扩散Transformer轻松生成高质量样本,计算量更少) [07:30] 🛠 SoRFT: Issue Resolving with Subtask-oriented Reinforced Fine-Tuning(SoRFT:面向子任务的强化微调问题解决方法) [08:07] 🤖 Building Interactable Replicas of Complex Articulated Objects via Gaussian Splatting(基于高斯样条构建复杂 articulated 物体的交互式副本) [08:45] 🎨 Multimodal Representation Alignment for Image Generation: Text-Image Interleaved Control Is Easier Than You Think(多模态表示对齐用于图像生成:文本-图像交错控制比你想象的更简单) [09:30] 🎥 Mobius: Text to Seamless Looping Video Generation via Latent Shift(Mobius:通过潜在位移从文本生成无缝循环视频) [10:08] 🛡 Guardians of the Agentic System: Preventing Many Shots Jailbreak with Agentic System(代理系统守护者:通过代理系统防止多次越狱) [10:49] 🤖 R1-T1: Fully Incentivizing Translation Capability in LLMs via Reasoning Learning(通过推理学习全面激励大语言模型中的翻译能力) [11:29] 🧠 On Relation-Specific Neurons in Large Language Models(关于大型语言模型中的关系特定神经元) [12:05] 🔄 Training Consistency Models with Variational Noise Coupling(基于变分噪声耦合的训练一致性模型) [12:46] ⚡ Efficient Gaussian Splatting for Monocular Dynamic Scene Rendering via Sparse Time-Variant Attribute Modeling(通过稀疏时变属性建模实现单目动态场景渲染的高效高斯光栅化) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

13分钟
99+
11个月前

2025.02.27 | Kanana提升韩英双语效率,GHOST 2.0实现高保真头部转移。

HuggingFace 每日AI论文速递

本期的 18 篇论文如下: [00:23] 🌐 Kanana: Compute-efficient Bilingual Language Models(Kanana:计算高效的双语语言模型) [00:54] 👤 GHOST 2.0: generative high-fidelity one shot transfer of heads(GHOST 2.0:生成高保真一次性头部转移) [01:43] 🎥 TheoremExplainAgent: Towards Multimodal Explanations for LLM Theorem Understanding(定理解释代理:面向大语言模型定理理解的多模态解释) [02:21] 🤖 Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems(代理奖励建模:将人类偏好与可验证的正确性信号结合以构建可靠的奖励系统) [03:02] 🤖 Can Large Language Models Detect Errors in Long Chain-of-Thought Reasoning?(大型语言模型能否检测长链推理中的错误?) [03:47] 🌍 Language Models' Factuality Depends on the Language of Inquiry(语言模型的事实性依赖于查询语言) [04:27] 🧠 Can Language Models Falsify? Evaluating Algorithmic Reasoning with Counterexample Creation(语言模型能否证伪?评估算法推理中的反例创建) [05:11] 🤖 Towards an AI co-scientist(迈向人工智能合作科学家) [05:52] 🇬 Plutus: Benchmarking Large Language Models in Low-Resource Greek Finance(普鲁托斯:在低资源希腊金融环境中评估大型语言模型) [06:38] 🤖 VEM: Environment-Free Exploration for Training GUI Agent with Value Environment Model(VEM:利用价值环境模型训练GUI代理的无环境探索) [07:12] 📏 Distill Any Depth: Distillation Creates a Stronger Monocular Depth Estimator(蒸馏任意深度:蒸馏技术创造更强的单目深度估计器) [07:52] 📚 Project Alexandria: Towards Freeing Scientific Knowledge from Copyright Burdens via LLMs(亚历山大项目:通过大型语言模型解除科学知识的版权负担) [08:35] 🛡 AISafetyLab: A Comprehensive Framework for AI Safety Evaluation and Improvement(AISafetyLab:AI安全评估与改进的综合框架) [09:23] 🧠 BIG-Bench Extra Hard(BIG-Bench 超难版本) [10:07] 🔍 CritiQ: Mining Data Quality Criteria from Human Preferences(CritiQ:从人类偏好中挖掘数据质量标准) [10:44] 🔬 MolSpectra: Pre-training 3D Molecular Representation with Multi-modal Energy Spectra(MolSpectra:利用多模态能量光谱预训练三维分子表示) [11:28] 📄 PosterSum: A Multimodal Benchmark for Scientific Poster Summarization(PosterSum:科学海报摘要的多模态基准) [12:08] 🧠 DOEI: Dual Optimization of Embedding Information for Attention-Enhanced Class Activation Maps(双优化嵌入信息用于增强注意力类激活图) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

13分钟
99+
11个月前

2025.02.26 | OmniAlign-V提升多模态模型对齐,SpargeAttn加速注意力计算

HuggingFace 每日AI论文速递

本期的 14 篇论文如下: [00:23] 🤖 OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference(OmniAlign-V:迈向多模态大语言模型与人类偏好增强对齐) [01:06] ⚡ SpargeAttn: Accurate Sparse Attention Accelerating Any Model Inference(SpargeAttn:准确稀疏注意力加速任意模型推理) [01:53] 🖼 KV-Edit: Training-Free Image Editing for Precise Background Preservation(KV-编辑:无需训练的图像编辑方法,实现精确背景保留) [02:32] 🌈 ART: Anonymous Region Transformer for Variable Multi-Layer Transparent Image Generation(匿名区域变换器:可变多层透明图像生成) [03:08] 🤖 SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution(SWE-RL:通过开源软件演化数据强化学习提升LLM推理能力) [03:51] 📊 Unveiling Downstream Performance Scaling of LLMs: A Clustering-Based Perspective(揭示大语言模型下游性能扩展:基于聚类的视角) [04:30] 🧠 Scale-Distribution Decoupling: Enabling Stable and Effective Training of Large Language Models(尺度分布解耦:实现大型语言模型稳定有效训练) [05:11] 🔄 K-LoRA: Unlocking Training-Free Fusion of Any Subject and Style LoRAs(K-LoRA:解锁无需训练的任意主题和风格LoRA融合) [05:51] 🌐 WebGames: Challenging General-Purpose Web-Browsing AI Agents(WebGames:挑战通用网页浏览AI代理) [06:29] 🧠 Introducing Visual Perception Token into Multimodal Large Language Model(引入视觉感知令牌的多模态大语言模型) [07:07] 🎰 The Lottery LLM Hypothesis, Rethinking What Abilities Should LLM Compression Preserve?(彩票LLM假说:重新思考LLM压缩应保留的能力) [07:47] 🧠 AAD-LLM: Neural Attention-Driven Auditory Scene Understanding(AAD-LLM:神经注意力驱动的听觉场景理解) [08:26] 🔍 LaTIM: Measuring Latent Token-to-Token Interactions in Mamba Models(LaTIM:测量Mamba模型中的潜在Token-to-Token交互) [09:07] 🧠 Shakti-VLMs: Scalable Vision-Language Models for Enterprise AI(Shakti-VLMs:企业级AI的可扩展视觉语言模型) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

10分钟
99+
11个月前

2025.02.25 | 长上下文优化创新,视觉扩散高效通用。

HuggingFace 每日AI论文速递

本期的 20 篇论文如下: [00:27] 📖 Thus Spake Long-Context Large Language Model(长上下文大语言模型如是说) [01:09] 🌈 DICEPTION: A Generalist Diffusion Model for Visual Perceptual Tasks(用于视觉感知任务的通用扩散模型) [01:48] 🚀 Slamming: Training a Speech Language Model on One GPU in a Day(撞击:在一天内使用单个GPU训练语音语言模型) [02:32] 🎥 VideoGrain: Modulating Space-Time Attention for Multi-grained Video Editing(视频粒度:调节时空注意力实现多粒度视频编辑) [03:11] 🎧 Audio-FLAN: A Preliminary Release(音频FLAN:初步发布) [03:43] 🧠 CodeCriticBench: A Holistic Code Critique Benchmark for Large Language Models(CodeCriticBench:面向大型语言模型的全面代码 critique 基准测试) [04:28] 🎨 GCC: Generative Color Constancy via Diffusing a Color Checker(GCC:通过扩散色卡生成颜色恒常性) [05:11] 📊 Linguistic Generalizability of Test-Time Scaling in Mathematical Reasoning(数学推理中测试时间扩展的语言通用性) [05:57] 🚀 Make LoRA Great Again: Boosting LoRA with Adaptive Singular Values and Mixture-of-Experts Optimization Alignment(让LoRA再次伟大:通过自适应奇异值和混合专家优化对齐提升LoRA性能) [06:38] 🧠 Multimodal Inconsistency Reasoning (MMIR): A New Benchmark for Multimodal Reasoning Models(多模态不一致性推理(MMIR):多模态推理模型的新基准) [07:23] 🎥 RIFLEx: A Free Lunch for Length Extrapolation in Video Diffusion Transformers(RIFLEx:视频扩散Transformer中长度外推的免费午餐) [08:01] 📱 Mobile-Agent-V: Learning Mobile Device Operation Through Video-Guided Multi-Agent Collaboration(移动代理V:通过视频引导的多代理协作学习移动设备操作) [08:45] ⏳ Benchmarking Temporal Reasoning and Alignment Across Chinese Dynasties(中国朝代间的时间推理与对齐基准测试) [09:31] 🤖 Reflective Planning: Vision-Language Models for Multi-Stage Long-Horizon Robotic Manipulation(反射性规划:视觉语言模型在多阶段长时程机器人操作中的应用) [10:02] 🔄 Stable-SPAM: How to Train in 4-Bit More Stably than 16-Bit Adam(稳定-SPAM:如何在4位精度下比16位Adam更稳定地训练) [10:43] 📝 Can Community Notes Replace Professional Fact-Checkers?(社区笔记能替代专业事实核查员吗?) [11:24] 📈 Forecasting Open-Weight AI Model Growth on Hugging Face(预测Hugging Face上开放权重AI模型的增长) [12:08] 🔑 Beyond Release: Access Considerations for Generative AI Systems(超越发布:生成式人工智能系统的访问考量) [12:49] 🌐 TAG: A Decentralized Framework for Multi-Agent Hierarchical Reinforcement Learning(TAG:一种用于多智能体分层强化学习的去中心化框架) [13:30] 💃 X-Dancer: Expressive Music to Human Dance Video Generation(X-Dancer:从音乐生成生动舞蹈视频) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

14分钟
99+
11个月前

2025.02.24 | 高效学术调查生成,标点符号关键作用

HuggingFace 每日AI论文速递

本期的 20 篇论文如下: [00:23] 📚 SurveyX: Academic Survey Automation via Large Language Models(基于大型语言模型的学术调查自动化) [01:10] 🔍 LLM-Microscope: Uncovering the Hidden Role of Punctuation in Context Memory of Transformers(LLM显微镜:揭示标点符号在Transformer上下文记忆中的隐藏作用) [01:50] 🚗 MaskGWM: A Generalizable Driving World Model with Video Mask Reconstruction(MaskGWM:结合视频掩码重建的通用驾驶世界模型) [02:28] 🧬 Mol-LLaMA: Towards General Understanding of Molecules in Large Molecular Language Model(Mol-LLaMA:面向大分子语言模型的分子通用理解) [03:12] 🎨 PhotoDoodle: Learning Artistic Image Editing from Few-Shot Pairwise Data(PhotoDoodle:从少量成对数据中学习艺术图像编辑) [03:55] 🔗 VLM$^2$-Bench: A Closer Look at How Well VLMs Implicitly Link Explicit Matching Visual Cues(VLM²-Bench:深入探究视觉语言模型在显式匹配视觉线索上的隐式链接能力) [04:42] 📌 SIFT: Grounding LLM Reasoning in Contexts via Stickers(SIFT:通过贴纸将大语言模型的推理扎根于上下文中) [05:27] 🧠 LightThinker: Thinking Step-by-Step Compression(光思者:逐步压缩推理) [05:59] 🗂 StructFlowBench: A Structured Flow Benchmark for Multi-turn Instruction Following(结构流基准:多轮指令跟随的结构流评估) [06:48] 🛡 Is Safety Standard Same for Everyone? User-Specific Safety Evaluation of Large Language Models(安全标准对所有人都一样吗?大型语言模型的用户特定安全评估) [07:40] 📚 KITAB-Bench: A Comprehensive Multi-Domain Benchmark for Arabic OCR and Document Understanding(KITAB-Bench:阿拉伯语OCR与文档理解的综合多领域基准) [08:30] 🧬 ReQFlow: Rectified Quaternion Flow for Efficient and High-Quality Protein Backbone Generation(ReQFlow:用于高效高质量蛋白质骨架生成的校正四元数流) [09:11] 🧠 MoBA: Mixture of Block Attention for Long-Context LLMs(MoBA:块注意力混合模型用于长上下文LLMs) [09:49] 🤖 InterFeedback: Unveiling Interactive Intelligence of Large Multimodal Models via Human Feedback(InterFeedback:通过人类反馈揭示大型多模态模型的交互智能) [10:37] 🧠 The Relationship Between Reasoning and Performance in Large Language Models -- o3 (mini) Thinks Harder, Not Longer(大语言模型中推理与性能的关系——o3(mini)通过更努力而非更长时间进行推理) [11:20] 📚 Evaluating Multimodal Generative AI with Korean Educational Standards(评估多模态生成式人工智能与韩国教育标准) [11:54] ⚠ Superintelligent Agents Pose Catastrophic Risks: Can Scientist AI Offer a Safer Path?(超级智能体带来灾难性风险:科学家AI能否提供更安全的路径?) [12:29] ⚡ One-step Diffusion Models with $f$-Divergence Distribution Matching(基于$f$-散度分布匹配的一步扩散模型) [13:09] 🧠 Think Inside the JSON: Reinforcement Strategy for Strict LLM Schema Adherence(在JSON内部思考:强化策略实现严格LLM模式遵循) [13:52] 🧠 MedHallu: A Comprehensive Benchmark for Detecting Medical Hallucinations in Large Language Models(MedHallu:检测大型语言模型中的医学幻觉的综合基准) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

15分钟
99+
11个月前

2025.02.21 | AI代理评估新框架,LLM学科表现差异显著。

HuggingFace 每日AI论文速递

本期的 20 篇论文如下: [00:26] 🧠 MLGym: A New Framework and Benchmark for Advancing AI Research Agents(MLGym:推进AI研究代理的新框架与基准) [01:18] 📚 SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines(SuperGPQA:扩展LLM评估至285个研究生学科) [02:04] 🌐 SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features(SigLIP 2:多语言视觉-语言编码器的语义理解、定位与密集特征改进) [02:52] 🧠 How Much Knowledge Can You Pack into a LoRA Adapter without Harming LLM?(在不损害大型语言模型的情况下,LoRA适配器能容纳多少知识?) [03:49] 🚀 S*: Test Time Scaling for Code Generation(S*:代码生成中的测试时间缩放) [04:35] ⏳ Does Time Have Its Place? Temporal Heads: Where Language Models Recall Time-specific Information(时间是否有其位置?时间头:语言模型如何回忆时间特定信息) [05:28] 📄 LongWriter-V: Enabling Ultra-Long and High-Fidelity Generation in Vision-Language Models(LongWriter-V:在视觉-语言模型中实现超长和高保真生成) [06:17] 🧠 Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning(逻辑-RL:通过基于规则的强化学习释放LLM推理能力) [07:13] 🖥 PC-Agent: A Hierarchical Multi-Agent Collaboration Framework for Complex Task Automation on PC(PC-Agent:一种用于复杂任务自动化在PC上的分层多智能体协作框架) [08:07] 🧠 S$^2$R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning(S$^2$R:通过强化学习教导大语言模型自我验证与自我修正) [09:01] 🧠 Discovering highly efficient low-weight quantum error-correcting codes with reinforcement learning(利用强化学习发现高效低权重量子纠错码) [09:55] 🎥 Dynamic Concepts Personalization from Single Videos(单视频动态概念个性化) [10:38] 🖼 Scaling Text-Rich Image Understanding via Code-Guided Synthetic Multimodal Data Generation(通过代码引导的合成多模态数据生成扩展文本丰富的图像理解) [11:23] 🌍 NAVIG: Natural Language-guided Analysis with Vision Language Models for Image Geo-localization(NAVIG:基于自然语言引导的视觉语言模型用于图像地理定位分析) [12:13] 🧠 AlphaMaze: Enhancing Large Language Models' Spatial Intelligence via GRPO(AlphaMaze:通过GRPO提升大型语言模型的空间智能) [13:06] 🌍 How Much Do LLMs Hallucinate across Languages? On Multilingual Estimation of LLM Hallucination in the Wild(LLMs在多语言环境下的幻觉现象研究:在野外场景中的多语言幻觉估计) [13:52] 🌍 Geolocation with Real Human Gameplay Data: A Large-Scale Dataset and Human-Like Reasoning Framework(基于真实人类游戏数据的 geolocation:大规模数据集与人类推理框架) [14:55] 🌐 RelaCtrl: Relevance-Guided Efficient Control for Diffusion Transformers(RelaCtrl:引导相关性的高效控制扩散变换器) [15:54] 🧠 Enhancing Cognition and Explainability of Multimodal Foundation Models with Self-Synthesized Data(增强多模态基础模型的认知与可解释性通过自合成数据) [16:41] 🤖 LLM-based User Profile Management for Recommender System(基于大语言模型的推荐系统用户画像管理) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

18分钟
99+
11个月前

2025.02.20 | 提升视觉感知,强化自动驾驶安全。

HuggingFace 每日AI论文速递

本期的 20 篇论文如下: [00:24] 🌐 Qwen2.5-VL Technical Report(Qwen2.5-VL 技术报告) [01:10] 🚗 RAD: Training an End-to-End Driving Policy via Large-Scale 3DGS-based Reinforcement Learning(RAD:基于大规模3DGS强化学习的端到端驾驶策略训练) [01:50] 🎶 SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation(SongGen:用于文本到歌曲生成的单阶段自回归Transformer) [02:38] 🧠 MoM: Linear Sequence Modeling with Mixture-of-Memories(MoM:结合记忆混合的线性序列建模) [03:15] 🌐 Craw4LLM: Efficient Web Crawling for LLM Pretraining(Craw4LLM:面向LLM预训练的高效网页爬取方法) [04:05] 🧠 LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization(LongPO:通过短至长偏好优化实现大型语言模型的长上下文自进化) [04:45] 🤔 Small Models Struggle to Learn from Strong Reasoners(小型模型难以从强推理者中学习) [05:27] ⚙ Autellix: An Efficient Serving Engine for LLM Agents as General Programs(Autellix:一种用于LLM代理作为通用程序的高效服务引擎) [06:08] 🌍 Presumed Cultural Identity: How Names Shape LLM Responses(假定的文化身份:名字如何塑造LLM的回应) [06:53] 🚨 Why Safeguarded Ships Run Aground? Aligned Large Language Models' Safety Mechanisms Tend to Be Anchored in The Template Region(为什么安全保障的船只也会搁浅?对齐的大型语言模型安全机制往往锚定在模板区域) [07:38] 🩺 SearchRAG: Can Search Engines Be Helpful for LLM-based Medical Question Answering?(搜索RAG:搜索引擎能否助力基于LLM的医疗问答?) [08:21] 🧠 Thinking Preference Optimization(思考偏好优化) [08:59] 🧠 Is That Your Final Answer? Test-Time Scaling Improves Selective Question Answering(那是你的最终答案吗?测试时缩放提升选择性问答) [09:40] 🧠 AdaptiveStep: Automatically Dividing Reasoning Step through Model Confidence(自适应步骤:通过模型置信度自动划分推理步骤) [10:21] 🧬 NExT-Mol: 3D Diffusion Meets 1D Language Modeling for 3D Molecule Generation(NExT-Mol:3D扩散与1D语言建模结合的3D分子生成) [11:02] 🧩 ActionPiece: Contextually Tokenizing Action Sequences for Generative Recommendation(ActionPiece:面向生成推荐的上下文感知行为序列标记化) [11:44] 🧠 Train Small, Infer Large: Memory-Efficient LoRA Training for Large Language Models(小模型训练,大模型推理:用于大型语言模型的内存高效LoRA训练) [12:33] 🌍 GIMMICK -- Globally Inclusive Multimodal Multitask Cultural Knowledge Benchmarking(GIMMICK -- 全球包容性多模态多任务文化知识基准测试) [13:19] 🤖 InfiR : Crafting Effective Small Language Models and Multimodal Small Language Models in Reasoning(InfiR:构建高效的小型语言模型和多模态小型语言模型用于推理) [14:06] 🔊 Noise May Contain Transferable Knowledge: Understanding Semi-supervised Heterogeneous Domain Adaptation from an Empirical Perspective(噪声可能包含可转移的知识:从实证角度理解半监督异构域适应) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

15分钟
99+
11个月前

2025.02.19 | 数据高效语音处理,嵌入空间压缩创新。

HuggingFace 每日AI论文速递

本期的 20 篇论文如下: [00:25] 🎙 Soundwave: Less is More for Speech-Text Alignment in LLMs(声波:减少数据需求,优化语音与文本对齐在LLMs中的应用) [01:05] 🔍 Cramming 1568 Tokens into a Single Vector and Back Again: Exploring the Limits of Embedding Space Capacity(将1568个Token压缩到一个向量并再次解压:探索嵌入空间容量的极限) [01:48] 🌊 Continuous Diffusion Model for Language Modeling(连续扩散模型用于语言建模) [02:30] 🎥 Phantom: Subject-consistent video generation via cross-modal alignment(幻影:通过跨模态对齐实现主体一致性视频生成) [03:12] 🧠 Rethinking Diverse Human Preference Learning through Principal Component Analysis(重新思考通过主成分分析进行多样化人类偏好学习) [04:00] 🤖 SoFar: Language-Grounded Orientation Bridges Spatial Reasoning and Object Manipulation(SoFar:语言引导的方向桥接空间推理与对象操作) [04:36] 🛡 SafeRoute: Adaptive Model Selection for Efficient and Accurate Safety Guardrails in Large Language Models(SafeRoute:大型语言模型中高效且准确的安全防护栏的自适应模型选择) [05:25] 🐍 Multimodal Mamba: Decoder-only Multimodal State Space Model via Quadratic to Linear Distillation(多模态Mamba:通过二次到线性蒸馏的解码器多模态状态空间模型) [06:08] 📚 You Do Not Fully Utilize Transformer's Representation Capacity(你没有充分利用Transformer的表示能力) [06:50] 🤖 Magma: A Foundation Model for Multimodal AI Agents(熔岩:多模态AI代理的基础模型) [07:23] 💹 FLAG-Trader: Fusion LLM-Agent with Gradient-based Reinforcement Learning for Financial Trading(FLAG-Trader:融合LLM与基于梯度的强化学习用于金融交易) [08:08] 📄 RealSyn: An Effective and Scalable Multimodal Interleaved Document Transformation Paradigm(RealSyn:一种有效且可扩展的多模态交错文档转换范式) [08:49] 🧠 PAFT: Prompt-Agnostic Fine-Tuning(PAFT:与提示无关的微调) [09:27] 🛠 OctoTools: An Agentic Framework with Extensible Tools for Complex Reasoning(OctoTools:一个具有扩展工具的复杂推理代理框架) [10:13] 📊 Revisiting the Test-Time Scaling of o1-like Models: Do they Truly Possess Test-Time Scaling Capabilities?(重新审视o1类模型的测试时缩放能力:它们是否真正具备测试时缩放能力?) [11:00] 🔄 MUDDFormer: Breaking Residual Bottlenecks in Transformers via Multiway Dynamic Dense Connections(MUDDFormer:通过多路动态密集连接打破Transformer中的残差瓶颈) [11:37] 🩺 HealthGPT: A Medical Large Vision-Language Model for Unifying Comprehension and Generation via Heterogeneous Knowledge Adaptation(HealthGPT:通过异构知识适应实现医疗大视觉语言模型的统一理解与生成) [12:12] 🧠 HeadInfer: Memory-Efficient LLM Inference by Head-wise Offloading(HeadInfer:通过分头卸载实现高效的LLM推理) [12:51] 🌍 Text2World: Benchmarking Large Language Models for Symbolic World Model Generation(文本到世界:大语言模型符号世界模型生成的基准测试) [13:32] 🧠 Atom of Thoughts for Markov LLM Test-Time Scaling(用于马尔可夫LLM测试时扩展的原子思维) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

14分钟
99+
11个月前

2025.02.18 | 稀疏注意力提升效率,机器人起身策略优化。

HuggingFace 每日AI论文速递

本期的 29 篇论文如下: [00:23] ⚡ Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention(原生稀疏注意力:硬件对齐与原生可训练的稀疏注意力) [01:10] 🤖 Learning Getting-Up Policies for Real-World Humanoid Robots(学习真实世界人形机器人起身策略) [01:55] 🧠 ReLearn: Unlearning via Learning for Large Language Models(ReLearn:通过学习实现大型语言模型的遗忘) [02:35] 💻 SWE-Lancer: Can Frontier LLMs Earn $1 Million from Real-World Freelance Software Engineering?(SWE-Lancer:前沿大语言模型能否从真实世界的自由软件工程中赚取100万美元?) [03:21] 🌐 HermesFlow: Seamlessly Closing the Gap in Multimodal Understanding and Generation(赫尔墨斯流:无缝衔接多模态理解和生成) [03:58] 🧠 How Do LLMs Acquire New Knowledge? A Knowledge Circuits Perspective on Continual Pre-Training(大型语言模型如何获取新知识?知识电路视角下的持续预训练) [04:33] 🤖 SURGE: On the Potential of Large Language Models as General-Purpose Surrogate Code Executors(SURGE:关于大型语言模型作为通用代理代码执行器的潜力) [05:12] 🔧 Diffusion-Sharpening: Fine-tuning Diffusion Models with Denoising Trajectory Sharpening(扩散锐化:利用去噪轨迹锐化优化扩散模型微调) [05:55] 🧠 I Think, Therefore I Diffuse: Enabling Multimodal In-Context Reasoning in Diffusion Models(我思故我扩散:在扩散模型中实现多模态上下文推理) [06:38] 🔧 SAFE-SQL: Self-Augmented In-Context Learning with Fine-grained Example Selection for Text-to-SQL(SAFE-SQL:基于细粒度示例选择的自增强上下文学习用于文本到SQL转换) [07:25] 🧠 CRANE: Reasoning with constrained LLM generation(CRANE:受限LLM生成的推理) [08:07] 🧠 Intuitive physics understanding emerges from self-supervised pretraining on natural videos(直觉物理理解从自然视频的自监督预训练中涌现) [08:46] 🐦 Cuckoo: An IE Free Rider Hatched by Massive Nutrition in LLM's Nest(杜鹃:在大型语言模型的巢中孵化出的信息抽取搭便车者) [09:22] 🧠 Dyve: Thinking Fast and Slow for Dynamic Process Verification(Dyve:动态过程验证中的快思与慢想) [10:06] 🧠 PhysReason: A Comprehensive Benchmark towards Physics-Based Reasoning(物理推理:基于物理推理的综合基准) [10:53] 🤖 System Message Generation for User Preferences using Open-Source Models(基于开源模型的用户偏好系统消息生成) [11:38] 🎥 video-SALMONN-o1: Reasoning-enhanced Audio-visual Large Language Model(视频-SALMONN-o1:推理增强的音视频大型语言模型) [12:33] 🧠 Building A Proof-Oriented Programmer That Is 64% Better Than GPT-4o Under Data Scarsity(构建一个在数据稀缺情况下比GPT-4o好64%的证明导向程序员) [13:11] 🤖 Memory, Benchmark & Robots: A Benchmark for Solving Complex Tasks with Reinforcement Learning(记忆、基准与机器人:一种用于强化学习解决复杂任务的基准) [13:52] 🤖 MagicArticulate: Make Your 3D Models Articulation-Ready(魔法清晰:让你的3D模型准备好关节动画) [14:37] 🤖 Talk Structurally, Act Hierarchically: A Collaborative Framework for LLM Multi-Agent Systems(结构化交流,层次化行动:LLM多智能体系统的协作框架) [15:21] 🧠 One Example Shown, Many Concepts Known! Counterexample-Driven Conceptual Reasoning in Mathematical LLMs(一个示例展示,多个概念知晓!数学大语言模型中的反例驱动概念推理) [16:03] 🤖 Can a Single Model Master Both Multi-turn Conversations and Tool Use? CALM: A Unified Conversational Agentic Language Model(单一模型能否同时掌握多轮对话与工具使用?CALM:一个统一的对话代理语言模型) [16:40] 🚀 Better Embeddings with Coupled Adam(结合Adam优化器的更好嵌入) [17:18] 🧐 Show Me the Work: Fact-Checkers' Requirements for Explainable Automated Fact-Checking(展示工作:事实核查员对可解释自动化事实核查的需求) [17:56] 🧪 Towards Data-Efficient Pretraining for Atomic Property Prediction(面向原子性质预测的数据高效预训练) [18:46] 🌀 The Mirage of Model Editing: Revisiting Evaluation in the Wild(模型编辑的幻象:重新审视实际应用中的评估) [19:31] 🧮 Large Language Models and Mathematical Reasoning Failures(大型语言模型与数学推理失败) [20:11] 📊 Language Complexity Measurement as a Noisy Zero-Shot Proxy for Evaluating LLM Performance(语言复杂度测量作为评估LLM性能的噪声零样本代理) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

21分钟
99+
11个月前

2025.02.17 | RAS加速扩散变换器,视频生成提升质量

HuggingFace 每日AI论文速递

本期的 21 篇论文如下: [00:22] 🌐 Region-Adaptive Sampling for Diffusion Transformers(区域自适应采样扩散变换器) [01:05] 🎥 Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model(步进视频生成技术报告:视频基础模型的实践、挑战与未来) [01:48] 🌊 Large Language Diffusion Models(大规模语言扩散模型) [02:31] 🧠 ZeroBench: An Impossible Visual Benchmark for Contemporary Large Multimodal Models(零基准:当代大型多模态模型的不可视觉基准) [03:15] 🌟 MM-RLHF: The Next Step Forward in Multimodal LLM Alignment(MM-RLHF:多模态大语言模型对齐的下一步进展) [03:58] 🖼 Precise Parameter Localization for Textual Generation in Diffusion Models(扩散模型中文本生成精确参数定位) [04:40] 🧠 Diverse Inference and Verification for Advanced Reasoning(高级推理的多重推断与验证) [05:22] 🧬 DarwinLM: Evolutionary Structured Pruning of Large Language Models(达尔文LM:大型语言模型的进化结构剪枝) [06:02] 📈 AdaPTS: Adapting Univariate Foundation Models to Probabilistic Multivariate Time Series Forecasting(AdaPTS:将单变量基础模型适配到概率性多变量时间序列预测) [06:40] 🖼 ImageRAG: Dynamic Image Retrieval for Reference-Guided Image Generation(ImageRAG:动态图像检索用于引导图像生成) [07:23] 🤖 We Can't Understand AI Using our Existing Vocabulary(我们无法用现有词汇理解人工智能) [08:03] 📊 FoNE: Precise Single-Token Number Embeddings via Fourier Features(FoNE:通过傅里叶特征实现精确的单标记数字嵌入) [08:53] 🌍 Small Models, Big Impact: Efficient Corpus and Graph-Based Adaptation of Small Multilingual Language Models for Low-Resource Languages(小模型,大影响:面向低资源语言的多语言小模型的有效语料库与基于图的适应) [09:41] 🔓 Jailbreaking to Jailbreak(越狱以越狱) [10:23] 🤖 STMA: A Spatio-Temporal Memory Agent for Long-Horizon Embodied Task Planning(STMA:一种用于长时程具身任务规划的时空记忆代理) [11:05] 📊 Text-guided Sparse Voxel Pruning for Efficient 3D Visual Grounding(文本引导的稀疏体素剪枝用于高效的三维视觉定位) [11:41] ⚡ MRS: A Fast Sampler for Mean Reverting Diffusion based on ODE and SDE Solvers(基于ODE和SDE求解器的均值回归扩散快速采样器) [12:26] 🚗 V2V-LLM: Vehicle-to-Vehicle Cooperative Autonomous Driving with Multi-Modal Large Language Models(V2V-LLM:基于多模态大语言模型的车辆间协同自动驾驶) [13:06] 🎵 CLaMP 3: Universal Music Information Retrieval Across Unaligned Modalities and Unseen Languages(CLaMP 3:跨模态与跨语言的通用音乐信息检索) [13:49] 🧩 Cluster and Predict Latents Patches for Improved Masked Image Modeling(基于聚类与预测潜在补丁的改进掩码图像建模) [14:31] 🧬 Agentic End-to-End De Novo Protein Design for Tailored Dynamics Using a Language Diffusion Model(基于语言扩散模型的端到端从头蛋白质设计以实现定制动力学) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

15分钟
99+
11个月前
EarsOnMe

加入我们的 Discord

与播客爱好者一起交流

立即加入

扫描微信二维码

添加微信好友,获取更多播客资讯

微信二维码

播放列表

自动播放下一个

播放列表还是空的

去找些喜欢的节目添加进来吧