https://babi.com/

slot gacor

节目列表: HuggingFace 每日AI论文速递 - EarsOnMe - 精选播客,一听即合

2025.02.14 | GPU扩展至300万tokens,文本编码器内存高效策略。

HuggingFace 每日AI论文速递

本期的 18 篇论文如下: [00:21] 🚀 InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU(InfiniteHiP:在单个GPU上扩展语言模型上下文至300万 tokens) [01:07] 🖼 Skrr: Skip and Re-use Text Encoder Layers for Memory Efficient Text-to-Image Generation(Skrr:跳过并重用文本编码器层以实现内存高效文本到图像生成) [01:49] 🧠 An Open Recipe: Adapting Language-Specific LLMs to a Reasoning Model in One Day via Model Merging(一个开放的方案:通过模型合并在一日内将语言特定LLM适应为推理模型) [02:31] 📚 SelfCite: Self-Supervised Alignment for Context Attribution in Large Language Models(SelfCite:大语言模型中上下文归属的自监督对齐方法) [03:14] 🐕 Can this Model Also Recognize Dogs? Zero-Shot Model Search from Weights(该模型也能识别狗吗?基于权重的零样本模型搜索) [03:56] 🌐 Exploring the Potential of Encoder-free Architectures in 3D LMMs(探索无编码器架构在三维大尺度多模态模型中的潜力) [04:39] 🎭 CoSER: Coordinating LLM-Based Persona Simulation of Established Roles(协同角色模拟:基于大语言模型的角色扮演语言代理) [05:26] 🌐 TripoSG: High-Fidelity 3D Shape Synthesis using Large-Scale Rectified Flow Models(TripoSG:使用大规模校正流模型生成高保真3D形状) [06:09] 🤖 EmbodiedBench: Comprehensive Benchmarking Multi-modal Large Language Models for Vision-Driven Embodied Agents(EmbodiedBench:全面评估视觉驱动具身智能体多模态大语言模型) [07:00] 🌪 Typhoon T1: An Open Thai Reasoning Model(台风T1:一个开放的泰语推理模型) [07:54] 🤖 Logical Reasoning in Large Language Models: A Survey(大型语言模型中的逻辑推理:综述) [08:36] 🧠 MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency(MME-CoT:评估大型多模态模型中链式思维推理质量、鲁棒性和效率) [09:23] 🧠 CoT-Valve: Length-Compressible Chain-of-Thought Tuning(长度可压缩的链式思维调优) [10:11] 🤖 SQuARE: Sequential Question Answering Reasoning Engine for Enhanced Chain-of-Thought in Large Language Models(SQuARE:增强大型语言模型链式思考的顺序问答推理引擎) [10:52] 🌐 mmE5: Improving Multimodal Multilingual Embeddings via High-quality Synthetic Data(mmE5:通过高质量合成数据改进多模态多语言嵌入) [11:36] 🦜 The Stochastic Parrot on LLM's Shoulder: A Summative Assessment of Physical Concept Understanding(随机鹦鹉在大语言模型肩上:物理概念理解的总结性评估) [12:18] 🤖 DexTrack: Towards Generalizable Neural Tracking Control for Dexterous Manipulation from Human References(DexTrack:面向人类参考的灵巧操作通用神经跟踪控制) [13:00] 🔍 3CAD: A Large-Scale Real-World 3C Product Dataset for Unsupervised Anomaly(3CAD:一个大规模真实3C产品数据集用于无监督异常检测) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

14分钟
99+
11个月前

2025.02.13 | 多语言评估工具填补空白,密集文本图像数据集挑战生成模型。

HuggingFace 每日AI论文速递

本期的 20 篇论文如下: [00:23] 🌍 BenchMAX: A Comprehensive Multilingual Evaluation Suite for Large Language Models(BenchMAX:大型语言模型的综合多语言评估套件) [01:08] 📄 TextAtlas5M: A Large-scale Dataset for Dense Text Image Generation(TextAtlas5M:用于密集文本图像生成的大规模数据集) [01:48] 🎥 Light-A-Video: Training-free Video Relighting via Progressive Light Fusion(光影视频:基于渐进光融合的无训练视频重照明) [02:36] 🎥 CineMaster: A 3D-Aware and Controllable Framework for Cinematic Text-to-Video Generation(CineMaster:一个三维感知与可控的电影级文本到视频生成框架) [03:16] 🖥 WorldGUI: Dynamic Testing for Comprehensive Desktop GUI Automation(世界GUI:桌面GUI自动化的综合动态测试) [04:06] ⚡ LASP-2: Rethinking Sequence Parallelism for Linear Attention and Its Hybrid(LASP-2:重新思考线性注意力及其混合模型的序列并行性) [04:45] 🧠 TransMLA: Multi-head Latent Attention Is All You Need(TransMLA:多头潜在注意力机制的全部需求) [05:31] 💼 Fino1: On the Transferability of Reasoning Enhanced LLMs to Finance(Fino1:关于推理增强型大型语言模型在金融领域的可迁移性研究) [06:23] 📏 Distillation Scaling Laws(蒸馏缩放定律) [07:02] 🚀 Ignore the KL Penalty! Boosting Exploration on Critical Tokens to Enhance RL Fine-Tuning(忽略KL惩罚!通过增强关键标记的探索来提升强化学习微调效果) [07:52] 🌍 SARChat-Bench-2M: A Multi-Task Vision-Language Benchmark for SAR Image Interpretation(SARChat-Bench-2M:用于SAR图像解释的多任务视觉语言基准) [08:25] 🧠 LLM Pretraining with Continuous Concepts(基于连续概念的LLM预训练) [09:09] 🎭 Animate Anyone 2: High-Fidelity Character Image Animation with Environment Affordance(动画任何人2:利用环境可操作性生成高保真角色图像动画) [09:52] 🔍 NoLiMa: Long-Context Evaluation Beyond Literal Matching(NoLiMa:超越字面匹配的长上下文评估) [10:39] 🧠 Mediator: Memory-efficient LLM Merging with Less Parameter Conflicts and Uncertainty Based Routing(中介:基于参数冲突少和不确定性路由的高效LLM合并) [11:15] 📚 Towards Trustworthy Retrieval Augmented Generation for Large Language Models: A Survey(面向可信赖的大语言模型检索增强生成:综述) [11:58] 🎥 Next Block Prediction: Video Generation via Semi-Autoregressive Modeling(下一区块预测:通过半自回归建模生成视频) [12:43] 🔄 DPO-Shift: Shifting the Distribution of Direct Preference Optimization(DPO-Shift:直接偏好优化分布的可控转移) [13:28] 🧠 LLM Modules: Knowledge Transfer from a Large to a Small Model using Enhanced Cross-Attention(LLM模块:使用增强交叉注意力机制从大模型向小模型进行知识迁移) [14:15] 🛡 MetaSC: Test-Time Safety Specification Optimization for Language Models(MetaSC:语言模型推理时的安全规范优化) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

15分钟
99+
11个月前

2025.02.12 | 强化学习提升编程竞赛,代码输入输出优化推理模型。

HuggingFace 每日AI论文速递

本期的 21 篇论文如下: [00:25] 🧠 Competitive Programming with Large Reasoning Models(使用大型推理模型进行编程竞赛) [01:03] 🧠 CodeI/O: Condensing Reasoning Patterns via Code Input-Output Prediction(代码输入输出:通过代码输入输出预测凝练推理模式) [01:47] 🎥 Magic 1-For-1: Generating One Minute Video Clips within One Minute(魔幻1对1:在一分钟内生成一分钟视频片段) [02:27] 🧠 Teaching Language Models to Critique via Reinforcement Learning(通过强化学习教授语言模型进行批判) [03:09] 💼 Expect the Unexpected: FailSafe Long Context QA for Finance(预料之外:金融领域长上下文问答的FailSafe) [03:49] 🌍 Scaling Pre-training to One Hundred Billion Data for Vision Language Models(视觉语言模型预训练扩展至千亿级数据) [04:24] 🧠 LLMs Can Easily Learn to Reason from Demonstrations Structure, not content, is what matters!(大模型能够轻松从示范结构中学习推理,内容不是关键!) [05:07] 📈 Enhancing Financial Time-Series Forecasting with Retrieval-Augmented Large Language Models(通过检索增强的大型语言模型提升金融时间序列预测) [05:50] 📄 Éclair -- Extracting Content and Layout with Integrated Reading Order for Documents(Éclair -- 提取文档内容的集成阅读顺序) [06:34] 🛠 Hephaestus: Improving Fundamental Agent Capabilities of Large Language Models through Continual Pre-Training(赫菲斯托斯:通过持续预训练提升大型语言模型的基础代理能力) [07:15] 🛠 CAD-Editor: A Locate-then-Infill Framework with Automated Training Data Synthesis for Text-Based CAD Editing(CAD编辑器:基于文本指令的CAD编辑框架及自动训练数据合成) [08:10] 🎥 Enhance-A-Video: Better Generated Video for Free(增强视频:免费生成更高质量的视频) [08:49] 🌍 NatureLM: Deciphering the Language of Nature for Scientific Discovery(NatureLM:解密科学发现的自然语言) [09:34] 🦎 Forget What You Know about LLMs Evaluations - LLMs are Like a Chameleon(忘掉你对LLM评估的认知 - LLM就像变色龙) [10:22] 🎥 VidCRAFT3: Camera, Object, and Lighting Control for Image-to-Video Generation(VidCRAFT3:图像到视频生成的相机、物体与光照控制) [11:01] 📹 CoS: Chain-of-Shot Prompting for Long Video Understanding(CoS:长视频理解的链式镜头提示) [11:42] 🧩 Mask-Enhanced Autoregressive Prediction: Pay Less Attention to Learn More(掩码增强的自回归预测:少关注以学更多) [12:28] 🎤 FocalCodec: Low-Bitrate Speech Coding via Focal Modulation Networks(FocalCodec:通过焦点调制网络实现低比特率语音编码) [13:09] 🕵 Auditing Prompt Caching in Language Model APIs(语言模型API中的提示缓存审计) [13:49] 💎 Gemstones: A Model Suite for Multi-Faceted Scaling Laws(宝石:多面性缩放定律的模型套件) [14:32] 🧠 Skill Expansion and Composition in Parameter Space(参数空间中的技能扩展与组合) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

15分钟
99+
11个月前

2025.02.11 | LLMs生成多语言去毒数据,强化学习提升数学推理效率。

HuggingFace 每日AI论文速递

本期的 21 篇论文如下: [00:25] 🤖 SynthDetoxM: Modern LLMs are Few-Shot Parallel Detoxification Data Annotators(SynthDetoxM:现代大语言模型是少样本并行去毒化数据标注器) [01:10] 🧠 Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning(探索数学推理中结果奖励的学习极限) [01:55] 🤔 Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling(10亿参数LLM能否超越4050亿参数LLM?重新思考计算最优的测试时缩放) [02:38] ⚡ Lossless Acceleration of Large Language Models with Hierarchical Drafting based on Temporal Locality in Speculative Decoding(基于时间局部性的层次化草稿实现大语言模型无损加速) [03:19] 🚀 Show-o Turbo: Towards Accelerated Unified Multimodal Understanding and Generation(Show-o Turbo:迈向加速统一多模态理解和生成) [03:57] 🤖 Training Language Models for Social Deduction with Multi-Agent Reinforcement Learning(利用多智能体强化学习训练语言模型进行社会推理) [04:38] 🧠 ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templates(ReasonFlux:通过扩展思维模板实现分层LLM推理) [05:28] 🌐 EVEv2: Improved Baselines for Encoder-Free Vision-Language Models(EVEv2:改进的无编码器视觉语言模型基线) [06:11] 🧠 LM2: Large Memory Models(大型记忆模型) [06:57] 🧠 The Hidden Life of Tokens: Reducing Hallucination of Large Vision-Language Models via Visual Information Steering(标记的隐秘生命:通过视觉信息引导减少大型视觉语言模型的幻觉) [07:50] 🪆 Matryoshka Quantization(嵌套量化) [08:35] 🎥 Lumina-Video: Efficient and Flexible Video Generation with Multi-scale Next-DiT(Lumina-Video: 多尺度Next-DiT的高效灵活视频生成) [09:22] 🎥 History-Guided Video Diffusion(历史引导的视频扩散) [10:12] 🎥 CustomVideoX: 3D Reference Attention Driven Dynamic Adaptation for Zero-Shot Customized Video Diffusion Transformers(CustomVideoX:三维参考注意力驱动的零样本定制视频扩散变换器动态适应) [10:59] ⚡ APE: Faster and Longer Context-Augmented Generation via Adaptive Parallel Encoding(自适应并行编码:通过自适应并行编码实现更快更长的上下文增强生成) [11:38] ⏱ Efficient-vDiT: Efficient Video Diffusion Transformers With Attention Tile(高效视频扩散Transformer模型) [12:21] 🤖 MetaChain: A Fully-Automated and Zero-Code Framework for LLM Agents(元链:一个全自动且无需代码的LLM代理框架) [13:03] 🚀 Steel-LLM:From Scratch to Open Source -- A Personal Journey in Building a Chinese-Centric LLM(Steel-LLM:从零到开源——构建以中文为中心的LLM的个人历程) [13:47] 🧠 The Curse of Depth in Large Language Models(深度在大语言模型中的诅咒) [14:24] 🎨 DreamDPO: Aligning Text-to-3D Generation with Human Preferences via Direct Preference Optimization(DreamDPO:通过直接偏好优化对齐文本到3D生成与人偏好) [15:14] 🎨 Dual Caption Preference Optimization for Diffusion Models(双标题偏好优化用于扩散模型) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

16分钟
99+
11个月前

2025.02.10 | 视频处理性能提升,视频生成速度显著加快。

HuggingFace 每日AI论文速递

本期的 21 篇论文如下: [00:22] 🎥 VideoRoPE: What Makes for Good Video Rotary Position Embedding?(视频旋转位置嵌入:什么使得视频旋转位置嵌入有效?) [01:07] 🎥 Fast Video Generation with Sliding Tile Attention(基于滑动瓦片注意力的快速视频生成) [01:54] 🎥 Goku: Flow Based Video Generative Foundation Models(悟空:基于流的视频生成基础模型) [02:35] 🌍 AuraFusion360: Augmented Unseen Region Alignment for Reference-based 360° Unbounded Scene Inpainting(AuraFusion360:基于参考的360°无界场景修补增强未见区域对齐) [03:19] 🔢 QuEST: Stable Training of LLMs with 1-Bit Weights and Activations(QuEST:使用1位权重和激活值稳定训练大型语言模型) [03:57] 🛡 DuoGuard: A Two-Player RL-Driven Framework for Multilingual LLM Guardrails(DuoGuard:一种基于双玩家强化学习的多语言大模型防护框架) [04:40] 🧠 Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach(通过潜在推理扩展测试时计算:一种递归深度方法) [05:28] 🎯 Agency Is Frame-Dependent(代理是框架依赖的) [06:04] 🎥 FlashVideo:Flowing Fidelity to Detail for Efficient High-Resolution Video Generation(闪视频:高效高分辨率视频生成中的细节保真) [06:46] 📊 Linear Correlation in LM's Compositional Generalization and Hallucination(语言模型中的组合泛化与幻觉的线性相关性) [07:32] 🧠 Generating Symbolic World Models via Test-time Scaling of Large Language Models(通过测试时扩展大型语言模型生成符号世界模型) [08:09] 📱 On-device Sora: Enabling Diffusion-Based Text-to-Video Generation for Mobile Devices(设备上的Sora:为移动设备实现基于扩散的文本到视频生成) [08:51] ⚡ CMoE: Fast Carving of Mixture-of-Experts for Efficient LLM Inference(CMoE:用于高效LLM推理的快速混合专家模型雕刻) [09:32] 🧩 Scaling Laws in Patchification: An Image Is Worth 50,176 Tokens And More(补丁化缩放定律:图像价值50,176个标记及以上) [10:20] 🔄 Step Back to Leap Forward: Self-Backtracking for Boosting Reasoning of Language Models(退一步跃进:提升语言模型推理能力的自回溯机制) [11:06] 🧠 CodeSteer: Symbolic-Augmented Language Models via Code/Text Guidance(CodeSteer:通过代码/文本引导的符号增强语言模型) [11:50] 🧩 No Task Left Behind: Isotropic Model Merging with Common and Task-Specific Subspaces(无任务落后:各向同性模型合并与通用及任务特定子空间) [12:39] 🌓 YINYANG-ALIGN: Benchmarking Contradictory Objectives and Proposing Multi-Objective Optimization based DPO for Text-to-Image Alignment(阴阳对齐:基准测试矛盾目标并提出基于多目标优化的DPO用于文本到图像对齐) [13:20] 🌐 QLIP: Text-Aligned Visual Tokenization Unifies Auto-Regressive Multimodal Understanding and Generation(QLIP:文本对齐视觉标记化统一自回归多模态理解和生成) [14:02] 🧠 ARR: Question Answering with Large Language Models via Analyzing, Retrieving, and Reasoning(ARR:通过分析、检索和推理进行问答的大语言模型) [14:48] 🤖 MEETING DELEGATE: Benchmarking LLMs on Attending Meetings on Our Behalf(会议代表:评估大型语言模型在代为参加会议中的表现) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

16分钟
99+
11个月前

2025.02.07 | 特征流提升模型可解释性,超IF增强指令跟随能力。

HuggingFace 每日AI论文速递

本期的 21 篇论文如下: [00:24] 🔄 Analyze Feature Flow to Enhance Interpretation and Steering in Language Models(分析特征流以增强语言模型的解释与控制) [01:03] 🤖 UltraIF: Advancing Instruction Following from the Wild(超IF:从野外推进指令跟随) [01:40] 🎥 DynVFX: Augmenting Real Videos with Dynamic Content(DynVFX:用动态内容增强真实视频) [02:16] 🌐 Ola: Pushing the Frontiers of Omni-Modal Language Model with Progressive Modality Alignment(Ola:通过渐进式模态对齐推动全模态语言模型的前沿) [02:51] 🏃 MotionLab: Unified Human Motion Generation and Editing via the Motion-Condition-Motion Paradigm(MotionLab:基于运动-条件-运动范式的统一人体运动生成与编辑) [03:31] 🤖 Great Models Think Alike and this Undermines AI Oversight(伟大的模型思维相似,这削弱了AI监督) [04:07] 📚 MAGA: MAssive Genre-Audience Reformulation to Pretraining Corpus Expansion(MAGA:大规模体裁-受众重构以扩展预训练语料库) [04:47] 🏆 Gold-medalist Performance in Solving Olympiad Geometry with AlphaGeometry2(在解决奥林匹克几何问题中实现金牌选手水平的AlphaGeometry2) [05:25] 🤖 ScoreFlow: Mastering LLM Agent Workflows via Score-based Preference Optimization(ScoreFlow:基于评分偏好优化的LLM代理工作流掌握) [06:07] 🎙 Llasa: Scaling Train-Time and Inference-Time Compute for Llama-based Speech Synthesis(Llasa:扩展基于Llama的语音合成中的训练和推理计算) [06:51] 🎥 MotionCanvas: Cinematic Shot Design with Controllable Image-to-Video Generation(MotionCanvas:基于可控图像到视频生成的电影镜头设计) [07:38] 📊 ChartCitor: Multi-Agent Framework for Fine-Grained Chart Visual Attribution(ChartCitor:细粒度图表视觉归属的多代理框架) [08:18] 🧠 BOLT: Bootstrap Long Chain-of-Thought in Language Models without Distillation(BOLT:无需蒸馏的大语言模型长链思维自举) [09:01] 🔄 Beyond Prompt Content: Enhancing LLM Performance via Content-Format Integrated Prompt Optimization(超越提示内容:通过内容-格式集成提示优化提升大语言模型性能) [09:45] 🌀 Weak-to-Strong Diffusion with Reflection(从弱到强扩散与反射) [10:26] 🤖 PlotGen: Multi-Agent LLM-based Scientific Data Visualization via Multimodal Feedback(PlotGen:基于多智能体LLM的科学数据可视化通过多模态反馈) [11:04] 🔧 Enhancing Code Generation for Low-Resource Languages: No Silver Bullet(提升低资源语言的代码生成:没有银弹) [11:48] 🔓 Speak Easy: Eliciting Harmful Jailbreaks from LLMs with Simple Interactions(轻松对话:通过简单互动从LLM中引出有害越狱行为) [12:22] 🤖 PILAF: Optimal Human Preference Sampling for Reward Modeling(PILAF:最优人类偏好采样用于奖励建模) [13:05] 🎥 Towards Physical Understanding in Video Generation: A 3D Point Regularization Approach(面向视频生成的物理理解:一种3D点正则化方法) [13:47] 🤖 Learning Real-World Action-Video Dynamics with Heterogeneous Masked Autoregression(基于异质掩码自回归的现实世界动作视频动态学习) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

14分钟
99+
11个月前

2025.02.06 | 数据优化提升模型性能,模拟市场再现复杂行为。

HuggingFace 每日AI论文速递

本期的 10 篇论文如下: [00:26] 🤖 SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model(SmolLM2:当小型模型走向大型化——以数据为中心的小型语言模型训练) [01:08] 🌐 TwinMarket: A Scalable Behavioral and Social Simulation for Financial Markets(双市场:一种可扩展的金融市场的行为与社会模拟) [01:45] 🧠 Demystifying Long Chain-of-Thought Reasoning in LLMs(揭秘大语言模型中的长链推理) [02:23] 🧠 LIMO: Less is More for Reasoning(LIMO:少即是多的推理) [03:15] 🧠 Boosting Multimodal Reasoning with MCTS-Automated Structured Thinking(通过蒙特卡洛树搜索提升多模态推理的自动化结构化思考) [04:04] 🧠 A Probabilistic Inference Approach to Inference-Time Scaling of LLMs using Particle-Based Monte Carlo Methods(基于粒子蒙特卡罗方法的概率推理在大语言模型推理时缩放中的应用) [04:47] 🔓 Jailbreaking with Universal Multi-Prompts(基于通用多提示的越狱技术) [05:25] 🎨 LayerTracer: Cognitive-Aligned Layered SVG Synthesis via Diffusion Transformer(LayerTracer:基于扩散变换器的认知对齐分层SVG合成) [06:27] 🧠 Token Assorted: Mixing Latent and Text Tokens for Improved Language Model Reasoning(令牌混合:通过混合潜在与文本令牌提升语言模型推理能力) [07:09] 🧠 On Teacher Hacking in Language Model Distillation(语言模型蒸馏中的教师模型攻击现象研究) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

8分钟
99+
11个月前

2025.02.05 | 逆桥匹配蒸馏提速,视频JAM提升运动连贯。

HuggingFace 每日AI论文速递

本期的 9 篇论文如下: [00:25] ⚡ Inverse Bridge Matching Distillation(逆桥匹配蒸馏) [01:02] 🎥 VideoJAM: Joint Appearance-Motion Representations for Enhanced Motion Generation in Video Models(视频JAM:增强视频模型运动生成的联合外观-运动表示) [01:44] 🤖 ACECODER: Acing Coder RL via Automated Test-Case Synthesis(ACECODER:通过自动化测试用例合成提升编码模型) [02:25] 🧠 QLASS: Boosting Language Agent Inference via Q-Guided Stepwise Search(QLASS:通过Q引导的逐步搜索提升语言代理推理) [03:09] 📉 Can LLMs Maintain Fundamental Abilities under KV Cache Compression?(LLM在KV缓存压缩下的基本能力保持情况) [03:56] 🧠 Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search(Satori:通过链式动作思维增强LLM推理的自回归搜索) [04:46] 🖼 Generating Multi-Image Synthetic Data for Text-to-Image Customization(生成多图像合成数据用于文本到图像定制) [05:31] 🤔 Rethinking Mixture-of-Agents: Is Mixing Different Large Language Models Beneficial?(重新思考混合代理:混合不同大型语言模型是否有益?) [06:13] 🎯 Concept Steerers: Leveraging K-Sparse Autoencoders for Controllable Generations(概念引导器:利用K稀疏自编码器实现可控生成) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

7分钟
99+
11个月前

2025.02.04 | DAAs性能提升,OmniHuman动画优化。

HuggingFace 每日AI论文速递

本期的 20 篇论文如下: [00:26] 🤔 The Differences Between Direct Alignment Algorithms are a Blur(直接对齐算法的差异逐渐模糊) [01:07] 🤖 OmniHuman-1: Rethinking the Scaling-Up of One-Stage Conditioned Human Animation Models(OmniHuman-1:重新思考单阶段条件式人体动画模型的放大) [01:48] 💡 Process Reinforcement through Implicit Rewards(基于隐式奖励的过程强化) [02:36] ⚖ Preference Leakage: A Contamination Problem in LLM-as-a-judge(偏好泄露:LLM即评判器中的污染问题) [03:14] 🛡 SafeRAG: Benchmarking Security in Retrieval-Augmented Generation of Large Language Model(SafeRAG:评估大语言模型检索增强生成中的安全性) [04:02] 🚀 FastKV: KV Cache Compression for Fast Long-Context Processing with Token-Selective Propagation(FastKV:通过令牌选择性传播实现快速长文本处理的KV缓存压缩) [04:50] 🌍 AIN: The Arabic INclusive Large Multimodal Model(AIN:阿拉伯语包容性大型多模态模型) [05:39] 🧠 DeepRAG: Thinking to Retrieval Step by Step for Large Language Models(DeepRAG:面向大型语言模型的逐步思考检索) [06:30] 🤔 MM-IQ: Benchmarking Human-Like Abstraction and Reasoning in Multimodal Models(MM-IQ:多模态模型中类人抽象与推理能力的基准测试) [07:19] 🛡 Almost Surely Safe Alignment of Large Language Models at Inference-Time(大语言模型在推理时近乎完全安全的对齐) [08:04] 🤔 ZebraLogic: On the Scaling Limits of LLMs for Logical Reasoning(ZebraLogic:关于大型语言模型在逻辑推理中的扩展极限) [08:49] 🤔 The Jumping Reasoning Curve? Tracking the Evolution of Reasoning Performance in GPT-[n] and o-[n] Models on Multimodal Puzzles(跳跃的推理曲线?追踪GPT-[n]和o-[n]模型在多模态谜题上的推理性能演变) [09:38] 🎮 Improving Transformer World Models for Data-Efficient RL(改进Transformer世界模型以实现数据高效的强化学习) [10:22] 💡 Improved Training Technique for Latent Consistency Models(改进的潜在一致性模型训练技术) [11:07] 🧠 Scaling Embedding Layers in Language Models(语言模型中扩展嵌入层) [11:42] 🎨 SliderSpace: Decomposing the Visual Capabilities of Diffusion Models(SliderSpace:解构扩散模型的视觉能力) [12:24] 🤔 PhD Knowledge Not Required: A Reasoning Challenge for Large Language Models(无需博士知识:大型语言模型的推理挑战) [13:08] 🧠 Lifelong Sequential Knowledge Editing without Model Degradation(终身序列知识编辑,且不降低模型性能) [13:46] 🔬 Current Pathology Foundation Models are unrobust to Medical Center Differences(当前病理学基础模型对于医疗中心差异不具有鲁棒性) [14:37] 🫀 A Study on the Performance of U-Net Modifications in Retroperitoneal Tumor Segmentation(U-Net改进模型在腹膜后肿瘤分割中的性能研究) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

15分钟
99+
1年前

2025.02.03 | 测试时缩放提升推理,奖励引导解码减少计算。

HuggingFace 每日AI论文速递

本期的 9 篇论文如下: [00:26] 🧠 s1: Simple test-time scaling(简单的测试时缩放) [01:18] ⚡ Reward-Guided Speculative Decoding for Efficient LLM Reasoning(奖励引导的推测解码方法用于高效LLM推理) [02:00] 🧠 Self-supervised Quantized Representation for Seamlessly Integrating Knowledge Graphs with Large Language Models(自监督量化表示法用于无缝集成知识图谱与大型语言模型) [02:41] 🛡 Constitutional Classifiers: Defending against Universal Jailbreaks across Thousands of Hours of Red Teaming(宪法分类器:在数千小时的红队测试中防御通用越狱攻击) [03:28] 🌍 DINO-WM: World Models on Pre-trained Visual Features enable Zero-shot Planning(DINO-WM:基于预训练视觉特征的世界模型实现零样本规划) [04:13] 🧠 Trading Inference-Time Compute for Adversarial Robustness(推理时间计算对对抗鲁棒性的影响) [04:54] 🧠 INT: Instance-Specific Negative Mining for Task-Generic Promptable Segmentation(任务通用提示分割的实例特定负样本挖掘) [05:30] 📰 Unraveling the Capabilities of Language Models in News Summarization(揭秘语言模型在新闻摘要中的能力) [06:09] 🎥 Fast Encoder-Based 3D from Casual Videos via Point Track Processing(基于快速编码器的从随意视频中进行3D重建的点轨迹处理) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

7分钟
92
1年前

【月末特辑】1月最火AI论文 | DeepSeek-R1强化学习提升LLM推理能力;长文本处理突破

HuggingFace 每日AI论文速递

本期的 10 篇论文如下: [00:40] TOP1(🔥281) | 🧠 DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning(DeepSeek-R1:通过强化学习激励大语言模型的推理能力) [03:13] TOP2(🔥271) | ⚡ MiniMax-01: Scaling Foundation Models with Lightning Attention(MiniMax-01:基于闪电注意力机制扩展基础模型) [05:36] TOP3(🔥249) | 🧠 rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking(rStar-Math:小型语言模型通过自我进化的深度思考掌握数学推理) [08:13] TOP4(🔥103) | 🧠 Evolving Deeper LLM Thinking(演化更深层次的LLM思维) [10:28] TOP5(🔥99) | 📚 2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining(2.5年课堂:用于视觉-语言预训练的多模态教科书) [12:51] TOP6(🔥90) | 🚀 REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models(REINFORCE++:一种简单高效的大语言模型对齐方法) [15:15] TOP7(🔥90) | 🧠 Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Though(迈向LLMs中的系统2推理:学习如何通过元思维链进行思考) [17:14] TOP8(🔥89) | 📊 The Lessons of Developing Process Reward Models in Mathematical Reasoning(数学推理中过程奖励模型开发的经验教训) [19:33] TOP9(🔥88) | 🤔 Agent-R: Training Language Model Agents to Reflect via Iterative Self-Training(Agent-R:通过迭代自训练使语言模型代理具备反思能力) [21:35] TOP10(🔥87) | 🧠 The GAN is dead; long live the GAN! A Modern GAN Baseline(GAN已死;GAN万岁!一个现代的GAN基线) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

24分钟
99+
1年前
EarsOnMe

加入我们的 Discord

与播客爱好者一起交流

立即加入

扫描微信二维码

添加微信好友,获取更多播客资讯

微信二维码

播放列表

自动播放下一个

播放列表还是空的

去找些喜欢的节目添加进来吧