https://babi.com/
slot gacor
本期的 5 篇论文如下: [00:35] TOP1(🔥258) | ⚡ MiniMax-01: Scaling Foundation Models with Lightning Attention(MiniMax-01:基于闪电注意力机制扩展基础模型) [02:52] TOP2(🔥77) | 📊 The Lessons of Developing Process Reward Models in Mathematical Reasoning(数学推理中过程奖励模型开发的经验教训) [05:06] TOP3(🔥66) | 🧠 Tensor Product Attention Is All You Need(张量积注意力机制是关键) [06:49] TOP4(🔥64) | 🧠 Enabling Scalable Oversight via Self-Evolving Critic(通过自进化批评实现可扩展监督) [08:58] TOP5(🔥61) | 🎥 VideoRAG: Retrieval-Augmented Generation over Video Corpus(VideoRAG:基于视频语料库的检索增强生成) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
本期的 12 篇论文如下: [00:26] 🧠 OmniThink: Expanding Knowledge Boundaries in Machine Writing through Thinking(OmniThink:通过思考扩展机器写作的知识边界) [01:06] 🔍 Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Steps(扩散模型推理时扩展:超越去噪步骤的扩展) [01:37] 🩺 Exploring the Inquiry-Diagnosis Relationship with Advanced Patient Simulators(探索高级患者模拟器中的问诊与诊断关系) [02:09] 🎨 SynthLight: Portrait Relighting with Diffusion Model by Learning to Re-render Synthetic Faces(SynthLight:基于扩散模型的人像重光照技术——通过重新渲染合成人脸学习) [02:48] 🤖 FAST: Efficient Action Tokenization for Vision-Language-Action Models(FAST:视觉-语言-动作模型的高效动作标记化方法) [03:23] 🔍 Learnings from Scaling Visual Tokenizers for Reconstruction and Generation(从视觉分词器的扩展中学习重建与生成) [04:01] 🧠 Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models(迈向大型推理模型:基于大语言模型的强化推理研究综述) [04:35] 🧹 The Heap: A Contamination-Free Multilingual Code Dataset for Evaluating Large Language Models(堆:一个无污染的多语言代码数据集用于评估大型语言模型) [05:15] 🤖 RLHS: Mitigating Misalignment in RLHF with Hindsight Simulation(RLHS:通过事后模拟缓解RLHF中的错位问题) [05:54] 🎨 AnyStory: Towards Unified Single and Multiple Subject Personalization in Text-to-Image Generation(AnyStory:面向统一单主体与多主体个性化的文本到图像生成) [06:36] 🎨 CaPa: Carve-n-Paint Synthesis for Efficient 4K Textured Mesh Generation(CaPa:用于高效4K纹理网格生成的雕刻与绘制合成框架) [07:18] 🎥 Do generative video models learn physical principles from watching videos?(生成视频模型是否通过观看视频学习物理原理?) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
本期的 9 篇论文如下: [00:25] 📊 MMDocIR: Benchmarking Multi-Modal Retrieval for Long Documents(MMDocIR:长文档多模态检索的基准测试) [01:06] 🏙 CityDreamer4D: Compositional Generative Model of Unbounded 4D Cities(CityDreamer4D:无界4D城市的组合生成模型) [01:49] 🎥 RepVideo: Rethinking Cross-Layer Representation for Video Generation(RepVideo:重新思考视频生成中的跨层表示) [02:30] 📚 Towards Best Practices for Open Datasets for LLM Training(面向LLM训练的最佳开放数据集实践) [03:11] 🎵 XMusic: Towards a Generalized and Controllable Symbolic Music Generation Framework(XMusic:迈向通用且可控的符号音乐生成框架) [03:46] 🔒 Trusted Machine Learning Models Unlock Private Inference for Problems Currently Infeasible with Cryptography(可信机器学习模型解锁当前密码学无法解决的隐私推理问题) [04:23] 🔍 Parameter-Inverted Image Pyramid Networks for Visual Perception and Multimodal Understanding(参数倒置图像金字塔网络用于视觉感知与多模态理解) [05:03] 🎨 Multimodal LLMs Can Reason about Aesthetics in Zero-Shot(多模态大语言模型在零样本条件下对美学的推理能力) [05:39] 🎥 Ouroboros-Diffusion: Exploring Consistent Content Generation in Tuning-free Long Video Diffusion(Ouroboros-Diffusion:探索无调优长视频扩散中的一致内容生成) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
本期的 15 篇论文如下: [00:23] ⚡ MiniMax-01: Scaling Foundation Models with Lightning Attention(MiniMax-01:基于闪电注意力机制扩展基础模型) [01:04] 🖼 Padding Tone: A Mechanistic Analysis of Padding Tokens in T2I Models(填充符:T2I模型中填充符的机制分析) [01:44] 🎨 MangaNinja: Line Art Colorization with Precise Reference Following(MangaNinja:基于精确参考跟随的线稿上色) [02:21] 🧬 A Multi-Modal AI Copilot for Single-Cell Analysis with Instruction Following(基于指令跟随的多模态AI副驾驶用于单细胞分析) [02:57] 🎥 Diffusion Adversarial Post-Training for One-Step Video Generation(扩散对抗后训练用于一步视频生成) [03:35] 🎲 PokerBench: Training Large Language Models to become Professional Poker Players(PokerBench:训练大型语言模型成为专业扑克玩家) [04:11] 🎨 FramePainter: Endowing Interactive Image Editing with Video Diffusion Priors(FramePainter:赋予交互式图像编辑视频扩散先验) [04:52] 🎨 Democratizing Text-to-Image Masked Generative Models with Compact Text-Aware One-Dimensional Tokens(使用紧凑的文本感知一维标记实现文本到图像掩码生成模型的民主化) [05:30] 🔍 Omni-RGPT: Unifying Image and Video Region-level Understanding via Token Marks(Omni-RGPT:通过标记统一图像和视频的区域级理解) [06:07] 🔍 Enhancing Automated Interpretability with Output-Centric Feature Descriptions(通过输出中心特征描述增强自动可解释性) [06:49] 📚 OpenCSG Chinese Corpus: A Series of High-quality Chinese Datasets for LLM Training(OpenCSG中文语料库:一系列用于大语言模型训练的高质量中文数据集) [07:27] 📹 Tarsier2: Advancing Large Vision-Language Models from Detailed Video Description to Comprehensive Video Understanding(Tarsier2:从详细视频描述到全面视频理解的大型视觉语言模型进阶) [08:04] 🤔 HALoGEN: Fantastic LLM Hallucinations and Where to Find Them(HALoGEN:大型语言模型的幻觉及其发现之处) [08:43] 🤖 Potential and Perils of Large Language Models as Judges of Unstructured Textual Data(大型语言模型作为非结构化文本数据评判者的潜力与风险) [09:23] 🚫 AfriHate: A Multilingual Collection of Hate Speech and Abusive Language Datasets for African Languages(AfriHate:非洲语言中仇恨言论和侮辱性语言的多语言数据集集合) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
本期的 11 篇论文如下: [00:24] 📊 The Lessons of Developing Process Reward Models in Mathematical Reasoning(数学推理中过程奖励模型开发的经验教训) [01:10] 🧠 Tensor Product Attention Is All You Need(张量积注意力机制是关键) [01:53] 🤖 $\text{Transformer}^2$: Self-adaptive LLMs(Transformer²:自适应大型语言模型) [02:34] 🎥 VideoAuteur: Towards Long Narrative Video Generation(视频导演:面向长篇叙事视频生成) [03:22] 🌐 WebWalker: Benchmarking LLMs in Web Traversal(WebWalker:在网页遍历中评估大语言模型) [04:08] 🩺 O1 Replication Journey -- Part 3: Inference-time Scaling for Medical Reasoning(O1复现之旅 -- 第三部分:医疗推理的推理时间扩展) [04:50] 🗣 MinMo: A Multimodal Large Language Model for Seamless Voice Interaction(MinMo:一种用于无缝语音交互的多模态大型语言模型) [05:41] 🔧 SPAM: Spike-Aware Adam with Momentum Reset for Stable LLM Training(SPAM:带动量重置的尖峰感知Adam优化器用于稳定LLM训练) [06:25] 🩺 BIOMEDICA: An Open Biomedical Image-Caption Archive, Dataset, and Vision-Language Models Derived from Scientific Literature(BIOMEDICA:一个开放的生物医学图像-文本档案、数据集及从科学文献中衍生出的视觉语言模型) [07:15] 🧪 ChemAgent: Self-updating Library in Large Language Models Improves Chemical Reasoning(ChemAgent:大型语言模型中自更新库提升化学推理能力) [07:51] 🌐 UnCommon Objects in 3D(三维中的不常见物体) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
本期的 10 篇论文如下: [00:24] 🤖 OmniManip: Towards General Robotic Manipulation via Object-Centric Interaction Primitives as Spatial Constraints(OmniManip:通过以对象为中心的交互原语作为空间约束实现通用机器人操作) [01:02] 🎥 VideoRAG: Retrieval-Augmented Generation over Video Corpus(VideoRAG:基于视频语料库的检索增强生成) [01:38] 🎥 OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding?(OVO-Bench:你的视频大语言模型离现实世界在线视频理解还有多远?) [02:26] 🧠 LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs(LlamaV-o1:重新思考大语言模型中的逐步视觉推理) [03:01] 🧠 Enabling Scalable Oversight via Self-Evolving Critic(通过自进化批评实现可扩展监督) [03:34] 🎥 ConceptMaster: Multi-Concept Video Customization on Diffusion Transformer Models Without Test-Time Tuning(ConceptMaster:无需测试时调优的扩散变换器模型上的多概念视频定制) [04:09] 🎥 Multi-subject Open-set Personalization in Video Generation(多主体开放集个性化视频生成) [04:47] 🔍 ReFocus: Visual Editing as a Chain of Thought for Structured Image Understanding(ReFocus:视觉编辑作为结构化图像理解的思维链) [05:23] 🤖 Multiagent Finetuning: Self Improvement with Diverse Reasoning Chains(多智能体微调:通过多样化推理链实现自我改进) [06:00] 🦠 Infecting Generative AI With Viruses(感染生成式人工智能的病毒) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
本期的 5 篇论文如下: [00:39] TOP1(🔥173) | 🧠 rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking(rStar-Math:小型语言模型通过自我进化的深度思考掌握数学推理) [03:03] TOP2(🔥71) | 🚀 REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models(REINFORCE++:一种简单高效的大语言模型对齐方法) [05:17] TOP3(🔥63) | 🧠 Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Though(迈向LLMs中的系统2推理:学习如何通过元思维链进行思考) [07:35] TOP4(🔥57) | 🔬 Agent Laboratory: Using LLM Agents as Research Assistants(智能体实验室:利用LLM智能体作为研究助手) [09:41] TOP5(🔥52) | 🌍 Cosmos World Foundation Model Platform for Physical AI(物理AI的宇宙世界基础模型平台) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
本期的 7 篇论文如下: [00:23] 🧠 The GAN is dead; long live the GAN! A Modern GAN Baseline(GAN已死;GAN万岁!一个现代的GAN基线) [01:02] 🎥 An Empirical Study of Autoregressive Pre-training from Videos(视频自回归预训练的实证研究) [01:49] 🚗 Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data, and Metric Perspectives(视觉语言模型是否准备好用于自动驾驶?从可靠性、数据和指标角度的实证研究) [02:32] 🔍 On Computational Limits and Provably Efficient Criteria of Visual Autoregressive Models: A Fine-Grained Complexity Analysis(关于视觉自回归模型的计算极限与可证明高效准则:细粒度复杂度分析) [03:14] 🌍 Centurio: On Drivers of Multilingual Ability of Large Vision-Language Model(Centurio:大型视觉语言模型多语言能力的驱动因素研究) [03:50] 📜 Building Foundations for Natural Language Processing of Historical Turkish: Resources and Models(构建历史土耳其语自然语言处理的基础:资源与模型) [04:26] 🔒 Entropy-Guided Attention for Private LLMs(熵引导注意力机制在私有大语言模型中的应用) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
本期的 11 篇论文如下: [00:25] 🧠 rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking(rStar-Math:小型语言模型通过自我进化的深度思考掌握数学推理) [01:06] 🧠 URSA: Understanding and Verifying Chain-of-thought Reasoning in Multimodal Mathematics(URSA:理解与验证多模态数学中的思维链推理) [01:45] 🧠 Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Though(迈向LLMs中的系统2推理:学习如何通过元思维链进行思考) [02:25] 🔬 Agent Laboratory: Using LLM Agents as Research Assistants(智能体实验室:利用LLM智能体作为研究助手) [03:02] 🔬 LLM4SR: A Survey on Large Language Models for Scientific Research(LLM4SR:大语言模型在科学研究中的应用综述) [03:44] 🔍 GeAR: Generation Augmented Retrieval(生成增强检索) [04:22] 🤖 InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection(InfiGUIAgent:具备原生推理与反思能力的多模态通用GUI代理) [05:02] 🐦 Chirpy3D: Continuous Part Latents for Creative 3D Bird Generation(Chirpy3D:基于连续部件潜变量的创造性3D鸟类生成) [05:41] 🖼 SPAR3D: Stable Point-Aware Reconstruction of 3D Objects from Single Images(SPAR3D:基于单图像的稳定点感知三维物体重建) [06:17] 🧠 DPO Kernels: A Semantically-Aware, Kernel-Enhanced, and Divergence-Rich Paradigm for Direct Preference Optimization(DPO核:一种语义感知、核增强且富含散度的直接偏好优化范式) [06:55] 🌳 EpiCoder: Encompassing Diversity and Complexity in Code Generation(EpiCoder:在代码生成中涵盖多样性与复杂性) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
本期的 11 篇论文如下: [00:24] 🚀 REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models(REINFORCE++:一种简单高效的大语言模型对齐方法) [01:00] 🎥 MotionBench: Benchmarking and Improving Fine-grained Video Motion Understanding for Vision Language Models(MotionBench:用于评估和改进视觉语言模型细粒度视频运动理解的基准) [01:40] 🔍 Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos(Sa2VA:将SAM2与LLaVA结合以实现图像和视频的密集基础理解) [02:21] 🌍 Cosmos World Foundation Model Platform for Physical AI(物理AI的宇宙世界基础模型平台) [03:01] 🔍 LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token(LLaVA-Mini:使用单一视觉标记的高效图像与视频大型多模态模型) [03:40] 🎥 Diffusion as Shader: 3D-aware Video Diffusion for Versatile Video Generation Control(扩散作为着色器:支持多样化视频生成控制的3D感知视频扩散) [04:22] 🎥 MoDec-GS: Global-to-Local Motion Decomposition and Temporal Interval Adjustment for Compact Dynamic 3D Gaussian Splatting(MoDec-GS:全局到局部运动分解与时间间隔调整用于紧凑动态3D高斯泼溅) [05:05] 📊 PPTAgent: Generating and Evaluating Presentations Beyond Text-to-Slides(PPTAgent:超越文本到幻灯片的演示文稿生成与评估) [05:42] 🎭 MagicFace: High-Fidelity Facial Expression Editing with Action-Unit Control(MagicFace:基于动作单元控制的高保真面部表情编辑) [06:17] 🎥 Magic Mirror: ID-Preserved Video Generation in Video Diffusion Transformers(魔镜:基于视频扩散变换器的身份保持视频生成) [06:52] 🐬 Dolphin: Closed-loop Open-ended Auto-research through Thinking, Practice, and Feedback(海豚:通过思考、实践和反馈实现闭环开放式自动研究) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
本期的 16 篇论文如下: [00:24] 🎥 STAR: Spatial-Temporal Augmentation with Text-to-Video Models for Real-World Video Super-Resolution(STAR:基于文本到视频模型的空间-时间增强用于现实世界视频超分辨率) [01:06] 🧮 BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoning(BoostStep:通过改进单步推理提升大语言模型的数学能力) [01:44] 🤖 Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction(Dispider:通过解耦感知、决策和反应实现视频大语言模型的主动实时交互) [02:19] 🧠 Personalized Graph-Based Retrieval for Large Language Models(基于个性化图检索的大语言模型增强生成) [02:54] 🧠 Test-time Computing: from System-1 Thinking to System-2 Thinking(测试时计算:从系统1思维到系统2思维) [03:34] 🦠 METAGENE-1: Metagenomic Foundation Model for Pandemic Monitoring(METAGENE-1:用于疫情监测的宏基因组基础模型) [04:13] 🎥 GS-DiT: Advancing Video Generation with Pseudo 4D Gaussian Fields through Efficient Dense 3D Point Tracking(GS-DiT:通过高效密集3D点跟踪推进伪4D高斯场视频生成) [04:48] 🎥 Through-The-Mask: Mask-based Motion Trajectories for Image-to-Video Generation(通过掩码:基于掩码的运动轨迹用于图像到视频生成) [05:27] 🎥 TransPixar: Advancing Text-to-Video Generation with Transparency(TransPixar:利用透明度推进文本到视频生成) [06:06] 🎥 Ingredients: Blending Custom Photos with Video Diffusion Transformers(成分:将定制照片与视频扩散变换器融合) [06:45] 🔍 DepthMaster: Taming Diffusion Models for Monocular Depth Estimation(DepthMaster:驯服扩散模型用于单目深度估计) [07:24] 🛡 Auto-RT: Automatic Jailbreak Strategy Exploration for Red-Teaming Large Language Models(Auto-RT:自动红队策略探索用于大型语言模型的越狱) [08:04] 🔍 ToolHop: A Query-Driven Benchmark for Evaluating Large Language Models in Multi-Hop Tool Use(ToolHop:用于评估大语言模型在多跳工具使用中的查询驱动基准) [08:43] 🔍 Scaling Laws for Floating Point Quantization Training(浮点量化训练的缩放定律) [09:19] 🎤 Samba-asr state-of-the-art speech recognition leveraging structured state-space models(Samba-ASR:利用结构化状态空间模型实现最先进的语音识别) [09:59] 🎨 AutoPresent: Designing Structured Visuals from Scratch(AutoPresent:从零开始设计结构化视觉内容) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
本期的 8 篇论文如下: [00:24] 🤖 EnerVerse: Envisioning Embodied Future Space for Robotics Manipulation(EnerVerse:面向机器人操作的具身未来空间构想) [00:58] 🤖 VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction(VITA-1.5:迈向GPT-4o级别的实时视觉与语音交互) [01:33] 🤔 Virgo: A Preliminary Exploration on Reproducing o1-like MLLM(Virgo:关于复现o1类多模态大语言模型的初步探索) [02:11] 🤖 SDPO: Segment-Level Direct Preference Optimization for Social Agents(SDPO:面向社交代理的片段级直接偏好优化) [02:51] 🎨 VisionReward: Fine-Grained Multi-Dimensional Human Preference Learning for Image and Video Generation(VisionReward:基于细粒度多维人类偏好的图像与视频生成学习) [03:31] 🧬 Graph Generative Pre-trained Transformer(图生成预训练变换器) [04:04] 🌍 LUSIFER: Language Universal Space Integration for Enhanced Multilingual Embeddings with Large Language Models(LUSIFER:基于大语言模型的语言通用空间集成增强多语言嵌入) [04:44] 🔬 BoxingGym: Benchmarking Progress in Automated Experimental Design and Model Discovery(BoxingGym:自动化实验设计与模型发现进展的基准测试) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
与播客爱好者一起交流
添加微信好友,获取更多播客资讯
播放列表还是空的
去找些喜欢的节目添加进来吧