本期的 8 篇论文如下: [00:28] ⚡ SageAttention2 Technical Report: Accurate 4 Bit Attention for Plug-and-play Inference Acceleration(SageAttention2技术报告:用于即插即用推理加速的精确4比特注意力机制) [01:10] 📹 VBench++: Comprehensive and Versatile Benchmark Suite for Video Generative Models(VBench++:全面且多功能的视频生成模型基准套件) [01:51] 🎮 VideoAutoArena: An Automated Arena for Evaluating Large Multimodal Models in Video Analysis through User Simulation(视频自动竞技场:通过用户模拟评估大型多模态模型在视频分析中的能力) [02:33] 🎯 SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory(SAMURAI:利用运动感知记忆机制将分割模型适应于零样本视觉跟踪) [03:10] 🌐 Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents(你的LLM是否秘密地成为互联网的世界模型?基于模型的网络代理规划) [03:52] 🔄 When Precision Meets Position: BFloat16 Breaks Down RoPE in Long-Context Training(精度与位置的碰撞:BFloat16在长上下文训练中破坏了RoPE) [04:34] 🎨 Stylecodes: Encoding Stylistic Information For Image Generation(风格编码:为图像生成编码风格信息) [05:11] 🩺 ORID: Organ-Regional Information Driven Framework for Radiology Report Generation(器官-区域信息驱动的放射报告生成框架) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
本期的 7 篇论文如下: [00:33] ⚡ Continuous Speculative Decoding for Autoregressive Image Generation(自回归图像生成的连续推测解码) [01:14] 📚 RedPajama: an Open Dataset for Training Large Language Models(红睡衣:用于训练大型语言模型的开放数据集) [01:58] 🤖 Soft Robotic Dynamic In-Hand Pen Spinning(软体机器人动态手内笔旋转) [02:39] 🚀 ITACLIP: Boosting Training-Free Semantic Segmentation with Image, Text, and Architectural Enhancements(ITACLIP:通过图像、文本和架构增强提升无训练语义分割) [03:13] 🔒 Building Trust: Foundations of Security, Safety and Transparency in AI(构建信任:人工智能中的安全、安全和透明度基础) [03:46] 🔍 SEAGULL: No-reference Image Quality Assessment for Regions of Interest via Vision-Language Instruction Tuning(SEAGULL:通过视觉语言指令调优的无参考图像质量评估方法) [04:24] 📊 Evaluating Tokenizer Performance of Large Language Models Across Official Indian Languages(评估大型语言模型在印度官方语言中的分词器性能) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
本期的 16 篇论文如下: [00:25] 📱 BlueLM-V-3B: Algorithm and System Co-Design for Multimodal Large Language Models on Mobile Devices(BlueLM-V-3B:移动设备上多模态大语言模型的算法与系统协同设计) [01:06] 🌍 Generative World Explorer(生成世界探索者) [01:43] 🔍 Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering(搜索、验证与反馈:通过验证器工程实现下一代基础模型的后训练范式) [02:24] 🎥 AnimateAnything: Consistent and Controllable Animation for Video Generation(动画任何事物:视频生成的连贯可控动画) [03:08] 🧠 Top-$nσ$: Not All Logits Are You Need(Top-$nσ$:并非所有对数都需要) [03:55] 🧠 Awaker2.5-VL: Stably Scaling MLLMs with Parameter-Efficient Mixture of Experts(Awaker2.5-VL:通过参数高效混合专家稳定扩展多模态大语言模型) [04:40] ⚡ SmoothCache: A Universal Inference Acceleration Technique for Diffusion Transformers(SmoothCache:一种用于扩散变换器的通用推理加速技术) [05:19] 📚 Drowning in Documents: Consequences of Scaling Reranker Inference(文档淹没:扩展重排序器推理的后果) [06:00] 🩺 Comprehensive and Practical Evaluation of Retrieval-Augmented Generation Systems for Medical Question Answering(医疗问答系统中检索增强生成系统的综合与实用评估) [06:37] 📱 SlimLM: An Efficient Small Language Model for On-Device Document Assistance(SlimLM:一种用于设备端文档辅助的高效小型语言模型) [07:19] 🎥 VeGaS: Video Gaussian Splatting(视频高斯喷射) [07:50] 🔄 Adaptive Decoding via Latent Preference Optimization(通过潜在偏好优化的自适应解码) [08:27] 🎥 StableV2V: Stablizing Shape Consistency in Video-to-Video Editing(稳定视频编辑:在视频到视频编辑中保持形状一致性) [09:11] 🇩 LLäMmlein: Compact and Competitive German-Only Language Models from Scratch(LLäMmlein:从头开始构建紧凑且有竞争力的德语专用语言模型) [09:43] 👕 FitDiT: Advancing the Authentic Garment Details for High-fidelity Virtual Try-on(FitDiT:提升高保真虚拟试穿的真实服装细节) [10:18] 📜 Evaluating the role of `Constitutions' for learning from AI feedback(评估‘宪法’在从AI反馈中学习的作用) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
本期的 6 篇论文如下: [00:28] 🧠 LLaVA-o1: Let Vision Language Models Reason Step-by-Step(LLaVA-o1:让视觉语言模型逐步推理) [01:14] 🎨 Region-Aware Text-to-Image Generation via Hard Binding and Soft Refinement(区域感知文本到图像生成:硬绑定与软优化) [01:51] 🌐 GaussianAnything: Interactive Point Cloud Latent Diffusion for 3D Generation(高斯任意:交互式点云潜在扩散用于3D生成) [02:25] 🌅 The Dawn of GUI Agent: A Preliminary Case Study with Claude 3.5 Computer Use(GUI代理的黎明:基于Claude 3.5计算机使用的初步案例研究) [03:00] 📖 Number it: Temporal Grounding Videos like Flipping Manga(像翻阅漫画一样进行视频时间定位) [03:45] 🌍 Xmodel-1.5: An 1B-scale Multilingual LLM(Xmodel-1.5:一个10亿参数的多语言大型语言模型) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
本期的 5 篇论文如下: [00:44] TOP1(🔥54) | 🖼 Add-it: Training-Free Object Insertion in Images With Pretrained Diffusion Models(Add-it:基于预训练扩散模型的图像无训练对象插入) [02:31] TOP2(🔥44) | 🤖 Large Language Models Can Self-Improve in Long-context Reasoning(大型语言模型在长上下文推理中的自我改进) [04:15] TOP3(🔥43) | 🌐 LLaMA-Mesh: Unifying 3D Mesh Generation with Language Models(LLaMA-Mesh:将3D网格生成与语言模型统一) [06:12] TOP4(🔥42) | 🎨 OmniEdit: Building Image Editing Generalist Models Through Specialist Supervision(全能编辑器:通过专家监督构建图像编辑通用模型) [08:01] TOP5(🔥42) | 📚 M-Longdoc: A Benchmark For Multimodal Super-Long Document Understanding And A Retrieval-Aware Tuning Framework(M-Longdoc:多模态超长文档理解和检索感知调优框架的基准) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
本期的 7 篇论文如下: [00:27] ✨ MagicQuill: An Intelligent Interactive Image Editing System(魔法羽毛笔:智能交互式图像编辑系统) [01:15] 🌐 LLaMA-Mesh: Unifying 3D Mesh Generation with Language Models(LLaMA-Mesh:将3D网格生成与语言模型统一) [01:50] 💾 Cut Your Losses in Large-Vocabulary Language Models(在大词汇量语言模型中减少损失) [02:22] 🏥 ClinicalBench: Can LLMs Beat Traditional ML Models in Clinical Prediction?(临床基准:LLMs能否在临床预测中超越传统ML模型?) [03:02] 🤖 Hermes: A Large Language Model Framework on the Journey to Autonomous Networks(赫尔墨斯:迈向自主网络的大型语言模型框架) [03:36] 🎥 Sharingan: Extract User Action Sequence from Desktop Recordings(分享眼:从桌面录制中提取用户操作序列) [04:21] 🤔 Inconsistencies In Consistency Models: Better ODE Solving Does Not Imply Better Samples(一致性模型中的不一致性:更好的ODE求解并不意味着更好的样本) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
本期的 7 篇论文如下: [00:26] 🤖 Large Language Models Can Self-Improve in Long-context Reasoning(大型语言模型在长上下文推理中的自我改进) [01:09] 🎥 EgoVid-5M: A Large-Scale Video-Action Dataset for Egocentric Video Generation(EgoVid-5M:用于第一人称视频生成的大规模视频动作数据集) [01:58] 🔍 Direct Preference Optimization Using Sparse Feature-Level Constraints(利用稀疏特征级约束进行直接偏好优化) [02:37] 🇫 CamemBERT 2.0: A Smarter French Language Model Aged to Perfection(CamemBERT 2.0:更智能的法语语言模型,完美成熟) [03:18] 🧠 Can sparse autoencoders be used to decompose and interpret steering vectors?(稀疏自编码器能否用于分解和解释转向向量?) [03:58] 🎵 PerceiverS: A Multi-Scale Perceiver with Effective Segmentation for Long-Term Expressive Symbolic Music Generation(PerceiverS:一种具有有效分割的多尺度感知器,用于长期表达性符号音乐生成) [04:39] 🎥 Motion Control for Enhanced Complex Action Video Generation(增强复杂动作视频生成的运动控制) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
本期的 6 篇论文如下: [00:28] 🔍 SAMPart3D: Segment Any Part in 3D Objects(SAMPart3D:三维物体任意部分分割) [01:06] 🌐 JanusFlow: Harmonizing Autoregression and Rectified Flow for Unified Multimodal Understanding and Generation(JanusFlow:统一自回归与校正流的多模态理解与生成) [01:42] 🤔 Stronger Models are NOT Stronger Teachers for Instruction Tuning(更强的模型并非更强的指令调优教师) [02:21] 🌐 Wavelet Latent Diffusion (Wala): Billion-Parameter 3D Generative Model with Compact Wavelet Encodings(小波潜在扩散(WaLa):具有紧凑小波编码的十亿参数3D生成模型) [03:02] 📚 BLIP3-KALE: Knowledge Augmented Large-Scale Dense Captions(BLIP3-KALE:知识增强的大规模密集字幕) [03:55] 🔍 Hardware and Software Platform Inference(硬件与软件平台推断) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
本期的 14 篇论文如下: [00:23] 🖼 Add-it: Training-Free Object Insertion in Images With Pretrained Diffusion Models(Add-it:基于预训练扩散模型的图像中无训练对象插入) [01:05] 🎨 OmniEdit: Building Image Editing Generalist Models Through Specialist Supervision(全能编辑器:通过专家监督构建图像编辑通用模型) [01:49] 📚 Chinese SimpleQA: A Chinese Factuality Evaluation for Large Language Models(中文简单问答:大语言模型的中文事实性评估) [02:27] 📚 M-Longdoc: A Benchmark For Multimodal Super-Long Document Understanding And A Retrieval-Aware Tuning Framework(M-Longdoc:多模态超长文档理解和检索感知调优框架的基准) [03:04] 🖼 Edify Image: High-Quality Image Generation with Pixel Space Laplacian Diffusion Models(启迪图像:基于像素空间拉普拉斯扩散模型的高质量图像生成) [03:42] 🧠 IOPO: Empowering LLMs with Complex Instruction Following via Input-Output Preference Optimization(IOPO:通过输入输出偏好优化增强LLMs复杂指令跟随能力) [04:33] 🦎 GitChameleon: Unmasking the Version-Switching Capabilities of Code Generation Models(GitChameleon:揭秘代码生成模型的版本切换能力) [05:11] 🌐 Watermark Anything with Localized Messages(基于局部信息的水印技术) [05:50] 🧠 Counterfactual Generation from Language Models(语言模型中的反事实生成) [06:22] 🤖 KMM: Key Frame Mask Mamba for Extended Motion Generation(KMM:扩展运动生成的关键帧掩码Mamba) [06:56] 🎲 Game-theoretic LLM: Agent Workflow for Negotiation Games(博弈论LLM:谈判游戏中的代理工作流程) [07:35] 📊 Golden Touchstone: A Comprehensive Bilingual Benchmark for Evaluating Financial Large Language Models(金标准:评估金融大语言模型的综合双语基准) [08:15] 🧠 NeKo: Toward Post Recognition Generative Correction Large Language Models with Task-Oriented Experts(NeKo:面向任务导向专家的生成校正大型语言模型) [08:54] 🧠 Ablation is Not Enough to Emulate DPO: How Neuron Dynamics Drive Toxicity Reduction(消融不足以模拟DPO:神经元动力学如何驱动毒性降低) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
本期的 6 篇论文如下: [00:30] ⚖ Balancing Pipeline Parallelism with Vocabulary Parallelism(平衡流水线并行与词汇并行) [01:15] 🎮 StdGEN: Semantic-Decomposed 3D Character Generation from Single Images(StdGEN:从单张图像生成语义分解的3D角色) [01:56] 🔄 DELIFT: Data Efficient Language model Instruction Fine Tuning(DELIFT:数据高效语言模型指令微调) [02:29] 🧪 Parameter-Efficient Fine-Tuning of Large Language Models for Unit Test Generation: An Empirical Study(大型语言模型参数高效微调用于单元测试生成:一项实证研究) [03:06] 🧠 LLM2CLIP: Powerful Language Model Unlock Richer Visual Representation(LLM2CLIP:强大的语言模型解锁更丰富的视觉表示) [03:46] 💻 Improving the detection of technical debt in Java source code with an enriched dataset(通过丰富数据集提升Java源代码中技术债务的检测) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
本期的 5 篇论文如下: [00:38] TOP1(🔥73) | 🔧 OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models(开放编码器:顶级代码大语言模型的开放食谱) [02:40] TOP2(🔥53) | 🎥 ReCapture: Generative Video Camera Controls for User-Provided Videos using Masked Video Fine-Tuning(ReCapture:使用掩码视频微调生成用户提供视频的生成性摄像机控制) [04:22] TOP3(🔥52) | 📄 HtmlRAG: HTML is Better Than Plain Text for Modeling Retrieved Knowledge in RAG Systems(HtmlRAG:在RAG系统中,HTML比纯文本更适合建模检索知识) [06:44] TOP4(🔥47) | ⚡ BitNet a4.8: 4-bit Activations for 1-bit LLMs(BitNet a4.8:1位大语言模型的4位激活) [08:25] TOP5(🔥45) | 🤖 AndroidLab: Training and Systematic Benchmarking of Android Autonomous Agents(AndroidLab:Android自主代理的训练与系统基准测试) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
本期的 14 篇论文如下: [00:25] 🔧 OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models(开放编码器:顶级代码大语言模型的开放食谱) [01:03] 🎥 ReCapture: Generative Video Camera Controls for User-Provided Videos using Masked Video Fine-Tuning(ReCapture:使用掩码视频微调生成用户提供视频的生成性摄像机控制) [01:46] ⚡ BitNet a4.8: 4-bit Activations for 1-bit LLMs(BitNet a4.8:1位大语言模型的4位激活) [02:25] 🎥 DimensionX: Create Any 3D and 4D Scenes from a Single Image with Controllable Video Diffusion(DimensionX:从单张图像生成可控视频扩散的3D和4D场景) [03:04] 🤖 Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models(混合变压器:多模态基础模型的稀疏与可扩展架构) [03:39] 🧠 Thanos: Enhancing Conversational Agents with Skill-of-Mind-Infused Large Language Model(灭霸:通过融入心灵技能增强对话代理的大型语言模型) [04:21] 🎥 TIP-I2V: A Million-Scale Real Text and Image Prompt Dataset for Image-to-Video Generation(TIP-I2V:百万级真实文本与图像提示数据集用于图像到视频生成) [05:05] 🤖 DynaMem: Online Dynamic Spatio-Semantic Memory for Open World Mobile Manipulation(DynaMem:开放世界移动操作的在线动态时空语义记忆) [05:40] 🧵 Needle Threading: Can LLMs Follow Threads through Near-Million-Scale Haystacks?(针穿线:LLMs能否在近百万规模的文本中追踪线索?) [06:22] 👀 GazeGen: Gaze-Driven User Interaction for Visual Content Generation(GazeGen:基于注视驱动的用户交互视觉内容生成) [07:03] 🌐 RetrieveGPT: Merging Prompts and Mathematical Models for Enhanced Code-Mixed Information Retrieval(RetrieveGPT:融合提示与数学模型以增强代码混合信息检索) [07:49] 🎥 SG-I2V: Self-Guided Trajectory Control in Image-to-Video Generation(SG-I2V:图像到视频生成中的自引导轨迹控制) [08:29] 🎥 VideoGLaMM: A Large Multimodal Model for Pixel-Level Visual Grounding in Videos(视频GLaMM:一种用于视频中像素级视觉定位的大型多模态模型) [09:03] ⚡ SVDQunat: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models(SVDQuant:通过低秩成分吸收异常值的4比特扩散模型) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
与播客爱好者一起交流
播放列表还是空的
去找些喜欢的节目添加进来吧