2024.11.22 每日AI论文 | 混合偏好优化提升推理,多模态自回归预训练创新。

本期的 14 篇论文如下: [00:26] 🧠 Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization(通过混合偏好优化提升多模态大语言模型的推理能力) [01:12] 🌐 Multimodal Autoregressive Pre-training of Large Vision Encoders(大规模视觉编码器多模态自回归预训练) [01:55] 🧠 Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions(Marco-o1:面向开放式解决方案的开放推理模型) [02:40] 🧠 Hymba: A Hybrid-head Architecture for Small Language Models(Hymba:一种用于小语言模型的混合头架构) [03:22] 🚀 Ultra-Sparse Memory Network(超稀疏内存网络) [03:58] 📚 OpenScholar: Synthesizing Scientific Literature with Retrieval-augmented LMs(开放学者:利用检索增强型语言模型合成科学文献) [04:47] 🧠 Natural Language Reinforcement Learning(自然语言强化学习) [05:26] 🧠 Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models(Insight-V:探索多模态大语言模型的长链视觉推理) [06:08] 🤖 Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models(我了解这个实体吗?语言模型中的知识意识与幻觉) [06:46] 🌊 Stable Flow: Vital Layers for Training-Free Image Editing(稳定流:无需训练的图像编辑关键层) [07:25] 🌐 UnifiedCrawl: Aggregated Common Crawl for Affordable Adaptation of LLMs on Low-Resource Languages(统一爬取:利用Common Crawl为低资源语言的LLM提供经济适用的适应性) [08:03] 🚗 MagicDriveDiT: High-Resolution Long Video Generation for Autonomous Driving with Adaptive Control(MagicDriveDiT:基于自适应控制的高分辨率长视频生成用于自动驾驶) [08:44] 🧠 Patience Is The Key to Large Language Model Reasoning(耐心是大型语言模型推理的关键) [09:18] 🌐 Baking Gaussian Splatting into Diffusion Denoiser for Fast and Scalable Single-stage Image-to-3D Generation(将高斯散射融入扩散去噪器以实现快速且可扩展的单阶段图像到3D生成) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

10分钟
99+
8个月前

2024.11.21 每日AI论文 | 4比特注意力加速显著,视频生成基准全面评估。

本期的 8 篇论文如下: [00:28] ⚡ SageAttention2 Technical Report: Accurate 4 Bit Attention for Plug-and-play Inference Acceleration(SageAttention2技术报告:用于即插即用推理加速的精确4比特注意力机制) [01:10] 📹 VBench++: Comprehensive and Versatile Benchmark Suite for Video Generative Models(VBench++:全面且多功能的视频生成模型基准套件) [01:51] 🎮 VideoAutoArena: An Automated Arena for Evaluating Large Multimodal Models in Video Analysis through User Simulation(视频自动竞技场:通过用户模拟评估大型多模态模型在视频分析中的能力) [02:33] 🎯 SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory(SAMURAI:利用运动感知记忆机制将分割模型适应于零样本视觉跟踪) [03:10] 🌐 Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents(你的LLM是否秘密地成为互联网的世界模型?基于模型的网络代理规划) [03:52] 🔄 When Precision Meets Position: BFloat16 Breaks Down RoPE in Long-Context Training(精度与位置的碰撞:BFloat16在长上下文训练中破坏了RoPE) [04:34] 🎨 Stylecodes: Encoding Stylistic Information For Image Generation(风格编码:为图像生成编码风格信息) [05:11] 🩺 ORID: Organ-Regional Information Driven Framework for Radiology Report Generation(器官-区域信息驱动的放射报告生成框架) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

6分钟
78
8个月前

2024.11.19 每日AI论文 | 移动设备高效部署,具身AI虚拟探索

本期的 16 篇论文如下: [00:25] 📱 BlueLM-V-3B: Algorithm and System Co-Design for Multimodal Large Language Models on Mobile Devices(BlueLM-V-3B:移动设备上多模态大语言模型的算法与系统协同设计) [01:06] 🌍 Generative World Explorer(生成世界探索者) [01:43] 🔍 Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering(搜索、验证与反馈:通过验证器工程实现下一代基础模型的后训练范式) [02:24] 🎥 AnimateAnything: Consistent and Controllable Animation for Video Generation(动画任何事物:视频生成的连贯可控动画) [03:08] 🧠 Top-$nσ$: Not All Logits Are You Need(Top-$nσ$:并非所有对数都需要) [03:55] 🧠 Awaker2.5-VL: Stably Scaling MLLMs with Parameter-Efficient Mixture of Experts(Awaker2.5-VL:通过参数高效混合专家稳定扩展多模态大语言模型) [04:40] ⚡ SmoothCache: A Universal Inference Acceleration Technique for Diffusion Transformers(SmoothCache:一种用于扩散变换器的通用推理加速技术) [05:19] 📚 Drowning in Documents: Consequences of Scaling Reranker Inference(文档淹没:扩展重排序器推理的后果) [06:00] 🩺 Comprehensive and Practical Evaluation of Retrieval-Augmented Generation Systems for Medical Question Answering(医疗问答系统中检索增强生成系统的综合与实用评估) [06:37] 📱 SlimLM: An Efficient Small Language Model for On-Device Document Assistance(SlimLM:一种用于设备端文档辅助的高效小型语言模型) [07:19] 🎥 VeGaS: Video Gaussian Splatting(视频高斯喷射) [07:50] 🔄 Adaptive Decoding via Latent Preference Optimization(通过潜在偏好优化的自适应解码) [08:27] 🎥 StableV2V: Stablizing Shape Consistency in Video-to-Video Editing(稳定视频编辑:在视频到视频编辑中保持形状一致性) [09:11] 🇩 LLäMmlein: Compact and Competitive German-Only Language Models from Scratch(LLäMmlein:从头开始构建紧凑且有竞争力的德语专用语言模型) [09:43] 👕 FitDiT: Advancing the Authentic Garment Details for High-fidelity Virtual Try-on(FitDiT:提升高保真虚拟试穿的真实服装细节) [10:18] 📜 Evaluating the role of `Constitutions' for learning from AI feedback(评估‘宪法’在从AI反馈中学习的作用) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

11分钟
99+
8个月前

2024.11.12 每日AI论文 | 对象无缝插入,通用编辑模型提升精度

本期的 14 篇论文如下: [00:23] 🖼 Add-it: Training-Free Object Insertion in Images With Pretrained Diffusion Models(Add-it:基于预训练扩散模型的图像中无训练对象插入) [01:05] 🎨 OmniEdit: Building Image Editing Generalist Models Through Specialist Supervision(全能编辑器:通过专家监督构建图像编辑通用模型) [01:49] 📚 Chinese SimpleQA: A Chinese Factuality Evaluation for Large Language Models(中文简单问答:大语言模型的中文事实性评估) [02:27] 📚 M-Longdoc: A Benchmark For Multimodal Super-Long Document Understanding And A Retrieval-Aware Tuning Framework(M-Longdoc:多模态超长文档理解和检索感知调优框架的基准) [03:04] 🖼 Edify Image: High-Quality Image Generation with Pixel Space Laplacian Diffusion Models(启迪图像:基于像素空间拉普拉斯扩散模型的高质量图像生成) [03:42] 🧠 IOPO: Empowering LLMs with Complex Instruction Following via Input-Output Preference Optimization(IOPO:通过输入输出偏好优化增强LLMs复杂指令跟随能力) [04:33] 🦎 GitChameleon: Unmasking the Version-Switching Capabilities of Code Generation Models(GitChameleon:揭秘代码生成模型的版本切换能力) [05:11] 🌐 Watermark Anything with Localized Messages(基于局部信息的水印技术) [05:50] 🧠 Counterfactual Generation from Language Models(语言模型中的反事实生成) [06:22] 🤖 KMM: Key Frame Mask Mamba for Extended Motion Generation(KMM:扩展运动生成的关键帧掩码Mamba) [06:56] 🎲 Game-theoretic LLM: Agent Workflow for Negotiation Games(博弈论LLM:谈判游戏中的代理工作流程) [07:35] 📊 Golden Touchstone: A Comprehensive Bilingual Benchmark for Evaluating Financial Large Language Models(金标准:评估金融大语言模型的综合双语基准) [08:15] 🧠 NeKo: Toward Post Recognition Generative Correction Large Language Models with Task-Oriented Experts(NeKo:面向任务导向专家的生成校正大型语言模型) [08:54] 🧠 Ablation is Not Enough to Emulate DPO: How Neuron Dynamics Drive Toxicity Reduction(消融不足以模拟DPO:神经元动力学如何驱动毒性降低) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

10分钟
95
8个月前
EarsOnMe

加入我们的 Discord

与播客爱好者一起交流

立即加入

扫描微信二维码

添加微信好友,获取更多播客资讯

微信二维码

播放列表

自动播放下一个

播放列表还是空的

去找些喜欢的节目添加进来吧