2024.10.11 每日AI论文 | 数学代码提升推理,前缀量化加速模型

本期的 21 篇论文如下: [00:25] 🧮 MathCoder2: Better Math Reasoning from Continued Pretraining on Model-translated Mathematical Code(MathCoder2:通过模型翻译的数学代码进行持续预训练以提升数学推理能力) [01:09] 🚀 PrefixQuant: Static Quantization Beats Dynamic through Prefixed Outliers in LLMs(前缀量化:静态量化通过LLMs中的前缀异常值超越动态量化) [01:59] 🤖 MLLM as Retriever: Interactively Learning Multimodal Retrieval for Embodied Agents(MLLM作为检索器:交互式学习多模态检索以增强具身代理) [02:33] 🎨 DICE: Discrete Inversion Enabling Controllable Editing for Multinomial Diffusion and Masked Generative Models(DICE:离散逆向可控编辑的多项扩散与掩码生成模型) [03:03] 🔄 Benchmarking Agentic Workflow Generation(代理工作流生成基准测试) [03:44] 🤖 Agent S: An Open Agentic Framework that Uses Computers Like a Human(Agent S:一个使用计算机如人类的开放代理框架) [04:23] 🔄 Rectified Diffusion: Straightness Is Not Your Need in Rectified Flow(修正扩散:在修正流中直线性并非必需) [04:55] 🤖 Intriguing Properties of Large Language and Vision Models(大型语言与视觉模型的引人特性) [05:35] 🎥 Progressive Autoregressive Video Diffusion Models(渐进式自回归视频扩散模型) [06:26] 🌲 Towards Self-Improvement of LLMs via MCTS: Leveraging Stepwise Knowledge with Curriculum Preference Learning(基于MCTS的LLMs自我改进:利用逐步知识与课程偏好学习) [07:10] 🌐 Preserving Multi-Modal Capabilities of Pre-trained VLMs for Improving Vision-Linguistic Compositionality(保留预训练视觉语言模型的多模态能力以提升视觉语言组合性) [07:50] 🤖 GLOV: Guided Large Language Models as Implicit Optimizers for Vision Language Models(GLOV:引导大型语言模型作为视觉语言模型的隐式优化器) [08:36] 🧩 SFTMix: Elevating Language Model Instruction Tuning with Mixup Recipe(SFTMix:利用Mixup方法提升语言模型指令微调) [09:15] 🔄 Emergent properties with repeated examples(重复示例的涌现特性) [09:57] 🤖 Optima: Optimizing Effectiveness and Efficiency for LLM-Based Multi-Agent System(优化基于LLM的多智能体系统的有效性与效率) [10:40] 🎲 Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates(欺骗自动LLM基准测试:空模型实现高胜率) [11:14] 🌐 Everything Everywhere All at Once: LLMs can In-Context Learn Multiple Tasks in Superposition(无处不在同时进行:LLMs 可以在叠加状态下进行多任务上下文学习) [11:58] 🧬 LPZero: Language Model Zero-cost Proxy Search from Zero(LPZero:从零开始的零成本代理搜索) [12:41] 🌐 MotionGS: Exploring Explicit Motion Guidance for Deformable 3D Gaussian Splatting(MotionGS:探索显式运动引导的可变形3D高斯喷射) [13:15] 🔍 Scaling Up Your Kernels: Large Kernel Design in ConvNets towards Universal Representations(扩展你的卷积核:大卷积核设计在卷积神经网络中的通用表示) [13:51] 🖼 DART: Denoising Autoregressive Transformer for Scalable Text-to-Image Generation(DART:去噪自回归Transformer用于可扩展的文本到图像生成) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

14分钟
99+
9个月前

【月末特辑】9月最火AI论文 | 强化学习提升语言模型,代码智能模型表现优异。

本期的 10 篇论文如下: [00:40] TOP1(🔥129) | 🤖 Training Language Models to Self-Correct via Reinforcement Learning(通过强化学习训练语言模型进行自我修正) [02:41] TOP2(🔥121) | 🚀 Qwen2.5-Coder Technical Report(Qwen2.5-Coder技术报告) [04:44] TOP3(🔥96) | 🌐 Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models(Molmo 和 PixMo:用于最先进多模态模型的开放权重和开放数据) [06:30] TOP4(🔥95) | 🖼 Guide-and-Rescale: Self-Guidance Mechanism for Effective Tuning-Free Real Image Editing(引导与重缩放:无调参自引导机制实现高效真实图像编辑) [08:23] TOP5(🔥86) | 🧠 Attention Heads of Large Language Models: A Survey(大型语言模型注意力头:一项综述) [10:17] TOP6(🔥85) | 🎥 Loopy: Taming Audio-Driven Portrait Avatar with Long-Term Motion Dependency(Loopy:驯服音频驱动的人像化身与长期运动依赖) [11:56] TOP7(🔥81) | 🌐 OmniGen: Unified Image Generation(全能生成:统一图像生成模型) [13:51] TOP8(🔥81) | 🧠 Emu3: Next-Token Prediction is All You Need(Emu3:下一个词预测是所有你需要的) [15:45] TOP9(🔥78) | 📄 General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model(通用OCR理论:通过统一端到端模型迈向OCR-2.0) [17:59] TOP10(🔥77) | 🧠 OLMoE: Open Mixture-of-Experts Language Models(OLMoE:开放式混合专家语言模型) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

20分钟
80
9个月前

2024.10.10 每日AI论文 | LLMs经济游戏表现各异,个性化视觉指令提升AI互动。

本期的 43 篇论文如下: [00:23] 🤖 GLEE: A Unified Framework and Benchmark for Language-based Economic Environments(GLEE:基于语言的经济环境统一框架与基准) [01:09] 👤 Personalized Visual Instruction Tuning(个性化视觉指令微调) [01:48] 🌍 Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation(迈向世界模拟器:基于物理常识的视频生成基准) [02:35] 🖼 IterComp: Iterative Composition-Aware Feedback Learning from Model Gallery for Text-to-Image Generation(迭代组合感知反馈学习:从模型库中提升文本到图像生成) [03:17] 🔍 Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate(解码大型视觉语言模型中的跨模态对齐与模态集成率) [03:54] 🌐 Aria: An Open Multimodal Native Mixture-of-Experts Model(Aria:一个开放的多模态原生混合专家模型) [04:29] 🌐 Pixtral 12B(Pixtral 12B) [05:09] 🎥 Pyramidal Flow Matching for Efficient Video Generative Modeling(金字塔流匹配用于高效视频生成建模) [05:49] 🔗 Unveiling the Backbone-Optimizer Coupling Bias in Visual Representation Learning(揭示视觉表示学习中的骨干-优化器耦合偏差) [06:29] 🎥 MM-Ego: Towards Building Egocentric Multimodal LLMs(MM-Ego:构建以自我为中心的多模态大型语言模型) [07:07] 🔄 One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation(一种初始化方法统治所有:通过解释方差适应进行微调) [07:51] 📖 Story-Adapter: A Training-free Iterative Framework for Long Story Visualization(故事适配器:一种无需训练的迭代框架用于长故事可视化) [08:33] 🚀 Self-Boosting Large Language Models with Synthetic Preference Data(利用合成偏好数据自我提升大型语言模型) [09:13] 🚀 Falcon Mamba: The First Competitive Attention-free 7B Language Model(猎鹰曼巴:首个无注意力机制的7B语言模型) [09:53] 🎨 TweedieMix: Improving Multi-Concept Fusion for Diffusion-based Image/Video Generation(TweedieMix:改进基于扩散的图像/视频生成中的多概念融合) [10:24] ⏳ Temporal Reasoning Transfer from Text to Video(从文本到视频的时间推理迁移) [10:54] 🎥 TRACE: Temporal Grounding Video LLM via Causal Event Modeling(TRACE:通过因果事件建模实现视频时间定位的大型语言模型) [11:30] 📊 Data Selection via Optimal Control for Language Models(通过最优控制进行语言模型数据选择) [12:07] 🤖 Response Tuning: Aligning Large Language Models without Instruction(响应调优:无需指令对齐大型语言模型) [12:49] 🤖 CursorCore: Assist Programming through Aligning Anything(CursorCore:通过对齐任何内容辅助编程) [13:36] 🎥 ViBiDSampler: Enhancing Video Interpolation Using Bidirectional Diffusion Sampler(ViBiDSampler:利用双向扩散采样器增强视频插值) [14:16] 🗣 Mixed-Session Conversation with Egocentric Memory(带有自我中心记忆的混合会话) [14:57] 🎮 ING-VP: MLLMs cannot Play Easy Vision-based Games Yet(ING-VP:多模态大语言模型在视觉游戏中的表现仍不尽人意) [15:41] 🔓 AutoDAN-Turbo: A Lifelong Agent for Strategy Self-Exploration to Jailbreak LLMs(AutoDAN-Turbo:一种用于策略自我探索以破解LLMs的终身代理) [16:26] 🎥 T2V-Turbo-v2: Enhancing Video Generation Model Post-Training through Data, Reward, and Conditional Guidance Design(T2V-Turbo-v2:通过数据、奖励和条件引导设计增强视频生成模型后训练) [17:00] 📖 Collective Critics for Creative Story Generation(创意故事生成的集体批评框架) [17:36] 🎵 Diversity-Rewarded CFG Distillation(多样性奖励的CFG蒸馏) [18:16] 🧠 Retrieval-Augmented Decision Transformer: External Memory for In-context RL(检索增强决策变压器:上下文强化学习的外部记忆) [18:57] 🎙 F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching(F5-TTS:基于流匹配生成流畅且忠实语音的童话生成器) [19:32] 🎹 FürElise: Capturing and Physically Synthesizing Hand Motions of Piano Performance(《致爱丽丝:捕捉并物理合成钢琴演奏手部动作》) [20:20] 🧠 Holistic Unlearning Benchmark: A Multi-Faceted Evaluation for Text-to-Image Diffusion Model Unlearning(整体遗忘基准:文本到图像扩散模型遗忘的多方面评估) [21:01] 🧬 Multimodal Large Language Models for Inverse Molecular Design with Retrosynthetic Planning(多模态大语言模型用于逆向分子设计与逆合成规划) [21:38] 🎥 BroadWay: Boost Your Text-to-Video Generation Model in a Training-free Way(BroadWay:无需训练提升文本到视频生成模型) [22:21] 🚨 Multimodal Situational Safety(多模态情境安全) [22:56] 💥 Hallucinating AI Hijacking Attack: Large Language Models and Malicious Code Recommenders(幻觉AI劫持攻击:大型语言模型与恶意代码推荐器) [23:38] 🛠 Seeker: Enhancing Exception Handling in Code with LLM-based Multi-Agent Approach(Seeker:利用基于LLM的多代理方法增强代码中的异常处理) [24:18] 🌐 Jointly Generating Multi-view Consistent PBR Textures using Collaborative Control(联合生成多视角一致的PBR纹理:协作控制方法) [24:55] 🤖 TinyEmo: Scaling down Emotional Reasoning via Metric Projection(TinyEmo:通过度量投影缩小情感推理) [25:29] 🧠 MentalArena: Self-play Training of Language Models for Diagnosis and Treatment of Mental Health Disorders(心理竞技场:通过自我对弈训练语言模型用于心理健康障碍的诊断与治疗) [26:08] 🎭 TextToon: Real-Time Text Toonify Head Avatar from Single Video(文本转卡通:从单视频实时生成卡通化头部虚拟形象) [26:49] 🤖 Do great minds think alike? Investigating Human-AI Complementarity in Question Answering with CAIMIRA(伟大的思想是否一致?探究CAIMIRA框架下的人机问答互补性) [27:28] 📊 MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering(MLE-bench:评估机器学习代理在机器学习工程中的表现) [28:03] 🧠 Does Spatial Cognition Emerge in Frontier Models?(空间认知在前沿模型中是否出现?) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

29分钟
90
9个月前

2024.10.09 每日AI论文 | 长上下文生成能力评估,指令多样性影响泛化

本期的 9 篇论文如下: [00:28] 📚 LongGenBench: Long-context Generation Benchmark(长上下文生成基准:LongGenBench) [01:11] 🌐 $\textbf{Only-IF}$:Revealing the Decisive Effect of Instruction Diversity on Generalization(仅限IF:揭示指令多样性对泛化的决定性影响) [01:50] 📊 RevisEval: Improving LLM-as-a-Judge via Response-Adapted References(RevisEval:通过响应自适应参考改进LLM作为评判者) [02:35] 🌟 A Spark of Vision-Language Intelligence: 2-Dimensional Autoregressive Transformer for Efficient Finegrained Image Generation(视觉语言智能的火花:用于高效细粒度图像生成的二维自回归Transformer) [03:25] 🎥 Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video Large Language Models(基于视频的大型语言模型:细化视频中的细粒度时间定位) [04:00] 🎨 ControlAR: Controllable Image Generation with Autoregressive Models(ControlAR:可控图像生成的自回归模型) [04:45] 🔍 Hyper-multi-step: The Truth Behind Difficult Long-context Tasks(超多步:困难长上下文任务背后的真相) [05:21] 🤖 MA-RLHF: Reinforcement Learning from Human Feedback with Macro Actions(MA-RLHF:基于宏动作的人类反馈强化学习) [06:03] 📊 EBES: Easy Benchmarking for Event Sequences(EBES:事件序列的简易基准测试) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

7分钟
88
9个月前

2024.10.08 每日AI论文 | 差分Transformer优化注意力,LLM幻觉研究揭示错误模式。

本期的 21 篇论文如下: [00:26] 🔍 Differential Transformer(差分Transformer) [01:04] 🧠 LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations(大语言模型知多于表:关于LLM幻觉的内在表征) [01:50] 📹 VideoGuide: Improving Video Diffusion Models without Training Through a Teacher's Guide(视频指南:通过教师指导提升视频扩散模型无需训练) [02:28] 📈 FAN: Fourier Analysis Networks(傅里叶分析网络) [03:05] 🏥 Named Clinical Entity Recognition Benchmark(命名临床实体识别基准) [03:37] 🔬 ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery(科学智能基准:面向数据驱动科学发现的语言智能体严格评估) [04:19] 🎶 UniMuMo: Unified Text, Music and Motion Generation(统一文本、音乐与动作生成) [04:55] 🔍 TLDR: Token-Level Detective Reward Model for Large Vision Language Models(TLDR:大视觉语言模型的令牌级侦探奖励模型) [05:35] 🎵 Presto! Distilling Steps and Layers for Accelerating Music Generation(快速!加速音乐生成的步骤和层级蒸馏) [06:08] 🖥 Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents(像人类一样导航数字世界:GUI代理的通用视觉基础) [06:49] 🖼 OmniBooth: Learning Latent Control for Image Synthesis with Multi-modal Instruction(全能展台:通过多模态指令学习图像合成的潜在控制) [07:29] 🌀 MonST3R: A Simple Approach for Estimating Geometry in the Presence of Motion(MonST3R:一种在动态场景中估计几何的简单方法) [08:09] 🧠 LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level Mathematical Reasoning(LLaMA-Berry:O1类奥林匹克级数学推理的成对优化) [08:50] 📊 MathHay: An Automated Benchmark for Long-Context Mathematical Reasoning in LLMs(MathHay:LLMs长上下文数学推理自动化基准) [09:39] 📊 GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models(GSM-符号化:理解大型语言模型在数学推理中的局限性) [10:34] 🤖 Autonomous Character-Scene Interaction Synthesis from Text Instruction(从文本指令自主合成角色场景互动) [11:12] 🧩 TurtleBench: Evaluating Top Language Models via Real-World Yes/No Puzzles(TurtleBench:通过真实世界的Yes/No谜题评估顶级语言模型) [12:00] 🤖 Grounding Language in Multi-Perspective Referential Communication(多视角指称通信中的语言接地) [12:48] 🎯 SePPO: Semi-Policy Preference Optimization for Diffusion Alignment(SePPO:扩散模型对齐的半策略偏好优化) [13:25] 🧩 What Matters for Model Merging at Scale?(大规模模型合并的关键因素是什么?) [14:02] 📊 SELECT: A Large-Scale Benchmark of Data Curation Strategies for Image Classification(SELECT:图像分类数据策展策略的大规模基准) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

15分钟
99+
9个月前

2024.10.07 每日AI论文 | 高效能语言模型节能新算法,视觉语言模型推理能力待提升。

本期的 12 篇论文如下: [00:25] ⚡ Addition is All You Need for Energy-efficient Language Models(加法即所需:高效能语言模型) [01:03] 🧠 NL-Eye: Abductive NLI for Images(NL-Eye:图像的溯因自然语言推理) [01:40] 🔍 Selective Attention Improves Transformer(选择性注意力提升Transformer) [02:17] ⚡ Accelerating Auto-regressive Text-to-Image Generation with Training-free Speculative Jacobi Decoding(加速自回归文本到图像生成:无训练的推测性雅可比解码) [02:48] 🤖 Tutor CoPilot: A Human-AI Approach for Scaling Real-Time Expertise(导师助手:一种用于扩展实时专家知识的人机协作方法) [03:27] 🩺 A Comprehensive Survey of Mamba Architectures for Medical Image Analysis: Classification, Segmentation, Restoration and Beyond(医学图像分析中的Mamba架构综合调查:分类、分割、恢复及超越) [04:12] 🎨 RoCoTex: A Robust Method for Consistent Texture Synthesis with Diffusion Models(RoCoTex:一种基于扩散模型的鲁棒一致纹理合成方法) [04:59] 🧠 Erasing Conceptual Knowledge from Language Models(从语言模型中消除概念知识) [05:37] 📈 MIGA: Mixture-of-Experts with Group Aggregation for Stock Market Prediction(MIGA:基于专家组聚合的混合模型用于股票市场预测) [06:16] 🤖 CANVAS: Commonsense-Aware Navigation System for Intuitive Human-Robot Interaction(CANVAS:常识感知导航系统用于直观人机交互) [06:54] 🌳 NRGBoost: Energy-Based Generative Boosted Trees(NRGBoost:基于能量的生成增强树) [07:37] 🤖 GenSim2: Scaling Robot Data Generation with Multi-modal and Reasoning LLMs(GenSim2:利用多模态和推理LLMs扩展机器人数据生成) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

8分钟
93
9个月前

2024.10.04 每日AI论文 | 字幕类型影响模型表现,长视频生成技术突破。

本期的 19 篇论文如下: [00:24] 🔄 Revisit Large-Scale Image-Caption Data in Pre-training Multimodal Foundation Models(重新审视大规模图像-文本数据在多模态基础模型预训练中的作用) [01:04] 🎥 Loong: Generating Minute-level Long Videos with Autoregressive Language Models(使用自回归语言模型生成分钟级长视频) [01:39] 🎥 Video Instruction Tuning With Synthetic Data(使用合成数据进行视频指令调优) [02:18] 🧐 LLaVA-Critic: Learning to Evaluate Multimodal Models(LLaVA-Critic:学习评估多模态模型) [02:56] 🔍 Contrastive Localized Language-Image Pre-Training(对比本地化语言-图像预训练) [03:31] 🌱 VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment(VinePPO:通过精细化的信用分配解锁LLM推理的RL潜力) [04:07] 🌟 Depth Pro: Sharp Monocular Metric Depth in Less Than a Second(Depth Pro:不到一秒内实现锐利的单目度量深度) [04:51] 🔗 Large Language Models as Markov Chains(大型语言模型作为马尔可夫链) [05:26] 🧠 CLIP-MoE: Towards Building Mixture of Experts for CLIP with Diversified Multiplet Upcycling(CLIP-MoE:通过多样化多重升级构建CLIP的专家混合模型) [06:03] 🔄 Eliminating Oversaturation and Artifacts of High Guidance Scales in Diffusion Models(消除扩散模型中高指导尺度引起的过饱和和伪影) [06:51] 🔄 Training Language Models on Synthetic Edit Sequences Improves Code Synthesis(在合成编辑序列上训练语言模型改进代码合成) [07:36] ⚡ SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration(SageAttention:用于即插即用推理加速的精确8位注意力机制) [08:14] 🌐 MVGS: Multi-view-regulated Gaussian Splatting for Novel View Synthesis(MVGS:多视角调节的高斯喷射用于新视角合成) [08:54] 📚 L-CiteEval: Do Long-Context Models Truly Leverage Context for Responding?(L-CiteEval:长上下文模型是否真正利用上下文进行响应?) [09:38] 🩺 MedVisionLlama: Leveraging Pre-Trained Large Language Model Layers to Enhance Medical Image Segmentation(利用预训练大型语言模型层增强医学图像分割) [10:24] 🎥 Vinoground: Scrutinizing LMMs over Dense Temporal Reasoning with Short Videos(Vinoground: 通过短视频密集时间推理审视大型多模态模型) [11:01] 🗣 Distilling an End-to-End Voice Assistant Without Instruction Training Data(无需指令训练数据的端到端语音助手蒸馏) [11:46] ♟ Learning the Latent Rules of a Game from Data: A Chess Story(从数据中学习游戏的潜在规则:一个国际象棋的故事) [12:29] 🎵 Synthio: Augmenting Small-Scale Audio Classification Datasets with Synthetic Data(Synthio:使用合成数据增强小规模音频分类数据集) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

13分钟
92
9个月前

2024.10.03 每日AI论文 | 分层调试提升代码准确性,多模态模型优化图像任务。

本期的 20 篇论文如下: [00:23] 🐞 From Code to Correctness: Closing the Last Mile of Code Generation with Hierarchical Debugging(从代码到正确性:通过分层调试解决代码生成的最后一步) [01:08] 📄 LEOPARD : A Vision Language Model For Text-Rich Multi-Image Tasks(LEOPARD:用于文本丰富的多图像任务的视觉语言模型) [01:48] 📊 Is Preference Alignment Always the Best Option to Enhance LLM-Based Translation? An Empirical Analysis(偏好对齐是否总是提升基于LLM的翻译的最佳选择?一项实证分析) [02:27] 🖼 ComfyGen: Prompt-Adaptive Workflows for Text-to-Image Generation(ComfyGen:文本到图像生成的提示自适应工作流) [03:08] 🧠 RATIONALYST: Pre-training Process-Supervision for Improving Reasoning(RATIONALYST:通过预训练过程监督改进推理) [03:45] 🧠 Not All LLM Reasoners Are Created Equal(并非所有LLM推理器都相同) [04:18] 📊 Quantifying Generalization Complexity for Large Language Models(量化大型语言模型的泛化复杂性) [04:59] 🔍 3DGS-DET: Empower 3D Gaussian Splatting with Boundary Guidance and Box-Focused Sampling for 3D Object Detection(3DGS-DET:利用边界引导和框聚焦采样增强3D高斯喷洒进行3D物体检测) [05:45] 🔄 HelpSteer2-Preference: Complementing Ratings with Preferences(HelpSteer2-Preference:通过偏好补充评分) [06:25] 🗣 MOSEL: 950,000 Hours of Speech Data for Open-Source Speech Foundation Model Training on EU Languages(MOSEL:用于欧盟语言开源语音基础模型训练的95万小时语音数据) [07:03] 🤖 Closed-loop Long-horizon Robotic Planning via Equilibrium Sequence Modeling(通过平衡序列建模实现闭环长期机器人规划) [07:40] 🌐 EVER: Exact Volumetric Ellipsoid Rendering for Real-time View Synthesis(EVER:实时视图合成的精确体积椭球体渲染) [08:22] 📄 FactAlign: Long-form Factuality Alignment of Large Language Models(FactAlign:大型语言模型的长篇事实对齐) [08:57] 📹 E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding(E.T. 基准:面向开放式事件级视频语言理解) [09:37] 🌍 BordIRlines: A Dataset for Evaluating Cross-lingual Retrieval-Augmented Generation(BordIRlines:评估跨语言检索增强生成的数据集) [10:13] 🔊 SonicSim: A customizable simulation platform for speech processing in moving sound source scenarios(SonicSim:移动声源场景下语音处理的定制化仿真平台) [10:53] 🔄 HarmoniCa: Harmonizing Training and Inference for Better Feature Cache in Diffusion Transformer Acceleration(HarmoniCa:在扩散Transformer加速中协调训练与推理以实现更好的特征缓存) [11:35] 🔍 Selective Aggregation for Low-Rank Adaptation in Federated Learning(联邦学习中低秩适应的选择性聚合) [12:14] 📚 Old Optimizer, New Norm: An Anthology(旧优化器,新范数:文集) [12:49] 📱 InfiniPot: Infinite Context Processing on Memory-Constrained LLMs(InfiniPot:内存受限的LLM无限上下文处理) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

13分钟
79
9个月前

2024.10.02 每日AI论文 | 跨能力任务表现受限,边缘设备高效部署模型

本期的 13 篇论文如下: [00:26] 🔗 Law of the Weakest Link: Cross Capabilities of Large Language Models(最弱环节法则:大型语言模型的跨能力) [01:05] 🌐 TPI-LLM: Serving 70B-scale LLMs Efficiently on Low-resource Edge Devices(TPI-LLM:在低资源边缘设备上高效服务70B规模的大型语言模型) [01:46] 🌍 Atlas-Chat: Adapting Large Language Models for Low-Resource Moroccan Arabic Dialect(Atlas-Chat:为低资源摩洛哥阿拉伯方言定制的大型语言模型) [02:22] 🎥 One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos(一令分段:视频中的语言指令推理分割) [02:59] 🌐 Flex3D: Feed-Forward 3D Generation With Flexible Reconstruction Model And Input View Curation(Flex3D:利用灵活的重建模型和输入视图优化进行前馈3D生成) [03:46] 🎨 Illustrious: an Open Advanced Illustration Model(辉煌:一个开放的高级插画模型) [04:22] 🚗 SyntheOcc: Synthesize Geometric-Controlled Street View Images through 3D Semantic MPIs(通过3D语义MPIs合成几何控制街景图像) [05:00] 📸 Posterior-Mean Rectified Flow: Towards Minimum MSE Photo-Realistic Image Restoration(后验均值校正流:迈向最小均方误差照片真实图像恢复) [05:47] 🎨 ACE: All-round Creator and Editor Following Instructions via Diffusion Transformer(遵循扩散变换器的全方位创作者和编辑) [06:22] 🎥 Visual Context Window Extension: A New Perspective for Long Video Understanding(视觉上下文窗口扩展:长视频理解的新视角) [07:05] 🤖 Helpful DoggyBot: Open-World Object Fetching using Legged Robots and Vision-Language Models(帮助型DoggyBot:使用四足机器人和视觉语言模型进行开放世界物体抓取) [07:46] 🎥 DressRecon: Freeform 4D Human Reconstruction from Monocular Video(DressRecon:单目视频中的自由形式4D人体重建) [08:32] 🤖 What the Harm? Quantifying the Tangible Impact of Gender Bias in Machine Translation with a Human-centered Study(性别偏见的影响?通过以人为本的研究量化机器翻译中的性别偏见) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

9分钟
78
9个月前

2024.10.01 每日AI论文 | 多模态模型提升图像理解,长度控制方法增强生成精确性。

本期的 11 篇论文如下: [00:26] 🌐 MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning(MM1.5:多模态大语言模型微调的方法、分析与见解) [01:04] 📏 Ruler: A Model-Agnostic Method to Control Generated Length for Large Language Models(Ruler:一种用于控制大型语言模型生成长度的模型无关方法) [01:41] 🗣 DiaSynth -- Synthetic Dialogue Generation Framework(DiaSynth -- 合成对话生成框架) [02:22] 📊 Hyper-Connections(OLMo-1B:探索DHC和SHC中的规模与训练) [02:57] 🤖 UniAff: A Unified Representation of Affordances for Tool Usage and Articulation with Vision-Language Models(UniAff:一种结合视觉语言模型的工具使用和关节运动的统一表示方法) [03:35] 🔍 Cottention: Linear Transformers With Cosine Attention(Cottention:基于余弦注意力的线性变换器) [04:10] 🤖 Scaling Proprioceptive-Visual Learning with Heterogeneous Pre-trained Transformers(通过异构预训练Transformer扩展本体感觉-视觉学习) [04:49] 🏋 Coffee-Gym: An Environment for Evaluating and Improving Natural Language Feedback on Erroneous Code(咖啡健身房:评估和改进错误代码的自然语言反馈环境) [05:29] 🖼 Image Copy Detection for Diffusion Models(扩散模型图像复制检测) [06:09] 🧠 Can Models Learn Skill Composition from Examples?(模型能否从示例中学习技能组合?) [06:43] 🎧 IDEAW: Robust Neural Audio Watermarking with Invertible Dual-Embedding(IDEAW:具有可逆双嵌入的鲁棒神经音频水印) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

7分钟
74
10个月前

2024.09.30 每日AI论文 | Emu3简化多模态设计,MIO提升视频理解表现。

本期的 9 篇论文如下: [00:24] 🧠 Emu3: Next-Token Prediction is All You Need(Emu3:下一个词预测是您所需要的全部) [00:53] 🌐 MIO: A Foundation Model on Multimodal Tokens(多模态标记的基础模型:MIO) [01:26] 🔍 VPTQ: Extreme Low-bit Vector Post-Training Quantization for Large Language Models(VPTQ:大语言模型的极端低比特向量后训练量化) [02:21] 🎥 PhysGen: Rigid-Body Physics-Grounded Image-to-Video Generation(PhysGen:基于刚体物理的图像到视频生成) [03:05] 🔄 Modulated Intervention Preference Optimization (MIPO): Keep the Easy, Refine the Difficult(调制干预偏好优化(MIPO):保持简单,细化困难) [03:46] 📄 MinerU: An Open-Source Solution for Precise Document Content Extraction(MinerU:一种用于精确文档内容提取的开源解决方案) [04:24] 🤖 MSI-Agent: Incorporating Multi-Scale Insight into Embodied Agents for Superior Planning and Decision-Making(MSI-Agent:将多尺度洞察融入具身代理以提升规划与决策能力) [05:01] 🤖 A Survey on the Honesty of Large Language Models(大型语言模型诚实性综述) [05:45] 📊 LML: Language Model Learning a Dataset for Data-Augmented Prediction(LML:用于数据增强预测的数据集学习语言模型) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

6分钟
61
10个月前
EarsOnMe

加入我们的 Discord

与播客爱好者一起交流

立即加入

扫描微信二维码

添加微信好友,获取更多播客资讯

微信二维码

播放列表

自动播放下一个

播放列表还是空的

去找些喜欢的节目添加进来吧