本期的 43 篇论文如下:
[00:23] 🤖 GLEE: A Unified Framework and Benchmark for Language-based Economic Environments(GLEE:基于语言的经济环境统一框架与基准)
[01:09] 👤 Personalized Visual Instruction Tuning(个性化视觉指令微调)
[01:48] 🌍 Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation(迈向世界模拟器:基于物理常识的视频生成基准)
[02:35] 🖼 IterComp: Iterative Composition-Aware Feedback Learning from Model Gallery for Text-to-Image Generation(迭代组合感知反馈学习:从模型库中提升文本到图像生成)
[03:17] 🔍 Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate(解码大型视觉语言模型中的跨模态对齐与模态集成率)
[03:54] 🌐 Aria: An Open Multimodal Native Mixture-of-Experts Model(Aria:一个开放的多模态原生混合专家模型)
[04:29] 🌐 Pixtral 12B(Pixtral 12B)
[05:09] 🎥 Pyramidal Flow Matching for Efficient Video Generative Modeling(金字塔流匹配用于高效视频生成建模)
[05:49] 🔗 Unveiling the Backbone-Optimizer Coupling Bias in Visual Representation Learning(揭示视觉表示学习中的骨干-优化器耦合偏差)
[06:29] 🎥 MM-Ego: Towards Building Egocentric Multimodal LLMs(MM-Ego:构建以自我为中心的多模态大型语言模型)
[07:07] 🔄 One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation(一种初始化方法统治所有:通过解释方差适应进行微调)
[07:51] 📖 Story-Adapter: A Training-free Iterative Framework for Long Story Visualization(故事适配器:一种无需训练的迭代框架用于长故事可视化)
[08:33] 🚀 Self-Boosting Large Language Models with Synthetic Preference Data(利用合成偏好数据自我提升大型语言模型)
[09:13] 🚀 Falcon Mamba: The First Competitive Attention-free 7B Language Model(猎鹰曼巴:首个无注意力机制的7B语言模型)
[09:53] 🎨 TweedieMix: Improving Multi-Concept Fusion for Diffusion-based Image/Video Generation(TweedieMix:改进基于扩散的图像/视频生成中的多概念融合)
[10:24] ⏳ Temporal Reasoning Transfer from Text to Video(从文本到视频的时间推理迁移)
[10:54] 🎥 TRACE: Temporal Grounding Video LLM via Causal Event Modeling(TRACE:通过因果事件建模实现视频时间定位的大型语言模型)
[11:30] 📊 Data Selection via Optimal Control for Language Models(通过最优控制进行语言模型数据选择)
[12:07] 🤖 Response Tuning: Aligning Large Language Models without Instruction(响应调优:无需指令对齐大型语言模型)
[12:49] 🤖 CursorCore: Assist Programming through Aligning Anything(CursorCore:通过对齐任何内容辅助编程)
[13:36] 🎥 ViBiDSampler: Enhancing Video Interpolation Using Bidirectional Diffusion Sampler(ViBiDSampler:利用双向扩散采样器增强视频插值)
[14:16] 🗣 Mixed-Session Conversation with Egocentric Memory(带有自我中心记忆的混合会话)
[14:57] 🎮 ING-VP: MLLMs cannot Play Easy Vision-based Games Yet(ING-VP:多模态大语言模型在视觉游戏中的表现仍不尽人意)
[15:41] 🔓 AutoDAN-Turbo: A Lifelong Agent for Strategy Self-Exploration to Jailbreak LLMs(AutoDAN-Turbo:一种用于策略自我探索以破解LLMs的终身代理)
[16:26] 🎥 T2V-Turbo-v2: Enhancing Video Generation Model Post-Training through Data, Reward, and Conditional Guidance Design(T2V-Turbo-v2:通过数据、奖励和条件引导设计增强视频生成模型后训练)
[17:00] 📖 Collective Critics for Creative Story Generation(创意故事生成的集体批评框架)
[17:36] 🎵 Diversity-Rewarded CFG Distillation(多样性奖励的CFG蒸馏)
[18:16] 🧠 Retrieval-Augmented Decision Transformer: External Memory for In-context RL(检索增强决策变压器:上下文强化学习的外部记忆)
[18:57] 🎙 F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching(F5-TTS:基于流匹配生成流畅且忠实语音的童话生成器)
[19:32] 🎹 FürElise: Capturing and Physically Synthesizing Hand Motions of Piano Performance(《致爱丽丝:捕捉并物理合成钢琴演奏手部动作》)
[20:20] 🧠 Holistic Unlearning Benchmark: A Multi-Faceted Evaluation for Text-to-Image Diffusion Model Unlearning(整体遗忘基准:文本到图像扩散模型遗忘的多方面评估)
[21:01] 🧬 Multimodal Large Language Models for Inverse Molecular Design with Retrosynthetic Planning(多模态大语言模型用于逆向分子设计与逆合成规划)
[21:38] 🎥 BroadWay: Boost Your Text-to-Video Generation Model in a Training-free Way(BroadWay:无需训练提升文本到视频生成模型)
[22:21] 🚨 Multimodal Situational Safety(多模态情境安全)
[22:56] 💥 Hallucinating AI Hijacking Attack: Large Language Models and Malicious Code Recommenders(幻觉AI劫持攻击:大型语言模型与恶意代码推荐器)
[23:38] 🛠 Seeker: Enhancing Exception Handling in Code with LLM-based Multi-Agent Approach(Seeker:利用基于LLM的多代理方法增强代码中的异常处理)
[24:18] 🌐 Jointly Generating Multi-view Consistent PBR Textures using Collaborative Control(联合生成多视角一致的PBR纹理:协作控制方法)
[24:55] 🤖 TinyEmo: Scaling down Emotional Reasoning via Metric Projection(TinyEmo:通过度量投影缩小情感推理)
[25:29] 🧠 MentalArena: Self-play Training of Language Models for Diagnosis and Treatment of Mental Health Disorders(心理竞技场:通过自我对弈训练语言模型用于心理健康障碍的诊断与治疗)
[26:08] 🎭 TextToon: Real-Time Text Toonify Head Avatar from Single Video(文本转卡通:从单视频实时生成卡通化头部虚拟形象)
[26:49] 🤖 Do great minds think alike? Investigating Human-AI Complementarity in Question Answering with CAIMIRA(伟大的思想是否一致?探究CAIMIRA框架下的人机问答互补性)
[27:28] 📊 MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering(MLE-bench:评估机器学习代理在机器学习工程中的表现)
[28:03] 🧠 Does Spatial Cognition Emerge in Frontier Models?(空间认知在前沿模型中是否出现?)

【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递

空空如也
暂无小宇宙热门评论