本期的 12 篇论文如下:
[00:27] 🌐 LLaVA-3D: A Simple yet Effective Pathway to Empowering LMMs with 3D-awareness(LLaVA-3D:一种简单而有效的路径,赋予多模态模型3D感知能力)
[01:10] 🧩 MaskLLM: Learnable Semi-Structured Sparsity for Large Language Models(MaskLLM:大型语言模型的可学习半结构化稀疏性)
[01:49] 🎭 EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions(EMOVA:赋予语言模型以生动的情感,使其能够看、听和说)
[02:35] 🌸 Lotus: Diffusion-based Visual Foundation Model for High-quality Dense Prediction(莲花:基于扩散的高质量密集预测视觉基础模型)
[03:15] ⚡ Discovering the Gems in Early Layers: Accelerating Long-Context LLMs with 1000x Input Token Reduction(探索早期层的瑰宝:通过1000倍输入令牌减少加速长上下文LLM)
[03:58] 🖼 Pixel-Space Post-Training of Latent Diffusion Models(潜在扩散模型的像素空间后训练)
[04:36] 🔍 Reducing the Footprint of Multi-Vector Retrieval with Minimal Performance Impact via Token Pooling(通过令牌池化减少多向量检索的足迹并保持最小性能影响)
[05:17] 🎭 Disco4D: Disentangled 4D Human Generation and Animation from a Single Image(Disco4D:从单张图像生成和动画化分离的4D人体模型)
[05:55] 🧠 Instruction Following without Instruction Tuning(无需指令微调的指令跟随)
[06:30] 📊 The Imperative of Conversation Analysis in the Era of LLMs: A Survey of Tasks, Techniques, and Trends(大语言模型时代对话分析的必要性:任务、技术与趋势综述)
[07:07] 🤖 Robot See Robot Do: Imitating Articulated Object Manipulation with Monocular 4D Reconstruction(机器人看机器人做:通过单目4D重建模仿关节物体操作)
[07:43] ⚽ Enhancing Structured-Data Retrieval with GraphRAG: Soccer Data Case Study(增强结构化数据检索与GraphRAG:足球数据案例研究)

【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递

空空如也
暂无小宇宙热门评论