HuggingFace 每日AI论文速递 - 2024.12.16 每日AI论文 | 视频理解新突破，AI探索3D环境。 - EarsOnMe

HuggingFace 每日AI论文速递
2024.12.16 每日AI论文 | 视频理解新突破，AI探索3D环境。

时长：

11分钟

播放：

发布：

8个月前

主播...

拨号上网

简介...

本期的 14 篇论文如下：

[00:23] 🎥 Apollo: An Exploration of Video Understanding in Large Multimodal Models（阿波罗：大型多模态模型中的视频理解探索）

[01:11] 🌍 GenEx: Generating an Explorable World（GenEx：生成可探索的世界）

[01:50] 🌐 SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding（协同生成-VL：基于视觉专家和令牌折叠的图像理解与生成）

[02:37] 🩺 BiMediX2: Bio-Medical EXpert LMM for Diverse Medical Modalities（BiMediX2：多模态生物医学专家大模型）

[03:21] 🤖 Large Action Models: From Inception to Implementation（大规模动作模型：从构想到实现）

[04:09] 🎥 InstanceCap: Improving Text-to-Video Generation via Instance-aware Structured Caption（实例感知结构化字幕：通过实例感知结构化字幕提升文本到视频生成）

[04:56] 🌟 FreeScale: Unleashing the Resolution of Diffusion Models via Tuning-Free Scale Fusion（FreeScale：通过无调谐尺度融合释放扩散模型的分辨率）

[05:42] 🎯 ObjectMate: A Recurrence Prior for Object Insertion and Subject-Driven Generation（ObjectMate：面向对象插入与主体驱动生成任务的循环先验方法）

[06:21] 🔥 FireFlow: Fast Inversion of Rectified Flow for Image Semantic Editing（FireFlow：图像语义编辑的快速校正流反演）

[07:09] 🎵 Multimodal Music Generation with Explicit Bridges and Retrieval Augmentation（基于显式桥梁和检索增强的多模态音乐生成）

[07:56] 🎨 FluxSpace: Disentangled Semantic Editing in Rectified Flow Transformers（FluxSpace：在修正流变换器中解耦语义编辑）

[08:44] 📊 SCBench: A KV Cache-Centric Analysis of Long-Context Methods（SCBench：以KV缓存为中心的长上下文方法分析）

[09:27] 🧠 SmolTulu: Higher Learning Rate to Batch Size Ratios Can Lead to Better Reasoning in SLMs（SmolTulu：更高的学习率与批量大小的比率可以提升SLMs的推理能力）

[10:05] 🩺 Prompt2Perturb (P2P): Text-Guided Diffusion-Based Adversarial Attacks on Breast Ultrasound Images（Prompt2Perturb (P2P): 基于文本引导扩散的乳腺超声图像对抗攻击）

【关注我们】

您还可以在以下平台找到我们，获得播客内容以外更多信息

小红书: AI速递

评价...

空空如也

小宇宙热门评论...

暂无小宇宙热门评论

去听...

小宇宙

谁收藏了...

EarsOnMe

空空如也

加入我们的 Discord

扫描微信二维码

播放列表