本期的 14 篇论文如下:
[00:23] 🎥 Apollo: An Exploration of Video Understanding in Large Multimodal Models(阿波罗:大型多模态模型中的视频理解探索)
[01:11] 🌍 GenEx: Generating an Explorable World(GenEx:生成可探索的世界)
[01:50] 🌐 SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding(协同生成-VL:基于视觉专家和令牌折叠的图像理解与生成)
[02:37] 🩺 BiMediX2: Bio-Medical EXpert LMM for Diverse Medical Modalities(BiMediX2:多模态生物医学专家大模型)
[03:21] 🤖 Large Action Models: From Inception to Implementation(大规模动作模型:从构想到实现)
[04:09] 🎥 InstanceCap: Improving Text-to-Video Generation via Instance-aware Structured Caption(实例感知结构化字幕:通过实例感知结构化字幕提升文本到视频生成)
[04:56] 🌟 FreeScale: Unleashing the Resolution of Diffusion Models via Tuning-Free Scale Fusion(FreeScale:通过无调谐尺度融合释放扩散模型的分辨率)
[05:42] 🎯 ObjectMate: A Recurrence Prior for Object Insertion and Subject-Driven Generation(ObjectMate:面向对象插入与主体驱动生成任务的循环先验方法)
[06:21] 🔥 FireFlow: Fast Inversion of Rectified Flow for Image Semantic Editing(FireFlow:图像语义编辑的快速校正流反演)
[07:09] 🎵 Multimodal Music Generation with Explicit Bridges and Retrieval Augmentation(基于显式桥梁和检索增强的多模态音乐生成)
[07:56] 🎨 FluxSpace: Disentangled Semantic Editing in Rectified Flow Transformers(FluxSpace:在修正流变换器中解耦语义编辑)
[08:44] 📊 SCBench: A KV Cache-Centric Analysis of Long-Context Methods(SCBench:以KV缓存为中心的长上下文方法分析)
[09:27] 🧠 SmolTulu: Higher Learning Rate to Batch Size Ratios Can Lead to Better Reasoning in SLMs(SmolTulu:更高的学习率与批量大小的比率可以提升SLMs的推理能力)
[10:05] 🩺 Prompt2Perturb (P2P): Text-Guided Diffusion-Based Adversarial Attacks on Breast Ultrasound Images(Prompt2Perturb (P2P): 基于文本引导扩散的乳腺超声图像对抗攻击)

【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递

空空如也
暂无小宇宙热门评论