HuggingFace 每日AI论文速递 - 2024.08.23 每日AI论文 | 大型语言模型提升文本生成质量，智人模型优化视觉任务表现 - EarsOnMe

时长：

12分钟

播放：

105

发布：

1年前

主播...

简介...

大家好，欢迎收听“Hugging Face 每日AI论文速递”。今天是2024年8月23日，我们将带您快速浏览今日的19篇热门AI论文，涵盖了大型语言模型的可控文本生成、多模态理解和生成、高保真文本到视频合成等多个前沿领域。现在，让我们立即进入精彩的论文世界。

[00:27] 📚 Controllable Text Generation for Large Language Models: A Survey（大型语言模型的可控文本生成：综述）

[01:00] 🧠 Sapiens: Foundation for Human Vision Models（智人：人类视觉模型基础）

[01:36] 🌐 Show-o: One Single Transformer to Unify Multimodal Understanding and Generation（Show-o：一个统一的Transformer模型，实现多模态理解和生成）

[02:12] 🎥 xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations（xGen-VideoSyn-1：高保真文本到视频合成与压缩表示）

[02:45] 🎥 DreamCinema: Cinematic Transfer with Free Camera and 3D Character（DreamCinema：自由相机与3D角色的电影转移）

[03:19] 🖼 Scalable Autoregressive Image Generation with Mamba（基于Mamba架构的可扩展自回归图像生成）

[03:54] 🤖 Hermes 3 Technical Report（Hermes 3技术报告）

[04:33] 🚀 Jamba-1.5: Hybrid Transformer-Mamba Models at Scale（Jamba-1.5：大规模混合Transformer-Mamba模型）

[05:10] 🎥 Real-Time Video Generation with Pyramid Attention Broadcast（基于金字塔注意力广播的实时视频生成）

[05:50] 🌲 Strategist: Learning Strategic Skills by LLMs via Bi-Level Tree Search（战略家：通过双层树搜索让LLMs学习战略技能）

[06:30] 🌉 SEA: Supervised Embedding Alignment for Token-Level Visual-Textual Integration in MLLMs（SEA：多模态大型语言模型中令牌级视觉-文本集成监督嵌入对齐）

[07:14] 💼 Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications（开放式金融大型语言模型：面向金融应用的多模态大型语言模型）

[07:49] 📷 SPARK: Multi-Vision Sensor Perception and Reasoning Benchmark for Large-scale Vision-Language Models（SPARK：大规模视觉语言模型的多视觉传感器感知与推理基准）

[08:26] 🇻 Vintern-1B: An Efficient Multimodal Large Language Model for Vietnamese（Vintern-1B：一个针对越南语的高效多模态大型语言模型）

[08:56] 🎥 Video-Foley: Two-Stage Video-To-Sound Generation via Temporal Event Condition For Foley Sound（视频-福莱：基于时序事件条件的两阶段视频到声音生成）

[09:24] 🎥 Anim-Director: A Large Multimodal Model Powered Agent for Controllable Animation Video Generation（Anim-Director：一个利用大型多模态模型驱动的可控动画视频生成代理）

[10:05] 🧐 ConflictBank: A Benchmark for Evaluating the Influence of Knowledge Conflicts in LLM（ConflictBank：评估大型语言模型中知识冲突影响的基准）

[10:46] 🌟 Subsurface Scattering for 3D Gaussian Splatting（3D高斯喷射中的次表面散射）

[11:20] 🇷 The Russian-focused embedders' exploration: ruMTEB benchmark and Russian embedding model design（聚焦俄罗斯的嵌入模型探索：ruMTEB基准与俄语嵌入模型设计）

【关注我们】

您还可以在以下平台找到我们，获得播客内容以外更多信息

小红书: AI速递

评价...

空空如也

小宇宙热门评论...

暂无小宇宙热门评论

去听...

小宇宙

谁收藏了...