节目列表: HuggingFace 每日AI论文速递 - EarsOnMe | 发现和收听来自小宇宙的热门播客

【周末特辑】10月第1周最火AI论文 | Emu3模型表现卓越，最弱环节定律制约LLMs。

本期的 5 篇论文如下： [00:47] TOP1(🔥73) | 🧠 Emu3: Next-Token Prediction is All You Need（Emu3：下一个词预测是所有你需要的） [02:42] TOP2(🔥48) | 🔗 Law of the Weakest Link: Cross Capabilities of Large Language Models（最弱环节定律：大型语言模型的跨能力） [04:26] TOP3(🔥45) | 🌐 MIO: A Foundation Model on Multimodal Tokens（MIO：基于多模态标记的基础模型） [06:26] TOP4(🔥44) | 🌐 Revisit Large-Scale Image-Caption Data in Pre-training Multimodal Foundation Models（重访大规模图像-标题数据在预训练多模态基础模型中的应用） [08:27] TOP5(🔥43) | 🧠 MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning（MM1.5：多模态大语言模型微调的方法、分析与洞察）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

10分钟

94

6个月前

2024.10.04 每日AI论文 | 字幕类型影响模型表现，长视频生成技术突破。

本期的 19 篇论文如下： [00:24] 🔄 Revisit Large-Scale Image-Caption Data in Pre-training Multimodal Foundation Models（重新审视大规模图像-文本数据在多模态基础模型预训练中的作用） [01:04] 🎥 Loong: Generating Minute-level Long Videos with Autoregressive Language Models（使用自回归语言模型生成分钟级长视频） [01:39] 🎥 Video Instruction Tuning With Synthetic Data（使用合成数据进行视频指令调优） [02:18] 🧐 LLaVA-Critic: Learning to Evaluate Multimodal Models（LLaVA-Critic：学习评估多模态模型） [02:56] 🔍 Contrastive Localized Language-Image Pre-Training（对比本地化语言-图像预训练） [03:31] 🌱 VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment（VinePPO：通过精细化的信用分配解锁LLM推理的RL潜力） [04:07] 🌟 Depth Pro: Sharp Monocular Metric Depth in Less Than a Second（Depth Pro：不到一秒内实现锐利的单目度量深度） [04:51] 🔗 Large Language Models as Markov Chains（大型语言模型作为马尔可夫链） [05:26] 🧠 CLIP-MoE: Towards Building Mixture of Experts for CLIP with Diversified Multiplet Upcycling（CLIP-MoE：通过多样化多重升级构建CLIP的专家混合模型） [06:03] 🔄 Eliminating Oversaturation and Artifacts of High Guidance Scales in Diffusion Models（消除扩散模型中高指导尺度引起的过饱和和伪影） [06:51] 🔄 Training Language Models on Synthetic Edit Sequences Improves Code Synthesis（在合成编辑序列上训练语言模型改进代码合成） [07:36] ⚡ SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration（SageAttention：用于即插即用推理加速的精确8位注意力机制） [08:14] 🌐 MVGS: Multi-view-regulated Gaussian Splatting for Novel View Synthesis（MVGS：多视角调节的高斯喷射用于新视角合成） [08:54] 📚 L-CiteEval: Do Long-Context Models Truly Leverage Context for Responding?（L-CiteEval：长上下文模型是否真正利用上下文进行响应？） [09:38] 🩺 MedVisionLlama: Leveraging Pre-Trained Large Language Model Layers to Enhance Medical Image Segmentation（利用预训练大型语言模型层增强医学图像分割） [10:24] 🎥 Vinoground: Scrutinizing LMMs over Dense Temporal Reasoning with Short Videos（Vinoground: 通过短视频密集时间推理审视大型多模态模型） [11:01] 🗣 Distilling an End-to-End Voice Assistant Without Instruction Training Data（无需指令训练数据的端到端语音助手蒸馏） [11:46] ♟ Learning the Latent Rules of a Game from Data: A Chess Story（从数据中学习游戏的潜在规则：一个国际象棋的故事） [12:29] 🎵 Synthio: Augmenting Small-Scale Audio Classification Datasets with Synthetic Data（Synthio：使用合成数据增强小规模音频分类数据集）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

13分钟

92

6个月前

2024.10.03 每日AI论文 | 分层调试提升代码准确性，多模态模型优化图像任务。

本期的 20 篇论文如下： [00:23] 🐞 From Code to Correctness: Closing the Last Mile of Code Generation with Hierarchical Debugging（从代码到正确性：通过分层调试解决代码生成的最后一步） [01:08] 📄 LEOPARD : A Vision Language Model For Text-Rich Multi-Image Tasks（LEOPARD：用于文本丰富的多图像任务的视觉语言模型） [01:48] 📊 Is Preference Alignment Always the Best Option to Enhance LLM-Based Translation? An Empirical Analysis（偏好对齐是否总是提升基于LLM的翻译的最佳选择？一项实证分析） [02:27] 🖼 ComfyGen: Prompt-Adaptive Workflows for Text-to-Image Generation（ComfyGen：文本到图像生成的提示自适应工作流） [03:08] 🧠 RATIONALYST: Pre-training Process-Supervision for Improving Reasoning（RATIONALYST：通过预训练过程监督改进推理） [03:45] 🧠 Not All LLM Reasoners Are Created Equal（并非所有LLM推理器都相同） [04:18] 📊 Quantifying Generalization Complexity for Large Language Models（量化大型语言模型的泛化复杂性） [04:59] 🔍 3DGS-DET: Empower 3D Gaussian Splatting with Boundary Guidance and Box-Focused Sampling for 3D Object Detection（3DGS-DET：利用边界引导和框聚焦采样增强3D高斯喷洒进行3D物体检测） [05:45] 🔄 HelpSteer2-Preference: Complementing Ratings with Preferences（HelpSteer2-Preference：通过偏好补充评分） [06:25] 🗣 MOSEL: 950,000 Hours of Speech Data for Open-Source Speech Foundation Model Training on EU Languages（MOSEL：用于欧盟语言开源语音基础模型训练的95万小时语音数据） [07:03] 🤖 Closed-loop Long-horizon Robotic Planning via Equilibrium Sequence Modeling（通过平衡序列建模实现闭环长期机器人规划） [07:40] 🌐 EVER: Exact Volumetric Ellipsoid Rendering for Real-time View Synthesis（EVER：实时视图合成的精确体积椭球体渲染） [08:22] 📄 FactAlign: Long-form Factuality Alignment of Large Language Models（FactAlign：大型语言模型的长篇事实对齐） [08:57] 📹 E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding（E.T. 基准：面向开放式事件级视频语言理解） [09:37] 🌍 BordIRlines: A Dataset for Evaluating Cross-lingual Retrieval-Augmented Generation（BordIRlines：评估跨语言检索增强生成的数据集） [10:13] 🔊 SonicSim: A customizable simulation platform for speech processing in moving sound source scenarios（SonicSim：移动声源场景下语音处理的定制化仿真平台） [10:53] 🔄 HarmoniCa: Harmonizing Training and Inference for Better Feature Cache in Diffusion Transformer Acceleration（HarmoniCa：在扩散Transformer加速中协调训练与推理以实现更好的特征缓存） [11:35] 🔍 Selective Aggregation for Low-Rank Adaptation in Federated Learning（联邦学习中低秩适应的选择性聚合） [12:14] 📚 Old Optimizer, New Norm: An Anthology（旧优化器，新范数：文集） [12:49] 📱 InfiniPot: Infinite Context Processing on Memory-Constrained LLMs（InfiniPot：内存受限的LLM无限上下文处理）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

13分钟

79

6个月前

2024.10.02 每日AI论文 | 跨能力任务表现受限，边缘设备高效部署模型

本期的 13 篇论文如下： [00:26] 🔗 Law of the Weakest Link: Cross Capabilities of Large Language Models（最弱环节法则：大型语言模型的跨能力） [01:05] 🌐 TPI-LLM: Serving 70B-scale LLMs Efficiently on Low-resource Edge Devices（TPI-LLM：在低资源边缘设备上高效服务70B规模的大型语言模型） [01:46] 🌍 Atlas-Chat: Adapting Large Language Models for Low-Resource Moroccan Arabic Dialect（Atlas-Chat：为低资源摩洛哥阿拉伯方言定制的大型语言模型） [02:22] 🎥 One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos（一令分段：视频中的语言指令推理分割） [02:59] 🌐 Flex3D: Feed-Forward 3D Generation With Flexible Reconstruction Model And Input View Curation（Flex3D：利用灵活的重建模型和输入视图优化进行前馈3D生成） [03:46] 🎨 Illustrious: an Open Advanced Illustration Model（辉煌：一个开放的高级插画模型） [04:22] 🚗 SyntheOcc: Synthesize Geometric-Controlled Street View Images through 3D Semantic MPIs（通过3D语义MPIs合成几何控制街景图像） [05:00] 📸 Posterior-Mean Rectified Flow: Towards Minimum MSE Photo-Realistic Image Restoration（后验均值校正流：迈向最小均方误差照片真实图像恢复） [05:47] 🎨 ACE: All-round Creator and Editor Following Instructions via Diffusion Transformer（遵循扩散变换器的全方位创作者和编辑） [06:22] 🎥 Visual Context Window Extension: A New Perspective for Long Video Understanding（视觉上下文窗口扩展：长视频理解的新视角） [07:05] 🤖 Helpful DoggyBot: Open-World Object Fetching using Legged Robots and Vision-Language Models（帮助型DoggyBot：使用四足机器人和视觉语言模型进行开放世界物体抓取） [07:46] 🎥 DressRecon: Freeform 4D Human Reconstruction from Monocular Video（DressRecon：单目视频中的自由形式4D人体重建） [08:32] 🤖 What the Harm? Quantifying the Tangible Impact of Gender Bias in Machine Translation with a Human-centered Study（性别偏见的影响？通过以人为本的研究量化机器翻译中的性别偏见）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

9分钟

78

6个月前

2024.10.01 每日AI论文 | 多模态模型提升图像理解，长度控制方法增强生成精确性。

本期的 11 篇论文如下： [00:26] 🌐 MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning（MM1.5：多模态大语言模型微调的方法、分析与见解） [01:04] 📏 Ruler: A Model-Agnostic Method to Control Generated Length for Large Language Models（Ruler：一种用于控制大型语言模型生成长度的模型无关方法） [01:41] 🗣 DiaSynth -- Synthetic Dialogue Generation Framework（DiaSynth -- 合成对话生成框架） [02:22] 📊 Hyper-Connections（OLMo-1B：探索DHC和SHC中的规模与训练） [02:57] 🤖 UniAff: A Unified Representation of Affordances for Tool Usage and Articulation with Vision-Language Models（UniAff：一种结合视觉语言模型的工具使用和关节运动的统一表示方法） [03:35] 🔍 Cottention: Linear Transformers With Cosine Attention（Cottention：基于余弦注意力的线性变换器） [04:10] 🤖 Scaling Proprioceptive-Visual Learning with Heterogeneous Pre-trained Transformers（通过异构预训练Transformer扩展本体感觉-视觉学习） [04:49] 🏋 Coffee-Gym: An Environment for Evaluating and Improving Natural Language Feedback on Erroneous Code（咖啡健身房：评估和改进错误代码的自然语言反馈环境） [05:29] 🖼 Image Copy Detection for Diffusion Models（扩散模型图像复制检测） [06:09] 🧠 Can Models Learn Skill Composition from Examples?（模型能否从示例中学习技能组合？） [06:43] 🎧 IDEAW: Robust Neural Audio Watermarking with Invertible Dual-Embedding（IDEAW：具有可逆双嵌入的鲁棒神经音频水印）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

7分钟

74

6个月前

2024.09.30 每日AI论文 | Emu3简化多模态设计，MIO提升视频理解表现。

本期的 9 篇论文如下： [00:24] 🧠 Emu3: Next-Token Prediction is All You Need（Emu3：下一个词预测是您所需要的全部） [00:53] 🌐 MIO: A Foundation Model on Multimodal Tokens（多模态标记的基础模型：MIO） [01:26] 🔍 VPTQ: Extreme Low-bit Vector Post-Training Quantization for Large Language Models（VPTQ：大语言模型的极端低比特向量后训练量化） [02:21] 🎥 PhysGen: Rigid-Body Physics-Grounded Image-to-Video Generation（PhysGen：基于刚体物理的图像到视频生成） [03:05] 🔄 Modulated Intervention Preference Optimization (MIPO): Keep the Easy, Refine the Difficult（调制干预偏好优化（MIPO）：保持简单，细化困难） [03:46] 📄 MinerU: An Open-Source Solution for Precise Document Content Extraction（MinerU：一种用于精确文档内容提取的开源解决方案） [04:24] 🤖 MSI-Agent: Incorporating Multi-Scale Insight into Embodied Agents for Superior Planning and Decision-Making（MSI-Agent：将多尺度洞察融入具身代理以提升规划与决策能力） [05:01] 🤖 A Survey on the Honesty of Large Language Models（大型语言模型诚实性综述） [05:45] 📊 LML: Language Model Learning a Dataset for Data-Augmented Prediction（LML：用于数据增强预测的数据集学习语言模型）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

6分钟

61

6个月前

【周末特辑】9月第5周最火AI论文 | 开放权重多模态模型，无调参个性化图像生成。

本期的 5 篇论文如下： [00:42] TOP1(🔥70) | 🌐 Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models（Molmo 和 PixMo：用于最先进多模态模型的开放权重和开放数据） [02:56] TOP2(🔥64) | 🖼 Imagine yourself: Tuning-Free Personalized Image Generation（想象你自己：无调参个性化图像生成） [05:08] TOP3(🔥48) | 🤖 Programming Every Example: Lifting Pre-training Data Quality like Experts at Scale（编程每个示例：大规模提升预训练数据质量如专家） [07:15] TOP4(🔥44) | 😂 YesBut: A High-Quality Annotated Multimodal Dataset for evaluating Satire Comprehension capability of Vision-Language Models（YesBut：一个用于评估视觉语言模型讽刺理解能力的高质量多模态数据集） [09:07] TOP5(🔥39) | 🤖 RACER: Rich Language-Guided Failure Recovery Policies for Imitation Learning（RACER：丰富的语言引导失败恢复策略用于模仿学习）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

11分钟

82

6个月前

2024.09.27 每日AI论文 | 3D感知能力提升，计算开销减少。

本期的 12 篇论文如下： [00:27] 🌐 LLaVA-3D: A Simple yet Effective Pathway to Empowering LMMs with 3D-awareness（LLaVA-3D：一种简单而有效的路径，赋予多模态模型3D感知能力） [01:10] 🧩 MaskLLM: Learnable Semi-Structured Sparsity for Large Language Models（MaskLLM：大型语言模型的可学习半结构化稀疏性） [01:49] 🎭 EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions（EMOVA：赋予语言模型以生动的情感，使其能够看、听和说） [02:35] 🌸 Lotus: Diffusion-based Visual Foundation Model for High-quality Dense Prediction（莲花：基于扩散的高质量密集预测视觉基础模型） [03:15] ⚡ Discovering the Gems in Early Layers: Accelerating Long-Context LLMs with 1000x Input Token Reduction（探索早期层的瑰宝：通过1000倍输入令牌减少加速长上下文LLM） [03:58] 🖼 Pixel-Space Post-Training of Latent Diffusion Models（潜在扩散模型的像素空间后训练） [04:36] 🔍 Reducing the Footprint of Multi-Vector Retrieval with Minimal Performance Impact via Token Pooling（通过令牌池化减少多向量检索的足迹并保持最小性能影响） [05:17] 🎭 Disco4D: Disentangled 4D Human Generation and Animation from a Single Image（Disco4D：从单张图像生成和动画化分离的4D人体模型） [05:55] 🧠 Instruction Following without Instruction Tuning（无需指令微调的指令跟随） [06:30] 📊 The Imperative of Conversation Analysis in the Era of LLMs: A Survey of Tasks, Techniques, and Trends（大语言模型时代对话分析的必要性：任务、技术与趋势综述） [07:07] 🤖 Robot See Robot Do: Imitating Articulated Object Manipulation with Monocular 4D Reconstruction（机器人看机器人做：通过单目4D重建模仿关节物体操作） [07:43] ⚽ Enhancing Structured-Data Retrieval with GraphRAG: Soccer Data Case Study（增强结构化数据检索与GraphRAG：足球数据案例研究）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

8分钟

87

6个月前

2024.09.26 每日AI论文 | 提升预训练数据质量，多模态模型开源创新

本期的 10 篇论文如下： [00:32] 🤖 Programming Every Example: Lifting Pre-training Data Quality like Experts at Scale（编程每个示例：大规模提升预训练数据质量如专家） [01:13] 🌐 Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models（Molmo 和 PixMo：用于最先进多模态模型的开源权重和数据） [01:51] 🩺 Boosting Healthcare LLMs Through Retrieved Context（通过检索上下文提升医疗领域大语言模型） [02:31] 📊 AIM 2024 Sparse Neural Rendering Challenge: Dataset and Benchmark（AIM 2024 稀疏神经渲染挑战：数据集与基准） [03:12] 🎸 Synchronize Dual Hands for Physics-Based Dexterous Guitar Playing（基于物理模拟的灵巧吉他演奏双手同步） [03:52] 🎭 DreamWaltz-G: Expressive 3D Gaussian Avatars from Skeleton-Guided 2D Diffusion（DreamWaltz-G：从骨骼引导的2D扩散生成富有表现力的3D高斯头像） [04:28] 🚁 Game4Loc: A UAV Geo-Localization Benchmark from Game Data（基于游戏数据的无人机地理定位基准） [05:13] 🌐 Degradation-Guided One-Step Image Super-Resolution with Diffusion Priors（基于扩散先验的退化引导一步图像超分辨率） [05:53] 🎥 TalkinNeRF: Animatable Neural Fields for Full-Body Talking Humans（TalkinNeRF：用于全身说话人类动画的可动画神经场） [06:32] 🤖 HyperAgent: Generalist Software Engineering Agents to Solve Coding Tasks at Scale（HyperAgent：通用软件工程代理解决大规模编码任务）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

7分钟

73

6个月前

2024.09.25 每日AI论文 | HelloBench评估长文本生成,迈向通用全能语言模型的未来

本期的 17 篇论文如下： [00:28] 📄 HelloBench: Evaluating Long Text Generation Capabilities of Large Language Models（HelloBench：评估大型语言模型的长文本生成能力） [01:14] 🌐 Making Text Embedders Few-Shot Learners（利用大语言模型使多语言文本嵌入器成为少样本学习者） [01:51] 🌐 OmniBench: Towards The Future of Universal Omni-Language Models（OmniBench：迈向通用全能语言模型的未来） [02:29] 🔄 Present and Future Generalization of Synthetic Image Detectors（合成图像检测器的现状与未来泛化） [03:08] 🎥 MIMO: Controllable Character Video Synthesis with Spatial Decomposed Modeling（MIMO：基于空间分解建模的可控角色视频合成） [03:43] 🔄 MonoFormer: One Transformer for Both Diffusion and Autoregression（MonoFormer：一个Transformer同时处理扩散和自回归） [04:16] 🌍 EuroLLM: Multilingual Language Models for Europe（欧洲多语言模型：EuroLLM） [04:53] 🖼 MaskBit: Embedding-free Image Generation via Bit Tokens（MaskBit: 通过比特令牌实现无嵌入图像生成） [05:33] 👁 Seeing Faces in Things: A Model and Dataset for Pareidolia（事物中的面孔：幻觉模型与数据集） [06:19] 🤖 Gen2Act: Human Video Generation in Novel Scenarios enables Generalizable Robot Manipulation（Gen2Act：在新场景中生成人类视频以实现可泛化的机器人操作） [06:57] 🎨 Improvements to SDXL in NovelAI Diffusion V3（NovelAI Diffusion V3中SDXL的改进） [07:41] 🔄 Reward-Robust RLHF in LLMs（大语言模型中的奖励鲁棒RLHF） [08:16] 🤖 DynaMo: In-Domain Dynamics Pretraining for Visuo-Motor Control（DynaMo：视觉运动控制的域内动力学预训练） [08:54] 🇮 SLIMER-IT: Zero-Shot NER on Italian Language（SLIMER-IT：意大利语零样本命名实体识别） [09:33] 📈 Time-MoE: Billion-Scale Time Series Foundation Models with Mixture of Experts（基于专家混合的十亿级时间序列基础模型） [10:17] 🛡 RRM: Robust Reward Model Training Mitigates Reward Hacking（RRM：鲁棒奖励模型训练缓解奖励作弊） [10:50] 📊 Tabular Data Generation using Binary Diffusion（使用二进制扩散生成表格数据）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

11分钟

73

6个月前

2024.09.24 每日AI论文 | 语言指导增强机器人恢复，AI医生在医学领域展现潜力

本期的 14 篇论文如下： [00:25] 🤖 RACER: Rich Language-Guided Failure Recovery Policies for Imitation Learning（RACER：基于丰富语言引导的模仿学习失败恢复策略） [01:00] 🩺 A Preliminary Study of o1 in Medicine: Are We Closer to an AI Doctor?（医学领域中o1的初步研究：我们离AI医生更近了吗？） [01:34] 🧙 PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions（PixWizard：基于开放语言指令的多功能图像到图像视觉助手） [02:16] 👻 Phantom of Latent for Large Language and Vision Models（大语言与视觉模型中的潜在幻影） [02:55] 🩺 Beyond Fine-tuning: Unleashing the Potential of Continuous Pretraining for Clinical LLMs（超越微调：释放临床大型语言模型连续预训练的潜力） [03:31] 🪞 Reflecting Reality: Enabling Diffusion Models to Produce Faithful Mirror Reflections（反映现实：使扩散模型生成可信的镜像反射） [04:11] 🌟 MaterialFusion: Enhancing Inverse Rendering with Material Diffusion Priors（材料融合：通过材料扩散先验增强逆渲染） [04:51] 🩺 An adapted large language model facilitates multiple medical tasks in diabetes care（适应性大型语言模型在糖尿病护理中的多任务应用） [05:30] 🎭 MaskedMimic: Unified Physics-Based Character Control Through Masked Motion Inpainting（基于掩码运动修复的统一物理角色控制） [06:08] 🤖 Style over Substance: Failure Modes of LLM Judges in Alignment Benchmarking（形式胜于实质：LLM评判者在对齐基准测试中的失效模式） [06:54] 🗣 Zero-shot Cross-lingual Voice Transfer for TTS（零样本跨语言语音转换用于TTS） [07:28] 🌐 SpaceBlender: Creating Context-Rich Collaborative Spaces Through Generative 3D Scene Blending（SpaceBlender：通过生成3D场景融合创建上下文丰富的协作空间） [08:07] 🎵 Self-Supervised Audio-Visual Soundscape Stylization（自监督视听音景风格化） [08:42] 📊 A Case Study of Web App Coding with OpenAI Reasoning Models（使用OpenAI推理模型进行Web应用编码的案例研究）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

9分钟

84

7个月前

2024.09.23 每日AI论文 | 无调优个性化图像生成，多模态讽刺理解评估

本期的 11 篇论文如下： [00:26] 🎨 Imagine yourself: Tuning-Free Personalized Image Generation（想象自己：无调优个性化图像生成） [01:02] 😂 YesBut: A High-Quality Annotated Multimodal Dataset for evaluating Satire Comprehension capability of Vision-Language Models（YesBut：评估视觉语言模型讽刺理解能力的高质量多模态数据集） [01:40] 🌍 Prithvi WxC: Foundation Model for Weather and Climate（Prithvi WxC：天气和气候的基础模型） [02:15] 🎵 MuCodec: Ultra Low-Bitrate Music Codec（MuCodec：超低比特率音乐编解码器） [02:51] 🌈 Colorful Diffuse Intrinsic Image Decomposition in the Wild（在野外进行彩色漫反射内在图像分解） [03:29] 🎥 Portrait Video Editing Empowered by Multimodal Generative Priors（基于多模态生成先验的肖像视频编辑） [04:01] 🎥 Temporally Aligned Audio for Video with Autoregression（基于自回归的视频音频时间对齐生成） [04:38] 📱 V^3: Viewing Volumetric Videos on Mobiles via Streamable 2D Dynamic Gaussians（V^3：通过可流式2D动态高斯函数在移动设备上观看体积视频） [05:21] 📚 Fact, Fetch, and Reason: A Unified Evaluation of Retrieval-Augmented Generation（事实、获取与推理：检索增强生成的统一评估） [05:57] 🛡 Hackphyr: A Local Fine-Tuned LLM Agent for Network Security Environments（Hackphyr：用于网络安全环境的本地微调LLM代理） [06:34] 🎻 Minstrel: Structural Prompt Generation with Multi-Agents Coordination for Non-AI Experts（Minstrel：面向非AI专家的多智能体协同结构化提示生成）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

7分钟

77

7个月前