节目列表: HuggingFace 每日AI论文速递 - EarsOnMe

2024.07.24 每日AI论文 | 医疗代理的可解释性、视频生成基准测试、虚拟试衣技术

大家好，欢迎收听“Hugging Face 每日AI论文速递”。今天是2024年7月24日，我们将带您快速浏览今日的11篇热门AI论文，内容涵盖医疗代理的可解释性、视频生成基准测试、虚拟试衣技术等多个前沿领域。现在，让我们立即进入精彩的论文世界。 [00:27] 🔗 CoD, Towards an Interpretable Medical Agent using Chain of Diagnosis（CoD：一种基于诊断链的可解释医疗代理） [00:54] 🔍 KAN or MLP: A Fairer Comparison（KAN或MLP：更公平的比较） [01:20] 🎥 T2V-CompBench: A Comprehensive Benchmark for Compositional Text-to-video Generation（T2V-CompBench: 组合文本到视频生成的全面基准测试） [02:00] 👕 OutfitAnyone: Ultra-high Quality Virtual Try-On for Any Clothing and Any Person（OutfitAnyone：为任何服装和任何人物提供超高保真度的虚拟试衣） [02:35] 🎬 MovieDreamer: Hierarchical Generation for Coherent Long Visual Sequence（MovieDreamer：连贯长视觉序列的分层生成） [03:08] 🤝 F-HOI: Toward Fine-grained Semantic-Aligned 3D Human-Object Interactions（F-HOI：面向细粒度语义对齐的三维人体-物体交互） [03:44] 🌐 INF-LLaVA: Dual-perspective Perception for High-Resolution Multimodal Large Language Model（INF-LLaVA：双视角感知用于高分辨率多模态大语言模型） [04:24] 🎥 SIGMA: Sinkhorn-Guided Masked Video Modeling（SIGMA: Sinkhorn-Guided Masked Video Modeling） [05:00] 🏁 A Simulation Benchmark for Autonomous Racing with Large-Scale Human Data（基于Assetto Corsa模拟器的自主赛车算法测试、验证和基准平台开发） [05:31] 🤖 Cross Anything: General Quadruped Robot Navigation through Complex Terrains（复杂地形中的通用四足机器人导航系统） [06:00] 🛡 PrimeGuard: Safe and Helpful LLMs through Tuning-Free Routing（PrimeGuard：无需调优的动态路由实现语言模型安全与帮助性）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

6分钟

79

2024.07.23 每日AI论文 | 大型语言模型、多模态处理、3D世界生成

大家好，欢迎收听“Hugging Face 每日AI论文速递”。今天是2024年7月23日，我们将带您快速浏览今日的20篇热门AI论文，涵盖了大型语言模型、多模态处理、3D世界生成等多个前沿领域。现在，让我们立即进入精彩的论文世界。 [00:24] 📚 Knowledge Mechanisms in Large Language Models: A Survey and Perspective（大型语言模型中的知识机制：综述与展望） [00:55] 🔍 NNsight and NDIF: Democratizing Access to Foundation Model Internals（NNsight与NDIF：普及基础模型内部访问） [01:41] 📊 POGEMA: A Benchmark Platform for Cooperative Multi-Agent Navigation（POGEMA：合作多智能体导航的基准平台） [02:15] 🎥 SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models（SlowFast-LLaVA：一种无需额外训练的视频大型语言模型的强基线方法） [02:40] 📺 LongVideoBench: A Benchmark for Long-context Interleaved Video-Language Understanding（LongVideoBench：长上下文交错视频语言理解基准测试） [03:14] 🎮 VideoGameBunny: Towards vision assistants for video games（VideoGameBunny：面向视频游戏的视觉助手） [03:49] 🌐 BoostMVSNeRFs: Boosting MVS-based NeRFs to Generalizable View Synthesis in Large-scale Scenes（BoostMVSNeRFs：提升基于MVS的NeRF在大规模场景中的通用视图合成质量） [04:29] 🌐 AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?（AssistantBench：网络代理能否解决现实且耗时的任务？） [05:04] 🌐 HoloDreamer: Holistic 3D Panoramic World Generation from Text Descriptions（HoloDreamer：从文本描述生成全景3D世界的整体框架） [05:36] 📚 BOND: Aligning LLMs with Best-of-N Distillation（BOND：将LLMs与Best-of-N蒸馏对齐） [06:10] 📊 MIBench: Evaluating Multimodal Large Language Models over Multiple Images（MIBench：评估多模态大型语言模型在多图像场景下的表现） [06:41] 🎶 MusiConGen: Rhythm and Chord Control for Transformer-Based Text-to-Music Generation（MusiConGen：基于Transformer的文本到音乐生成中的节奏和和弦控制） [07:19] 🔧 Conditioned Language Policy: A General Framework for Steerable Multi-Objective Finetuning（条件语言策略：可操纵多目标微调的通用框架） [07:56] 🎭 Temporal Residual Jacobians For Rig-free Motion Transfer（无绑定运动转移的时间残差雅可比） [08:28] 📉 Consent in Crisis: The Rapid Decline of the AI Data Commons（危机中的同意：AI数据共享的快速衰退） [08:53] 🎨 Artist: Aesthetically Controllable Text-Driven Stylization without Training（Artist：无需训练的文本驱动美学可控风格化） [09:26] 🎥 Cinemo: Consistent and Controllable Image Animation with Motion Diffusion Models（Cinemo：基于运动扩散模型的图像动画一致性与可控性） [09:56] 🎥 Local All-Pair Correspondence for Point Tracking（局部全对应对应点跟踪） [10:24] 🔥 ThermalNeRF: Thermal Radiance Fields（热辐射场：热辐射场） [10:55] 🤖 GET-Zero: Graph Embodiment Transformer for Zero-shot Embodiment Generalization（GET-Zero：零样本实体泛化的图实体变换器）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

11分钟

99+

2024.07.22 每日AI论文 | 视觉-语言模型、长上下文LLM推理、文本到3D生成

大家好，欢迎收听'Hugging Face 每日AI论文速递'。今天是2024年7月22日，我们将带您快速浏览今日的15篇热门AI论文，涵盖视觉-语言模型、长上下文LLM推理、文本到3D生成等多个前沿领域。精彩内容，马上开始！ [00:25] 🧠 EVLM: An Efficient Vision-Language Model for Visual Understanding（EVLM：一种用于视觉理解的高效视觉-语言模型） [00:55] 📚 ChatQA 2: Bridging the Gap to Proprietary LLMs in Long Context and RAG Capabilities（ChatQA 2：弥合开放访问LLMs与专有LLMs在长上下文与RAG能力上的差距） [01:32] ⚡ LazyLLM: Dynamic Token Pruning for Efficient Long Context LLM Inference（LazyLLM：动态令牌剪枝技术在长上下文LLM推理中的高效应用） [02:05] 🤖 The Vision of Autonomic Computing: Can LLMs Make It a Reality?（自主计算愿景：LLMs能否使其成为现实？） [02:35] 🔊 Stable Audio Open（稳定音频开放） [03:07] 📄 VisFocus: Prompt-Guided Vision Encoders for OCR-Free Dense Document Understanding（VisFocus：无需OCR的视觉编码器用于密集文档理解） [03:39] 📄 Visual Text Generation in the Wild（真实场景中的视觉文本生成） [04:10] 🚀 Jumping Ahead: Improving Reconstruction Fidelity with JumpReLU Sparse Autoencoders（跳跃前进：通过JumpReLU稀疏自动编码器提高重建保真度） [04:44] 🔬 SciCode: A Research Coding Benchmark Curated by Scientists（SciCode：科学家策划的研究编码基准） [05:16] 🚀 Fast Matrix Multiplications for Lookup Table-Quantized LLMs（大型语言模型的查找表量化快速矩阵乘法） [05:51] 🌐 PlacidDreamer: Advancing Harmony in Text-to-3D Generation（PlacidDreamer：推进文本到3D生成的和谐） [06:28] 🔄 Phi-3 Safety Post-Training: Aligning Language Models with a "Break-Fix" Cycle（Phi-3安全后训练：通过“break-fix”循环对齐语言模型） [06:59] 🎵 Efficient Audio Captioning with Encoder-Level Knowledge Distillation（基于编码器级知识蒸馏的高效音频描述） [07:27] 📚 Qalam : A Multimodal LLM for Arabic Optical Character and Handwriting Recognition（Qalam：一种用于阿拉伯光学字符和手写识别的多模态大型语言模型） [08:03] 🌐 SparseCraft: Few-Shot Neural Reconstruction through Stereopsis Guided Geometric Linearization（SparseCraft：基于立体视觉引导的几何线性化少样本神经重建）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

84

【周末特辑】7月第3周最火AI论文 (2024.07.15~07.19)

大家好，欢迎收听《Hugging Face 每日AI论文速递》周末特辑。每周日，我们都会为您带来一周内Hugging Face上最受欢迎的AI论文摘要。本周我们关注的日期是从2024年7月15日至7月19日。本期节目将带您了解五篇精选论文，涵盖了从Qwen2技术报告到大型语言模型在电子表格处理中的应用，再到三元、量化和FP16语言模型的综合研究，以及无限上下文LLMs中的人类似事件记忆机制，最后是针对LLM代理的红队测试方法。现在，让我们立即进入本期节目的详细内容。 [00:45] TOP1(🔥140) | 📊 Qwen2 Technical Report（Qwen2技术报告） [02:55] TOP2(🔥102) | 📊 SpreadsheetLLM: Encoding Spreadsheets for Large Language Models（SpreadsheetLLM：编码电子表格以供大型语言模型使用） [04:50] TOP3(🔥59) | 📚 Spectra: A Comprehensive Study of Ternary, Quantized, and FP16 Language Models（光谱：三元、量化和FP16语言模型的综合研究） [06:36] TOP4(🔥48) | 🧠 Human-like Episodic Memory for Infinite Context LLMs（人类似的事件记忆机制在无限上下文LLMs中的应用） [08:22] TOP5(🔥42) | 🔍 AgentPoison: Red-teaming LLM Agents via Poisoning Memory or Knowledge Bases（AgentPoison：通过毒化记忆或知识库对LLM代理进行红队测试）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

10分钟

82

2024.07.19 每日AI论文 | 大型语言模型的扩展规律、多模态模型可信度研究、检索增强机器学习

大家好，欢迎收听《Hugging Face 每日AI论文速递》。今天是2024年7月19日，我们将带您快速浏览今日的14篇热门AI论文，内容涵盖大型语言模型的扩展规律、多模态模型可信度研究以及检索增强机器学习等前沿话题。现在，让我们立即进入精彩的论文世界吧！ [00:28] 📚 Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies（词汇表大小对大型语言模型扩展规律的影响研究） [01:00] 📚 Scaling Retrieval-Based Language Models with a Trillion-Token Datastore（基于万亿标记数据存储库扩展检索型语言模型） [01:46] 🌆 Streetscapes: Large-scale Consistent Street View Generation Using Autoregressive Video Diffusion（街景生成：利用自回归视频扩散生成大规模一致性街景视图） [02:19] 📊 Understanding Reference Policies in Direct Preference Optimization（理解直接偏好优化中的参考策略） [02:50] 📊 Benchmarking Trustworthiness of Multimodal Large Language Models: A Comprehensive Study（多模态大型语言模型可信度综合研究基准） [03:23] 📏 Scaling Granite Code Models to 128K Context（扩展Granite代码模型至128K上下文） [03:56] 📹 Shape of Motion: 4D Reconstruction from a Single Video（运动形态：单视频4D重建） [04:26] 🔧 CodeV: Empowering LLMs for Verilog Generation through Multi-Level Summarization（CodeV：通过多级摘要增强LLMs进行Verilog生成） [04:53] 📚 Attention Overflow: Language Model Input Blur during Long-Context Missing Items Recommendation（注意力溢出：长上下文缺失项推荐中的语言模型输入模糊） [05:23] 🧠 BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval（BRIGHT：一个现实且具有挑战性的密集推理检索基准） [05:54] 📊 PM-LLM-Benchmark: Evaluating Large Language Models on Process Mining Tasks（PM-LLM-Benchmark：评估大型语言模型在过程挖掘任务中的表现） [06:35] 📊 Benchmark Agreement Testing Done Right: A Guide for LLM Benchmark Evaluation（正确的基准一致性测试：LLM基准评估指南） [07:12] 📚 Retrieval-Enhanced Machine Learning: Synthesis and Opportunities（检索增强机器学习：综合与机遇） [07:48] 📄 A Comparative Study on Automatic Coding of Medical Letters with Explainability（医疗信件自动编码的可解释性比较研究）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

8分钟

99+

2024.07.18 每日AI论文 | 语言模型的综合研究、多模态模型评估、以及视频处理技术

大家好，欢迎收听“Hugging Face 每日AI论文速递”。今天是2024年7月18日，我们将带您快速浏览今日的13篇热门AI论文，内容涵盖语言模型的综合研究、多模态模型评估、以及视频处理技术等前沿领域。现在，让我们立即进入精彩的论文世界吧！ [00:25] 📚 Spectra: A Comprehensive Study of Ternary, Quantized, and FP16 Language Models（光谱：三元、量化和FP16语言模型的综合研究） [00:56] 🔍 AgentPoison: Red-teaming LLM Agents via Poisoning Memory or Knowledge Bases（AgentPoison：通过毒化记忆或知识库对LLM代理进行红队测试） [01:36] 📊 LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models（LMMs-Eval：大型多模态模型评估的现实检查） [02:12] 🌐 E5-V: Universal Embeddings with Multimodal Large Language Models（E5-V：多模态大语言模型的通用嵌入） [02:43] 🔍 Patch-Level Training for Large Language Models（大型语言模型的补丁级训练） [03:17] 🤖 Case2Code: Learning Inductive Reasoning with Synthetic Data（Case2Code：利用合成数据学习归纳推理） [03:53] 👗 IMAGDressing-v1: Customizable Virtual Dressing（IMAGDressing-v1: 可定制的虚拟装扮） [04:31] 🎥 VD3D: Taming Large Video Diffusion Transformers for 3D Camera Control（VD3D：驯服大型视频扩散Transformer以实现3D摄像机控制） [05:08] 🐠 Goldfish: Vision-Language Understanding of Arbitrarily Long Videos（金鱼：理解任意长度视频的视觉语言） [05:48] 🎵 Audio Conditioning for Music Generation via Discrete Bottleneck Features（基于离散瓶颈特征的音频条件化音乐生成） [06:23] 📷 Splatfacto-W: A Nerfstudio Implementation of Gaussian Splatting for Unconstrained Photo Collections（Splatfacto-W：一种用于不受约束照片集合的高斯光栅化Nerfstudio实现） [07:02] 🚫 The Art of Saying No: Contextual Noncompliance in Language Models（说不的艺术：语言模型中的情境性非遵守） [07:41] 🚀 GoldFinch: High Performance RWKV/Transformer Hybrid with Linear Pre-Fill and Extreme KV-Cache Compression（GoldFinch：高性能RWKV/Transformer混合模型，具有线性预填充和极端KV-Cache压缩）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

8分钟

75

2024.07.17 每日AI论文 | 大型语言模型的推理能力、多模态模型的评估工具、3D模型动画化

大家好，欢迎收听“Hugging Face 每日AI论文速递”。今天是2024年7月17日，我们将带您快速浏览今日的18篇热门AI论文，涵盖了大型语言模型的推理能力、多模态模型的评估工具、以及3D模型动画化等前沿话题。现在，让我们立即进入精彩的论文世界吧！ [00:26] 📚 NeedleBench: Can LLMs Do Retrieval and Reasoning in 1 Million Context Window?（NeedleBench：大型语言模型在100万个上下文窗口中进行检索和推理的能力如何？） [01:07] 🎥 Ref-AVS: Refer and Segment Objects in Audio-Visual Scenes（Ref-AVS：音频-视觉场景中的参考与分割对象） [01:41] 🎤 Qwen2-Audio Technical Report（Qwen2-Audio技术报告） [02:14] 🤖 Sibyl: Simple yet Effective Agent Framework for Complex Real-world Reasoning（Sibyl：简单而有效的复杂现实推理代理框架） [02:50] 📈 Scaling Diffusion Transformers to 16 Billion Parameters（扩展扩散Transformer至160亿参数） [03:24] 🌐 DreamCatalyst: Fast and High-Quality 3D Editing via Controlling Editability and Identity Preservation（DreamCatalyst：通过控制编辑能力和身份保持实现快速且高质量的3D编辑） [03:59] 📊 VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models（VLMEvalKit：评估大型多模态模型的开源工具包） [04:37] ⚡ Efficient Training with Denoised Neural Weights（使用去噪神经权重的有效训练） [05:16] 🎥 Animate3D: Animating Any 3D Model with Multi-view Video Diffusion（Animate3D：使用多视角视频扩散动画化任何3D模型） [05:50] 📊 From GaLore to WeLore: How Low-Rank Weights Non-uniformly Emerge from Low-Rank Gradients（从GaLore到WeLore：低秩权重如何非均匀地从低秩梯度中涌现） [06:29] 📚 YouTube-SL-25: A Large-Scale, Open-Domain Multilingual Sign Language Parallel Corpus（YouTube-SL-25：一个大规模、开放领域多语种手语并行语料库） [07:05] 📊 Vibravox: A Dataset of French Speech Captured with Body-conduction Audio Sensors（Vibravox：使用身体传导音频传感器捕获的法语语音数据集） [07:44] 🔄 FIRE: A Dataset for Feedback Integration and Refinement Evaluation of Multimodal Models（FIRE：多模态模型反馈集成与细化评估数据集） [08:27] 🌐 OmniBind: Large-scale Omni Multimodal Representation via Binding Spaces（OmniBind：通过绑定空间实现大规模多模态表示） [09:06] 🔬 Data-Juicer Sandbox: A Comprehensive Suite for Multimodal Data-Model Co-development（数据榨汁机沙盒：多模态数据-模型协同开发的综合套件） [09:36] 🔍 Click-Gaussian: Interactive Segmentation to Any 3D Gaussians（Click-Gaussian：交互式分割至任意3D高斯） [10:12] 🤖 Grasping Diverse Objects with Simulated Humanoids（模拟人类机器人抓取多样物体） [10:42] 🔍 Uncertainty is Fragile: Manipulating Uncertainty in Large Language Models（不确定性是脆弱的：操纵大型语言模型中的不确定性）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

11分钟

62

2024.07.16 每日AI论文 | 大型语言模型隐私风险，视频处理技术创新

大家好，欢迎收听'Hugging Face 每日AI论文速递'。今天是2024年7月16日，我们将带您快速浏览今日的13篇热门AI论文。本期内容涵盖了从大型语言模型的隐私风险到视频处理技术的创新，以及多语言模型的测试等多个前沿领域。现在，让我们立即进入精彩的论文世界吧！ [00:26] 📊 Qwen2 Technical Report（Qwen2技术报告） [01:10] 🔒 Learning to Refuse: Towards Mitigating Privacy Risks in LLMs（学会拒绝：减轻LLMs中的隐私风险） [01:50] 📊 The Good, The Bad, and The Greedy: Evaluation of LLMs Should Not Ignore Non-Determinism（好、坏与贪婪：评估LLMs时不应忽视非确定性） [02:34] 🔍 Q-Sparse: All Large Language Models can be Fully Sparsely-Activated（Q-Sparse：所有大型语言模型都可以完全稀疏激活） [03:09] 🤖 GRUtopia: Dream General Robots in a City at Scale（GRUtopia：大规模城市中梦想通用机器人的研究） [03:46] 🎥 Masked Generative Video-to-Audio Transformers with Enhanced Synchronicity（具有增强同步性的掩码生成视频到音频转换器） [04:22] 🤖 Make-An-Agent: A Generalizable Policy Network Generator with Behavior-Prompted Diffusion（Make-An-Agent：基于行为提示的扩散模型的通用策略网络生成器） [04:55] 🔄 SHERL: Synthesizing High Accuracy and Efficient Memory for Resource-Limited Transfer Learning（SHERL：资源有限迁移学习中的高精度和高效内存合成） [05:34] 📹 Video Occupancy Models（视频占用模型） [06:11] 🎥 Noise Calibration: Plug-and-play Content-Preserving Video Enhancement using Pre-trained Video Diffusion Models（噪声校准：利用预训练视频扩散模型进行内容保持的视频增强） [06:51] 🌟 DataDream: Few-shot Guided Dataset Generation（DataDream：少样本引导的数据集生成） [07:29] 📚 MMM: Multilingual Mutual Reinforcement Effect Mix Datasets & Test with Open-domain Information Extraction Large Language Models（MMM：多语言互增强效应混合数据集与开放领域信息提取大型语言模型测试） [08:09] 🔬 LAB-Bench: Measuring Capabilities of Language Models for Biology Research（LAB-Bench：评估语言模型在生物学研究中的能力）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

79

2024.07.15 每日AI论文 | 大型语言模型的应用、模型更新策略、多模态问答数据集

大家好，欢迎收听“Hugging Face 每日AI论文速递”。今天是2024年7月15日，我们将带您快速浏览今日的14篇热门AI论文，内容涵盖了大型语言模型的应用、模型更新策略、以及多模态问答数据集等前沿话题。现在，让我们立即进入精彩的论文世界吧！ [00:26] 📊 SpreadsheetLLM: Encoding Spreadsheets for Large Language Models（SpreadsheetLLM：编码电子表格以供大型语言模型使用） [01:10] 🧠 Human-like Episodic Memory for Infinite Context LLMs（人类似的事件记忆机制在无限上下文LLMs中的应用） [01:45] 🔄 MUSCLE: A Model Update Strategy for Compatible LLM Evolution（MUSCLE：一种兼容LLM演化的模型更新策略） [02:20] 📱 H2O-Danube3 Technical Report（H2O-Danube3技术报告） [02:56] 🎲 GAVEL: Generating Games Via Evolution and Language Models（GAVEL：通过进化和语言模型生成游戏） [03:41] 🎨 Transformer Layers as Painters（Transformer层作为画家） [04:10] 📊 New Desiderata for Direct Preference Optimization（直接偏好优化的新需求） [04:47] 📚 Characterizing Prompt Compression Methods for Long Context Inference（长上下文推理中提示压缩方法的特性分析） [05:21] 📊 Understanding Retrieval Robustness for Retrieval-Augmented Image Captioning（理解增强检索型图像描述模型的鲁棒性） [05:53] 🎨 StyleSplat: 3D Object Style Transfer with Gaussian Splatting（StyleSplat：使用高斯散射进行3D对象风格转换） [06:28] 🎥 TCAN: Animating Human Images with Temporally Consistent Pose Guidance using Diffusion Models（TCAN：利用时间一致性姿势引导的扩散模型进行人类图像动画处理） [07:10] 🛡 Refuse Whenever You Feel Unsafe: Improving Safety in LLMs via Decoupled Refusal Training（拒绝危险：通过解耦拒绝训练提高大型语言模型的安全性） [07:44] 🔧 Model Surgery: Modulating LLM's Behavior Via Simple Parameter Editing（模型手术：通过简单参数编辑调节LLM行为） [08:20] 📚 SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers（SPIQA：用于科学论文多模态问答的数据集）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

76

2024.07.12 每日AI论文

Hugging Face 每日AI论文速递每天10分钟，带您快速了解当日HuggingFace热门AI论文内容今天带来的 15 篇论文如下： 📊 Skywork-Math: Data Scaling Laws for Mathematical Reasoning in Large Language Models -- The Story Goes On（Skywork-Math：大型语言模型中数学推理能力的数据规模定律 -- 故事继续） 📊 MAVIS: Mathematical Visual Instruction Tuning（MAVIS：数学视觉指令调优） 📹 Video Diffusion Alignment via Reward Gradients（通过奖励梯度实现视频扩散对齐） 🔍 MambaVision: A Hybrid Mamba-Transformer Vision Backbone（MambaVision：一种混合Mamba-Transformer视觉骨干网络） 📊 GTA: A Benchmark for General Tool Agents（GTA：通用工具代理基准） 📊 The Synergy between Data and Multi-Modal Large Language Models: A Survey from Co-Development Perspective（数据与多模态大型语言模型的协同作用：从协同发展角度的调查） 🌐 DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception（DenseFusion-1M：整合视觉专家以实现全面多模态感知） 🎥 Live2Diff: Live Stream Translation via Uni-directional Attention in Video Diffusion Models（Live2Diff：基于单向注意力机制的视频扩散模型实现直播翻译） 🌲 Gradient Boosting Reinforcement Learning（梯度提升强化学习） 📉 Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients（Q-GaLore：使用INT4投影和层适应低秩梯度的量化GaLore） 📖 SEED-Story: Multimodal Long Story Generation with Large Language Model（SEED-Story：基于大型语言模型的多模态长故事生成） 📹 Generalizable Implicit Motion Modeling for Video Frame Interpolation（可泛化的隐式运动建模用于视频帧插值） 📊 OmniNOCS: A unified NOCS dataset and model for 3D lifting of 2D objects（OmniNOCS：用于2D物体3D提升的统一NOCS数据集与模型） 🎤 Autoregressive Speech Synthesis without Vector Quantization（无需向量量化的自回归语音合成） 🌍 WildGaussians: 3D Gaussian Splatting in the Wild（WildGaussians：自然环境中的3D高斯喷洒）【关注我们，获取更多信息】小红书: AI速递

10分钟

99+

2024.07.11 每日AI论文

Hugging Face 每日AI论文速递每天10分钟，带您快速了解当日HuggingFace热门AI论文内容今天带来的 14 篇论文如下： 🌐 PaliGemma: A versatile 3B VLM for transfer（PaliGemma：一种多功能3B视觉语言模型用于迁移） 🌐 LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models（LLaVA-NeXT-Interleave：在大规模多模态模型中处理多图像、视频和3D问题） 🚀 Inference Performance Optimization for Large Language Models on CPUs（CPU上大型语言模型推理性能优化） 🌐 Controlling Space and Time with Diffusion Models（使用扩散模型控制空间和时间） 🎥🔊 Video-to-Audio Generation with Hidden Alignment（基于隐藏对齐的视频到音频生成） 🎥 VEnhancer: Generative Space-Time Enhancement for Video Generation（VEnhancer：生成空间-时间增强的视频生成技术） 📊 On Leakage of Code Generation Evaluation Datasets（关于代码生成评估数据集泄露的问题） 🔍 Do Vision and Language Models Share Concepts? A Vector Space Alignment Study（视觉和语言模型是否共享概念？一项向量空间对齐研究） 🤖 This&That: Language-Gesture Controlled Video Generation for Robot Planning（This&That：基于语言和手势控制的机器人视频生成规划） 🌌 CosmoCLIP: Generalizing Large Vision-Language Models for Astronomical Imaging（CosmoCLIP：通用大型视觉语言模型在天文图像处理中的应用） 🎥 Still-Moving: Customized Video Generation without Customized Video Data（Still-Moving：无需定制视频数据的定制化视频生成） 📊 An accurate detection is not all you need to combat label noise in web-noisy datasets（在网络噪声数据集中对抗标签噪声的准确检测并非全部所需） 🤖 BiGym: A Demo-Driven Mobile Bi-Manual Manipulation Benchmark（BiGym：移动双手机器人演示驱动操作基准） 👥 CrowdMoGen: Zero-Shot Text-Driven Collective Motion Generation（CrowdMoGen：零样本文本驱动的人群运动生成）【关注我们，获取更多信息】小红书：AI速递

23