节目列表: HuggingFace 每日AI论文速递 - EarsOnMe | 发现和收听来自小宇宙的热门播客

2024.07.15 每日AI论文 | 大型语言模型的应用、模型更新策略、多模态问答数据集

大家好，欢迎收听“Hugging Face 每日AI论文速递”。今天是2024年7月15日，我们将带您快速浏览今日的14篇热门AI论文，内容涵盖了大型语言模型的应用、模型更新策略、以及多模态问答数据集等前沿话题。现在，让我们立即进入精彩的论文世界吧！ [00:26] 📊 SpreadsheetLLM: Encoding Spreadsheets for Large Language Models（SpreadsheetLLM：编码电子表格以供大型语言模型使用） [01:10] 🧠 Human-like Episodic Memory for Infinite Context LLMs（人类似的事件记忆机制在无限上下文LLMs中的应用） [01:45] 🔄 MUSCLE: A Model Update Strategy for Compatible LLM Evolution（MUSCLE：一种兼容LLM演化的模型更新策略） [02:20] 📱 H2O-Danube3 Technical Report（H2O-Danube3技术报告） [02:56] 🎲 GAVEL: Generating Games Via Evolution and Language Models（GAVEL：通过进化和语言模型生成游戏） [03:41] 🎨 Transformer Layers as Painters（Transformer层作为画家） [04:10] 📊 New Desiderata for Direct Preference Optimization（直接偏好优化的新需求） [04:47] 📚 Characterizing Prompt Compression Methods for Long Context Inference（长上下文推理中提示压缩方法的特性分析） [05:21] 📊 Understanding Retrieval Robustness for Retrieval-Augmented Image Captioning（理解增强检索型图像描述模型的鲁棒性） [05:53] 🎨 StyleSplat: 3D Object Style Transfer with Gaussian Splatting（StyleSplat：使用高斯散射进行3D对象风格转换） [06:28] 🎥 TCAN: Animating Human Images with Temporally Consistent Pose Guidance using Diffusion Models（TCAN：利用时间一致性姿势引导的扩散模型进行人类图像动画处理） [07:10] 🛡 Refuse Whenever You Feel Unsafe: Improving Safety in LLMs via Decoupled Refusal Training（拒绝危险：通过解耦拒绝训练提高大型语言模型的安全性） [07:44] 🔧 Model Surgery: Modulating LLM's Behavior Via Simple Parameter Editing（模型手术：通过简单参数编辑调节LLM行为） [08:20] 📚 SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers（SPIQA：用于科学论文多模态问答的数据集）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

9分钟

60

9个月前

2024.07.12 每日AI论文

Hugging Face 每日AI论文速递每天10分钟，带您快速了解当日HuggingFace热门AI论文内容今天带来的 15 篇论文如下： 📊 Skywork-Math: Data Scaling Laws for Mathematical Reasoning in Large Language Models -- The Story Goes On（Skywork-Math：大型语言模型中数学推理能力的数据规模定律 -- 故事继续） 📊 MAVIS: Mathematical Visual Instruction Tuning（MAVIS：数学视觉指令调优） 📹 Video Diffusion Alignment via Reward Gradients（通过奖励梯度实现视频扩散对齐） 🔍 MambaVision: A Hybrid Mamba-Transformer Vision Backbone（MambaVision：一种混合Mamba-Transformer视觉骨干网络） 📊 GTA: A Benchmark for General Tool Agents（GTA：通用工具代理基准） 📊 The Synergy between Data and Multi-Modal Large Language Models: A Survey from Co-Development Perspective（数据与多模态大型语言模型的协同作用：从协同发展角度的调查） 🌐 DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception（DenseFusion-1M：整合视觉专家以实现全面多模态感知） 🎥 Live2Diff: Live Stream Translation via Uni-directional Attention in Video Diffusion Models（Live2Diff：基于单向注意力机制的视频扩散模型实现直播翻译） 🌲 Gradient Boosting Reinforcement Learning（梯度提升强化学习） 📉 Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients（Q-GaLore：使用INT4投影和层适应低秩梯度的量化GaLore） 📖 SEED-Story: Multimodal Long Story Generation with Large Language Model（SEED-Story：基于大型语言模型的多模态长故事生成） 📹 Generalizable Implicit Motion Modeling for Video Frame Interpolation（可泛化的隐式运动建模用于视频帧插值） 📊 OmniNOCS: A unified NOCS dataset and model for 3D lifting of 2D objects（OmniNOCS：用于2D物体3D提升的统一NOCS数据集与模型） 🎤 Autoregressive Speech Synthesis without Vector Quantization（无需向量量化的自回归语音合成） 🌍 WildGaussians: 3D Gaussian Splatting in the Wild（WildGaussians：自然环境中的3D高斯喷洒）【关注我们，获取更多信息】小红书: AI速递

10分钟

99

9个月前

2024.07.11 每日AI论文

Hugging Face 每日AI论文速递每天10分钟，带您快速了解当日HuggingFace热门AI论文内容今天带来的 14 篇论文如下： 🌐 PaliGemma: A versatile 3B VLM for transfer（PaliGemma：一种多功能3B视觉语言模型用于迁移） 🌐 LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models（LLaVA-NeXT-Interleave：在大规模多模态模型中处理多图像、视频和3D问题） 🚀 Inference Performance Optimization for Large Language Models on CPUs（CPU上大型语言模型推理性能优化） 🌐 Controlling Space and Time with Diffusion Models（使用扩散模型控制空间和时间） 🎥🔊 Video-to-Audio Generation with Hidden Alignment（基于隐藏对齐的视频到音频生成） 🎥 VEnhancer: Generative Space-Time Enhancement for Video Generation（VEnhancer：生成空间-时间增强的视频生成技术） 📊 On Leakage of Code Generation Evaluation Datasets（关于代码生成评估数据集泄露的问题） 🔍 Do Vision and Language Models Share Concepts? A Vector Space Alignment Study（视觉和语言模型是否共享概念？一项向量空间对齐研究） 🤖 This&That: Language-Gesture Controlled Video Generation for Robot Planning（This&That：基于语言和手势控制的机器人视频生成规划） 🌌 CosmoCLIP: Generalizing Large Vision-Language Models for Astronomical Imaging（CosmoCLIP：通用大型视觉语言模型在天文图像处理中的应用） 🎥 Still-Moving: Customized Video Generation without Customized Video Data（Still-Moving：无需定制视频数据的定制化视频生成） 📊 An accurate detection is not all you need to combat label noise in web-noisy datasets（在网络噪声数据集中对抗标签噪声的准确检测并非全部所需） 🤖 BiGym: A Demo-Driven Mobile Bi-Manual Manipulation Benchmark（BiGym：移动双手机器人演示驱动操作基准） 👥 CrowdMoGen: Zero-Shot Text-Driven Collective Motion Generation（CrowdMoGen：零样本文本驱动的人群运动生成）【关注我们，获取更多信息】小红书：AI速递

9分钟

19

9个月前

2024.07.10 每日AI论文

Hugging Face 每日AI论文速递每天10分钟，带您快速了解当日HuggingFace热门AI论文内容今天带来的 16 篇论文如下： 👓 Vision language models are blind（视觉语言模型是盲的） 📹 Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision（视频-STaR：自训练实现视频指令调整与任意监督） 🌐 Internet of Agents: Weaving a Web of Heterogeneous Agents for Collaborative Intelligence（代理互联网：编织异构代理网络以实现协作智能） 👤 RodinHD: High-Fidelity 3D Avatar Generation with Diffusion Models（RodinHD：使用扩散模型生成高保真3D虚拟形象） 📚 AgentInstruct: Toward Generative Teaching with Agentic Flows（AgentInstruct：通过代理流程实现生成教学） 📚 Adapting LLMs to Hebrew: Unveiling DictaLM 2.0 with Enhanced Vocabulary and Instruction Capabilities（适应希伯来语的大型语言模型：揭示DictaLM 2.0及其增强词汇和指令能力） 📹 MiraData: A Large-Scale Video Dataset with Long Durations and Structured Captions（MiraData：一个大规模视频数据集，具有长时长和结构化详细字幕） 🌐 Graph-Based Captioning: Enhancing Visual Descriptions by Interconnecting Region Captions（基于图的描述：通过互联区域描述增强视觉描述） 🔍 Lookback Lens: Detecting and Mitigating Contextual Hallucinations in Large Language Models Using Only Attention Maps（回溯透镜：仅使用注意力映射检测和缓解大型语言模型中的上下文幻觉） 📚 Knowledge Composition using Task Vectors with Learned Anisotropic Scaling（使用任务向量的学习各向异性缩放进行知识组合） 📚 TheoremLlama: Transforming General-Purpose LLMs into Lean4 Experts（TheoremLlama：将通用大型语言模型转化为Lean4专家） ⚡ BM25S: Orders of magnitude faster lexical search via eager sparse scoring（BM25S：通过急切稀疏评分实现数量级更快的词汇搜索） 🎥 VIMI: Grounding Video Generation through Multi-modal Instruction（VIMI：通过多模态指令生成视频） 🔄 From Loops to Oops: Fallback Behaviors of Language Models Under Uncertainty（从循环到失误：语言模型在不确定性条件下的回退行为） 📚 How do you know that? Teaching Generative Language Models to Reference Answers to Biomedical Questions（如何知道？教学生成语言模型引用生物医学问题的答案） 📈 LETS-C: Leveraging Language Embedding for Time Series Classification（利用语言嵌入进行时间序列分类）

11分钟

15

9个月前

2024.07.09 每日AI论文

Hugging Face 每日AI论文速递每天10分钟，带您快速了解当日HuggingFace热门AI论文内容今天带来的 17 篇论文如下： 📊 MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?（MJ-Bench：你的多模态奖励模型真的是文本到图像生成的好评判吗？） 🌐 LLaMAX: Scaling Linguistic Horizons of LLM by Enhancing Translation Capabilities Beyond 100 Languages（LLaMAX：通过增强翻译能力扩展大型语言模型的语言视野至100种以上语言） 🎥 Learning Action and Reasoning-Centric Image Editing from Videos and Simulations（从视频和模拟中学习以动作和推理为中心的图像编辑） 📚 Associative Recurrent Memory Transformer（关联循环记忆变换器） 🌐 ANOLE: An Open, Autoregressive, Native Large Multimodal Models for Interleaved Image-Text Generation（ANOLE：一种开源、自回归、原生的大型多模态模型，用于交错图像-文本生成） 📚 Evaluating Language Model Context Windows: A "Working Memory" Test and Inference-time Correction（评估语言模型上下文窗口：一种“工作记忆”测试与推理时校正） 🎥 Compositional Video Generation as Flow Equalization（组合视频生成作为流量均衡） 📊 PAS: Data-Efficient Plug-and-Play Prompt Augmentation System（PAS：数据高效的即插即用提示增强系统） 🚀 InverseCoder: Unleashing the Power of Instruction-Tuned Code LLMs with Inverse-Instruct（InverseCoder：通过逆向指令释放指令调优代码大型语言模型的潜力） 🛠️ Tailor3D: Customized 3D Assets Editing and Generation with Dual-Side Images（Tailor3D：利用双面图像定制化编辑和生成3D资产） 🖼️ UltraEdit: Instruction-based Fine-Grained Image Editing at Scale（超编辑：基于指令的细粒度大规模图像编辑） 📚 Training Task Experts through Retrieval Based Distillation（通过检索基础提炼训练任务专家） 👁️‍🗨️ Multi-Object Hallucination in Vision-Language Models（视觉语言模型中的多对象幻觉现象） 🔍 Understanding Visual Feature Reliance through the Lens of Complexity（通过复杂度视角理解视觉特征依赖） 🎨 PartCraft: Crafting Creative Objects by Parts（PartCraft：通过部分创作创意物体） 📚 LLMAEL: Large Language Models are Good Context Augmenters for Entity Linking（大型语言模型在实体链接中的上下文增强作用） 🔍 ANAH-v2: Scaling Analytical Hallucination Annotation of Large Language Models（ANAH-v2：扩展大型语言模型幻觉标注的规模）

11分钟

17

9个月前

2024.07.08 每日AI论文

Hugging Face 每日AI论文速递每天10分钟，带您快速了解当日HuggingFace热门AI论文内容今天带来的 15 篇论文如下： 🌐 Unveiling Encoder-Free Vision-Language Models（揭示无编码器的视觉-语言模型） 🗣️ FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs（FunAudioLLM：用于增强人类与大型语言模型之间自然语音交互的语音理解和生成基础模型） 🧠 AriGraph: Learning Knowledge Graph World Models with Episodic Memory for LLM Agents（AriGraph：为LLM代理学习知识图世界模型与情景记忆） 📚 Learning to (Learn at Test Time): RNNs with Expressive Hidden States（学习在测试时学习：具有表达性隐藏状态的RNN） 📊 ChartGemma: Visual Instruction-tuning for Chart Reasoning in the Wild（ChartGemma：针对野外图表推理的视觉指令调优） 📈 RULE: Reliable Multimodal RAG for Factuality in Medical Vision Language Models（可靠的多模态RAG用于医学视觉语言模型的事实性） 🗣️ Stark: Social Long-Term Multi-Modal Conversation with Persona Commonsense Knowledge（STARK：具有人格常识知识的社会长期多模态对话） 🧠 DotaMath: Decomposition of Thought with Code Assistance and Self-correction for Mathematical Reasoning（DotaMath：利用代码辅助和自我修正的思维分解方法进行数学推理） 🛡️ Safe Unlearning: A Surprisingly Effective and Generalizable Solution to Defend Against Jailbreak Attacks（安全遗忘：一种有效且具有普遍性的防御越狱攻击解决方案） 📊 On scalable oversight with weak LLMs judging strong LLMs（关于可扩展监督协议下弱大型语言模型对强大型语言模型的监督研究） 🎥 Flash-VStream: Memory-Based Real-Time Understanding for Long Video Streams（基于内存的实时长视频流理解） 📊 HEMM: Holistic Evaluation of Multimodal Foundation Models（HEMM：多模态基础模型的整体评估） 🤝 LLM-jp: A Cross-organizational Project for the Research and Development of Fully Open Japanese LLMs（LLM-jp：一个跨组织项目，用于完全开放的日本大型语言模型的研究与开发） 📷 CRiM-GS: Continuous Rigid Motion-Aware Gaussian Splatting from Motion Blur Images（CRiM-GS：从运动模糊图像中连续刚体运动感知的高斯喷溅） 🔍 Granular Privacy Control for Geolocation with Vision Language Models（视觉语言模型的粒度隐私控制：地理定位）

10分钟

30

9个月前

2024.07.05 每日AI论文

Hugging Face 每日AI论文速递每天10分钟，带您快速了解当日HuggingFace热门AI论文内容今天带来的 3 篇论文如下： 🔄 Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion（扩散强制：下一词预测与全序列扩散的结合） 🔍 Let the Expert Stick to His Last: Expert-Specialized Fine-Tuning for Sparse Architectural Large Language Models（让专家专注于他的领域：稀疏架构大型语言模型的专家专业化微调） 📊 Planetarium: A Rigorous Benchmark for Translating Text to Structured Planning Languages（天文馆：一个严格的基准，用于评估将文本转换为结构化规划语言的能力）

2分钟

99+

9个月前

2024.07.15 每日AI论文 | 大型语言模型的应用、模型更新策略、多模态问答数据集

2024.07.12 每日AI论文

2024.07.11 每日AI论文

2024.07.10 每日AI论文

2024.07.09 每日AI论文

2024.07.08 每日AI论文

2024.07.05 每日AI论文

推荐播单

加入我们的 Discord

播放列表