节目列表: HuggingFace 每日AI论文速递 - EarsOnMe | 发现和收听来自小宇宙的热门播客

2024.07.19 每日AI论文 | 大型语言模型的扩展规律、多模态模型可信度研究、检索增强机器学习

大家好，欢迎收听《Hugging Face 每日AI论文速递》。今天是2024年7月19日，我们将带您快速浏览今日的14篇热门AI论文，内容涵盖大型语言模型的扩展规律、多模态模型可信度研究以及检索增强机器学习等前沿话题。现在，让我们立即进入精彩的论文世界吧！ [00:28] 📚 Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies（词汇表大小对大型语言模型扩展规律的影响研究） [01:00] 📚 Scaling Retrieval-Based Language Models with a Trillion-Token Datastore（基于万亿标记数据存储库扩展检索型语言模型） [01:46] 🌆 Streetscapes: Large-scale Consistent Street View Generation Using Autoregressive Video Diffusion（街景生成：利用自回归视频扩散生成大规模一致性街景视图） [02:19] 📊 Understanding Reference Policies in Direct Preference Optimization（理解直接偏好优化中的参考策略） [02:50] 📊 Benchmarking Trustworthiness of Multimodal Large Language Models: A Comprehensive Study（多模态大型语言模型可信度综合研究基准） [03:23] 📏 Scaling Granite Code Models to 128K Context（扩展Granite代码模型至128K上下文） [03:56] 📹 Shape of Motion: 4D Reconstruction from a Single Video（运动形态：单视频4D重建） [04:26] 🔧 CodeV: Empowering LLMs for Verilog Generation through Multi-Level Summarization（CodeV：通过多级摘要增强LLMs进行Verilog生成） [04:53] 📚 Attention Overflow: Language Model Input Blur during Long-Context Missing Items Recommendation（注意力溢出：长上下文缺失项推荐中的语言模型输入模糊） [05:23] 🧠 BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval（BRIGHT：一个现实且具有挑战性的密集推理检索基准） [05:54] 📊 PM-LLM-Benchmark: Evaluating Large Language Models on Process Mining Tasks（PM-LLM-Benchmark：评估大型语言模型在过程挖掘任务中的表现） [06:35] 📊 Benchmark Agreement Testing Done Right: A Guide for LLM Benchmark Evaluation（正确的基准一致性测试：LLM基准评估指南） [07:12] 📚 Retrieval-Enhanced Machine Learning: Synthesis and Opportunities（检索增强机器学习：综合与机遇） [07:48] 📄 A Comparative Study on Automatic Coding of Medical Letters with Explainability（医疗信件自动编码的可解释性比较研究）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

8分钟

87

10个月前

2024.07.18 每日AI论文 | 语言模型的综合研究、多模态模型评估、以及视频处理技术

大家好，欢迎收听“Hugging Face 每日AI论文速递”。今天是2024年7月18日，我们将带您快速浏览今日的13篇热门AI论文，内容涵盖语言模型的综合研究、多模态模型评估、以及视频处理技术等前沿领域。现在，让我们立即进入精彩的论文世界吧！ [00:25] 📚 Spectra: A Comprehensive Study of Ternary, Quantized, and FP16 Language Models（光谱：三元、量化和FP16语言模型的综合研究） [00:56] 🔍 AgentPoison: Red-teaming LLM Agents via Poisoning Memory or Knowledge Bases（AgentPoison：通过毒化记忆或知识库对LLM代理进行红队测试） [01:36] 📊 LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models（LMMs-Eval：大型多模态模型评估的现实检查） [02:12] 🌐 E5-V: Universal Embeddings with Multimodal Large Language Models（E5-V：多模态大语言模型的通用嵌入） [02:43] 🔍 Patch-Level Training for Large Language Models（大型语言模型的补丁级训练） [03:17] 🤖 Case2Code: Learning Inductive Reasoning with Synthetic Data（Case2Code：利用合成数据学习归纳推理） [03:53] 👗 IMAGDressing-v1: Customizable Virtual Dressing（IMAGDressing-v1: 可定制的虚拟装扮） [04:31] 🎥 VD3D: Taming Large Video Diffusion Transformers for 3D Camera Control（VD3D：驯服大型视频扩散Transformer以实现3D摄像机控制） [05:08] 🐠 Goldfish: Vision-Language Understanding of Arbitrarily Long Videos（金鱼：理解任意长度视频的视觉语言） [05:48] 🎵 Audio Conditioning for Music Generation via Discrete Bottleneck Features（基于离散瓶颈特征的音频条件化音乐生成） [06:23] 📷 Splatfacto-W: A Nerfstudio Implementation of Gaussian Splatting for Unconstrained Photo Collections（Splatfacto-W：一种用于不受约束照片集合的高斯光栅化Nerfstudio实现） [07:02] 🚫 The Art of Saying No: Contextual Noncompliance in Language Models（说不的艺术：语言模型中的情境性非遵守） [07:41] 🚀 GoldFinch: High Performance RWKV/Transformer Hybrid with Linear Pre-Fill and Extreme KV-Cache Compression（GoldFinch：高性能RWKV/Transformer混合模型，具有线性预填充和极端KV-Cache压缩）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

8分钟

63

10个月前

2024.07.17 每日AI论文 | 大型语言模型的推理能力、多模态模型的评估工具、3D模型动画化

大家好，欢迎收听“Hugging Face 每日AI论文速递”。今天是2024年7月17日，我们将带您快速浏览今日的18篇热门AI论文，涵盖了大型语言模型的推理能力、多模态模型的评估工具、以及3D模型动画化等前沿话题。现在，让我们立即进入精彩的论文世界吧！ [00:26] 📚 NeedleBench: Can LLMs Do Retrieval and Reasoning in 1 Million Context Window?（NeedleBench：大型语言模型在100万个上下文窗口中进行检索和推理的能力如何？） [01:07] 🎥 Ref-AVS: Refer and Segment Objects in Audio-Visual Scenes（Ref-AVS：音频-视觉场景中的参考与分割对象） [01:41] 🎤 Qwen2-Audio Technical Report（Qwen2-Audio技术报告） [02:14] 🤖 Sibyl: Simple yet Effective Agent Framework for Complex Real-world Reasoning（Sibyl：简单而有效的复杂现实推理代理框架） [02:50] 📈 Scaling Diffusion Transformers to 16 Billion Parameters（扩展扩散Transformer至160亿参数） [03:24] 🌐 DreamCatalyst: Fast and High-Quality 3D Editing via Controlling Editability and Identity Preservation（DreamCatalyst：通过控制编辑能力和身份保持实现快速且高质量的3D编辑） [03:59] 📊 VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models（VLMEvalKit：评估大型多模态模型的开源工具包） [04:37] ⚡ Efficient Training with Denoised Neural Weights（使用去噪神经权重的有效训练） [05:16] 🎥 Animate3D: Animating Any 3D Model with Multi-view Video Diffusion（Animate3D：使用多视角视频扩散动画化任何3D模型） [05:50] 📊 From GaLore to WeLore: How Low-Rank Weights Non-uniformly Emerge from Low-Rank Gradients（从GaLore到WeLore：低秩权重如何非均匀地从低秩梯度中涌现） [06:29] 📚 YouTube-SL-25: A Large-Scale, Open-Domain Multilingual Sign Language Parallel Corpus（YouTube-SL-25：一个大规模、开放领域多语种手语并行语料库） [07:05] 📊 Vibravox: A Dataset of French Speech Captured with Body-conduction Audio Sensors（Vibravox：使用身体传导音频传感器捕获的法语语音数据集） [07:44] 🔄 FIRE: A Dataset for Feedback Integration and Refinement Evaluation of Multimodal Models（FIRE：多模态模型反馈集成与细化评估数据集） [08:27] 🌐 OmniBind: Large-scale Omni Multimodal Representation via Binding Spaces（OmniBind：通过绑定空间实现大规模多模态表示） [09:06] 🔬 Data-Juicer Sandbox: A Comprehensive Suite for Multimodal Data-Model Co-development（数据榨汁机沙盒：多模态数据-模型协同开发的综合套件） [09:36] 🔍 Click-Gaussian: Interactive Segmentation to Any 3D Gaussians（Click-Gaussian：交互式分割至任意3D高斯） [10:12] 🤖 Grasping Diverse Objects with Simulated Humanoids（模拟人类机器人抓取多样物体） [10:42] 🔍 Uncertainty is Fragile: Manipulating Uncertainty in Large Language Models（不确定性是脆弱的：操纵大型语言模型中的不确定性）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

11分钟

50

10个月前

2024.07.16 每日AI论文 | 大型语言模型隐私风险，视频处理技术创新

大家好，欢迎收听'Hugging Face 每日AI论文速递'。今天是2024年7月16日，我们将带您快速浏览今日的13篇热门AI论文。本期内容涵盖了从大型语言模型的隐私风险到视频处理技术的创新，以及多语言模型的测试等多个前沿领域。现在，让我们立即进入精彩的论文世界吧！ [00:26] 📊 Qwen2 Technical Report（Qwen2技术报告） [01:10] 🔒 Learning to Refuse: Towards Mitigating Privacy Risks in LLMs（学会拒绝：减轻LLMs中的隐私风险） [01:50] 📊 The Good, The Bad, and The Greedy: Evaluation of LLMs Should Not Ignore Non-Determinism（好、坏与贪婪：评估LLMs时不应忽视非确定性） [02:34] 🔍 Q-Sparse: All Large Language Models can be Fully Sparsely-Activated（Q-Sparse：所有大型语言模型都可以完全稀疏激活） [03:09] 🤖 GRUtopia: Dream General Robots in a City at Scale（GRUtopia：大规模城市中梦想通用机器人的研究） [03:46] 🎥 Masked Generative Video-to-Audio Transformers with Enhanced Synchronicity（具有增强同步性的掩码生成视频到音频转换器） [04:22] 🤖 Make-An-Agent: A Generalizable Policy Network Generator with Behavior-Prompted Diffusion（Make-An-Agent：基于行为提示的扩散模型的通用策略网络生成器） [04:55] 🔄 SHERL: Synthesizing High Accuracy and Efficient Memory for Resource-Limited Transfer Learning（SHERL：资源有限迁移学习中的高精度和高效内存合成） [05:34] 📹 Video Occupancy Models（视频占用模型） [06:11] 🎥 Noise Calibration: Plug-and-play Content-Preserving Video Enhancement using Pre-trained Video Diffusion Models（噪声校准：利用预训练视频扩散模型进行内容保持的视频增强） [06:51] 🌟 DataDream: Few-shot Guided Dataset Generation（DataDream：少样本引导的数据集生成） [07:29] 📚 MMM: Multilingual Mutual Reinforcement Effect Mix Datasets & Test with Open-domain Information Extraction Large Language Models（MMM：多语言互增强效应混合数据集与开放领域信息提取大型语言模型测试） [08:09] 🔬 LAB-Bench: Measuring Capabilities of Language Models for Biology Research（LAB-Bench：评估语言模型在生物学研究中的能力）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

9分钟

69

10个月前

2024.07.15 每日AI论文 | 大型语言模型的应用、模型更新策略、多模态问答数据集

大家好，欢迎收听“Hugging Face 每日AI论文速递”。今天是2024年7月15日，我们将带您快速浏览今日的14篇热门AI论文，内容涵盖了大型语言模型的应用、模型更新策略、以及多模态问答数据集等前沿话题。现在，让我们立即进入精彩的论文世界吧！ [00:26] 📊 SpreadsheetLLM: Encoding Spreadsheets for Large Language Models（SpreadsheetLLM：编码电子表格以供大型语言模型使用） [01:10] 🧠 Human-like Episodic Memory for Infinite Context LLMs（人类似的事件记忆机制在无限上下文LLMs中的应用） [01:45] 🔄 MUSCLE: A Model Update Strategy for Compatible LLM Evolution（MUSCLE：一种兼容LLM演化的模型更新策略） [02:20] 📱 H2O-Danube3 Technical Report（H2O-Danube3技术报告） [02:56] 🎲 GAVEL: Generating Games Via Evolution and Language Models（GAVEL：通过进化和语言模型生成游戏） [03:41] 🎨 Transformer Layers as Painters（Transformer层作为画家） [04:10] 📊 New Desiderata for Direct Preference Optimization（直接偏好优化的新需求） [04:47] 📚 Characterizing Prompt Compression Methods for Long Context Inference（长上下文推理中提示压缩方法的特性分析） [05:21] 📊 Understanding Retrieval Robustness for Retrieval-Augmented Image Captioning（理解增强检索型图像描述模型的鲁棒性） [05:53] 🎨 StyleSplat: 3D Object Style Transfer with Gaussian Splatting（StyleSplat：使用高斯散射进行3D对象风格转换） [06:28] 🎥 TCAN: Animating Human Images with Temporally Consistent Pose Guidance using Diffusion Models（TCAN：利用时间一致性姿势引导的扩散模型进行人类图像动画处理） [07:10] 🛡 Refuse Whenever You Feel Unsafe: Improving Safety in LLMs via Decoupled Refusal Training（拒绝危险：通过解耦拒绝训练提高大型语言模型的安全性） [07:44] 🔧 Model Surgery: Modulating LLM's Behavior Via Simple Parameter Editing（模型手术：通过简单参数编辑调节LLM行为） [08:20] 📚 SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers（SPIQA：用于科学论文多模态问答的数据集）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

9分钟

60

10个月前

2024.07.12 每日AI论文

Hugging Face 每日AI论文速递每天10分钟，带您快速了解当日HuggingFace热门AI论文内容今天带来的 15 篇论文如下： 📊 Skywork-Math: Data Scaling Laws for Mathematical Reasoning in Large Language Models -- The Story Goes On（Skywork-Math：大型语言模型中数学推理能力的数据规模定律 -- 故事继续） 📊 MAVIS: Mathematical Visual Instruction Tuning（MAVIS：数学视觉指令调优） 📹 Video Diffusion Alignment via Reward Gradients（通过奖励梯度实现视频扩散对齐） 🔍 MambaVision: A Hybrid Mamba-Transformer Vision Backbone（MambaVision：一种混合Mamba-Transformer视觉骨干网络） 📊 GTA: A Benchmark for General Tool Agents（GTA：通用工具代理基准） 📊 The Synergy between Data and Multi-Modal Large Language Models: A Survey from Co-Development Perspective（数据与多模态大型语言模型的协同作用：从协同发展角度的调查） 🌐 DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception（DenseFusion-1M：整合视觉专家以实现全面多模态感知） 🎥 Live2Diff: Live Stream Translation via Uni-directional Attention in Video Diffusion Models（Live2Diff：基于单向注意力机制的视频扩散模型实现直播翻译） 🌲 Gradient Boosting Reinforcement Learning（梯度提升强化学习） 📉 Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients（Q-GaLore：使用INT4投影和层适应低秩梯度的量化GaLore） 📖 SEED-Story: Multimodal Long Story Generation with Large Language Model（SEED-Story：基于大型语言模型的多模态长故事生成） 📹 Generalizable Implicit Motion Modeling for Video Frame Interpolation（可泛化的隐式运动建模用于视频帧插值） 📊 OmniNOCS: A unified NOCS dataset and model for 3D lifting of 2D objects（OmniNOCS：用于2D物体3D提升的统一NOCS数据集与模型） 🎤 Autoregressive Speech Synthesis without Vector Quantization（无需向量量化的自回归语音合成） 🌍 WildGaussians: 3D Gaussian Splatting in the Wild（WildGaussians：自然环境中的3D高斯喷洒）【关注我们，获取更多信息】小红书: AI速递

10分钟

99

10个月前

2024.07.11 每日AI论文

Hugging Face 每日AI论文速递每天10分钟，带您快速了解当日HuggingFace热门AI论文内容今天带来的 14 篇论文如下： 🌐 PaliGemma: A versatile 3B VLM for transfer（PaliGemma：一种多功能3B视觉语言模型用于迁移） 🌐 LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models（LLaVA-NeXT-Interleave：在大规模多模态模型中处理多图像、视频和3D问题） 🚀 Inference Performance Optimization for Large Language Models on CPUs（CPU上大型语言模型推理性能优化） 🌐 Controlling Space and Time with Diffusion Models（使用扩散模型控制空间和时间） 🎥🔊 Video-to-Audio Generation with Hidden Alignment（基于隐藏对齐的视频到音频生成） 🎥 VEnhancer: Generative Space-Time Enhancement for Video Generation（VEnhancer：生成空间-时间增强的视频生成技术） 📊 On Leakage of Code Generation Evaluation Datasets（关于代码生成评估数据集泄露的问题） 🔍 Do Vision and Language Models Share Concepts? A Vector Space Alignment Study（视觉和语言模型是否共享概念？一项向量空间对齐研究） 🤖 This&That: Language-Gesture Controlled Video Generation for Robot Planning（This&That：基于语言和手势控制的机器人视频生成规划） 🌌 CosmoCLIP: Generalizing Large Vision-Language Models for Astronomical Imaging（CosmoCLIP：通用大型视觉语言模型在天文图像处理中的应用） 🎥 Still-Moving: Customized Video Generation without Customized Video Data（Still-Moving：无需定制视频数据的定制化视频生成） 📊 An accurate detection is not all you need to combat label noise in web-noisy datasets（在网络噪声数据集中对抗标签噪声的准确检测并非全部所需） 🤖 BiGym: A Demo-Driven Mobile Bi-Manual Manipulation Benchmark（BiGym：移动双手机器人演示驱动操作基准） 👥 CrowdMoGen: Zero-Shot Text-Driven Collective Motion Generation（CrowdMoGen：零样本文本驱动的人群运动生成）【关注我们，获取更多信息】小红书：AI速递

9分钟

19

10个月前

2024.07.10 每日AI论文

Hugging Face 每日AI论文速递每天10分钟，带您快速了解当日HuggingFace热门AI论文内容今天带来的 16 篇论文如下： 👓 Vision language models are blind（视觉语言模型是盲的） 📹 Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision（视频-STaR：自训练实现视频指令调整与任意监督） 🌐 Internet of Agents: Weaving a Web of Heterogeneous Agents for Collaborative Intelligence（代理互联网：编织异构代理网络以实现协作智能） 👤 RodinHD: High-Fidelity 3D Avatar Generation with Diffusion Models（RodinHD：使用扩散模型生成高保真3D虚拟形象） 📚 AgentInstruct: Toward Generative Teaching with Agentic Flows（AgentInstruct：通过代理流程实现生成教学） 📚 Adapting LLMs to Hebrew: Unveiling DictaLM 2.0 with Enhanced Vocabulary and Instruction Capabilities（适应希伯来语的大型语言模型：揭示DictaLM 2.0及其增强词汇和指令能力） 📹 MiraData: A Large-Scale Video Dataset with Long Durations and Structured Captions（MiraData：一个大规模视频数据集，具有长时长和结构化详细字幕） 🌐 Graph-Based Captioning: Enhancing Visual Descriptions by Interconnecting Region Captions（基于图的描述：通过互联区域描述增强视觉描述） 🔍 Lookback Lens: Detecting and Mitigating Contextual Hallucinations in Large Language Models Using Only Attention Maps（回溯透镜：仅使用注意力映射检测和缓解大型语言模型中的上下文幻觉） 📚 Knowledge Composition using Task Vectors with Learned Anisotropic Scaling（使用任务向量的学习各向异性缩放进行知识组合） 📚 TheoremLlama: Transforming General-Purpose LLMs into Lean4 Experts（TheoremLlama：将通用大型语言模型转化为Lean4专家） ⚡ BM25S: Orders of magnitude faster lexical search via eager sparse scoring（BM25S：通过急切稀疏评分实现数量级更快的词汇搜索） 🎥 VIMI: Grounding Video Generation through Multi-modal Instruction（VIMI：通过多模态指令生成视频） 🔄 From Loops to Oops: Fallback Behaviors of Language Models Under Uncertainty（从循环到失误：语言模型在不确定性条件下的回退行为） 📚 How do you know that? Teaching Generative Language Models to Reference Answers to Biomedical Questions（如何知道？教学生成语言模型引用生物医学问题的答案） 📈 LETS-C: Leveraging Language Embedding for Time Series Classification（利用语言嵌入进行时间序列分类）

11分钟

15

10个月前

2024.07.09 每日AI论文

Hugging Face 每日AI论文速递每天10分钟，带您快速了解当日HuggingFace热门AI论文内容今天带来的 17 篇论文如下： 📊 MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?（MJ-Bench：你的多模态奖励模型真的是文本到图像生成的好评判吗？） 🌐 LLaMAX: Scaling Linguistic Horizons of LLM by Enhancing Translation Capabilities Beyond 100 Languages（LLaMAX：通过增强翻译能力扩展大型语言模型的语言视野至100种以上语言） 🎥 Learning Action and Reasoning-Centric Image Editing from Videos and Simulations（从视频和模拟中学习以动作和推理为中心的图像编辑） 📚 Associative Recurrent Memory Transformer（关联循环记忆变换器） 🌐 ANOLE: An Open, Autoregressive, Native Large Multimodal Models for Interleaved Image-Text Generation（ANOLE：一种开源、自回归、原生的大型多模态模型，用于交错图像-文本生成） 📚 Evaluating Language Model Context Windows: A "Working Memory" Test and Inference-time Correction（评估语言模型上下文窗口：一种“工作记忆”测试与推理时校正） 🎥 Compositional Video Generation as Flow Equalization（组合视频生成作为流量均衡） 📊 PAS: Data-Efficient Plug-and-Play Prompt Augmentation System（PAS：数据高效的即插即用提示增强系统） 🚀 InverseCoder: Unleashing the Power of Instruction-Tuned Code LLMs with Inverse-Instruct（InverseCoder：通过逆向指令释放指令调优代码大型语言模型的潜力） 🛠️ Tailor3D: Customized 3D Assets Editing and Generation with Dual-Side Images（Tailor3D：利用双面图像定制化编辑和生成3D资产） 🖼️ UltraEdit: Instruction-based Fine-Grained Image Editing at Scale（超编辑：基于指令的细粒度大规模图像编辑） 📚 Training Task Experts through Retrieval Based Distillation（通过检索基础提炼训练任务专家） 👁️‍🗨️ Multi-Object Hallucination in Vision-Language Models（视觉语言模型中的多对象幻觉现象） 🔍 Understanding Visual Feature Reliance through the Lens of Complexity（通过复杂度视角理解视觉特征依赖） 🎨 PartCraft: Crafting Creative Objects by Parts（PartCraft：通过部分创作创意物体） 📚 LLMAEL: Large Language Models are Good Context Augmenters for Entity Linking（大型语言模型在实体链接中的上下文增强作用） 🔍 ANAH-v2: Scaling Analytical Hallucination Annotation of Large Language Models（ANAH-v2：扩展大型语言模型幻觉标注的规模）

11分钟

17

10个月前

2024.07.08 每日AI论文

Hugging Face 每日AI论文速递每天10分钟，带您快速了解当日HuggingFace热门AI论文内容今天带来的 15 篇论文如下： 🌐 Unveiling Encoder-Free Vision-Language Models（揭示无编码器的视觉-语言模型） 🗣️ FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs（FunAudioLLM：用于增强人类与大型语言模型之间自然语音交互的语音理解和生成基础模型） 🧠 AriGraph: Learning Knowledge Graph World Models with Episodic Memory for LLM Agents（AriGraph：为LLM代理学习知识图世界模型与情景记忆） 📚 Learning to (Learn at Test Time): RNNs with Expressive Hidden States（学习在测试时学习：具有表达性隐藏状态的RNN） 📊 ChartGemma: Visual Instruction-tuning for Chart Reasoning in the Wild（ChartGemma：针对野外图表推理的视觉指令调优） 📈 RULE: Reliable Multimodal RAG for Factuality in Medical Vision Language Models（可靠的多模态RAG用于医学视觉语言模型的事实性） 🗣️ Stark: Social Long-Term Multi-Modal Conversation with Persona Commonsense Knowledge（STARK：具有人格常识知识的社会长期多模态对话） 🧠 DotaMath: Decomposition of Thought with Code Assistance and Self-correction for Mathematical Reasoning（DotaMath：利用代码辅助和自我修正的思维分解方法进行数学推理） 🛡️ Safe Unlearning: A Surprisingly Effective and Generalizable Solution to Defend Against Jailbreak Attacks（安全遗忘：一种有效且具有普遍性的防御越狱攻击解决方案） 📊 On scalable oversight with weak LLMs judging strong LLMs（关于可扩展监督协议下弱大型语言模型对强大型语言模型的监督研究） 🎥 Flash-VStream: Memory-Based Real-Time Understanding for Long Video Streams（基于内存的实时长视频流理解） 📊 HEMM: Holistic Evaluation of Multimodal Foundation Models（HEMM：多模态基础模型的整体评估） 🤝 LLM-jp: A Cross-organizational Project for the Research and Development of Fully Open Japanese LLMs（LLM-jp：一个跨组织项目，用于完全开放的日本大型语言模型的研究与开发） 📷 CRiM-GS: Continuous Rigid Motion-Aware Gaussian Splatting from Motion Blur Images（CRiM-GS：从运动模糊图像中连续刚体运动感知的高斯喷溅） 🔍 Granular Privacy Control for Geolocation with Vision Language Models（视觉语言模型的粒度隐私控制：地理定位）

10分钟

30

10个月前

2024.07.05 每日AI论文

Hugging Face 每日AI论文速递每天10分钟，带您快速了解当日HuggingFace热门AI论文内容今天带来的 3 篇论文如下： 🔄 Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion（扩散强制：下一词预测与全序列扩散的结合） 🔍 Let the Expert Stick to His Last: Expert-Specialized Fine-Tuning for Sparse Architectural Large Language Models（让专家专注于他的领域：稀疏架构大型语言模型的专家专业化微调） 📊 Planetarium: A Rigorous Benchmark for Translating Text to Structured Planning Languages（天文馆：一个严格的基准，用于评估将文本转换为结构化规划语言的能力）

2分钟

99+

10个月前

2024.07.19 每日AI论文 | 大型语言模型的扩展规律、多模态模型可信度研究、检索增强机器学习

2024.07.18 每日AI论文 | 语言模型的综合研究、多模态模型评估、以及视频处理技术

2024.07.17 每日AI论文 | 大型语言模型的推理能力、多模态模型的评估工具、3D模型动画化

2024.07.16 每日AI论文 | 大型语言模型隐私风险，视频处理技术创新

2024.07.15 每日AI论文 | 大型语言模型的应用、模型更新策略、多模态问答数据集

2024.07.12 每日AI论文

2024.07.11 每日AI论文

2024.07.10 每日AI论文

2024.07.09 每日AI论文

2024.07.08 每日AI论文

2024.07.05 每日AI论文

推荐播单

加入我们的 Discord

播放列表