节目列表: HuggingFace 每日AI论文速递 - EarsOnMe | 发现和收听来自小宇宙的热门播客

2024.11.15 每日AI论文 | 高效图像编辑，3D网格生成

本期的 7 篇论文如下： [00:27] ✨ MagicQuill: An Intelligent Interactive Image Editing System（魔法羽毛笔：智能交互式图像编辑系统） [01:15] 🌐 LLaMA-Mesh: Unifying 3D Mesh Generation with Language Models（LLaMA-Mesh：将3D网格生成与语言模型统一） [01:50] 💾 Cut Your Losses in Large-Vocabulary Language Models（在大词汇量语言模型中减少损失） [02:22] 🏥 ClinicalBench: Can LLMs Beat Traditional ML Models in Clinical Prediction?（临床基准：LLMs能否在临床预测中超越传统ML模型？） [03:02] 🤖 Hermes: A Large Language Model Framework on the Journey to Autonomous Networks（赫尔墨斯：迈向自主网络的大型语言模型框架） [03:36] 🎥 Sharingan: Extract User Action Sequence from Desktop Recordings（分享眼：从桌面录制中提取用户操作序列） [04:21] 🤔 Inconsistencies In Consistency Models: Better ODE Solving Does Not Imply Better Samples（一致性模型中的不一致性：更好的ODE求解并不意味着更好的样本）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

5分钟

85

5个月前

2024.11.14 每日AI论文 | LLMs自我改进显著，EgoVid-5M数据集创新。

本期的 7 篇论文如下： [00:26] 🤖 Large Language Models Can Self-Improve in Long-context Reasoning（大型语言模型在长上下文推理中的自我改进） [01:09] 🎥 EgoVid-5M: A Large-Scale Video-Action Dataset for Egocentric Video Generation（EgoVid-5M：用于第一人称视频生成的大规模视频动作数据集） [01:58] 🔍 Direct Preference Optimization Using Sparse Feature-Level Constraints（利用稀疏特征级约束进行直接偏好优化） [02:37] 🇫 CamemBERT 2.0: A Smarter French Language Model Aged to Perfection（CamemBERT 2.0：更智能的法语语言模型，完美成熟） [03:18] 🧠 Can sparse autoencoders be used to decompose and interpret steering vectors?（稀疏自编码器能否用于分解和解释转向向量？） [03:58] 🎵 PerceiverS: A Multi-Scale Perceiver with Effective Segmentation for Long-Term Expressive Symbolic Music Generation（PerceiverS：一种具有有效分割的多尺度感知器，用于长期表达性符号音乐生成） [04:39] 🎥 Motion Control for Enhanced Complex Action Video Generation（增强复杂动作视频生成的运动控制）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

5分钟

99+

5个月前

2024.11.13 每日AI论文 | 三维物体分割新框架，多模态理解生成模型

本期的 6 篇论文如下： [00:28] 🔍 SAMPart3D: Segment Any Part in 3D Objects（SAMPart3D：三维物体任意部分分割） [01:06] 🌐 JanusFlow: Harmonizing Autoregression and Rectified Flow for Unified Multimodal Understanding and Generation（JanusFlow：统一自回归与校正流的多模态理解与生成） [01:42] 🤔 Stronger Models are NOT Stronger Teachers for Instruction Tuning（更强的模型并非更强的指令调优教师） [02:21] 🌐 Wavelet Latent Diffusion (Wala): Billion-Parameter 3D Generative Model with Compact Wavelet Encodings（小波潜在扩散（WaLa）：具有紧凑小波编码的十亿参数3D生成模型） [03:02] 📚 BLIP3-KALE: Knowledge Augmented Large-Scale Dense Captions（BLIP3-KALE：知识增强的大规模密集字幕） [03:55] 🔍 Hardware and Software Platform Inference（硬件与软件平台推断）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

4分钟

99+

5个月前

2024.11.12 每日AI论文 | 对象无缝插入，通用编辑模型提升精度

本期的 14 篇论文如下： [00:23] 🖼 Add-it: Training-Free Object Insertion in Images With Pretrained Diffusion Models（Add-it：基于预训练扩散模型的图像中无训练对象插入） [01:05] 🎨 OmniEdit: Building Image Editing Generalist Models Through Specialist Supervision（全能编辑器：通过专家监督构建图像编辑通用模型） [01:49] 📚 Chinese SimpleQA: A Chinese Factuality Evaluation for Large Language Models（中文简单问答：大语言模型的中文事实性评估） [02:27] 📚 M-Longdoc: A Benchmark For Multimodal Super-Long Document Understanding And A Retrieval-Aware Tuning Framework（M-Longdoc：多模态超长文档理解和检索感知调优框架的基准） [03:04] 🖼 Edify Image: High-Quality Image Generation with Pixel Space Laplacian Diffusion Models（启迪图像：基于像素空间拉普拉斯扩散模型的高质量图像生成） [03:42] 🧠 IOPO: Empowering LLMs with Complex Instruction Following via Input-Output Preference Optimization（IOPO：通过输入输出偏好优化增强LLMs复杂指令跟随能力） [04:33] 🦎 GitChameleon: Unmasking the Version-Switching Capabilities of Code Generation Models（GitChameleon：揭秘代码生成模型的版本切换能力） [05:11] 🌐 Watermark Anything with Localized Messages（基于局部信息的水印技术） [05:50] 🧠 Counterfactual Generation from Language Models（语言模型中的反事实生成） [06:22] 🤖 KMM: Key Frame Mask Mamba for Extended Motion Generation（KMM：扩展运动生成的关键帧掩码Mamba） [06:56] 🎲 Game-theoretic LLM: Agent Workflow for Negotiation Games（博弈论LLM：谈判游戏中的代理工作流程） [07:35] 📊 Golden Touchstone: A Comprehensive Bilingual Benchmark for Evaluating Financial Large Language Models（金标准：评估金融大语言模型的综合双语基准） [08:15] 🧠 NeKo: Toward Post Recognition Generative Correction Large Language Models with Task-Oriented Experts（NeKo：面向任务导向专家的生成校正大型语言模型） [08:54] 🧠 Ablation is Not Enough to Emulate DPO: How Neuron Dynamics Drive Toxicity Reduction（消融不足以模拟DPO：神经元动力学如何驱动毒性降低）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

10分钟

95

5个月前

2024.11.11 每日AI论文 | 提升训练吞吐量，减少内存使用

本期的 6 篇论文如下： [00:30] ⚖ Balancing Pipeline Parallelism with Vocabulary Parallelism（平衡流水线并行与词汇并行） [01:15] 🎮 StdGEN: Semantic-Decomposed 3D Character Generation from Single Images（StdGEN：从单张图像生成语义分解的3D角色） [01:56] 🔄 DELIFT: Data Efficient Language model Instruction Fine Tuning（DELIFT：数据高效语言模型指令微调） [02:29] 🧪 Parameter-Efficient Fine-Tuning of Large Language Models for Unit Test Generation: An Empirical Study（大型语言模型参数高效微调用于单元测试生成：一项实证研究） [03:06] 🧠 LLM2CLIP: Powerful Language Model Unlock Richer Visual Representation（LLM2CLIP：强大的语言模型解锁更丰富的视觉表示） [03:46] 💻 Improving the detection of technical debt in Java source code with an enriched dataset（通过丰富数据集提升Java源代码中技术债务的检测）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

4分钟

99+

5个月前

【周末特辑】11月第2周最火AI论文 | 开放编码器加速代码AI研究，ReCapture提升视频生成质量。

本期的 5 篇论文如下： [00:38] TOP1(🔥73) | 🔧 OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models（开放编码器：顶级代码大语言模型的开放食谱） [02:40] TOP2(🔥53) | 🎥 ReCapture: Generative Video Camera Controls for User-Provided Videos using Masked Video Fine-Tuning（ReCapture：使用掩码视频微调生成用户提供视频的生成性摄像机控制） [04:22] TOP3(🔥52) | 📄 HtmlRAG: HTML is Better Than Plain Text for Modeling Retrieved Knowledge in RAG Systems（HtmlRAG：在RAG系统中，HTML比纯文本更适合建模检索知识） [06:44] TOP4(🔥47) | ⚡ BitNet a4.8: 4-bit Activations for 1-bit LLMs（BitNet a4.8：1位大语言模型的4位激活） [08:25] TOP5(🔥45) | 🤖 AndroidLab: Training and Systematic Benchmarking of Android Autonomous Agents（AndroidLab：Android自主代理的训练与系统基准测试）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

10分钟

99+

5个月前

2024.11.08 每日AI论文 | 开放编码器提升代码生成，ReCapture优化视频轨迹

本期的 14 篇论文如下： [00:25] 🔧 OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models（开放编码器：顶级代码大语言模型的开放食谱） [01:03] 🎥 ReCapture: Generative Video Camera Controls for User-Provided Videos using Masked Video Fine-Tuning（ReCapture：使用掩码视频微调生成用户提供视频的生成性摄像机控制） [01:46] ⚡ BitNet a4.8: 4-bit Activations for 1-bit LLMs（BitNet a4.8：1位大语言模型的4位激活） [02:25] 🎥 DimensionX: Create Any 3D and 4D Scenes from a Single Image with Controllable Video Diffusion（DimensionX：从单张图像生成可控视频扩散的3D和4D场景） [03:04] 🤖 Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models（混合变压器：多模态基础模型的稀疏与可扩展架构） [03:39] 🧠 Thanos: Enhancing Conversational Agents with Skill-of-Mind-Infused Large Language Model（灭霸：通过融入心灵技能增强对话代理的大型语言模型） [04:21] 🎥 TIP-I2V: A Million-Scale Real Text and Image Prompt Dataset for Image-to-Video Generation（TIP-I2V：百万级真实文本与图像提示数据集用于图像到视频生成） [05:05] 🤖 DynaMem: Online Dynamic Spatio-Semantic Memory for Open World Mobile Manipulation（DynaMem：开放世界移动操作的在线动态时空语义记忆） [05:40] 🧵 Needle Threading: Can LLMs Follow Threads through Near-Million-Scale Haystacks?（针穿线：LLMs能否在近百万规模的文本中追踪线索？） [06:22] 👀 GazeGen: Gaze-Driven User Interaction for Visual Content Generation（GazeGen：基于注视驱动的用户交互视觉内容生成） [07:03] 🌐 RetrieveGPT: Merging Prompts and Mathematical Models for Enhanced Code-Mixed Information Retrieval（RetrieveGPT：融合提示与数学模型以增强代码混合信息检索） [07:49] 🎥 SG-I2V: Self-Guided Trajectory Control in Image-to-Video Generation（SG-I2V：图像到视频生成中的自引导轨迹控制） [08:29] 🎥 VideoGLaMM: A Large Multimodal Model for Pixel-Level Visual Grounding in Videos（视频GLaMM：一种用于视频中像素级视觉定位的大型多模态模型） [09:03] ⚡ SVDQunat: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models（SVDQuant：通过低秩成分吸收异常值的4比特扩散模型）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

10分钟

97

5个月前

2024.11.07 每日AI论文 | 数据污染影响模型评估，结构化推理提升LLMs性能

本期的 4 篇论文如下： [00:28] 🔍 Both Text and Images Leaked! A Systematic Analysis of Multimodal LLM Data Contamination（文本与图像均泄露！多模态大语言模型数据污染的系统分析） [01:07] 🤖 Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level（大型语言模型协调结构化推理达到Kaggle大师级别） [01:53] 🧠 Polynomial Composition Activations: Unleashing the Dynamics of Large Language Models（多项式组合激活函数：释放大型语言模型的动态） [02:28] 🔄 Self-Consistency Preference Optimization（自一致性偏好优化）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

3分钟

82

5个月前

2024.11.06 每日AI论文 | HTML提升RAG性能，分子图助手优化多模态任务

本期的 11 篇论文如下： [00:30] 📄 HtmlRAG: HTML is Better Than Plain Text for Modeling Retrieved Knowledge in RAG Systems（HtmlRAG：在RAG系统中，HTML比纯文本更适合建模检索知识） [01:12] 🧬 LLaMo: Large Language Model-based Molecular Graph Assistant（基于大型语言模型的分子图助手） [01:52] 🤖 DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot Execution（DeeR-VLA：动态推理多模态大语言模型以实现高效机器人执行） [02:28] 🤖 Sample-Efficient Alignment for LLMs（LLM的高效对齐方法） [03:01] 🚦 Controlling Language and Diffusion Models by Transporting Activations（通过传输激活控制语言和扩散模型） [03:49] 🌟 DreamPolish: Domain Score Distillation With Progressive Geometry Generation（梦幻抛光：基于渐进几何生成的领域分数蒸馏） [04:32] 🦓 Zebra-Llama: A Context-Aware Large Language Model for Democratizing Rare Disease Knowledge（斑马-羊驼：一种用于普及罕见病知识的上下文感知大型语言模型） [05:12] 👕 GarVerseLOD: High-Fidelity 3D Garment Reconstruction from a Single In-the-Wild Image using a Dataset with Levels of Details（GarVerseLOD：利用多层次细节数据集从单张自然图像中进行高保真3D服装重建） [05:46] 🔍 Correlation of Object Detection Performance with Visual Saliency and Depth Estimation（目标检测性能与视觉显著性和深度估计的相关性） [06:28] 🔄 Adaptive Length Image Tokenization via Recurrent Allocation（通过递归分配实现自适应长度图像标记化） [07:01] 🧠 Inference Optimal VLMs Need Only One Visual Token but Larger Models（推断最优的视觉语言模型仅需一个视觉标记但需要更大的模型）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

8分钟

99

5个月前

2024.11.05 每日AI论文 | AndroidLab提升代理性能，WebRL优化网络任务表现。

本期的 17 篇论文如下： [00:26] 🤖 AndroidLab: Training and Systematic Benchmarking of Android Autonomous Agents（AndroidLab：Android自主代理的训练与系统基准测试） [01:15] 🌐 WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning（WebRL：通过自进化在线课程强化学习训练LLM网络代理） [01:55] 🌐 Training-free Regional Prompting for Diffusion Transformers（无需训练的扩散变换器区域提示） [02:36] 🌍 Survey of Cultural Awareness in Language Models: Text and Beyond（语言模型中的文化意识调查：文本与超越） [03:15] 🤖 Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent（混元-大：腾讯开源的520亿激活参数模型） [03:52] 📊 DynaMath: A Dynamic Visual Benchmark for Evaluating Mathematical Reasoning Robustness of Vision Language Models（DynaMath：评估视觉语言模型数学推理鲁棒性的动态视觉基准） [04:29] 🎥 How Far is Video Generation from World Model: A Physical Law Perspective（视频生成与世界模型有多远：物理定律视角） [05:08] ⚡ Adaptive Caching for Faster Video Generation with Diffusion Transformers（基于扩散变换器的自适应缓存加速视频生成） [05:48] 🦖 DynaSaur: Large Language Agents Beyond Predefined Actions（DynaSaur：超越预定义动作的大型语言模型代理） [06:26] 🎥 GenXD: Generating Any 3D and 4D Scenes（GenXD：生成任意3D和4D场景） [07:01] 📊 Sparsing Law: Towards Large Language Models with Greater Activation Sparsity（稀疏化定律：迈向更大激活稀疏性的大语言模型） [07:45] 📚 LIBMoE: A Library for comprehensive benchmarking Mixture of Experts in Large Language Models（LIBMoE：大型语言模型中混合专家的综合基准库） [08:26] 🎥 PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance（提示引导下的多样化视频序列理解） [09:08] ⚖ "Give Me BF16 or Give Me Death"? Accuracy-Performance Trade-Offs in LLM Quantization（给我BF16还是给我死亡？LLM量化中的精度-性能权衡） [09:48] 🌌 Decoding Dark Matter: Specialized Sparse Autoencoders for Interpreting Rare Concepts in Foundation Models（解码暗物质：用于解释基础模型中罕见概念的专用稀疏自编码器） [10:36] 🎨 MVPaint: Synchronized Multi-View Diffusion for Painting Anything 3D（MVPaint：同步多视角扩散用于3D绘画） [11:14] 🌍 Swan and ArabicMTEB: Dialect-Aware, Arabic-Centric, Cross-Lingual, and Cross-Cultural Embedding Models and Benchmarks（天鹅与阿拉伯MTEB：方言感知、以阿拉伯语为中心、跨语言和跨文化的嵌入模型与基准）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

12分钟

92

5个月前

2024.11.04 每日AI论文 | OS-ATLAS提升GUI代理性能，CAF优化生成模型效率。

本期的 17 篇论文如下： [00:25] 🤖 OS-ATLAS: A Foundation Action Model for Generalist GUI Agents（OS-ATLAS：通用GUI代理的基础动作模型） [01:07] ⚙ Constant Acceleration Flow（恒定加速度流） [01:53] 🍅 TOMATO: Assessing Visual Temporal Reasoning Capabilities in Multimodal Foundation Models（番茄：评估多模态基础模型在视觉时间推理能力） [02:33] 🎨 Randomized Autoregressive Visual Generation（随机自回归视觉生成） [03:10] 🧠 Adapting While Learning: Grounding LLMs for Scientific Problems with Intelligent Tool Usage Adaptation（边学习边适应：通过智能工具使用适应性将LLMs应用于科学问题） [03:50] 📚 Personalization of Large Language Models: A Survey（大型语言模型的个性化：综述） [04:29] 🖼 In-Context LoRA for Diffusion Transformers（上下文LoRA用于扩散变换器） [05:09] ⚡ SambaMixer: State of Health Prediction of Li-ion Batteries using Mamba State Space Models（SambaMixer：使用Mamba状态空间模型预测锂离子电池健康状态） [05:54] 🤖 Survey of User Interface Design and Interaction Techniques in Generative AI Applications（生成式AI应用中的用户界面设计与交互技术综述） [06:32] 🧶 HelloMeme: Integrating Spatial Knitting Attentions to Embed High-Level and Fidelity-Rich Conditions in Diffusion Models（HelloMeme：将空间编织注意力整合到扩散模型中以嵌入高层次和丰富保真度的条件） [07:07] 🌐 M2rc-Eval: Massively Multilingual Repository-level Code Completion Evaluation（M2rc-Eval：大规模多语言仓库级代码补全评估） [07:44] 🌆 CityGaussianV2: Efficient and Geometrically Accurate Reconstruction for Large-Scale Scenes（城市高斯V2：大规模场景的高效几何精确重建） [08:22] 🔄 GPT or BERT: why not both?（GPT还是BERT：为何不两者兼得？） [09:02] 🎭 Face Anonymization Made Simple（面部匿名化变得简单） [09:40] 📊 Zipfian Whitening（齐夫白化） [10:19] 📚 WikiNER-fr-gold: A Gold-Standard NER Corpus（WikiNER-fr-gold：一个金标准命名实体识别语料库） [10:53] 🧠 GRS-QA -- Graph Reasoning-Structured Question Answering Dataset（图推理结构化问答数据集）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

11分钟

87

5个月前

【周末特辑】11月第1周最火AI论文 | 多模态遗忘新基准CLEAR，GPT-4o系统卡片详解。

本期的 5 篇论文如下： [00:41] TOP1(🔥191) | 🧠 CLEAR: Character Unlearning in Textual and Visual Modalities（CLEAR：文本与视觉模态中的字符遗忘） [02:58] TOP2(🔥70) | 🤖 GPT-4o System Card（GPT-4o系统卡片） [04:50] TOP3(🔥50) | 🔍 Unpacking SDXL Turbo: Interpreting Text-to-Image Models with Sparse Autoencoders（解构SDXL Turbo：使用稀疏自编码器解释文本到图像模型） [06:53] TOP4(🔥49) | 🗣 CORAL: Benchmarking Multi-turn Conversational Retrieval-Augmentation Generation（CORAL：多轮对话增强生成基准测试） [08:44] TOP5(🔥48) | 🚀 ROCKET-1: Master Open-World Interaction with Visual-Temporal Context Prompting（ROCKET-1：利用视觉-时间上下文提示掌握开放世界交互）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

11分钟

99+

5个月前