2024.11.12 每日AI论文 | 对象无缝插入,通用编辑模型提升精度

本期的 14 篇论文如下: [00:23] 🖼 Add-it: Training-Free Object Insertion in Images With Pretrained Diffusion Models(Add-it:基于预训练扩散模型的图像中无训练对象插入) [01:05] 🎨 OmniEdit: Building Image Editing Generalist Models Through Specialist Supervision(全能编辑器:通过专家监督构建图像编辑通用模型) [01:49] 📚 Chinese SimpleQA: A Chinese Factuality Evaluation for Large Language Models(中文简单问答:大语言模型的中文事实性评估) [02:27] 📚 M-Longdoc: A Benchmark For Multimodal Super-Long Document Understanding And A Retrieval-Aware Tuning Framework(M-Longdoc:多模态超长文档理解和检索感知调优框架的基准) [03:04] 🖼 Edify Image: High-Quality Image Generation with Pixel Space Laplacian Diffusion Models(启迪图像:基于像素空间拉普拉斯扩散模型的高质量图像生成) [03:42] 🧠 IOPO: Empowering LLMs with Complex Instruction Following via Input-Output Preference Optimization(IOPO:通过输入输出偏好优化增强LLMs复杂指令跟随能力) [04:33] 🦎 GitChameleon: Unmasking the Version-Switching Capabilities of Code Generation Models(GitChameleon:揭秘代码生成模型的版本切换能力) [05:11] 🌐 Watermark Anything with Localized Messages(基于局部信息的水印技术) [05:50] 🧠 Counterfactual Generation from Language Models(语言模型中的反事实生成) [06:22] 🤖 KMM: Key Frame Mask Mamba for Extended Motion Generation(KMM:扩展运动生成的关键帧掩码Mamba) [06:56] 🎲 Game-theoretic LLM: Agent Workflow for Negotiation Games(博弈论LLM:谈判游戏中的代理工作流程) [07:35] 📊 Golden Touchstone: A Comprehensive Bilingual Benchmark for Evaluating Financial Large Language Models(金标准:评估金融大语言模型的综合双语基准) [08:15] 🧠 NeKo: Toward Post Recognition Generative Correction Large Language Models with Task-Oriented Experts(NeKo:面向任务导向专家的生成校正大型语言模型) [08:54] 🧠 Ablation is Not Enough to Emulate DPO: How Neuron Dynamics Drive Toxicity Reduction(消融不足以模拟DPO:神经元动力学如何驱动毒性降低) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

10分钟
95
5个月前

2024.11.08 每日AI论文 | 开放编码器提升代码生成,ReCapture优化视频轨迹

本期的 14 篇论文如下: [00:25] 🔧 OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models(开放编码器:顶级代码大语言模型的开放食谱) [01:03] 🎥 ReCapture: Generative Video Camera Controls for User-Provided Videos using Masked Video Fine-Tuning(ReCapture:使用掩码视频微调生成用户提供视频的生成性摄像机控制) [01:46] ⚡ BitNet a4.8: 4-bit Activations for 1-bit LLMs(BitNet a4.8:1位大语言模型的4位激活) [02:25] 🎥 DimensionX: Create Any 3D and 4D Scenes from a Single Image with Controllable Video Diffusion(DimensionX:从单张图像生成可控视频扩散的3D和4D场景) [03:04] 🤖 Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models(混合变压器:多模态基础模型的稀疏与可扩展架构) [03:39] 🧠 Thanos: Enhancing Conversational Agents with Skill-of-Mind-Infused Large Language Model(灭霸:通过融入心灵技能增强对话代理的大型语言模型) [04:21] 🎥 TIP-I2V: A Million-Scale Real Text and Image Prompt Dataset for Image-to-Video Generation(TIP-I2V:百万级真实文本与图像提示数据集用于图像到视频生成) [05:05] 🤖 DynaMem: Online Dynamic Spatio-Semantic Memory for Open World Mobile Manipulation(DynaMem:开放世界移动操作的在线动态时空语义记忆) [05:40] 🧵 Needle Threading: Can LLMs Follow Threads through Near-Million-Scale Haystacks?(针穿线:LLMs能否在近百万规模的文本中追踪线索?) [06:22] 👀 GazeGen: Gaze-Driven User Interaction for Visual Content Generation(GazeGen:基于注视驱动的用户交互视觉内容生成) [07:03] 🌐 RetrieveGPT: Merging Prompts and Mathematical Models for Enhanced Code-Mixed Information Retrieval(RetrieveGPT:融合提示与数学模型以增强代码混合信息检索) [07:49] 🎥 SG-I2V: Self-Guided Trajectory Control in Image-to-Video Generation(SG-I2V:图像到视频生成中的自引导轨迹控制) [08:29] 🎥 VideoGLaMM: A Large Multimodal Model for Pixel-Level Visual Grounding in Videos(视频GLaMM:一种用于视频中像素级视觉定位的大型多模态模型) [09:03] ⚡ SVDQunat: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models(SVDQuant:通过低秩成分吸收异常值的4比特扩散模型) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

10分钟
97
5个月前

2024.11.06 每日AI论文 | HTML提升RAG性能,分子图助手优化多模态任务

本期的 11 篇论文如下: [00:30] 📄 HtmlRAG: HTML is Better Than Plain Text for Modeling Retrieved Knowledge in RAG Systems(HtmlRAG:在RAG系统中,HTML比纯文本更适合建模检索知识) [01:12] 🧬 LLaMo: Large Language Model-based Molecular Graph Assistant(基于大型语言模型的分子图助手) [01:52] 🤖 DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot Execution(DeeR-VLA:动态推理多模态大语言模型以实现高效机器人执行) [02:28] 🤖 Sample-Efficient Alignment for LLMs(LLM的高效对齐方法) [03:01] 🚦 Controlling Language and Diffusion Models by Transporting Activations(通过传输激活控制语言和扩散模型) [03:49] 🌟 DreamPolish: Domain Score Distillation With Progressive Geometry Generation(梦幻抛光:基于渐进几何生成的领域分数蒸馏) [04:32] 🦓 Zebra-Llama: A Context-Aware Large Language Model for Democratizing Rare Disease Knowledge(斑马-羊驼:一种用于普及罕见病知识的上下文感知大型语言模型) [05:12] 👕 GarVerseLOD: High-Fidelity 3D Garment Reconstruction from a Single In-the-Wild Image using a Dataset with Levels of Details(GarVerseLOD:利用多层次细节数据集从单张自然图像中进行高保真3D服装重建) [05:46] 🔍 Correlation of Object Detection Performance with Visual Saliency and Depth Estimation(目标检测性能与视觉显著性和深度估计的相关性) [06:28] 🔄 Adaptive Length Image Tokenization via Recurrent Allocation(通过递归分配实现自适应长度图像标记化) [07:01] 🧠 Inference Optimal VLMs Need Only One Visual Token but Larger Models(推断最优的视觉语言模型仅需一个视觉标记但需要更大的模型) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

8分钟
99
5个月前

2024.11.05 每日AI论文 | AndroidLab提升代理性能,WebRL优化网络任务表现。

本期的 17 篇论文如下: [00:26] 🤖 AndroidLab: Training and Systematic Benchmarking of Android Autonomous Agents(AndroidLab:Android自主代理的训练与系统基准测试) [01:15] 🌐 WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning(WebRL:通过自进化在线课程强化学习训练LLM网络代理) [01:55] 🌐 Training-free Regional Prompting for Diffusion Transformers(无需训练的扩散变换器区域提示) [02:36] 🌍 Survey of Cultural Awareness in Language Models: Text and Beyond(语言模型中的文化意识调查:文本与超越) [03:15] 🤖 Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent(混元-大:腾讯开源的520亿激活参数模型) [03:52] 📊 DynaMath: A Dynamic Visual Benchmark for Evaluating Mathematical Reasoning Robustness of Vision Language Models(DynaMath:评估视觉语言模型数学推理鲁棒性的动态视觉基准) [04:29] 🎥 How Far is Video Generation from World Model: A Physical Law Perspective(视频生成与世界模型有多远:物理定律视角) [05:08] ⚡ Adaptive Caching for Faster Video Generation with Diffusion Transformers(基于扩散变换器的自适应缓存加速视频生成) [05:48] 🦖 DynaSaur: Large Language Agents Beyond Predefined Actions(DynaSaur:超越预定义动作的大型语言模型代理) [06:26] 🎥 GenXD: Generating Any 3D and 4D Scenes(GenXD:生成任意3D和4D场景) [07:01] 📊 Sparsing Law: Towards Large Language Models with Greater Activation Sparsity(稀疏化定律:迈向更大激活稀疏性的大语言模型) [07:45] 📚 LIBMoE: A Library for comprehensive benchmarking Mixture of Experts in Large Language Models(LIBMoE:大型语言模型中混合专家的综合基准库) [08:26] 🎥 PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance(提示引导下的多样化视频序列理解) [09:08] ⚖ "Give Me BF16 or Give Me Death"? Accuracy-Performance Trade-Offs in LLM Quantization(给我BF16还是给我死亡?LLM量化中的精度-性能权衡) [09:48] 🌌 Decoding Dark Matter: Specialized Sparse Autoencoders for Interpreting Rare Concepts in Foundation Models(解码暗物质:用于解释基础模型中罕见概念的专用稀疏自编码器) [10:36] 🎨 MVPaint: Synchronized Multi-View Diffusion for Painting Anything 3D(MVPaint:同步多视角扩散用于3D绘画) [11:14] 🌍 Swan and ArabicMTEB: Dialect-Aware, Arabic-Centric, Cross-Lingual, and Cross-Cultural Embedding Models and Benchmarks(天鹅与阿拉伯MTEB:方言感知、以阿拉伯语为中心、跨语言和跨文化的嵌入模型与基准) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

12分钟
92
5个月前

2024.11.04 每日AI论文 | OS-ATLAS提升GUI代理性能,CAF优化生成模型效率。

本期的 17 篇论文如下: [00:25] 🤖 OS-ATLAS: A Foundation Action Model for Generalist GUI Agents(OS-ATLAS:通用GUI代理的基础动作模型) [01:07] ⚙ Constant Acceleration Flow(恒定加速度流) [01:53] 🍅 TOMATO: Assessing Visual Temporal Reasoning Capabilities in Multimodal Foundation Models(番茄:评估多模态基础模型在视觉时间推理能力) [02:33] 🎨 Randomized Autoregressive Visual Generation(随机自回归视觉生成) [03:10] 🧠 Adapting While Learning: Grounding LLMs for Scientific Problems with Intelligent Tool Usage Adaptation(边学习边适应:通过智能工具使用适应性将LLMs应用于科学问题) [03:50] 📚 Personalization of Large Language Models: A Survey(大型语言模型的个性化:综述) [04:29] 🖼 In-Context LoRA for Diffusion Transformers(上下文LoRA用于扩散变换器) [05:09] ⚡ SambaMixer: State of Health Prediction of Li-ion Batteries using Mamba State Space Models(SambaMixer:使用Mamba状态空间模型预测锂离子电池健康状态) [05:54] 🤖 Survey of User Interface Design and Interaction Techniques in Generative AI Applications(生成式AI应用中的用户界面设计与交互技术综述) [06:32] 🧶 HelloMeme: Integrating Spatial Knitting Attentions to Embed High-Level and Fidelity-Rich Conditions in Diffusion Models(HelloMeme:将空间编织注意力整合到扩散模型中以嵌入高层次和丰富保真度的条件) [07:07] 🌐 M2rc-Eval: Massively Multilingual Repository-level Code Completion Evaluation(M2rc-Eval:大规模多语言仓库级代码补全评估) [07:44] 🌆 CityGaussianV2: Efficient and Geometrically Accurate Reconstruction for Large-Scale Scenes(城市高斯V2:大规模场景的高效几何精确重建) [08:22] 🔄 GPT or BERT: why not both?(GPT还是BERT:为何不两者兼得?) [09:02] 🎭 Face Anonymization Made Simple(面部匿名化变得简单) [09:40] 📊 Zipfian Whitening(齐夫白化) [10:19] 📚 WikiNER-fr-gold: A Gold-Standard NER Corpus(WikiNER-fr-gold:一个金标准命名实体识别语料库) [10:53] 🧠 GRS-QA -- Graph Reasoning-Structured Question Answering Dataset(图推理结构化问答数据集) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

11分钟
87
5个月前
EarsOnMe

加入我们的 Discord

与播客爱好者一起交流

立即加入

播放列表

自动播放下一个

播放列表还是空的

去找些喜欢的节目添加进来吧