节目列表: HuggingFace 每日AI论文速递 - EarsOnMe

2024.07.10 每日AI论文

Hugging Face 每日AI论文速递每天10分钟，带您快速了解当日HuggingFace热门AI论文内容今天带来的 16 篇论文如下： 👓 Vision language models are blind（视觉语言模型是盲的） 📹 Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision（视频-STaR：自训练实现视频指令调整与任意监督） 🌐 Internet of Agents: Weaving a Web of Heterogeneous Agents for Collaborative Intelligence（代理互联网：编织异构代理网络以实现协作智能） 👤 RodinHD: High-Fidelity 3D Avatar Generation with Diffusion Models（RodinHD：使用扩散模型生成高保真3D虚拟形象） 📚 AgentInstruct: Toward Generative Teaching with Agentic Flows（AgentInstruct：通过代理流程实现生成教学） 📚 Adapting LLMs to Hebrew: Unveiling DictaLM 2.0 with Enhanced Vocabulary and Instruction Capabilities（适应希伯来语的大型语言模型：揭示DictaLM 2.0及其增强词汇和指令能力） 📹 MiraData: A Large-Scale Video Dataset with Long Durations and Structured Captions（MiraData：一个大规模视频数据集，具有长时长和结构化详细字幕） 🌐 Graph-Based Captioning: Enhancing Visual Descriptions by Interconnecting Region Captions（基于图的描述：通过互联区域描述增强视觉描述） 🔍 Lookback Lens: Detecting and Mitigating Contextual Hallucinations in Large Language Models Using Only Attention Maps（回溯透镜：仅使用注意力映射检测和缓解大型语言模型中的上下文幻觉） 📚 Knowledge Composition using Task Vectors with Learned Anisotropic Scaling（使用任务向量的学习各向异性缩放进行知识组合） 📚 TheoremLlama: Transforming General-Purpose LLMs into Lean4 Experts（TheoremLlama：将通用大型语言模型转化为Lean4专家） ⚡ BM25S: Orders of magnitude faster lexical search via eager sparse scoring（BM25S：通过急切稀疏评分实现数量级更快的词汇搜索） 🎥 VIMI: Grounding Video Generation through Multi-modal Instruction（VIMI：通过多模态指令生成视频） 🔄 From Loops to Oops: Fallback Behaviors of Language Models Under Uncertainty（从循环到失误：语言模型在不确定性条件下的回退行为） 📚 How do you know that? Teaching Generative Language Models to Reference Answers to Biomedical Questions（如何知道？教学生成语言模型引用生物医学问题的答案） 📈 LETS-C: Leveraging Language Embedding for Time Series Classification（利用语言嵌入进行时间序列分类）

11分钟

28

1年前

2024.07.09 每日AI论文

HuggingFace 每日AI论文速递

Hugging Face 每日AI论文速递每天10分钟，带您快速了解当日HuggingFace热门AI论文内容今天带来的 17 篇论文如下： 📊 MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?（MJ-Bench：你的多模态奖励模型真的是文本到图像生成的好评判吗？） 🌐 LLaMAX: Scaling Linguistic Horizons of LLM by Enhancing Translation Capabilities Beyond 100 Languages（LLaMAX：通过增强翻译能力扩展大型语言模型的语言视野至100种以上语言） 🎥 Learning Action and Reasoning-Centric Image Editing from Videos and Simulations（从视频和模拟中学习以动作和推理为中心的图像编辑） 📚 Associative Recurrent Memory Transformer（关联循环记忆变换器） 🌐 ANOLE: An Open, Autoregressive, Native Large Multimodal Models for Interleaved Image-Text Generation（ANOLE：一种开源、自回归、原生的大型多模态模型，用于交错图像-文本生成） 📚 Evaluating Language Model Context Windows: A "Working Memory" Test and Inference-time Correction（评估语言模型上下文窗口：一种“工作记忆”测试与推理时校正） 🎥 Compositional Video Generation as Flow Equalization（组合视频生成作为流量均衡） 📊 PAS: Data-Efficient Plug-and-Play Prompt Augmentation System（PAS：数据高效的即插即用提示增强系统） 🚀 InverseCoder: Unleashing the Power of Instruction-Tuned Code LLMs with Inverse-Instruct（InverseCoder：通过逆向指令释放指令调优代码大型语言模型的潜力） 🛠️ Tailor3D: Customized 3D Assets Editing and Generation with Dual-Side Images（Tailor3D：利用双面图像定制化编辑和生成3D资产） 🖼️ UltraEdit: Instruction-based Fine-Grained Image Editing at Scale（超编辑：基于指令的细粒度大规模图像编辑） 📚 Training Task Experts through Retrieval Based Distillation（通过检索基础提炼训练任务专家） 👁️‍🗨️ Multi-Object Hallucination in Vision-Language Models（视觉语言模型中的多对象幻觉现象） 🔍 Understanding Visual Feature Reliance through the Lens of Complexity（通过复杂度视角理解视觉特征依赖） 🎨 PartCraft: Crafting Creative Objects by Parts（PartCraft：通过部分创作创意物体） 📚 LLMAEL: Large Language Models are Good Context Augmenters for Entity Linking（大型语言模型在实体链接中的上下文增强作用） 🔍 ANAH-v2: Scaling Analytical Hallucination Annotation of Large Language Models（ANAH-v2：扩展大型语言模型幻觉标注的规模）

11分钟

31

1年前

2024.07.08 每日AI论文

HuggingFace 每日AI论文速递

Hugging Face 每日AI论文速递每天10分钟，带您快速了解当日HuggingFace热门AI论文内容今天带来的 15 篇论文如下： 🌐 Unveiling Encoder-Free Vision-Language Models（揭示无编码器的视觉-语言模型） 🗣️ FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs（FunAudioLLM：用于增强人类与大型语言模型之间自然语音交互的语音理解和生成基础模型） 🧠 AriGraph: Learning Knowledge Graph World Models with Episodic Memory for LLM Agents（AriGraph：为LLM代理学习知识图世界模型与情景记忆） 📚 Learning to (Learn at Test Time): RNNs with Expressive Hidden States（学习在测试时学习：具有表达性隐藏状态的RNN） 📊 ChartGemma: Visual Instruction-tuning for Chart Reasoning in the Wild（ChartGemma：针对野外图表推理的视觉指令调优） 📈 RULE: Reliable Multimodal RAG for Factuality in Medical Vision Language Models（可靠的多模态RAG用于医学视觉语言模型的事实性） 🗣️ Stark: Social Long-Term Multi-Modal Conversation with Persona Commonsense Knowledge（STARK：具有人格常识知识的社会长期多模态对话） 🧠 DotaMath: Decomposition of Thought with Code Assistance and Self-correction for Mathematical Reasoning（DotaMath：利用代码辅助和自我修正的思维分解方法进行数学推理） 🛡️ Safe Unlearning: A Surprisingly Effective and Generalizable Solution to Defend Against Jailbreak Attacks（安全遗忘：一种有效且具有普遍性的防御越狱攻击解决方案） 📊 On scalable oversight with weak LLMs judging strong LLMs（关于可扩展监督协议下弱大型语言模型对强大型语言模型的监督研究） 🎥 Flash-VStream: Memory-Based Real-Time Understanding for Long Video Streams（基于内存的实时长视频流理解） 📊 HEMM: Holistic Evaluation of Multimodal Foundation Models（HEMM：多模态基础模型的整体评估） 🤝 LLM-jp: A Cross-organizational Project for the Research and Development of Fully Open Japanese LLMs（LLM-jp：一个跨组织项目，用于完全开放的日本大型语言模型的研究与开发） 📷 CRiM-GS: Continuous Rigid Motion-Aware Gaussian Splatting from Motion Blur Images（CRiM-GS：从运动模糊图像中连续刚体运动感知的高斯喷溅） 🔍 Granular Privacy Control for Geolocation with Vision Language Models（视觉语言模型的粒度隐私控制：地理定位）

10分钟

56

1年前

2024.07.05 每日AI论文

HuggingFace 每日AI论文速递

Hugging Face 每日AI论文速递每天10分钟，带您快速了解当日HuggingFace热门AI论文内容今天带来的 3 篇论文如下： 🔄 Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion（扩散强制：下一词预测与全序列扩散的结合） 🔍 Let the Expert Stick to His Last: Expert-Specialized Fine-Tuning for Sparse Architectural Large Language Models（让专家专注于他的领域：稀疏架构大型语言模型的专家专业化微调） 📊 Planetarium: A Rigorous Benchmark for Translating Text to Structured Planning Languages（天文馆：一个严格的基准，用于评估将文本转换为结构化规划语言的能力）

2分钟

99+

1年前

2024.07.10 每日AI论文

2024.07.09 每日AI论文

2024.07.08 每日AI论文

2024.07.05 每日AI论文

推荐播单

加入我们的 Discord

扫描微信二维码

播放列表