大家好,欢迎收听'Hugging Face 每日AI论文速递'。今天是2024年7月22日,我们将带您快速浏览今日的15篇热门AI论文,涵盖视觉-语言模型、长上下文LLM推理、文本到3D生成等多个前沿领域。精彩内容,马上开始!
[00:25] 🧠 EVLM: An Efficient Vision-Language Model for Visual Understanding(EVLM:一种用于视觉理解的高效视觉-语言模型)
[00:55] 📚 ChatQA 2: Bridging the Gap to Proprietary LLMs in Long Context and RAG Capabilities(ChatQA 2:弥合开放访问LLMs与专有LLMs在长上下文与RAG能力上的差距)
[01:32] ⚡ LazyLLM: Dynamic Token Pruning for Efficient Long Context LLM Inference(LazyLLM:动态令牌剪枝技术在长上下文LLM推理中的高效应用)
[02:05] 🤖 The Vision of Autonomic Computing: Can LLMs Make It a Reality?(自主计算愿景:LLMs能否使其成为现实?)
[02:35] 🔊 Stable Audio Open(稳定音频开放)
[03:07] 📄 VisFocus: Prompt-Guided Vision Encoders for OCR-Free Dense Document Understanding(VisFocus:无需OCR的视觉编码器用于密集文档理解)
[03:39] 📄 Visual Text Generation in the Wild(真实场景中的视觉文本生成)
[04:10] 🚀 Jumping Ahead: Improving Reconstruction Fidelity with JumpReLU Sparse Autoencoders(跳跃前进:通过JumpReLU稀疏自动编码器提高重建保真度)
[04:44] 🔬 SciCode: A Research Coding Benchmark Curated by Scientists(SciCode:科学家策划的研究编码基准)
[05:16] 🚀 Fast Matrix Multiplications for Lookup Table-Quantized LLMs(大型语言模型的查找表量化快速矩阵乘法)
[05:51] 🌐 PlacidDreamer: Advancing Harmony in Text-to-3D Generation(PlacidDreamer:推进文本到3D生成的和谐)
[06:28] 🔄 Phi-3 Safety Post-Training: Aligning Language Models with a "Break-Fix" Cycle(Phi-3安全后训练:通过“break-fix”循环对齐语言模型)
[06:59] 🎵 Efficient Audio Captioning with Encoder-Level Knowledge Distillation(基于编码器级知识蒸馏的高效音频描述)
[07:27] 📚 Qalam : A Multimodal LLM for Arabic Optical Character and Handwriting Recognition(Qalam:一种用于阿拉伯光学字符和手写识别的多模态大型语言模型)
[08:03] 🌐 SparseCraft: Few-Shot Neural Reconstruction through Stereopsis Guided Geometric Linearization(SparseCraft:基于立体视觉引导的几何线性化少样本神经重建)

【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递

空空如也
暂无小宇宙热门评论