大家好,欢迎收听“Hugging Face 每日AI论文速递”。今天是2024年8月07日,我们将带您快速浏览12篇热门AI论文,涵盖视觉语言模型评估、图像处理、多模态数据集等多个前沿领域。现在,让我们立即进入今天的论文速递。
[00:24] 📊 MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models(MMIU:评估大型视觉语言模型在多图像理解上的能力)
[01:03] 🌐 LLaVA-OneVision: Easy Visual Task Transfer(LLaVA-OneVision:简易视觉任务转移)
[01:46] 🎨 An Object is Worth 64x64 Pixels: Generating 3D Object via Image Diffusion(一个物体值得64x64像素:通过图像扩散生成3D物体)
[02:26] 🖼 IPAdapter-Instruct: Resolving Ambiguity in Image-based Conditioning using Instruct Prompts(IPAdapter-Instruct:使用指令提示解决基于图像条件控制的模糊性问题)
[03:13] 🩺 MedTrinity-25M: A Large-scale Multimodal Dataset with Multigranular Annotations for Medicine(MedTrinity-25M:一个用于医学的大规模多模态数据集,具有多粒度标注)
[03:50] 🧠 Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters(最优扩展大型语言模型测试时计算量比扩展模型参数更有效)
[04:21] 🧠 CoverBench: A Challenging Benchmark for Complex Claim Verification(CoverBench:一个针对复杂声明验证的挑战性基准)
[05:02] 🔍 Diffusion Models as Data Mining Tools(扩散模型作为数据挖掘工具)
[05:42] 🎭 ReSyncer: Rewiring Style-based Generator for Unified Audio-Visually Synced Facial Performer(ReSyncer:基于风格的生成器用于统一音视频同步面部表演者)
[06:31] 📊 StructEval: Deepen and Broaden Large Language Model Assessment via Structured Evaluation(StructEval:通过结构化评估深化和扩展大型语言模型评估)
[07:11] ⚡ AVESFormer: Efficient Transformer Design for Real-Time Audio-Visual Segmentation(AVESFormer:实时音频-视觉分割的高效Transformer设计)
[07:48] 🔗 Synthesizing Text-to-SQL Data from Weak and Strong LLMs(合成文本到SQL数据:利用弱和强大型语言模型)

【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递

空空如也
暂无小宇宙热门评论