评分
暂无评分
0人评价
5星
0%
4星
0%
3星
0%
2星
0%
1星
0%
AI智能总结...
AI/summary > _
AI 正在思考中...
本集内容尚未生成 AI 总结
简介...
https://xiaoyuzhoufm.com

本期的 31 篇论文如下:

[00:23] 📊 MixEval-X: Any-to-Any Evaluations from Real-World Data Mixtures(MixEval-X:从现实世界数据混合中进行任意到任意评估)

[01:02] 🎥 Movie Gen: A Cast of Media Foundation Models(电影生成:媒体基础模型集合)

[01:35] 📱 MobA: A Two-Level Agent System for Efficient Mobile Task Automation(MobA:一种高效移动任务自动化的两级代理系统)

[02:18] 🌐 Harnessing Webpage UIs for Text-Rich Visual Understanding(利用网页UI进行丰富的视觉理解)

[02:59] 🔄 Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation(雅努斯:解耦视觉编码以实现统一的多模态理解和生成)

[03:29] 🩺 MMed-RAG: Versatile Multimodal RAG System for Medical Vision Language Models(多功能多模态RAG系统在医学视觉语言模型中的应用)

[04:04] 📊 A Unified View of Delta Parameter Editing in Post-Trained Large-Scale Models(大规模模型后训练中Delta参数编辑的统一视角)

[04:46] 🔄 PopAlign: Diversifying Contrasting Patterns for a More Comprehensive Alignment(PopAlign:多样化对比模式以实现更全面的模型对齐)

[05:23] 🔍 BenTo: Benchmark Task Reduction with In-Context Transferability(BenTo: 基于上下文迁移性的基准任务缩减)

[06:03] 🎥 DreamVideo-2: Zero-Shot Subject-Driven Video Customization with Precise Motion Control(DreamVideo-2:零样本主题驱动视频定制与精确运动控制)

[06:49] 🧠 MoH: Multi-Head Attention as Mixture-of-Head Attention(MoH:多头部注意力机制作为混合头部注意力机制)

[07:28] 🎥 VidPanos: Generative Panoramic Videos from Casual Panning Videos(VidPanos:从随意拍摄的平移视频生成全景视频)

[08:03] 📉 FlatQuant: Flatness Matters for LLM Quantization(FlatQuant:扁平化对LLM量化的重要性)

[08:44] 🔄 Retrospective Learning from Interactions(从交互中回顾学习)

[09:22] 🔄 Failing Forward: Improving Generative Error Correction for ASR with Synthetic Data and Retrieval Augmentation(向前失败:利用合成数据和检索增强改进ASR的生成错误校正)

[10:06] 🖼 Can MLLMs Understand the Deep Implication Behind Chinese Images?(多模态大语言模型能否理解中文图像的深层含义?)

[10:43] 📱 MedMobile: A mobile-sized language model with expert-level clinical capabilities(MedMobile:具备专家级临床能力的移动端语言模型)

[11:22] 🌍 WorldCuisines: A Massive-Scale Benchmark for Multilingual and Multicultural Visual Question Answering on Global Cuisines(世界美食:多语言多文化视觉问答的大规模基准)

[12:04] 🤖 Remember, Retrieve and Generate: Understanding Infinite Visual Concepts as Your Personalized Assistant(记住、检索与生成:理解无限视觉概念作为个性化助手)

[12:48] 🔄 LoLDU: Low-Rank Adaptation via Lower-Diag-Upper Decomposition for Parameter-Efficient Fine-Tuning(LoLDU:通过下三角-对角-上三角分解实现低秩适应的参数高效微调)

[13:29] 🔒 AERO: Softmax-Only LLMs for Efficient Private Inference(AERO:仅使用Softmax的LLM实现高效隐私推断)

[14:12] 🌐 $γ-$MoD: Exploring Mixture-of-Depth Adaptation for Multimodal Large Language Models(γ-MoD:探索多模态大语言模型的深度混合适应)

[14:45] 🌐 Long-LRM: Long-sequence Large Reconstruction Model for Wide-coverage Gaussian Splats(长序列大重建模型:广覆盖高斯点云)

[15:24] 🎶 MuVi: Video-to-Music Generation with Semantic Alignment and Rhythmic Synchronization(MuVi:视频到音乐生成与语义对齐及节奏同步)

[16:05] 🔒 Do LLMs Have Political Correctness? Analyzing Ethical Biases and Jailbreak Vulnerabilities in AI Systems(大型语言模型是否具备政治正确性?分析AI系统中的伦理偏见与越狱漏洞)

[16:48] 📚 SBI-RAG: Enhancing Math Word Problem Solving for Students through Schema-Based Instruction and Retrieval-Augmented Generation(基于模式教学和检索增强生成的数学应用题解决方法)

[17:27] 🗺 Roadmap towards Superhuman Speech Understanding using Large Language Models(基于大型语言模型的超人类语音理解路线图)

[18:05] 🔄 Toward Guidance-Free AR Visual Generation via Condition Contrastive Alignment(面向无指导的AR视觉生成的条件对比对齐)

[18:47] 🤖 TransAgent: Transfer Vision-Language Foundation Models with Heterogeneous Agent Collaboration(TransAgent:异构代理协作迁移视觉语言基础模型)

[19:25] 🔬 Open Materials 2024 (OMat24) Inorganic Materials Dataset and Models(开放材料2024(OMat24)无机材料数据集与模型)

[20:05] 📚 Minimum Tuning to Unlock Long Output from LLMs with High Quality Data as the Key(最小调整解锁LLM长输出:高质量数据的关键)

【关注我们】

您还可以在以下平台找到我们,获得播客内容以外更多信息

小红书: AI速递

主播...
拨号上网
评价...

空空如也

小宇宙热门评论...

暂无小宇宙热门评论

EarsOnMe

加入我们的 Discord

与播客爱好者一起交流

立即加入

播放列表

自动播放下一个

播放列表还是空的

去找些喜欢的节目添加进来吧