本期的 11 篇论文如下:
[00:28] 🚀 Qwen2.5-Coder Technical Report(Qwen2.5-Coder 技术报告)
[01:06] 🌍 Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution(Qwen2-VL:增强视觉-语言模型在任意分辨率下的世界感知能力)
[01:47] 🎯 LLMs + Persona-Plug = Personalized LLMs(LLMs + Persona-Plug = 个性化LLMs)
[02:32] 🔍 To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning(是否使用CoT?链式思维主要在数学和符号推理中起作用)
[03:12] 🌐 GRIN: GRadient-INformed MoE(GRIN:梯度引导的MoE模型)
[03:50] 📚 Preference Tuning with Human Feedback on Language, Speech, and Vision Tasks: A Survey(基于人类反馈的语言、语音和视觉任务偏好调优:综述)
[04:30] 🎙 Takin: A Cohort of Superior Quality Zero-shot Speech Generation Models(Takin:一组高质量零样本语音生成模型)
[05:19] 🎵 Towards Diverse and Efficient Audio Captioning via Diffusion Models(基于扩散模型的多样化高效音频描述生成)
[06:02] 📚 A Controlled Study on Long Context Extension and Generalization in LLMs(大型语言模型中长上下文扩展与泛化的控制研究)
[06:42] 🌌 Vista3D: Unravel the 3D Darkside of a Single Image(Vista3D:揭开单张图像的3D暗面)
[07:18] 🎧 SoloAudio: Target Sound Extraction with Language-oriented Audio Diffusion Transformer(SoloAudio:基于语言导向的音频扩散Transformer的目标声音提取)

【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递

空空如也
暂无小宇宙热门评论