
时长:
7分钟
播放:
102
发布:
2周前
主播...
简介...
本期的 15 篇论文如下:
[00:24] 🚀 InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency(InternVL3.5:提升开源多模态模型在通用性、推理能力和效率上的表现)
[00:52] 🧠 Visual-CoG: Stage-Aware Reinforcement Learning with Chain of Guidance for Text-to-Image Generation(Visual-CoG:阶段感知强化学习与指导链用于文本到图像生成)
[01:19] 🎨 MV-RAG: Retrieval Augmented Multiview Diffusion(MV-RAG:检索增强多视角扩散)
[01:45] 🧠 T2I-ReasonBench: Benchmarking Reasoning-Informed Text-to-Image Generation(T2I-ReasonBench:推理增强型文本到图像生成基准评估)
[02:10] 🤔 Beyond Memorization: Extending Reasoning Depth with Recurrence, Memory and Test-Time Compute Scaling(超越记忆:借助循环、记忆和测试时计算扩展来提升推理深度)
[02:41] 🚀 Breaking the Exploration Bottleneck: Rubric-Scaffolded Reinforcement Learning for General LLM Reasoning(打破探索瓶颈:通用大型语言模型推理的评分标准支架式强化学习)
[03:04] 🎨 PosterGen: Aesthetic-Aware Paper-to-Poster Generation via Multi-Agent LLMs(PosterGen:基于多智能体LLMs的美学感知型论文海报生成)
[03:25] 🤔 UQ: Assessing Language Models on Unsolved Questions(UQ:评估语言模型面对未解决问题)
[03:54] 📚 MEENA (PersianMMMU): Multimodal-Multilingual Educational Exams for N-level Assessment(MEENA (PersianMMMU):面向多级别评估的多模态多语言教育考试)
[04:25] 🗺 Explain Before You Answer: A Survey on Compositional Visual Reasoning(先解释再回答:组合式视觉推理研究综述)
[04:47] 📊 ST-Raptor: LLM-Powered Semi-Structured Table Question Answering(ST-Raptor:大语言模型驱动的半结构化表格问答)
[05:15] 🔍 SpotEdit: Evaluating Visually-Guided Image Editing Methods(SpotEdit:评估视觉引导的图像编辑方法)
[05:39] 📖 German4All - A Dataset and Model for Readability-Controlled Paraphrasing in German(German4All:德语中可读性控制复述的数据集与模型)
[06:06] 📉 Limitations of Normalization in Attention Mechanism(注意力机制中归一化的局限性)
[06:33] 🌐 MeshSplat: Generalizable Sparse-View Surface Reconstruction via Gaussian Splatting(MeshSplat:基于高斯辐射场的可泛化稀疏视角表面重建)
【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递
[00:24] 🚀 InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency(InternVL3.5:提升开源多模态模型在通用性、推理能力和效率上的表现)
[00:52] 🧠 Visual-CoG: Stage-Aware Reinforcement Learning with Chain of Guidance for Text-to-Image Generation(Visual-CoG:阶段感知强化学习与指导链用于文本到图像生成)
[01:19] 🎨 MV-RAG: Retrieval Augmented Multiview Diffusion(MV-RAG:检索增强多视角扩散)
[01:45] 🧠 T2I-ReasonBench: Benchmarking Reasoning-Informed Text-to-Image Generation(T2I-ReasonBench:推理增强型文本到图像生成基准评估)
[02:10] 🤔 Beyond Memorization: Extending Reasoning Depth with Recurrence, Memory and Test-Time Compute Scaling(超越记忆:借助循环、记忆和测试时计算扩展来提升推理深度)
[02:41] 🚀 Breaking the Exploration Bottleneck: Rubric-Scaffolded Reinforcement Learning for General LLM Reasoning(打破探索瓶颈:通用大型语言模型推理的评分标准支架式强化学习)
[03:04] 🎨 PosterGen: Aesthetic-Aware Paper-to-Poster Generation via Multi-Agent LLMs(PosterGen:基于多智能体LLMs的美学感知型论文海报生成)
[03:25] 🤔 UQ: Assessing Language Models on Unsolved Questions(UQ:评估语言模型面对未解决问题)
[03:54] 📚 MEENA (PersianMMMU): Multimodal-Multilingual Educational Exams for N-level Assessment(MEENA (PersianMMMU):面向多级别评估的多模态多语言教育考试)
[04:25] 🗺 Explain Before You Answer: A Survey on Compositional Visual Reasoning(先解释再回答:组合式视觉推理研究综述)
[04:47] 📊 ST-Raptor: LLM-Powered Semi-Structured Table Question Answering(ST-Raptor:大语言模型驱动的半结构化表格问答)
[05:15] 🔍 SpotEdit: Evaluating Visually-Guided Image Editing Methods(SpotEdit:评估视觉引导的图像编辑方法)
[05:39] 📖 German4All - A Dataset and Model for Readability-Controlled Paraphrasing in German(German4All:德语中可读性控制复述的数据集与模型)
[06:06] 📉 Limitations of Normalization in Attention Mechanism(注意力机制中归一化的局限性)
[06:33] 🌐 MeshSplat: Generalizable Sparse-View Surface Reconstruction via Gaussian Splatting(MeshSplat:基于高斯辐射场的可泛化稀疏视角表面重建)
【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递
评价...
空空如也
小宇宙热门评论...
暂无小宇宙热门评论