本期的 15 篇论文如下:
[00:22] 💡 Kuwain 1.5B: An Arabic SLM via Language Injection(Kuwain 1.5B:一种基于语言注入的阿拉伯语SLM)
[00:58] 🤖 TTRL: Test-Time Reinforcement Learning(测试时强化学习)
[01:40] 🌍 The Bitter Lesson Learned from 2,000+ Multilingual Benchmarks(从2000+多语种评测基准中汲取的惨痛教训)
[02:23] 🖼 Describe Anything: Detailed Localized Image and Video Captioning(描述一切:细粒度局部图像与视频字幕生成)
[03:00] 💡 Learning Adaptive Parallel Reasoning with Language Models(基于语言模型的自适应并行推理学习)
[03:34] 🖼 IV-Bench: A Benchmark for Image-Grounded Video Perception and Reasoning in Multimodal LLMs(IV-Bench:多模态大语言模型中基于图像的视频感知与推理基准)
[04:19] 📖 BookWorld: From Novels to Interactive Agent Societies for Creative Story Generation(BookWorld:从小说到交互式智能体社会,用于创意故事生成)
[05:10] 🚀 Efficient Pretraining Length Scaling(高效预训练长度扩展)
[05:49] 🩻 CheXWorld: Exploring Image World Modeling for Radiograph Representation Learning(CheXWorld:探索用于X射线影像表征学习的图像世界建模)
[06:26] 🖼 Personalized Text-to-Image Generation with Auto-Regressive Models(基于自回归模型的个性化文本到图像生成)
[07:08] 🗣 LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale(LiveCC:基于大规模流式语音转录学习视频大语言模型)
[07:47] 🎬 Vidi: Large Multimodal Models for Video Understanding and Editing(Vidi:用于视频理解与编辑的大型多模态模型)
[08:27] 🖼 From Reflection to Perfection: Scaling Inference-Time Optimization for Text-to-Image Diffusion Models via Reflection Tuning(从反思到完美:通过反思调优扩展文本到图像扩散模型的推理时优化)
[09:03] 🤖 LLMs are Greedy Agents: Effects of RL Fine-tuning on Decision-Making Abilities(LLM是贪婪的智能体:强化学习微调对决策能力的影响)
[09:44] 🤖 WALL-E 2.0: World Alignment by NeuroSymbolic Learning improves World Model-based LLM Agents(WALL-E 2.0:通过神经符号学习实现世界对齐,提升基于世界模型的LLM智能体性能)

【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递

空空如也
暂无小宇宙热门评论