本期的 16 篇论文如下:
[00:25] 🌐 Baichuan-Omni Technical Report(百川-Omni 技术报告)
[00:59] 🖼 Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis(Meissonic:高效高分辨率文本到图像生成的掩码生成Transformer复兴)
[01:41] 🔧 From Generalist to Specialist: Adapting Vision Language Models via Task-Specific Visual Instruction Tuning(从通才到专家:通过任务特定视觉指令调整适应视觉语言模型)
[02:17] 🎨 EvolveDirector: Approaching Advanced Text-to-Image Generation with Large Vision-Language Models(进化导演:利用大规模视觉语言模型接近高级文本到图像生成)
[02:53] 🧠 StructRAG: Boosting Knowledge Intensive Reasoning of LLMs via Inference-time Hybrid Information Structurization(结构化RAG:通过推理时混合信息结构化提升LLMs的知识密集型推理能力)
[03:34] 📏 PositionID: LLMs can Control Lengths, Copy and Paste with Explicit Positional Awareness(大语言模型:具备显式位置感知的长度控制与复制粘贴)
[04:11] 🌐 Semantic Score Distillation Sampling for Compositional Text-to-3D Generation(语义分数蒸馏采样用于组合式文本到3D生成)
[04:47] 🧠 SuperCorrect: Supervising and Correcting Language Models with Error-Driven Insights(超级纠正:利用错误驱动的洞察力监督和纠正语言模型)
[05:29] 🔄 Mechanistic Permutability: Match Features Across Layers(机制可置换性:跨层匹配特征)
[06:07] 🤖 Multi-Agent Collaborative Data Selection for Efficient LLM Pretraining(多智能体协作数据选择以提高LLM预训练效率)
[06:45] ⚡ KV Prediction for Improved Time to First Token(KV预测提升首次输出时间)
[07:30] 🌐 ZeroComp: Zero-shot Object Compositing from Image Intrinsics via Diffusion(零样本对象合成:基于扩散的图像内在特性)
[08:13] 🚨 MiRAGeNews: Multimodal Realistic AI-Generated News Detection(多模态现实AI生成新闻检测)
[08:52] 🤖 DA-Code: Agent Data Science Code Generation Benchmark for Large Language Models(DA-Code:面向大型语言模型的代理数据科学代码生成基准)
[09:30] 📈 I-Max: Maximize the Resolution Potential of Pre-trained Rectified Flow Transformers with Projected Flow(I-Max:最大化预训练校正流变换器的分辨率潜力与投影流)
[10:12] 🧠 Mentor-KD: Making Small Language Models Better Multi-step Reasoners(导师-KD:使小型语言模型成为更好的多步推理者)

【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递

空空如也
暂无小宇宙热门评论