Album
时长:
11分钟
播放:
88
发布:
9个月前
主播...
简介...
https://xiaoyuzhoufm.com

本期的 15 篇论文如下:


[00:25] 🖼 CoSTA$\ast$: Cost-Sensitive Toolpath Agent for Multi-turn Image Editing(CoSTA*:面向多轮图像编辑的成本敏感工具路径代理)


[01:03] 🎭 Silent Branding Attack: Trigger-free Data Poisoning Attack on Text-to-Image Diffusion Models(无声品牌攻击:针对文本到图像扩散模型的无触发数据投毒攻击)


[01:45] 🌍 World Modeling Makes a Better Planner: Dual Preference Optimization for Embodied Task Planning(世界建模提升规划器性能:双重偏好优化用于具身任务规划)


[02:30] 🗺 Charting and Navigating Hugging Face's Model Atlas(绘制与导航Hugging Face的模型地图)


[03:14] 🧠 GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing(GoT:释放多模态大型语言模型的推理能力用于视觉生成与编辑)


[03:48] 🎨 CoRe^2: Collect, Reflect and Refine to Generate Better and Faster(CoRe^2:收集、反思与精炼以生成更快更好的图像)


[04:29] 🧠 Transformers without Normalization(无需归一化的Transformer)


[05:06] 🌐 GroundingSuite: Measuring Complex Multi-Granular Pixel Grounding(GroundingSuite:测量复杂多粒度像素接地)


[05:50] 🤖 New Trends for Modern Machine Translation with Large Reasoning Models(现代机器翻译的新趋势:基于大型推理模型的研究)


[06:32] 📝 Shifting Long-Context LLMs Research from Input to Output(从输入到输出:长上下文大语言模型研究的转变)


[07:09] 🌐 VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search(视觉网页指令:通过网络搜索扩展多模态指令数据)


[07:54] 🧠 DiT-Air: Revisiting the Efficiency of Diffusion Model Architecture Design in Text to Image Generation(DiT-Air: 重新审视扩散模型架构设计在文本到图像生成中的效率)


[08:35] 🐱 Do I look like a `cat.n.01` to you? A Taxonomy Image Generation Benchmark(我看起来像一只猫吗?分类图像生成基准)


[09:20] 🎥 Open-Sora 2.0: Training a Commercial-Level Video Generation Model in $200k(Open-Sora 2.0:以20万美元训练商用级视频生成模型)


[10:01] 🎥 Long Context Tuning for Video Generation(长上下文调优用于视频生成)





【关注我们】


您还可以在以下平台找到我们,获得播客内容以外更多信息


小红书: AI速递

评价...

空空如也

小宇宙热门评论...

暂无小宇宙热门评论

EarsOnMe

加入我们的 Discord

与播客爱好者一起交流

立即加入

扫描微信二维码

添加微信好友,获取更多播客资讯

微信二维码

播放列表

自动播放下一个

播放列表还是空的

去找些喜欢的节目添加进来吧