本期的 17 篇论文如下: [00:28] 📄 HelloBench: Evaluating Long Text Generation Capabilities of Large Language Models(HelloBench:评估大型语言模型的长文本生成能力) [01:14] 🌐 Making Text Embedders Few-Shot Learners(利用大语言模型使多语言文本嵌入器成为少样本学习者) [01:51] 🌐 OmniBench: Towards The Future of Universal Omni-Language Models(OmniBench:迈向通用全能语言模型的未来) [02:29] 🔄 Present and Future Generalization of Synthetic Image Detectors(合成图像检测器的现状与未来泛化) [03:08] 🎥 MIMO: Controllable Character Video Synthesis with Spatial Decomposed Modeling(MIMO:基于空间分解建模的可控角色视频合成) [03:43] 🔄 MonoFormer: One Transformer for Both Diffusion and Autoregression(MonoFormer:一个Transformer同时处理扩散和自回归) [04:16] 🌍 EuroLLM: Multilingual Language Models for Europe(欧洲多语言模型:EuroLLM) [04:53] 🖼 MaskBit: Embedding-free Image Generation via Bit Tokens(MaskBit: 通过比特令牌实现无嵌入图像生成) [05:33] 👁 Seeing Faces in Things: A Model and Dataset for Pareidolia(事物中的面孔:幻觉模型与数据集) [06:19] 🤖 Gen2Act: Human Video Generation in Novel Scenarios enables Generalizable Robot Manipulation(Gen2Act:在新场景中生成人类视频以实现可泛化的机器人操作) [06:57] 🎨 Improvements to SDXL in NovelAI Diffusion V3(NovelAI Diffusion V3中SDXL的改进) [07:41] 🔄 Reward-Robust RLHF in LLMs(大语言模型中的奖励鲁棒RLHF) [08:16] 🤖 DynaMo: In-Domain Dynamics Pretraining for Visuo-Motor Control(DynaMo:视觉运动控制的域内动力学预训练) [08:54] 🇮 SLIMER-IT: Zero-Shot NER on Italian Language(SLIMER-IT:意大利语零样本命名实体识别) [09:33] 📈 Time-MoE: Billion-Scale Time Series Foundation Models with Mixture of Experts(基于专家混合的十亿级时间序列基础模型) [10:17] 🛡 RRM: Robust Reward Model Training Mitigates Reward Hacking(RRM:鲁棒奖励模型训练缓解奖励作弊) [10:50] 📊 Tabular Data Generation using Binary Diffusion(使用二进制扩散生成表格数据) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
本期的 14 篇论文如下: [00:25] 🤖 RACER: Rich Language-Guided Failure Recovery Policies for Imitation Learning(RACER:基于丰富语言引导的模仿学习失败恢复策略) [01:00] 🩺 A Preliminary Study of o1 in Medicine: Are We Closer to an AI Doctor?(医学领域中o1的初步研究:我们离AI医生更近了吗?) [01:34] 🧙 PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions(PixWizard:基于开放语言指令的多功能图像到图像视觉助手) [02:16] 👻 Phantom of Latent for Large Language and Vision Models(大语言与视觉模型中的潜在幻影) [02:55] 🩺 Beyond Fine-tuning: Unleashing the Potential of Continuous Pretraining for Clinical LLMs(超越微调:释放临床大型语言模型连续预训练的潜力) [03:31] 🪞 Reflecting Reality: Enabling Diffusion Models to Produce Faithful Mirror Reflections(反映现实:使扩散模型生成可信的镜像反射) [04:11] 🌟 MaterialFusion: Enhancing Inverse Rendering with Material Diffusion Priors(材料融合:通过材料扩散先验增强逆渲染) [04:51] 🩺 An adapted large language model facilitates multiple medical tasks in diabetes care(适应性大型语言模型在糖尿病护理中的多任务应用) [05:30] 🎭 MaskedMimic: Unified Physics-Based Character Control Through Masked Motion Inpainting(基于掩码运动修复的统一物理角色控制) [06:08] 🤖 Style over Substance: Failure Modes of LLM Judges in Alignment Benchmarking(形式胜于实质:LLM评判者在对齐基准测试中的失效模式) [06:54] 🗣 Zero-shot Cross-lingual Voice Transfer for TTS(零样本跨语言语音转换用于TTS) [07:28] 🌐 SpaceBlender: Creating Context-Rich Collaborative Spaces Through Generative 3D Scene Blending(SpaceBlender:通过生成3D场景融合创建上下文丰富的协作空间) [08:07] 🎵 Self-Supervised Audio-Visual Soundscape Stylization(自监督视听音景风格化) [08:42] 📊 A Case Study of Web App Coding with OpenAI Reasoning Models(使用OpenAI推理模型进行Web应用编码的案例研究) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
本期的 11 篇论文如下: [00:26] 🎨 Imagine yourself: Tuning-Free Personalized Image Generation(想象自己:无调优个性化图像生成) [01:02] 😂 YesBut: A High-Quality Annotated Multimodal Dataset for evaluating Satire Comprehension capability of Vision-Language Models(YesBut:评估视觉语言模型讽刺理解能力的高质量多模态数据集) [01:40] 🌍 Prithvi WxC: Foundation Model for Weather and Climate(Prithvi WxC:天气和气候的基础模型) [02:15] 🎵 MuCodec: Ultra Low-Bitrate Music Codec(MuCodec:超低比特率音乐编解码器) [02:51] 🌈 Colorful Diffuse Intrinsic Image Decomposition in the Wild(在野外进行彩色漫反射内在图像分解) [03:29] 🎥 Portrait Video Editing Empowered by Multimodal Generative Priors(基于多模态生成先验的肖像视频编辑) [04:01] 🎥 Temporally Aligned Audio for Video with Autoregression(基于自回归的视频音频时间对齐生成) [04:38] 📱 V^3: Viewing Volumetric Videos on Mobiles via Streamable 2D Dynamic Gaussians(V^3:通过可流式2D动态高斯函数在移动设备上观看体积视频) [05:21] 📚 Fact, Fetch, and Reason: A Unified Evaluation of Retrieval-Augmented Generation(事实、获取与推理:检索增强生成的统一评估) [05:57] 🛡 Hackphyr: A Local Fine-Tuned LLM Agent for Network Security Environments(Hackphyr:用于网络安全环境的本地微调LLM代理) [06:34] 🎻 Minstrel: Structural Prompt Generation with Multi-Agents Coordination for Non-AI Experts(Minstrel:面向非AI专家的多智能体协同结构化提示生成) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
本期的 5 篇论文如下: [00:42] TOP1(🔥86) | 🚀 Qwen2.5-Coder Technical Report(Qwen2.5-Coder 技术报告) [03:55] TOP2(🔥74) | 🤖 Training Language Models to Self-Correct via Reinforcement Learning(通过强化学习训练语言模型进行自我修正) [05:47] TOP3(🔥67) | 🌐 OmniGen: Unified Image Generation(OmniGen:统一图像生成) [07:51] TOP4(🔥54) | 🌍 Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution(Qwen2-VL:增强视觉-语言模型在任意分辨率下的世界感知能力) [10:30] TOP5(🔥51) | 🌐 NVLM: Open Frontier-Class Multimodal LLMs(NVLM:开放前沿类多模态大语言模型) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
本期的 15 篇论文如下: [00:24] 🤖 Training Language Models to Self-Correct via Reinforcement Learning(通过强化学习训练语言模型进行自我修正) [01:03] 📚 InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced Mathematical Reasoning(InfiMM-WebMath-40B:推进多模态预训练以增强数学推理) [01:40] 🔍 MMSearch: Benchmarking the Potential of Large Models as Multi-modal Search Engines(MMSearch:评估大型模型作为多模态搜索引擎的潜力) [02:19] 🌐 Oryx MLLM: On-Demand Spatial-Temporal Understanding at Arbitrary Resolution(Oryx MLLM:按需任意分辨率的空间-时间理解) [02:55] 🎨 LVCD: Reference-based Lineart Video Colorization with Diffusion Models(基于扩散模型的参考线稿视频着色) [03:35] 🧠 B4: Towards Optimal Assessment of Plausible Code Solutions with Plausible Tests(B4:基于合理测试评估合理代码解决方案的最优方法) [04:13] 📖 StoryMaker: Towards Holistic Consistent Characters in Text-to-image Generation(StoryMaker:在文本到图像生成中实现整体一致的角色) [04:59] 🌐 3DTopia-XL: Scaling High-quality 3D Asset Generation via Primitive Diffusion(3DTopia-XL:通过基本体扩散扩展高质量3D资产生成) [05:39] 🚀 Scaling Smart: Accelerating Large Language Model Pre-training with Small Model Initialization(智能扩展:利用小模型初始化加速大型语言模型预训练) [06:18] 🤖 Language Models Learn to Mislead Humans via RLHF(语言模型通过RLHF误导人类) [06:59] 🎨 FlexiTex: Enhancing Texture Generation with Visual Guidance(FlexiTex:通过视觉引导增强纹理生成) [07:36] 🎥 Denoising Reuse: Exploiting Inter-frame Motion Consistency for Efficient Video Latent Generation(去噪重用:利用帧间运动一致性实现高效视频潜在生成) [08:13] 📚 MURI: High-Quality Instruction Tuning Datasets for Low-Resource Languages via Reverse Instructions(通过反向指令为低资源语言生成高质量指令调优数据集) [08:52] 🎙 CLAIR-A: Leveraging Large Language Models to Judge Audio Captions(利用大型语言模型评估音频字幕) [09:28] ⚡ 3DGS-LM: Faster Gaussian-Splatting Optimization with Levenberg-Marquardt(3DGS-LM:使用Levenberg-Marquardt加速高斯散射优化) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
本期的 11 篇论文如下: [00:28] 🚀 Qwen2.5-Coder Technical Report(Qwen2.5-Coder 技术报告) [01:06] 🌍 Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution(Qwen2-VL:增强视觉-语言模型在任意分辨率下的世界感知能力) [01:47] 🎯 LLMs + Persona-Plug = Personalized LLMs(LLMs + Persona-Plug = 个性化LLMs) [02:32] 🔍 To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning(是否使用CoT?链式思维主要在数学和符号推理中起作用) [03:12] 🌐 GRIN: GRadient-INformed MoE(GRIN:梯度引导的MoE模型) [03:50] 📚 Preference Tuning with Human Feedback on Language, Speech, and Vision Tasks: A Survey(基于人类反馈的语言、语音和视觉任务偏好调优:综述) [04:30] 🎙 Takin: A Cohort of Superior Quality Zero-shot Speech Generation Models(Takin:一组高质量零样本语音生成模型) [05:19] 🎵 Towards Diverse and Efficient Audio Captioning via Diffusion Models(基于扩散模型的多样化高效音频描述生成) [06:02] 📚 A Controlled Study on Long Context Extension and Generalization in LLMs(大型语言模型中长上下文扩展与泛化的控制研究) [06:42] 🌌 Vista3D: Unravel the 3D Darkside of a Single Image(Vista3D:揭开单张图像的3D暗面) [07:18] 🎧 SoloAudio: Target Sound Extraction with Language-oriented Audio Diffusion Transformer(SoloAudio:基于语言导向的音频扩散Transformer的目标声音提取) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
本期的 15 篇论文如下: [00:26] 🌐 OmniGen: Unified Image Generation(OmniGen:统一图像生成) [01:02] 🌐 NVLM: Open Frontier-Class Multimodal LLMs(NVLM:开放前沿类多模态大语言模型) [01:41] 🔍 Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think(微调图像条件扩散模型比你想象的更容易) [02:15] 🌐 Phidias: A Generative Model for Creating 3D Content from Text, Image, and 3D Conditions with Reference-Augmented Diffusion(Phidias:一种利用参考增强扩散从文本、图像和3D条件生成3D内容的生成模型) [02:59] 🎥 OSV: One Step is Enough for High-Quality Image to Video Generation(OSV:一步生成高质量图像到视频) [03:38] 🤖 On the limits of agency in agent-based models(基于代理模型的代理限制研究) [04:17] 🔍 Promptriever: Instruction-Trained Retrievers Can Be Prompted Like Language Models(提示检索器:指令训练的检索器可以像语言模型一样被提示) [04:52] 📊 A Comprehensive Evaluation of Quantized Instruction-Tuned Large Language Models: An Experimental Analysis up to 405B(量化指令调优大型语言模型的综合评估:一项高达405B参数的实验分析) [05:38] 🎵 EzAudio: Enhancing Text-to-Audio Generation with Efficient Diffusion Transformer(EzAudio:利用高效扩散Transformer增强文本到音频生成) [06:21] 🤖 Agile Continuous Jumping in Discontinuous Terrains(不连续地形中的敏捷连续跳跃) [07:01] 🌐 SplatFields: Neural Gaussian Splats for Sparse 3D and 4D Reconstruction(SplatFields: 用于稀疏3D和4D重建的神经高斯Splats) [07:34] 📈 Single-Layer Learnable Activation for Implicit Neural Representation (SL$^{2}$A-INR)(单层可学习激活函数用于隐式神经表示) [08:11] 📈 Implicit Neural Representations with Fourier Kolmogorov-Arnold Networks(基于傅里叶科尔莫戈罗夫-阿诺德网络的隐式神经表示) [08:53] 🎵 PDMX: A Large-Scale Public Domain MusicXML Dataset for Symbolic Music Processing(PDMX:用于符号音乐处理的大规模公共领域MusicXML数据集) [09:38] 🔍 Measuring and Enhancing Trustworthiness of LLMs in RAG through Grounded Attributions and Learning to Refuse(通过基于属性的归因和学习拒绝来衡量和增强RAG中LLM的可信度) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
本期的 13 篇论文如下: [00:26] 🎵 Seed-Music: A Unified Framework for High Quality and Controlled Music Generation(Seed-Music:高质量和可控音乐生成的统一框架) [01:03] ⚡ RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval(通过向量检索加速长上下文大语言模型推理) [01:46] 🌐 Ferret: Federated Full-Parameter Tuning at Scale for Large Language Models(Ferret:大规模联邦学习中大型语言模型的全参数微调) [02:35] 🔍 Guiding Vision-Language Model Selection for Visual Question-Answering Across Tasks, Domains, and Knowledge Types(指导视觉语言模型选择用于跨任务、领域和知识类型的视觉问答) [03:20] 🔊 ReCLAP: Improving Zero Shot Audio Classification by Describing Sounds(ReCLAP:通过描述声音改进零样本音频分类) [04:04] 📚 One missing piece in Vision and Language: A Survey on Comics Understanding(视觉与语言中的缺失一环:漫画理解综述) [04:42] 🌐 jina-embeddings-v3: Multilingual Embeddings With Task LoRA(Jina-embeddings-v3:多语言嵌入与任务LoRA) [05:28] 🧠 On the Diagram of Thought(关于思维图的探讨) [06:10] 🔊 AudioBERT: Audio Knowledge Augmented Language Model(音频BERT:增强语言模型的音频知识) [06:40] 🔍 Policy Filtration in RLHF to Fine-Tune LLM for Code Generation(在RLHF中进行策略过滤以微调LLM进行代码生成) [07:20] 📊 Towards Predicting Temporal Changes in a Patient's Chest X-ray Images based on Electronic Health Records(基于电子健康记录预测患者胸部X光图像的时间变化) [07:57] 🤖 Breaking reCAPTCHAv2(破解 reCAPTCHAv2) [08:27] 🐝 beeFormer: Bridging the Gap Between Semantic and Interaction Similarity in Recommender Systems(beeFormer:在推荐系统中弥合语义和交互相似性之间的差距) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
本期的 7 篇论文如下: [00:23] 💡 A Diffusion Approach to Radiance Field Relighting using Multi-Illumination Synthesis(使用多光照合成的辐射场重光照的扩散方法) [00:56] 🖱 InstantDrag: Improving Interactivity in Drag-based Image Editing(即时拖拽:提升基于拖拽的图像编辑交互性) [01:31] 🎥 Robust Dual Gaussian Splatting for Immersive Human-centric Volumetric Videos(鲁棒双高斯散射用于沉浸式以人为中心的体积视频) [02:10] 🎨 DrawingSpinUp: 3D Animation from Single Character Drawings(从单个角色绘图生成3D动画) [02:41] 🎧 Apollo: Band-sequence Modeling for High-Quality Audio Restoration(阿波罗:用于高质量音频恢复的频带序列建模) [03:21] 🖱 Click2Mask: Local Editing with Dynamic Mask Generation(Click2Mask:动态掩码生成的局部编辑) [03:58] 🔍 Mamba-YOLO-World: Marrying YOLO-World with Mamba for Open-Vocabulary Detection(Mamba-YOLO-World:将YOLO-World与Mamba结合用于开放词汇检测) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
本期的 5 篇论文如下: [00:41] TOP1(🔥65) | 📚 Towards a Unified View of Preference Learning for Large Language Models: A Survey(面向大型语言模型的偏好学习统一视图:综述) [02:48] TOP2(🔥50) | 🎭 PingPong: A Benchmark for Role-Playing Language Models with User Emulation and Multi-Model Evaluation(PingPong:用于用户模拟和多模型评估的角色扮演语言模型基准) [05:15] TOP3(🔥44) | 🩺 MEDIC: Towards a Comprehensive Framework for Evaluating LLMs in Clinical Applications(MEDIC:面向临床应用中大型语言模型评估的综合框架) [07:19] TOP4(🔥43) | 🗣 LLaMA-Omni: Seamless Speech Interaction with Large Language Models(LLaMA-Omni:与大型语言模型的无缝语音交互) [09:02] TOP5(🔥42) | 🌐 MMEvol: Empowering Multimodal Large Language Models with Evol-Instruct(MMEvol:通过Evol-Instruct增强多模态大语言模型) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
本期的 9 篇论文如下: [00:27] 💻 Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale(Windows Agent Arena: 大规模评估多模态操作系统代理) [01:03] 🤖 Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers(大语言模型能否生成新颖的研究想法?一项与100多名NLP研究人员合作的大规模人类研究) [01:37] 🖼 IFAdapter: Instance Feature Control for Grounded Text-to-Image Generation(基于实例特征控制的接地文本到图像生成) [02:13] 🖼 TextBoost: Towards One-Shot Personalization of Text-to-Image Models via Fine-tuning Text Encoder(TextBoost:通过微调文本编码器实现文本到图像模型的单次个性化) [02:55] 🧑 DreamHOI: Subject-Driven Generation of 3D Human-Object Interactions with Diffusion Priors(DreamHOI:基于扩散先验的主体驱动生成3D人-物交互) [03:41] 🔄 Source2Synth: Synthetic Data Generation and Curation Grounded in Real Data Sources(基于真实数据源的合成数据生成与筛选) [04:28] 🌐 FlashSplat: 2D to 3D Gaussian Splatting Segmentation Solved Optimally(FlashSplat:二维到三维高斯喷射分割的最优解) [05:03] 🔍 Can OOD Object Detectors Learn from Foundation Models?(基础模型能否助力分布外目标检测?) [05:38] 🎥 PiTe: Pixel-Temporal Alignment for Large Video-Language Model(PiTe:大型视频-语言模型的像素-时间对齐) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
[00:25] 🎭 PingPong: A Benchmark for Role-Playing Language Models with User Emulation and Multi-Model Evaluation(PingPong:用户模拟和多模型评估的角色扮演语言模型基准) [01:05] 🩺 MEDIC: Towards a Comprehensive Framework for Evaluating LLMs in Clinical Applications(MEDIC:评估临床应用中大型语言模型的综合框架) [01:56] 🧠 Agent Workflow Memory(代理工作流程记忆) [02:38] 🔄 Gated Slot Attention for Efficient Linear-Time Sequence Modeling(门控槽注意力机制在高效线性时间序列建模中的应用) [03:19] 🧠 Self-Harmonized Chain of Thought(自我协调的思维链) [03:54] 🌐 Hi3D: Pursuing High-Resolution Image-to-3D Generation with Video Diffusion Models(Hi3D:利用视频扩散模型追求高分辨率图像到3D生成) [04:35] 🤖 MVLLaVA: An Intelligent Agent for Unified and Flexible Novel View Synthesis(MVLLaVA:用于统一和灵活的新视角合成的智能代理) [05:13] 📚 gsplat: An Open-Source Library for Gaussian Splatting(gsplat:用于高斯散射的开源库) [05:51] 🔍 Can Large Language Models Unlock Novel Scientific Research Ideas?(大型语言模型能否解锁新颖的科学研究思路?) [06:23] 🎵 VMAS: Video-to-Music Generation via Semantic Alignment in Web Music Videos(VMAS:基于网络音乐视频语义对齐的视频到音乐生成) [07:10] 👤 Instant Facial Gaussians Translator for Relightable and Interactable Facial Rendering(可重照明和交互式面部渲染的即时高斯翻译器) [07:49] 🧬 ProteinBench: A Holistic Evaluation of Protein Foundation Models(ProteinBench:蛋白质基础模型的全面评估) [08:31] 🔍 Generative Hierarchical Materials Search(生成层次化材料搜索) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
与播客爱好者一起交流
播放列表还是空的
去找些喜欢的节目添加进来吧