2025.04.25 | 开源模型超越闭源;新型评估指标提升生成质量。

本期的 15 篇论文如下: [00:24] 🖼 Step1X-Edit: A Practical Framework for General Image Editing(Step1X-Edit:一个通用的图像编辑实用框架) [01:05] 🖼 RefVNLI: Towards Scalable Evaluation of Subject-driven Text-to-image Generation(RefVNLI:面向主体驱动的文本到图像生成的可扩展评估) [01:48] 🤖 Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning(Paper2Code:从机器学习科学论文中自动生成代码) [02:22] 🖼 Breaking the Modality Barrier: Universal Embedding Learning with Multimodal LLMs(打破模态壁垒:基于多模态大型语言模型的通用嵌入学习) [03:02] 🧠 Perspective-Aware Reasoning in Vision-Language Models via Mental Imagery Simulation(基于心智图像模拟的视觉-语言模型中的视角感知推理) [03:42] ⚖ QuaDMix: Quality-Diversity Balanced Data Selection for Efficient LLM Pretraining(QuaDMix:面向高效LLM预训练的质量-多样性平衡数据选择) [04:19] 🖼 Token-Shuffle: Towards High-Resolution Image Generation with Autoregressive Models(Token-Shuffle:面向自回归模型的高分辨率图像生成) [04:58] 🖼 Distilling semantically aware orders for autoregressive image generation(用于自回归图像生成的语义感知顺序蒸馏) [05:38] 🗜 DyMU: Dynamic Merging and Virtual Unmerging for Efficient VLMs(DyMU:用于高效视觉语言模型的动态合并与虚拟解合并) [06:17] 🇪 IberBench: LLM Evaluation on Iberian Languages(IberBench:伊比利亚语系的大语言模型评测基准) [07:01] 🧠 Process Reward Models That Think(思考过程奖励模型) [07:46] 🎨 Boosting Generative Image Modeling via Joint Image-Feature Synthesis(通过联合图像-特征合成增强生成图像建模) [08:21] 🎬 ViSMaP: Unsupervised Hour-long Video Summarisation by Meta-Prompting(ViSMaP:基于元提示的无监督小时级视频摘要) [09:02] 👗 3DV-TON: Textured 3D-Guided Consistent Video Try-on via Diffusion Models(3DV-TON:基于扩散模型的纹理3D引导一致性视频试穿) [09:44] 📹 TimeChat-Online: 80% Visual Tokens are Naturally Redundant in Streaming Videos(TimeChat-Online:在线流媒体视频中 80% 的视觉 tokens 天然冗余) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

10分钟
99+
2周前

2025.04.24 | 视觉推理评估新基准;高保真人脸替换技术

本期的 14 篇论文如下: [00:23] 👁 VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models(VisuLogic:一个用于评估多模态大型语言模型中视觉推理能力的基准) [01:08] 🎭 DreamID: High-Fidelity and Fast diffusion-based Face Swapping via Triplet ID Group Learning(DreamID:基于Triplet ID Group Learning的高保真快速扩散人脸替换) [01:46] 🌐 Trillion 7B Technical Report(Trillion-7B 技术报告) [02:30] 💡 Pre-DPO: Improving Data Utilization in Direct Preference Optimization Using a Guiding Reference Model(Pre-DPO:利用引导参考模型提升直接偏好优化中的数据利用率) [03:11] 🧩 I-Con: A Unifying Framework for Representation Learning(I-Con:一种统一的表征学习框架) [03:50] 🧩 Decoupled Global-Local Alignment for Improving Compositional Understanding(解耦的全局-局部对齐以提升组合理解能力) [04:30] 🎨 DreamO: A Unified Framework for Image Customization(DreamO:图像定制的统一框架) [05:12] 💡 Tina: Tiny Reasoning Models via LoRA(蒂娜:基于LoRA的小型推理模型) [05:49] 🛡 A Comprehensive Survey in LLM(-Agent) Full Stack Safety: Data, Training and Deployment(LLM(-Agent) 全栈安全综合研究:数据、训练与部署) [06:30] 🧐 RePOPE: Impact of Annotation Errors on the POPE Benchmark(RePOPE:标注错误对POPE基准的影响) [07:06] 💡 Rethinking the Generation of High-Quality CoT Data from the Perspective of LLM-Adaptive Question Difficulty Grading(重新思考:基于LLM自适应问题难度分级的优质CoT数据生成) [07:46] 🛠 CRUST-Bench: A Comprehensive Benchmark for C-to-safe-Rust Transpilation(CRUST-Bench:C到安全Rust转译的综合基准) [08:29] ✅ Unchecked and Overlooked: Addressing the Checkbox Blind Spot in Large Language Models with CheckboxQA(未被检查与忽视:用 CheckboxQA 数据集解决大语言模型中的复选框盲点) [09:21] 🖼 Progressive Language-guided Visual Learning for Multi-Task Visual Grounding(多任务视觉定位的渐进式语言引导视觉学习) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

10分钟
99+
2周前

2025.04.23 | 阿拉伯语性能提升;推理任务性能显著提高。

本期的 15 篇论文如下: [00:22] 💡 Kuwain 1.5B: An Arabic SLM via Language Injection(Kuwain 1.5B:一种基于语言注入的阿拉伯语SLM) [00:58] 🤖 TTRL: Test-Time Reinforcement Learning(测试时强化学习) [01:40] 🌍 The Bitter Lesson Learned from 2,000+ Multilingual Benchmarks(从2000+多语种评测基准中汲取的惨痛教训) [02:23] 🖼 Describe Anything: Detailed Localized Image and Video Captioning(描述一切:细粒度局部图像与视频字幕生成) [03:00] 💡 Learning Adaptive Parallel Reasoning with Language Models(基于语言模型的自适应并行推理学习) [03:34] 🖼 IV-Bench: A Benchmark for Image-Grounded Video Perception and Reasoning in Multimodal LLMs(IV-Bench:多模态大语言模型中基于图像的视频感知与推理基准) [04:19] 📖 BookWorld: From Novels to Interactive Agent Societies for Creative Story Generation(BookWorld:从小说到交互式智能体社会,用于创意故事生成) [05:10] 🚀 Efficient Pretraining Length Scaling(高效预训练长度扩展) [05:49] 🩻 CheXWorld: Exploring Image World Modeling for Radiograph Representation Learning(CheXWorld:探索用于X射线影像表征学习的图像世界建模) [06:26] 🖼 Personalized Text-to-Image Generation with Auto-Regressive Models(基于自回归模型的个性化文本到图像生成) [07:08] 🗣 LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale(LiveCC:基于大规模流式语音转录学习视频大语言模型) [07:47] 🎬 Vidi: Large Multimodal Models for Video Understanding and Editing(Vidi:用于视频理解与编辑的大型多模态模型) [08:27] 🖼 From Reflection to Perfection: Scaling Inference-Time Optimization for Text-to-Image Diffusion Models via Reflection Tuning(从反思到完美:通过反思调优扩展文本到图像扩散模型的推理时优化) [09:03] 🤖 LLMs are Greedy Agents: Effects of RL Fine-tuning on Decision-Making Abilities(LLM是贪婪的智能体:强化学习微调对决策能力的影响) [09:44] 🤖 WALL-E 2.0: World Alignment by NeuroSymbolic Learning improves World Model-based LLM Agents(WALL-E 2.0:通过神经符号学习实现世界对齐,提升基于世界模型的LLM智能体性能) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

10分钟
99+
2周前

2025.04.22 | LUFFY提升推理性能;FlowReasoner增强系统适应性。

本期的 15 篇论文如下: [00:25] 🧠 Learning to Reason under Off-Policy Guidance(离线策略指导下的推理学习) [01:00] 🤖 FlowReasoner: Reinforcing Query-Level Meta-Agents(FlowReasoner:强化查询级别元代理) [01:40] 🦅 Eagle 2.5: Boosting Long-Context Post-Training for Frontier Vision-Language Models(Eagle 2.5:提升前沿视觉-语言模型长文本后训练性能) [02:22] 🧰 ToolRL: Reward is All Tool Learning Needs(工具强化学习:奖励是工具学习的全部) [03:07] 🌐 SphereDiff: Tuning-free Omnidirectional Panoramic Image and Video Generation via Spherical Latent Representation(SphereDiff:通过球面潜在表示实现免调优全景图像和视频生成) [03:39] 🎨 StyleMe3D: Stylization with Disentangled Priors by Multiple Encoders on 3D Gaussians(StyleMe3D:基于3D高斯的解耦先验多编码器风格化) [04:18] 🤖 X-Teaming: Multi-Turn Jailbreaks and Defenses with Adaptive Multi-Agents(X-Teaming:基于自适应多智能体的多轮越狱与防御) [04:57] 🤖 UFO2: The Desktop AgentOS(UFO2:桌面AgentOS) [05:34] 🧑 LeetCodeDataset: A Temporal Dataset for Robust Evaluation and Efficient Training of Code LLMs(LeetCodeDataset:一个用于代码大语言模型稳健评估和高效训练的时序数据集) [06:18] 👀 Seeing from Another Perspective: Evaluating Multi-View Understanding in MLLMs(换个角度看世界:评估多模态大语言模型中的多视角理解能力) [07:02] 🤖 InfiGUI-R1: Advancing Multimodal GUI Agents from Reactive Actors to Deliberative Reasoners(InfiGUI-R1:推进多模态GUI智能体从反应式执行者到审慎推理者的演进) [07:42] 🕹 EasyEdit2: An Easy-to-use Steering Framework for Editing Large Language Models(EasyEdit2:一种用于编辑大型语言模型的简易操控框架) [08:23] 📱 LearnAct: Few-Shot Mobile GUI Agent with a Unified Demonstration Benchmark(LearnAct:基于统一演示基准的少样本移动GUI智能体) [09:06] 🖼 LookingGlass: Generative Anamorphoses via Laplacian Pyramid Warping(窥镜:基于拉普拉斯金字塔扭曲的生成式畸变图像) [09:50] 🎵 DRAGON: Distributional Rewards Optimize Diffusion Generative Models(DRAGON:利用分布奖励优化扩散生成模型) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

10分钟
99+
2周前

2025.04.21 | 强化学习未提升新推理能力;MIG优化指令微调数据选择。

本期的 9 篇论文如下: [00:22] 🤔 Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?(强化学习真的能激励大语言模型产生超越基础模型的推理能力吗?) [00:59] 🧠 MIG: Automatic Data Selection for Instruction Tuning by Maximizing Information Gain in Semantic Space(MIG:通过最大化语义空间中的信息增益实现指令微调的自动数据选择) [01:41] 🤔 Could Thinking Multilingually Empower LLM Reasoning?(多语思考能否增强大型语言模型的推理能力?) [02:25] 🏙 AerialMegaDepth: Learning Aerial-Ground Reconstruction and View Synthesis(AerialMegaDepth:学习空中-地面重建与视角合成) [03:09] 🏠 HiScene: Creating Hierarchical 3D Scenes with Isometric View Generation(HiScene:利用等距视图生成创建分层3D场景) [03:52] 💡 NodeRAG: Structuring Graph-based RAG with Heterogeneous Nodes(NodeRAG:使用异构节点构建的基于图结构的RAG) [04:30] 🧠 It's All Connected: A Journey Through Test-Time Memorization, Attentional Bias, Retention, and Online Optimization(一切皆有关联:一次关于测试时记忆、注意力偏差、保留和在线优化的探索之旅) [05:07] 🏞 Tokenize Image Patches: Global Context Fusion for Effective Haze Removal in Large Images(令牌化图像块:用于大型图像中有效去雾的全局上下文融合) [05:51] 🧠 Thought Manipulation: External Thought Can Be Efficient for Large Reasoning Models(思想操控:外部思想能够有效应用于大型推理模型) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

6分钟
99+
3周前

2025.04.18 | CLIMB提升领域模型表现;反蒸馏采样防止模型被盗用。

本期的 15 篇论文如下: [00:23] 🗂 CLIMB: CLustering-based Iterative Data Mixture Bootstrapping for Language Model Pre-training(CLIMB:基于聚类的迭代数据混合引导预训练方法) [01:03] 🧪 Antidistillation Sampling(反蒸馏采样) [01:41] 🤝 A Strategic Coordination Framework of Small LLMs Matches Large LLMs in Data Synthesis(小型LLM的策略协调框架在数据合成方面与大型LLM相媲美) [02:26] 🎬 Packing Input Frame Context in Next-Frame Prediction Models for Video Generation(视频生成中基于帧打包的下一帧预测模型) [03:02] 🤖 Generate, but Verify: Reducing Hallucination in Vision-Language Models with Retrospective Resampling(生成,但验证:通过回顾重采样减少视觉-语言模型中的幻觉) [03:43] 🧠 WORLDMEM: Long-term Consistent World Simulation with Memory(WORLDMEM:基于记忆的长期一致性世界模拟) [04:27] 🎬 VistaDPO: Video Hierarchical Spatial-Temporal Direct Preference Optimization for Large Video Models(VistaDPO:用于大型视频模型的分层时空直接偏好优化) [05:01] 🤖 NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation(NoisyRollout:利用数据增强强化视觉推理) [05:43] 🎨 DMM: Building a Versatile Image Generation Model via Distillation-Based Model Merging(DMM:构建基于蒸馏模型合并的通用图像生成模型) [06:20] 📊 ChartQAPro: A More Diverse and Challenging Benchmark for Chart Question Answering(ChartQAPro:一个更多样化和更具挑战性的图表问答基准) [07:07] 🤖 Exploring Expert Failures Improves LLM Agent Tuning(探索专家失败案例以提升LLM Agent的调优效果) [07:48] 🎨 InstantCharacter: Personalize Any Characters with a Scalable Diffusion Transformer Framework(InstantCharacter:使用可扩展的扩散Transformer框架个性化任何角色) [08:26] 📸 CCMNet: Leveraging Calibrated Color Correction Matrices for Cross-Camera Color Constancy(CCMNet:利用校准颜色校正矩阵实现跨相机色彩恒常性) [09:06] 🎬 FocusedAD: Character-centric Movie Audio Description(聚焦AD:以角色为中心的电影音频描述) [09:39] 🤔 Retrieval-Augmented Generation with Conflicting Evidence(检索增强生成与冲突证据) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

10分钟
96
3周前

2025.04.17 | ColorBench测试VLM颜色理解;BitNet提升计算效率。

本期的 11 篇论文如下: [00:27] 🎨 ColorBench: Can VLMs See and Understand the Colorful World? A Comprehensive Benchmark for Color Perception, Reasoning, and Robustness(ColorBench:视觉语言模型能否看到并理解多彩世界?一个关于颜色感知、推理和鲁棒性的综合基准) [01:09] 💡 BitNet b1.58 2B4T Technical Report(BitNet b1.58 2B4T 技术报告) [01:50] 🎨 Cobra: Efficient Line Art COlorization with BRoAder References(Cobra:基于更广泛参考的高效线稿着色) [02:28] 🚀 AlayaDB: The Data Foundation for Efficient and Effective Long-context LLM Inference(AlayaDB:用于高效且有效的长文本LLM推理的数据基础) [03:05] 🗣 SIFT-50M: A Large-Scale Multilingual Dataset for Speech Instruction Fine-Tuning(SIFT-50M:用于语音指令微调的大规模多语种数据集) [03:51] 🧰 ReTool: Reinforcement Learning for Strategic Tool Use in LLMs(ReTool:基于强化学习的LLM战略性工具使用) [04:31] 🚀 REPA-E: Unlocking VAE for End-to-End Tuning with Latent Diffusion Transformers(REPA-E:通过潜在扩散Transformer解锁变分自编码器的端到端调整) [05:09] 📹 Vivid4D: Improving 4D Reconstruction from Monocular Video by Video Inpainting(Vivid4D:通过视频修复改进单目视频的4D重建) [05:51] 🤖 Robust and Fine-Grained Detection of AI Generated Texts(AI生成文本的稳健和细粒度检测) [06:34] 🧠 Syzygy of Thoughts: Improving LLM CoT with the Minimal Free Resolution(思想的合冲:用极小自由分解改进大型语言模型的思维链) [07:18] 🖼 BlockGaussian: Efficient Large-Scale Scene Novel View Synthesis via Adaptive Block-Based Gaussian Splatting(BlockGaussian:基于自适应块的高效大规模场景新视角合成) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

8分钟
93
3周前

2025.04.16 | Genius提升LLM推理能力;xVerify高效验证推理模型。

本期的 15 篇论文如下: [00:22] 🧠 Genius: A Generalizable and Purely Unsupervised Self-Training Framework For Advanced Reasoning(Genius:一种用于高级推理的通用且纯粹的无监督自训练框架) [01:06] ✅ xVerify: Efficient Answer Verifier for Reasoning Model Evaluations(xVerify:用于推理模型评估的高效答案验证器) [01:52] 🖼 Pixel-SAIL: Single Transformer For Pixel-Grounded Understanding(Pixel-SAIL:用于像素级理解的单Transformer) [02:37] ✅ Heimdall: test-time scaling on the generative verification(海姆达尔:生成式验证的测试时扩展) [03:23] 🎨 Seedream 3.0 Technical Report(Seedream 3.0 技术报告) [04:07] 📊 How Instruction and Reasoning Data shape Post-Training: Data Quality through the Lens of Layer-wise Gradients(指令和推理数据如何塑造后训练:基于层级梯度的数据质量分析) [04:54] 🎮 TextArena(文本竞技场:用于大型语言模型中智能行为训练与评估的竞争性文本游戏集合) [05:43] 🧠 The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer(简单性的可扩展性:使用单一Transformer的视觉-语言学习的实证分析) [06:22] 🤖 Efficient Process Reward Model Training via Active Learning(基于主动学习的高效过程奖励模型训练) [07:01] 🚀 Efficient Generative Model Training via Embedded Representation Warmup(基于嵌入表示预热的高效生成模型训练) [07:43] 🎥 NormalCrafter: Learning Temporally Consistent Normals from Video Diffusion Priors(NormalCrafter: 从视频扩散先验中学习时序一致的法线) [08:23] 🧠 A Minimalist Approach to LLM Reasoning: from Rejection Sampling to Reinforce(LLM推理的极简方法:从拒绝采样到强化学习) [09:00] 🧮 DeepMath-103K: A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning(DeepMath-103K:一个大规模、具有挑战性、经过净化且可验证的数学数据集,用于推进推理研究) [09:43] 🚗 Diffusion Distillation With Direct Preference Optimization For Efficient 3D LiDAR Scene Completion(基于直接偏好优化的扩散蒸馏,用于高效3D激光雷达场景补全) [10:25] 📹 PVUW 2025 Challenge Report: Advances in Pixel-level Understanding of Complex Videos in the Wild(PVUW 2025 挑战报告:复杂自然视频像素级理解进展) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

11分钟
99+
3周前

2025.04.15 | 多模态模型性能提升;低资源推理加速优化

本期的 15 篇论文如下: [00:23] 🖼 InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models(InternVL3:探索开源多模态模型的高级训练和测试时方案) [01:03] 🏠 PRIMA.CPP: Speeding Up 70B-Scale LLM Inference on Low-Resource Everyday Home Clusters(PRIMA.CPP: 加速低资源家用集群上700亿参数规模大语言模型的推理) [01:46] 🖼 FUSION: Fully Integration of Vision-Language Representations for Deep Cross-Modal Understanding(FUSION:用于深度跨模态理解的视觉-语言表征的完全集成) [02:26] 🤔 VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning(VL-Rethinker:通过强化学习激励视觉-语言模型的自我反思) [03:07] 🤖 Iterative Self-Training for Code Generation via Reinforced Re-Ranking(基于强化重排序的迭代自训练代码生成) [03:51] 🎬 Mavors: Multi-granularity Video Representation for Multimodal Large Language Model(Mavors:面向多模态大型语言模型的多粒度视频表征) [04:28] 🤖 AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories(AgentRewardBench:评估Web Agent轨迹的自动评估方法) [05:13] 🧠 S1-Bench: A Simple Benchmark for Evaluating System 1 Thinking Capability of Large Reasoning Models(S1-Bench:一个评估大型推理模型系统1思维能力的简单基准) [05:56] 🤔 Have we unified image generation and understanding yet? An empirical study of GPT-4o's image generation ability(我们是否已经统一了图像生成与理解?GPT-4o图像生成能力的一项实证研究) [06:42] 🤖 DUMP: Automated Distribution-Level Curriculum Learning for RL-based LLM Post-training(DUMP:基于强化学习的LLM后训练的自动化分布级别课程学习) [07:22] 🌍 SocioVerse: A World Model for Social Simulation Powered by LLM Agents and A Pool of 10 Million Real-World Users(SocioVerse:一个由LLM驱动的智能体和一千万真实用户池支持的社会模拟世界模型) [08:11] 🤖 Breaking the Data Barrier -- Building GUI Agents Through Task Generalization(打破数据壁垒——通过任务泛化构建GUI智能体) [08:56] 💡 TinyLLaVA-Video-R1: Towards Smaller LMMs for Video Reasoning(TinyLLaVA-Video-R1:面向视频推理的小型多模态模型) [09:40] 🧪 LLM-SRBench: A New Benchmark for Scientific Equation Discovery with Large Language Models(LLM-SRBench:一个用于大型语言模型科学方程发现的新基准) [10:21] 🛡 EmoAgent: Assessing and Safeguarding Human-AI Interaction for Mental Health Safety(EmoAgent:评估并保障人机交互中的心理健康安全) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

11分钟
99+
3周前

2025.04.14 | 经济高效视频生成;自回归图像生成扩展。

本期的 13 篇论文如下: [00:24] 🎬 Seaweed-7B: Cost-Effective Training of Video Generation Foundation Model(Seaweed-7B:一种经济高效的视频生成基础模型训练方法) [01:00] 🖼 GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation(GigaTok:将视觉标记器扩展到30亿参数以进行自回归图像生成) [01:42] 🎮 MineWorld: a Real-Time and Open-Source Interactive World Model on Minecraft(MineWorld:基于Minecraft的实时开源交互式世界模型) [02:25] 🖼 PixelFlow: Pixel-Space Generative Models with Flow(PixelFlow:基于Flow的像素空间生成模型) [03:05] 🤖 SQL-R1: Training Natural Language to SQL Reasoning Model By Reinforcement Learning(SQL-R1:通过强化学习训练自然语言到SQL的推理模型) [03:51] 🎨 FlexIP: Dynamic Control of Preservation and Personality for Customized Image Generation(FlexIP:用于定制图像生成的保持与个性动态控制) [04:30] 🎬 In-2-4D: Inbetweening from Two Single-View Images to 4D Generation(In-2-4D:从两张单视图图像到4D生成的补帧) [05:05] 🤔 ModernBERT or DeBERTaV3? Examining Architecture and Data Influence on Transformer Encoder Models Performance(ModernBERT还是DeBERTaV3?探究架构和数据对Transformer编码器模型性能的影响) [05:42] 🚀 Pangu Ultra: Pushing the Limits of Dense Large Language Models on Ascend NPUs(盘古 Ultra:在昇腾NPU上突破稠密大型语言模型的极限) [06:21] 🤔 Do PhD-level LLMs Truly Grasp Elementary Addition? Probing Rule Learning vs. Memorization in Large Language Models(博士级大语言模型真的理解基础加法吗? 探究大语言模型中的规则学习与记忆) [07:11] 🛡 SAEs $\textit{Can}$ Improve Unlearning: Dynamic Sparse Autoencoder Guardrails for Precision Unlearning in LLMs(稀疏自编码器助力模型遗忘:用于大语言模型精确遗忘的动态稀疏自编码器防护) [07:52] 🤝 CoRAG: Collaborative Retrieval-Augmented Generation(CoRAG:协同检索增强生成) [08:29] 🤝 InteractVLM: 3D Interaction Reasoning from 2D Foundational Models(InteractVLM:基于2D基础模型的三维交互推理) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

9分钟
99+
4周前
EarsOnMe

加入我们的 Discord

与播客爱好者一起交流

立即加入

播放列表

自动播放下一个

播放列表还是空的

去找些喜欢的节目添加进来吧