HuggingFace 每日AI论文速递 - 节目列表

2025.01.07 每日AI论文 | STAR提升视频超分辨率时空一致性,BoostStep增强大模型数学推理能力。

2025.01.07 每日AI论文 | STAR提升视频超分辨率时空一致性,BoostStep增强大模型数学推理能力。

HuggingFace 每日AI论文速递

本期的 16 篇论文如下:[00:24] 🎥 STAR: Spatial-Temporal Augmentation with Text-to-Video Models for Real-World Video Super-Resolution(STAR:基于文本到视频模型的空间-时间增强用于现实世界视频超分辨率)[01:06] 🧮 BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoning(BoostStep:通过改进单步推理提升大语言模型的数学能力)[01:44] 🤖 Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction(Dispider:通过解耦感知、决策和反应实现视频大语言模型的主动实时交互)[02:19] 🧠 Personalized Graph-Based Retrieval for Large Language Models(基于个性化图检索的大语言模型增强生成)[02:54] 🧠 Test-time Computing: from System-1 Thinking to System-2 Thinking(测试时计算:从系统1思维到系统2思维)[03:34] 🦠 METAGENE-1: Metagenomic Foundation Model for Pandemic Monitoring(METAGENE-1:用于疫情监测的宏基因组基础模型)[04:13] 🎥 GS-DiT: Advancing Video Generation with Pseudo 4D Gaussian Fields through Efficient Dense 3D Point Tracking(GS-DiT:通过高效密集3D点跟踪推进伪4D高斯场视频生成)[04:48] 🎥 Through-The-Mask: Mask-based Motion Trajectories for Image-to-Video Generation(通过掩码:基于掩码的运动轨迹用于图像到视频生成)[05:27] 🎥 TransPixar: Advancing Text-to-Video Generation with Transparency(TransPixar:利用透明度推进文本到视频生成)[06:06] 🎥 Ingredients: Blending Custom Photos with Video Diffusion Transformers(成分:将定制照片与视频扩散变换器融合)[06:45] 🔍 DepthMaster: Taming Diffusion Models for Monocular Depth Estimation(DepthMaster:驯服扩散模型用于单目深度估计)[07:24] 🛡 Auto-RT: Automatic Jailbreak Strategy Exploration for Red-Teaming Large Language Models(Auto-RT:自动红队策略探索用于大型语言模型的越狱)[08:04] 🔍 ToolHop: A Query-Driven Benchmark for Evaluating Large Language Models in Multi-Hop Tool Use(ToolHop:用于评估大语言模型在多跳工具使用中的查询驱动基准)[08:43] 🔍 Scaling Laws for Floating Point Quantization Training(浮点量化训练的缩放定律)[09:19] 🎤 Samba-asr state-of-the-art speech recognition leveraging structured state-space models(Samba-ASR:利用结构化状态空间模型实现最先进的语音识别)[09:59] 🎨 AutoPresent: Designing Structured Visuals from Scratch(AutoPresent:从零开始设计结构化视觉内容)【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递在小宇宙查看该单集文稿

11分钟
99+
1年前
2025.01.06 每日AI论文 | EnerVerse提升机器人操作规划能力,VITA-1.5优化实时视觉语音交互。

2025.01.06 每日AI论文 | EnerVerse提升机器人操作规划能力,VITA-1.5优化实时视觉语音交互。

HuggingFace 每日AI论文速递

本期的 8 篇论文如下:[00:24] 🤖 EnerVerse: Envisioning Embodied Future Space for Robotics Manipulation(EnerVerse:面向机器人操作的具身未来空间构想)[00:58] 🤖 VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction(VITA-1.5:迈向GPT-4o级别的实时视觉与语音交互)[01:33] 🤔 Virgo: A Preliminary Exploration on Reproducing o1-like MLLM(Virgo:关于复现o1类多模态大语言模型的初步探索)[02:11] 🤖 SDPO: Segment-Level Direct Preference Optimization for Social Agents(SDPO:面向社交代理的片段级直接偏好优化)[02:51] 🎨 VisionReward: Fine-Grained Multi-Dimensional Human Preference Learning for Image and Video Generation(VisionReward:基于细粒度多维人类偏好的图像与视频生成学习)[03:31] 🧬 Graph Generative Pre-trained Transformer(图生成预训练变换器)[04:04] 🌍 LUSIFER: Language Universal Space Integration for Enhanced Multilingual Embeddings with Large Language Models(LUSIFER:基于大语言模型的语言通用空间集成增强多语言嵌入)[04:44] 🔬 BoxingGym: Benchmarking Progress in Automated Experimental Design and Model Discovery(BoxingGym:自动化实验设计与模型发现进展的基准测试)【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递在小宇宙查看该单集文稿

5分钟
99+
1年前
【月末特辑】12月最火AI论文 | Qwen2.5提升大语言模型性能,阿波罗优化视频理解效率。

【月末特辑】12月最火AI论文 | Qwen2.5提升大语言模型性能,阿波罗优化视频理解效率。

HuggingFace 每日AI论文速递

本期的 10 篇论文如下:[00:31] TOP1(🔥335) | 🤖 Qwen2.5 Technical Report(Qwen2.5技术报告)[02:44] TOP2(🔥136) | 🎥 Apollo: An Exploration of Video Understanding in Large Multimodal Models(阿波罗:大型多模态模型中的视频理解探索)[05:01] TOP3(🔥123) | 🚀 Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling(通过模型、数据和测试时扩展提升开源多模态模型的性能边界)[07:18] TOP4(🔥121) | 🔄 PaliGemma 2: A Family of Versatile VLMs for Transfer(PaliGemma 2:多功能视觉语言模型的迁移研究)[09:38] TOP5(🔥116) | 🚀 Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference(更智能、更优、更快、更长:一种现代双向编码器,用于快速、内存高效的长上下文微调和推理)[12:21] TOP6(🔥108) | 🚀 SNOOPI: Supercharged One-step Diffusion Distillation with Proper Guidance(SNOOPI:超强一步扩散蒸馏与适当引导)[14:42] TOP7(🔥105) | 🔍 VisionZip: Longer is Better but Not Necessary in Vision Language Models(视觉压缩:视觉语言模型中长度并非必要优势)[16:51] TOP8(🔥96) | 🧠 Phi-4 Technical Report(Phi-4 技术报告)[18:55] TOP9(🔥92) | 🎥 InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions(InternLM-XComposer2.5-OmniLive:一个用于长期流式视频和音频交互的综合多模态系统)[21:02] TOP10(🔥91) | 🧠 Are Your LLMs Capable of Stable Reasoning?(你的大语言模型能够稳定推理吗?)【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递在小宇宙查看该单集文稿

23分钟
99+
1年前
2025.01.03 每日AI论文 | 多模态教科书提升视觉语言模型性能,VideoAnydoor实现高保真视频对象插入

2025.01.03 每日AI论文 | 多模态教科书提升视觉语言模型性能,VideoAnydoor实现高保真视频对象插入

HuggingFace 每日AI论文速递

本期的 17 篇论文如下:[00:24] 📚 2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining(2.5年课堂:用于视觉-语言预训练的多模态教科书)[01:02] 🎥 VideoAnydoor: High-fidelity Video Object Insertion with Precise Motion Control(VideoAnydoor:高保真视频对象插入与精确运动控制)[01:39] 🎥 VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM(VideoRefer套件:通过视频大语言模型推进时空对象理解)[02:13] 🏆 CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings(CodeElo:基于人类可比Elo评分的大语言模型竞赛级代码生成基准测试)[02:52] 🎨 Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models(重建与生成:潜在扩散模型中的优化困境驯服)[03:29] 🤖 ProgCo: Program Helps Self-Correction of Large Language Models(ProgCo:程序助力大语言模型自我修正)[04:03] 🗺 MapEval: A Map-Based Evaluation of Geo-Spatial Reasoning in Foundation Models(MapEval:基于地图的基础模型地理空间推理能力评估)[04:41] 🤖 A3: Android Agent Arena for Mobile GUI Agents(A3:移动GUI代理的安卓代理竞技场)[05:21] 🧪 Dynamic Scaling of Unit Tests for Code Reward Modeling(代码奖励建模中单元测试的动态扩展)[05:57] 🛡 MLLM-as-a-Judge for Image Safety without Human Labeling(无需人工标注的图像安全MLLM-as-a-Judge方法)[06:40] 🎥 LTX-Video: Realtime Video Latent Diffusion(LTX-视频:实时视频潜在扩散模型)[07:15] 🗺 MapQaTor: A System for Efficient Annotation of Map Query Datasets(MapQaTor:高效地图查询数据集标注系统)[07:51] 🔍 Understanding and Mitigating Bottlenecks of State Space Models through the Lens of Recency and Over-smoothing(通过近期性和过度平滑的视角理解并缓解状态空间模型的瓶颈)[08:29] 🎥 SeedVR: Seeding Infinity in Diffusion Transformer Towards Generic Video Restoration(SeedVR:在扩散Transformer中播种无限,实现通用视频修复)[09:13] 🤖 SeFAR: Semi-supervised Fine-grained Action Recognition with Temporal Perturbation and Learning Stabilization(SeFAR:基于时间扰动和学习稳定的半监督细粒度动作识别)[09:50] 🧠 Rethinking Addressing in Language Models via Contexualized Equivariant Positional Encoding(重新思考语言模型中的寻址机制:基于上下文等变位置编码)[10:27] 📊 Population Aware Diffusion for Time Series Generation(面向时间序列生成的群体感知扩散模型)【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递在小宇宙查看该单集文稿

11分钟
99+
1年前
2024.12.31 每日AI论文 | 解释性指令提升视觉任务泛化,多模态模型优化医学影像泛化。

2024.12.31 每日AI论文 | 解释性指令提升视觉任务泛化,多模态模型优化医学影像泛化。

HuggingFace 每日AI论文速递

本期的 10 篇论文如下:[00:25] 🔍 Explanatory Instructions: Towards Unified Vision Tasks Understanding and Zero-shot Generalization(解释性指令:迈向统一视觉任务理解与零样本泛化)[01:13] 🧠 On the Compositional Generalization of Multimodal LLMs for Medical Imaging(多模态大语言模型在医学影像中的组合泛化研究)[02:02] ⚙ Efficiently Serving LLM Reasoning Programs with Certaindex(高效服务LLM推理程序的Certaindex系统)[02:44] 🎨 Edicho: Consistent Image Editing in the Wild(Edicho:在野外图像中的一致性编辑)[03:22] 🎵 TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching and Clap-Ranked Preference Optimization(TangoFlux:基于流匹配和CLAP排序偏好优化的超快速且忠实文本到音频生成)[04:04] 🎥 Bringing Objects to Life: 4D generation from 3D objects(赋予物体生命:从3D物体生成4D内容)[04:47] 🧠 Facilitating large language model Russian adaptation with Learned Embedding Propagation(通过学习嵌入传播促进大语言模型的俄语适应)[05:25] 🤖 HumanEval Pro and MBPP Pro: Evaluating Large Language Models on Self-invoking Code Generation(HumanEval Pro与MBPP Pro:评估大语言模型在自调用代码生成上的表现)[06:12] 🤖 Training Software Engineering Agents and Verifiers with SWE-Gym(使用SWE-Gym训练软件工程代理与验证器)[06:52] 🧠 OneKE: A Dockerized Schema-Guided LLM Agent-based Knowledge Extraction System(OneKE:基于Docker化模式引导的LLM代理知识提取系统)【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递在小宇宙查看该单集文稿

7分钟
99+
1年前
2024.12.30 每日AI论文 | 华佗GPT-o1提升医学推理,Orient Anything精准估计物体方向。

2024.12.30 每日AI论文 | 华佗GPT-o1提升医学推理,Orient Anything精准估计物体方向。

HuggingFace 每日AI论文速递

本期的 8 篇论文如下:[00:30] 🧠 HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs(华佗GPT-o1:迈向医学复杂推理的大语言模型)[01:16] 🧭 Orient Anything: Learning Robust Object Orientation Estimation from Rendering 3D Models(定向万物:从渲染3D模型中学习鲁棒的物体方向估计)[02:03] 🔍 Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment(任务偏好优化:通过视觉任务对齐提升多模态大语言模型)[02:50] 🧬 The Superposition of Diffusion Models Using the Itô Density Estimator(使用Itô密度估计器进行扩散模型的叠加)[03:33] 🎨 From Elements to Design: A Layered Approach for Automatic Graphic Design Composition(从元素到设计:一种分层的自动图形设计构图方法)[04:16] 🛡 Safeguard Fine-Tuned LLMs Through Pre- and Post-Tuning Model Merging(通过预调优和后调优模型合并保护微调的大型语言模型)[04:56] 📊 SBS Figures: Pre-training Figure QA from Stage-by-Stage Synthesized Images(SBS图表:从分阶段合成图像预训练图表问答)[05:47] 🎥 VideoMaker: Zero-shot Customized Video Generation with the Inherent Force of Video Diffusion Models(VideoMaker:利用视频扩散模型的内在力量实现零样本定制视频生成)【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递在小宇宙查看该单集文稿

6分钟
99+
1年前
2024.12.25 每日AI论文 | 提升三维场景理解,填补深度信息缺失。

2024.12.25 每日AI论文 | 提升三维场景理解,填补深度信息缺失。

HuggingFace 每日AI论文速递

本期的 9 篇论文如下:[00:26] 🧠 3DGraphLLM: Combining Semantic Graphs and Large Language Models for 3D Scene Understanding(3DGraphLLM:结合语义图与大型语言模型进行三维场景理解)[01:11] 🖼 DepthLab: From Partial to Complete(DepthLab:从部分到完整)[01:54] 📊 Fourier Position Embedding: Enhancing Attention's Periodic Extension for Length Generalization(傅里叶位置嵌入:增强注意力机制的周期性扩展以实现长度泛化)[02:35] 🎥 DiTCtrl: Exploring Attention Control in Multi-Modal Diffusion Transformer for Tuning-Free Multi-Prompt Longer Video Generation(DiTCtrl:探索多模态扩散变压器中的注意力控制以实现无需调优的多提示长视频生成)[03:26] 🤔 In Case You Missed It: ARC 'Challenge' Is Not That Challenging(你可能错过了:ARC '挑战' 并不那么具有挑战性)[04:02] 🧠 ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing(ReMoE:使用ReLU路由的全可微分专家混合模型)[04:41] 🧩 PartGen: Part-level 3D Generation and Reconstruction with Multi-View Diffusion Models(PartGen:基于多视角扩散模型的部分级三维生成与重建)[05:20] 🧠 SKETCH: Structured Knowledge Enhanced Text Comprehension for Holistic Retrieval(SKETCH:结构化知识增强的文本理解与整体检索)[06:02] 🧠 Ensembling Large Language Models with Process Reward-Guided Tree Search for Better Complex Reasoning(通过过程奖励引导的树搜索集成大语言模型以提升复杂推理能力)【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递在小宇宙查看该单集文稿

7分钟
99+
1年前

加入我们的 Discord

与播客爱好者一起交流

立即加入

扫描微信二维码

添加微信好友,获取更多播客资讯

微信二维码

播放列表

自动播放下一个

播放列表还是空的

去找些喜欢的节目添加进来吧