HuggingFace 每日AI论文速递 - 节目列表

2024.12.10 每日AI论文 | 识别数学推理错误,评估强化学习记忆。

2024.12.10 每日AI论文 | 识别数学推理错误,评估强化学习记忆。

HuggingFace 每日AI论文速递

本期的 9 篇论文如下:[00:23] 🧮 ProcessBench: Identifying Process Errors in Mathematical Reasoning(ProcessBench:识别数学推理中的过程错误)[01:13] 🧠 Unraveling the Complexity of Memory in RL Agents: an Approach for Classification and Evaluation(揭开强化学习代理中记忆复杂性的分类与评估方法)[01:58] 🧠 Training Large Language Models to Reason in a Continuous Latent Space(在连续潜在空间中训练大型语言模型进行推理)[02:38] 🌐 Exploring Multi-Grained Concept Annotations for Multimodal Large Language Models(探索多粒度概念注释在多模态大语言模型中的应用)[03:22] 🎥 Divot: Diffusion Powers Video Tokenizer for Comprehension and Generation(Divot:基于扩散模型的视频理解与生成)[04:09] 🎥 You See it, You Got it: Learning 3D Creation on Pose-Free Videos at Scale(所见即所得:在无姿态视频上大规模学习3D创作)[04:53] 🌍 Global and Dense Embeddings of Earth: Major TOM Floating in the Latent Space(地球的全局与密集嵌入:潜在空间中的Major TOM浮动)[05:31] 🌐 Robust Multi-bit Text Watermark with LLM-based Paraphrasers(基于LLM的鲁棒多比特文本水印)[06:15] 🤖 CARP: Visuomotor Policy Learning via Coarse-to-Fine Autoregressive Prediction(CARP:通过粗到细自回归预测进行视觉运动策略学习)【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递在小宇宙查看该单集文稿

7分钟
82
1年前
2024.12.09 每日AI论文 | 提升多模态模型性能,优化文本到视频生成质量。

2024.12.09 每日AI论文 | 提升多模态模型性能,优化文本到视频生成质量。

HuggingFace 每日AI论文速递

本期的 11 篇论文如下:[00:27] 🌐 Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling(扩展开源多模态模型性能边界:模型、数据与测试时扩展)[00:58] 🎥 LiFT: Leveraging Human Feedback for Text-to-Video Model Alignment(利用人类反馈进行文本到视频模型对齐)[01:41] 🧠 MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale(MAmmoTH-VL:大规模指令调优激发多模态推理)[02:24] 🤖 EXAONE 3.5: Series of Large Language Models for Real-world Use Cases(EXAONE 3.5:面向实际应用的大型语言模型系列)[03:26] 🤖 Moto: Latent Motion Token as the Bridging Language for Robot Manipulation(Moto:作为机器人操作桥梁语言的潜在运动标记)[04:10] 🚀 APOLLO: SGD-like Memory, AdamW-level Performance(APOLLO:类似SGD的内存,AdamW级别的性能)[04:49] ⚡ SwiftEdit: Lightning Fast Text-Guided Image Editing via One-Step Diffusion(SwiftEdit:通过一步扩散实现闪电般快速的文本引导图像编辑)[05:26] 🎥 GenMAC: Compositional Text-to-Video Generation with Multi-Agent Collaboration(GenMAC:基于多智能体协作的组合式文本到视频生成)[06:07] ⏱ Mind the Time: Temporally-Controlled Multi-Event Video Generation(注意时间:时间控制的多事件视频生成)[06:42] 🏠 2DGS-Room: Seed-Guided 2D Gaussian Splatting with Geometric Constrains for High-Fidelity Indoor Scene Reconstruction(2DGS-Room:基于种子引导的2D高斯喷射与几何约束的高保真室内场景重建)[07:20] 🗣 DEMO: Reframing Dialogue Interaction with Fine-grained Element Modeling(DEMO:通过细粒度元素建模重构对话交互)【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递在小宇宙查看该单集文稿

8分钟
99+
1年前
2024.12.06 每日AI论文 | 视觉压缩提升效率,代码监控增强机器人可靠性。

2024.12.06 每日AI论文 | 视觉压缩提升效率,代码监控增强机器人可靠性。

HuggingFace 每日AI论文速递

本期的 23 篇论文如下:[00:23] 🔍 VisionZip: Longer is Better but Not Necessary in Vision Language Models(视觉压缩:视觉语言模型中长度并非必要优势)[01:03] 🤖 Code-as-Monitor: Constraint-aware Visual Programming for Reactive and Proactive Robotic Failure Detection(代码即监控:约束感知的视觉编程用于反应性和前瞻性机器人故障检测)[01:43] 🖥 Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction(Aguvis:统一纯视觉自主GUI交互代理)[02:27] 🔊 A Noise is Worth Diffusion Guidance(噪声值得扩散引导)[03:04] 📊 Evaluating Language Models as Synthetic Data Generators(评估语言模型作为合成数据生成器)[03:48] 🌐 Structured 3D Latents for Scalable and Versatile 3D Generation(结构化3D潜在表示在可扩展和多功能3D生成中的应用)[04:26] 🌐 MV-Adapter: Multi-view Consistent Image Generation Made Easy(MV-Adapter:多视角一致图像生成变得简单)[05:05] 🖼 Negative Token Merging: Image-based Adversarial Feature Guidance(负向标记合并:基于图像的对抗特征引导)[05:41] 🌐 Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion(佛罗伦萨-VL:通过生成视觉编码器和深度-广度融合增强视觉语言模型)[06:18] 📈 Densing Law of LLMs(大语言模型的密度定律)[06:59] 🌌 Infinity: Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis(无限:高分辨率图像合成中的比特位自回归建模)[07:37] ⚽ Towards Universal Soccer Video Understanding(面向通用足球视频理解)[08:15] 🎨 HumanEdit: A High-Quality Human-Rewarded Dataset for Instruction-based Image Editing(HumanEdit:一个高质量的人类奖励数据集,用于基于指令的图像编辑)[08:53] 👗 AnyDressing: Customizable Multi-Garment Virtual Dressing via Latent Diffusion Models(任意服装虚拟试穿:基于潜在扩散模型的可定制多服装生成)[09:35] 🌍 Global MMLU: Understanding and Addressing Cultural and Linguistic Biases in Multilingual Evaluation(全球MMLU:理解和解决多语言评估中的文化和语言偏见)[10:11] 🌐 Personalized Multimodal Large Language Models: A Survey(个性化多模态大语言模型:综述)[10:55] ⚡ ZipAR: Accelerating Autoregressive Image Generation through Spatial Locality(ZipAR:通过空间局部性加速自回归图像生成)[11:36] 🧠 MRGen: Diffusion-based Controllable Data Engine for MRI Segmentation towards Unannotated Modalities(MRGen:基于扩散的可控数据引擎用于无标注模态的MRI分割)[12:14] 🧠 Discriminative Fine-tuning of LVLMs(判别性微调的大视觉语言模型)[12:48] 🧠 Monet: Mixture of Monosemantic Experts for Transformers(Monet:Transformer的单语义专家混合模型)[13:24] 🌊 OmniFlow: Any-to-Any Generation with Multi-Modal Rectified Flows(全流:多模态校正流的任意到任意生成)[13:59] 🧠 KV Shifting Attention Enhances Language Modeling(KV移位注意力增强语言建模)[14:40] 🌍 Marco-LLM: Bridging Languages via Massive Multilingual Training for Cross-Lingual Enhancement(Marco-LLM:通过大规模多语言训练实现跨语言增强)【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递在小宇宙查看该单集文稿

15分钟
99+
1年前
2024.12.05 每日AI论文 | 提升文本到图像扩散模型,生成沉浸式360度视频。

2024.12.05 每日AI论文 | 提升文本到图像扩散模型,生成沉浸式360度视频。

HuggingFace 每日AI论文速递

本期的 15 篇论文如下:[00:24] 🚀 SNOOPI: Supercharged One-step Diffusion Distillation with Proper Guidance(SNOOPI:超强一步扩散蒸馏与适当引导)[01:06] 🎥 Imagine360: Immersive 360 Video Generation from Perspective Anchor(Imagine360:从透视锚点生成沉浸式360度视频)[01:40] 🚗 Distilling Diffusion Models to Efficient 3D LiDAR Scene Completion(扩散模型在高效3D LiDAR场景补全中的蒸馏方法)[02:13] 🔄 PaliGemma 2: A Family of Versatile VLMs for Transfer(PaliGemma 2:多功能视觉语言模型的迁移研究)[02:52] 🌊 TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation(TokenFlow:多模态理解和生成的统一图像分词器)[03:31] 🌐 VARCO-VISION: Expanding Frontiers in Korean Vision-Language Models(VARCO-VISION:拓展韩国视觉语言模型的前沿)[04:05] 🌐 NVComposer: Boosting Generative Novel View Synthesis with Multiple Sparse and Unposed Images(NVComposer:通过多张稀疏和未定位图像提升生成新视角合成)[04:49] 🎥 Video-3D LLM: Learning Position-Aware Video Representation for 3D Scene Understanding(视频-3D大语言模型:学习位置感知视频表示用于3D场景理解)[05:34] 🔍 CleanDIFT: Diffusion Features without Noise(CleanDIFT:无噪声扩散特征)[06:11] 🎨 MIDI: Multi-Instance Diffusion for Single Image to 3D Scene Generation(MIDI:单张图像生成多实例3D场景的新方法)[06:53] 🎥 One Shot, One Talk: Whole-body Talking Avatar from a Single Image(一图一语:从单张图像生成全身说话虚拟形象)[07:33] 📹 Mimir: Improving Video Diffusion Models for Precise Text Understanding(米米尔:提升视频扩散模型在精确文本理解中的应用)[08:07] 🎨 NitroFusion: High-Fidelity Single-Step Diffusion through Dynamic Adversarial Training(硝基融合:通过动态对抗训练实现高保真单步扩散)[08:47] 🧩 Weighted-Reward Preference Optimization for Implicit Model Fusion(加权奖励偏好优化用于隐式模型融合)[09:37] 🔍 Inst-IT: Boosting Multimodal Instance Understanding via Explicit Visual Prompt Instruction Tuning(Inst-IT:通过显式视觉提示指令调优提升多模态实例理解)【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递在小宇宙查看该单集文稿

10分钟
99+
1年前
2024.12.04 每日AI论文 | 多镜头视频生成框架提升叙事连贯性,关键令牌识别增强LLM推理能力。

2024.12.04 每日AI论文 | 多镜头视频生成框架提升叙事连贯性,关键令牌识别增强LLM推理能力。

HuggingFace 每日AI论文速递

本期的 15 篇论文如下:[00:24] 🎥 VideoGen-of-Thought: A Collaborative Framework for Multi-Shot Video Generation(视频思维生成:多镜头视频生成的协作框架)[01:04] 🧠 Critical Tokens Matter: Token-Level Contrastive Estimation Enhances LLM's Reasoning Capability(关键令牌重要性:令牌级对比估计提升LLM的推理能力)[01:45] 🔄 Free Process Rewards without Process Labels(无过程标签的自由过程奖励)[02:30] 🎧 AV-Odyssey Bench: Can Your Multimodal LLMs Really Understand Audio-Visual Information?(AV-Odyssey 基准:多模态大语言模型真的能理解视听信息吗?)[03:04] 🤖 MALT: Improving Reasoning with Multi-Agent LLM Training(MALT:通过多智能体LLM训练提升推理能力)[03:45] 🎥 OmniCreator: Self-Supervised Unified Generation with Universal Editing(全能创作者:自监督统一生成与通用编辑)[04:23] 🌴 Truth or Mirage? Towards End-to-End Factuality Evaluation with LLM-Oasis(真相还是幻象?面向端到端事实性评估的LLM-Oasis)[05:08] 📚 OCR Hinders RAG: Evaluating the Cascading Impact of OCR on Retrieval-Augmented Generation(OCR 阻碍 RAG:评估 OCR 对检索增强生成系统的级联影响)[05:51] 📊 Scaling Image Tokenizers with Grouped Spherical Quantization(基于分组球面量化的图像标记器扩展)[06:27] 🌐 LSceneLLM: Enhancing Large 3D Scene Understanding Using Adaptive Visual Preferences(LSceneLLM:利用自适应视觉偏好增强大型3D场景理解)[07:09] ⚙ A dynamic parallel method for performance optimization on hybrid CPUs(混合CPU性能优化的动态并行方法)[08:00] 🌐 MaskRIS: Semantic Distortion-aware Data Augmentation for Referring Image Segmentation(MaskRIS:语义扭曲感知的数据增强方法用于指称图像分割)[08:46] 🎥 Motion Prompting: Controlling Video Generation with Motion Trajectories(运动提示:通过运动轨迹控制视频生成)[09:27] 🎥 VideoLights: Feature Refinement and Cross-Task Alignment Transformer for Joint Video Highlight Detection and Moment Retrieval(视频亮点:联合视频亮点检测与时刻检索的特征精炼与跨任务对齐Transformer)[10:01] 🤖 Generating a Low-code Complete Workflow via Task Decomposition and RAG(通过任务分解和RAG生成低代码完整工作流程)【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递在小宇宙查看该单集文稿

11分钟
99+
1年前
2024.12.03 每日AI论文 | X-Prompt提升图像生成,GATE OpenING评估图文生成。

2024.12.03 每日AI论文 | X-Prompt提升图像生成,GATE OpenING评估图文生成。

HuggingFace 每日AI论文速递

本期的 24 篇论文如下:[00:23] 🖼 X-Prompt: Towards Universal In-Context Image Generation in Auto-Regressive Vision Language Foundation Models(X-Prompt:面向自回归视觉语言基础模型的通用上下文图像生成)[00:58] 📊 GATE OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation(GATE 开放:一个综合基准用于评估开放式交错图文生成)[01:32] 🖼 Switti: Designing Scale-Wise Transformers for Text-to-Image Synthesis(Switti:为文本到图像合成设计尺度变换器)[02:09] 🎥 Open-Sora Plan: Open-Source Large Video Generation Model(开放Sora计划:开源大型视频生成模型)[02:55] 🎥 TAPTRv3: Spatial and Temporal Context Foster Robust Tracking of Any Point in Long Video(TAPTRv3:时空上下文增强长视频中任意点的鲁棒跟踪)[03:37] 🤖 o1-Coder: an o1 Replication for Coding(o1-Coder:一个面向编码任务的o1模型复现)[04:12] 🤖 SOLAMI: Social Vision-Language-Action Modeling for Immersive Interaction with 3D Autonomous Characters(SOLAMI:沉浸式互动的3D自主角色社交视觉-语言-动作建模)[04:49] 🎥 VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by Video Spatiotemporal Augmentation(VISTA:通过视频时空增强提升长时和高分辨率视频理解)[05:38] 🔍 TinyFusion: Diffusion Transformers Learned Shallow(微型融合:浅层扩散变换器的学习)[06:19] 🔍 VLsI: Verbalized Layers-to-Interactions from Large to Small Vision Language Models(VLsI:从大型到小型视觉语言模型的层级交互化)[06:52] 🎙 FLOAT: Generative Motion Latent Flow Matching for Audio-driven Talking Portrait(FLOAT:基于生成运动潜在流匹配的音频驱动说话人像)[07:32] 🚀 Efficient Track Anything(高效追踪任何目标)[08:15] 🌊 Steering Rectified Flow Models in the Vector Field for Controlled Image Generation(在矢量场中引导校正流模型以实现受控图像生成)[08:50] 🎥 Long Video Diffusion Generation with Segmented Cross-Attention and Content-Rich Video Data Curation(长视频扩散生成与分段交叉注意力及内容丰富的视频数据集构建)[09:33] 📹 WF-VAE: Enhancing Video VAE by Wavelet-Driven Energy Flow for Latent Video Diffusion Model(WF-VAE:通过小波驱动的能量流动增强视频VAE以用于潜在视频扩散模型)[10:11] 🔍 VLSBench: Unveiling Visual Leakage in Multimodal Safety(VLSBench:揭示多模态安全中的视觉泄露问题)[10:51] 🧠 VisOnlyQA: Large Vision Language Models Still Struggle with Visual Perception of Geometric Information(VisOnlyQA:大型视觉语言模型在几何信息视觉感知方面仍存在困难)[11:41] 🎮 PhysGame: Uncovering Physical Commonsense Violations in Gameplay Videos(PhysGame:揭示游戏视频中的物理常识违规)[12:14] 🗣 Collaborative Instance Navigation: Leveraging Agent Self-Dialogue to Minimize User Input(协作实例导航:利用代理自我对话最小化用户输入)[12:51] 🌍 INCLUDE: Evaluating Multilingual Language Understanding with Regional Knowledge(评估多语言理解能力:基于区域知识)[13:28] 🎨 Art-Free Generative Models: Art Creation Without Graphic Art Knowledge(无艺术生成模型:无需图形艺术知识的艺术创作)[14:02] 📈 A Simple and Provable Scaling Law for the Test-Time Compute of Large Language Models(大型语言模型测试时计算的简单可证明缩放定律)[14:41] 🌐 World-consistent Video Diffusion with Explicit 3D Modeling(世界一致性视频扩散与显式3D建模)[15:22] 🔊 Towards Cross-Lingual Audio Abuse Detection in Low-Resource Settings with Few-Shot Learning(面向低资源环境下跨语言音频滥用检测的小样本学习)【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递在小宇宙查看该单集文稿

16分钟
99+
1年前
【月末特辑】11月最火AI论文 | OpenCoder性能媲美专有模型,SDXL Turbo增强图像模型可解释性。

【月末特辑】11月最火AI论文 | OpenCoder性能媲美专有模型,SDXL Turbo增强图像模型可解释性。

HuggingFace 每日AI论文速递

本期的 10 篇论文如下:[00:41] TOP1(🔥109) | 🔓 OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models(开放编码器:顶级代码大语言模型的开放食谱)[02:35] TOP2(🔥75) | 🔍 Unpacking SDXL Turbo: Interpreting Text-to-Image Models with Sparse Autoencoders(解构SDXL Turbo:使用稀疏自编码器解释文本到图像模型)[04:35] TOP3(🔥72) | 🖼 ROICtrl: Boosting Instance Control for Visual Generation(ROICtrl:提升视觉生成的实例控制)[06:38] TOP4(🔥69) | 🎥 ReCapture: Generative Video Camera Controls for User-Provided Videos using Masked Video Fine-Tuning(ReCapture:使用掩码视频微调生成用户提供视频的生成性摄像机控制)[08:21] TOP5(🔥68) | 🌐 LLaMA-Mesh: Unifying 3D Mesh Generation with Language Models(LLaMA-Mesh:将3D网格生成与语言模型统一)[10:13] TOP6(🔥67) | 🌍 Generative World Explorer(生成世界探索者)[12:39] TOP7(🔥64) | 📄 HtmlRAG: HTML is Better Than Plain Text for Modeling Retrieved Knowledge in RAG Systems(HtmlRAG:在RAG系统中,HTML比纯文本更适合建模检索知识)[14:52] TOP8(🔥63) | ⚡ BitNet a4.8: 4-bit Activations for 1-bit LLMs(BitNet a4.8:1位大语言模型的4位激活)[16:41] TOP9(🔥62) | 🖼 Add-it: Training-Free Object Insertion in Images With Pretrained Diffusion Models(Add-it:基于预训练扩散模型的图像无训练对象插入)[18:16] TOP10(🔥61) | 🧠 Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization(通过混合偏好优化提升多模态大语言模型的推理能力)【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递在小宇宙查看该单集文稿

20分钟
99+
1年前
2024.12.02 每日AI论文 | HiAR-ICL提升复杂任务表现,多模态模型领域适应增强。

2024.12.02 每日AI论文 | HiAR-ICL提升复杂任务表现,多模态模型领域适应增强。

HuggingFace 每日AI论文速递

本期的 14 篇论文如下:[00:25] 🧠 Beyond Examples: High-level Automated Reasoning Paradigm in In-Context Learning via MCTS(超越示例:通过蒙特卡洛树搜索在上下文学习中的高级自动化推理范式)[01:06] 🌐 On Domain-Specific Post-Training for Multimodal Large Language Models(针对多模态大语言模型的领域特定后训练研究)[01:39] 🎥 Video Depth without Video Models(无需视频模型的视频深度估计)[02:10] 🧩 Puzzle: Distillation-Based NAS for Inference-Optimized LLMs(谜题:基于蒸馏的神经架构搜索用于优化推理的大型语言模型)[02:58] ⏱ Timestep Embedding Tells: It's Time to Cache for Video Diffusion Model(时间步嵌入提示:视频扩散模型的缓存时机)[03:39] 🎥 Trajectory Attention for Fine-grained Video Motion Control(细粒度视频运动控制的轨迹注意力)[04:26] 🌐 FAM Diffusion: Frequency and Attention Modulation for High-Resolution Image Generation with Stable Diffusion(FAM扩散:频率与注意力调制用于稳定扩散的高分辨率图像生成)[05:07] 🌊 DisCoRD: Discrete Tokens to Continuous Motion via Rectified Flow Decoding(DisCoRD:通过修正流解码将离散标记转换为连续运动)[05:52] 📐 AlphaTablets: A Generic Plane Representation for 3D Planar Reconstruction from Monocular Videos(AlphaTablets:单目视频三维平面重建的通用平面表示)[06:30] 🎥 Look Every Frame All at Once: Video-Ma$^2$mba for Efficient Long-form Video Understanding with Multi-Axis Gradient Checkpointing(每帧一览:视频-Ma²mba高效长视频理解的多轴梯度检查点技术)[07:07] 📹 AC3D: Analyzing and Improving 3D Camera Control in Video Diffusion Transformers(AC3D:分析并改进视频扩散变换器中的3D相机控制)[07:52] 📰 LLM Teacher-Student Framework for Text Classification With No Manually Annotated Data: A Case Study in IPTC News Topic Classification(无手动标注数据的文本分类LLM师生框架:以IPTC新闻主题分类为例)[08:38] 🎥 Spatiotemporal Skip Guidance for Enhanced Video Diffusion Sampling(时空跳跃引导增强视频扩散采样)[09:09] 🔄 Reverse Thinking Makes LLMs Stronger Reasoners(逆向思维使大型语言模型成为更强的推理者)【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递在小宇宙查看该单集文稿

10分钟
93
1年前
2024.11.28 每日AI论文 | 实例控制增强,4D场景生成突破

2024.11.28 每日AI论文 | 实例控制增强,4D场景生成突破

HuggingFace 每日AI论文速递

本期的 21 篇论文如下:[00:24] 🖼 ROICtrl: Boosting Instance Control for Visual Generation(ROICtrl:提升视觉生成的实例控制)[01:08] 🎥 CAT4D: Create Anything in 4D with Multi-View Video Diffusion Models(CAT4D:使用多视角视频扩散模型在4D中创建任何内容)[01:55] 📚 Interleaved Scene Graph for Interleaved Text-and-Image Generation Assessment(交错场景图用于交错文本与图像生成评估)[02:38] 🌐 MARVEL-40M+: Multi-Level Visual Elaboration for High-Fidelity Text-to-3D Content Creation(MARVEL-40M+:高保真文本到3D内容创建的多层次视觉细化)[03:21] 🤖 Large Language Model-Brained GUI Agents: A Survey(大语言模型驱动的图形用户界面代理:综述)[03:57] 🎨 DreamCache: Finetuning-Free Lightweight Personalized Image Generation via Feature Caching(DreamCache:通过特征缓存实现无需微调的轻量级个性化图像生成)[04:35] ⚡ Collaborative Decoding Makes Visual Auto-Regressive Modeling Efficient(协同解码使视觉自回归建模更高效)[05:14] 🎥 Identity-Preserving Text-to-Video Generation by Frequency Decomposition(基于频率分解的身份保持文本到视频生成)[05:47] 🚗 DiffusionDrive: Truncated Diffusion Model for End-to-End Autonomous Driving(扩散驱动:用于端到端自动驾驶的截断扩散模型)[06:31] 🔺 3D Convex Splatting: Radiance Field Rendering with 3D Smooth Convexes(三维凸包拼接:基于三维平滑凸包的辐射场渲染)[07:10] 🎭 Make-It-Animatable: An Efficient Framework for Authoring Animation-Ready 3D Characters(制作可动画化:一种高效的3D角色动画制作框架)[07:48] 🎛 Omegance: A Single Parameter for Various Granularities in Diffusion-Based Synthesis(Omegance:扩散合成中多粒度控制的单一参数)[08:26] 🦖 ChatRex: Taming Multimodal LLM for Joint Perception and Understanding(ChatRex:驯服多模态大语言模型以实现联合感知与理解)[09:26] 🧍 UniPose: A Unified Multimodal Framework for Human Pose Comprehension, Generation and Editing(UniPose:一种统一的多模态人体姿态理解、生成和编辑框架)[10:06] 🧠 Optimizing Brain Tumor Segmentation with MedNeXt: BraTS 2024 SSA and Pediatrics(优化脑肿瘤分割与MedNeXt:BraTS 2024 SSA与儿科研究)[10:43] ⏱ Draft Model Knows When to Stop: A Self-Verification Length Policy for Speculative Decoding(草稿模型知道何时停止:一种用于推测解码的自验证长度策略)[11:27] 🎙 VideoLLM Knows When to Speak: Enhancing Time-Sensitive Video Comprehension with Video-Text Duet Interaction Format(视频大语言模型何时发言:通过视频-文本二重奏交互格式增强时间敏感视频理解)[12:03] 🌟 Adaptive Blind All-in-One Image Restoration(自适应盲全合一图像恢复)[12:39] 🛡 Edit Away and My Face Will not Stay: Personal Biometric Defense against Malicious Generative Editing(编辑与我的脸将不再保持:针对恶意生成编辑的个人生物识别防御)[13:18] 🎥 Video-Guided Foley Sound Generation with Multimodal Controls(基于多模态控制的音效生成)[13:48] 📚 Training and Evaluating Language Models with Template-based Data Generation(基于模板的数据生成训练与评估语言模型)【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递在小宇宙查看该单集文稿

14分钟
85
1年前

加入我们的 Discord

与播客爱好者一起交流

立即加入

扫描微信二维码

添加微信好友,获取更多播客资讯

微信二维码

播放列表

自动播放下一个

播放列表还是空的

去找些喜欢的节目添加进来吧