HuggingFace 每日AI论文速递 - 节目列表

2024.10.30 每日AI论文 | 多模态遗忘挑战大,AutoKaggle提升效率。

2024.10.30 每日AI论文 | 多模态遗忘挑战大,AutoKaggle提升效率。

HuggingFace 每日AI论文速递

本期的 8 篇论文如下:[00:33] 🧠 CLEAR: Character Unlearning in Textual and Visual Modalities(CLEAR:文本与视觉模态中的字符遗忘)[01:10] 🤖 AutoKaggle: A Multi-Agent Framework for Autonomous Data Science Competitions(AutoKaggle:一种用于自主数据科学竞赛的多智能体框架)[01:46] 🤖 SocialGPT: Prompting LLMs for Social Relation Reasoning via Greedy Segment Optimization(社交GPT:通过贪婪段优化提示LLMs进行社交关系推理)[02:26] 🌐 OpenWebVoyager: Building Multimodal Web Agents via Iterative Real-World Exploration, Feedback and Optimization(开放式网络航海者:通过迭代现实世界探索、反馈和优化构建多模态网络代理)[03:13] 🧠 Flow-DPO: Improving LLM Mathematical Reasoning through Online Multi-Agent Learning(Flow-DPO:通过在线多智能体学习提升LLM数学推理能力)[03:52] 🚀 ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference(ShadowKV:高吞吐量长上下文LLM推理的KV缓存优化)[04:31] 🤖 Robots Pre-train Robots: Manipulation-Centric Robotic Representation from Large-Scale Robot Dataset(机器人预训练机器人:基于大规模机器人数据集的以操作为中心的机器人表示)[05:17] 🤖 Precise and Dexterous Robotic Manipulation via Human-in-the-Loop Reinforcement Learning(基于人机协作强化学习的精确灵巧机器人操作)【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递在小宇宙查看该单集文稿

6分钟
81
1年前
2024.10.29 每日AI论文 | 波兰语模型性能提升,异构代理系统创新。

2024.10.29 每日AI论文 | 波兰语模型性能提升,异构代理系统创新。

HuggingFace 每日AI论文速递

本期的 17 篇论文如下:[00:24] 🇵 Bielik 7B v0.1: A Polish Language Model -- Development, Insights, and Evaluation(Bielik 7B v0.1:波兰语言模型——开发、洞察与评估)[01:00] 🤖 AgentStore: Scalable Integration of Heterogeneous Agents As Specialized Generalist Computer Assistant(AgentStore:可扩展的异构代理作为专业化通才计算机助手集成)[01:39] 🤖 GPT-4o System Card(GPT-4o系统卡片)[02:21] 📄 Document Parsing Unveiled: Techniques, Challenges, and Prospects for Structured Information Extraction(文档解析揭秘:结构化信息提取的技术、挑战与前景)[03:08] 🤖 LongReward: Improving Long-context Large Language Models with AI Feedback(长奖励:通过AI反馈提升长上下文大语言模型)[03:43] 🎥 MarDini: Masked Autoregressive Diffusion for Video Generation at Scale(MarDini:大规模视频生成的掩码自回归扩散模型)[04:22] 🌟 DreamClear: High-Capacity Real-World Image Restoration with Privacy-Safe Dataset Curation(DreamClear:高容量真实世界图像修复与隐私安全数据集构建)[05:10] 🧩 GrounDiT: Grounding Diffusion Transformers via Noisy Patch Transplantation(GrounDiT:基于噪声补丁移植的扩散变换器空间定位)[05:49] 📚 A Survey of Small Language Models(小语言模型综述)[06:23] 💾 COAT: Compressing Optimizer states and Activation for Memory-Efficient FP8 Training(COAT:压缩优化器状态和激活以实现高效的FP8训练)[06:58] ⚡ Fast Best-of-N Decoding via Speculative Rejection(基于推测拒绝的快速最佳N解码)[07:36] 🔍 Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines(视觉搜索助手:赋能视觉-语言模型作为多模态搜索引擎)[08:25] 🎥 LARP: Tokenizing Videos with a Learned Autoregressive Generative Prior(LARP:利用学习到的自回归生成先验进行视频标记化)[09:00] 🤖 Neural Fields in Robotics: A Survey(机器人学中的神经场:综述)[09:40] 🗣 Dialog2Flow: Pre-training Soft-Contrastive Action-Driven Sentence Embeddings for Automatic Dialog Flow Extraction(对话2流程:预训练软对比动作驱动句子嵌入用于自动对话流程提取)[10:15] 🩺 Language Models And A Second Opinion Use Case: The Pocket Professional(语言模型与第二意见应用案例:口袋专家)[10:55] 🤖 Leveraging Locality to Boost Sample Efficiency in Robotic Manipulation(利用局部性提升机器人操作的样本效率)【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递在小宇宙查看该单集文稿

12分钟
90
1年前
2024.10.28 每日AI论文 | 视觉-时间提示提升交互,连续扩散模型优化语音合成

2024.10.28 每日AI论文 | 视觉-时间提示提升交互,连续扩散模型优化语音合成

HuggingFace 每日AI论文速递

本期的 13 篇论文如下:[00:25] 🚀 ROCKET-1: Master Open-World Interaction with Visual-Temporal Context Prompting(ROCKET-1:利用视觉-时间上下文提示掌握开放世界交互)[01:14] 🗣 Continuous Speech Synthesis using per-token Latent Diffusion(基于每标记潜在扩散的连续语音合成)[01:55] ⚡ Teach Multimodal LLMs to Comprehend Electrocardiographic Images(教授多模态大语言模型理解心电图图像)[02:39] 🌐 Infinity-MM: Scaling Multimodal Performance with Large-Scale and High-Quality Instruction Data(无限多模态:通过大规模高质量指令数据扩展多模态性能)[03:23] ⚡ FasterCache: Training-Free Video Diffusion Model Acceleration with High Quality(FasterCache:无训练视频扩散模型加速与高质量生成)[03:56] 🎧 MMAU: A Massive Multi-Task Audio Understanding and Reasoning Benchmark(大规模多任务音频理解与推理基准)[04:34] 🧠 Counting Ability of Large Language Models and Impact of Tokenization(大型语言模型的计数能力及其对分词的影响)[05:08] 🧠 Fictitious Synthetic Data Can Improve LLM Factuality via Prerequisite Learning(通过先决学习利用虚构合成数据提升LLM事实性)[05:46] 🤖 Reflection-Bench: probing AI intelligence with reflection(反射-基准:通过反射探测AI智能)[06:23] 🤖 Hybrid Preferences: Learning to Route Instances for Human vs. AI Feedback(混合偏好:学习路由实例以进行人机反馈)[06:57] 🔍 Leveraging Skills from Unlabeled Prior Data for Efficient Online Exploration(利用未标注的先验数据进行高效在线探索)[07:35] 🔍 Are LLMs Better than Reported? Detecting Label Errors and Mitigating Their Effect on Model Performance(LLM是否优于报告?检测标签错误并减轻其对模型性能的影响)[08:15] 🤖 Dynamic 3D Gaussian Tracking for Graph-Based Neural Dynamics Modeling(基于图神经网络的动态三维高斯跟踪用于神经动力学建模)【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递在小宇宙查看该单集文稿

9分钟
92
1年前
2024.10.25 每日AI论文 | 内存效率显著提升,长上下文对齐能力增强。

2024.10.25 每日AI论文 | 内存效率显著提升,长上下文对齐能力增强。

HuggingFace 每日AI论文速递

本期的 21 篇论文如下:[00:26] 🚀 Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss(打破内存壁垒:对比损失的近无限批量规模扩展)[01:09] 🔄 LOGO -- Long cOntext aliGnment via efficient preference Optimization(LOGO -- 通过高效偏好优化实现长上下文对齐)[01:45] 🧠 Unleashing Reasoning Capability of LLMs via Scalable Question Synthesis from Scratch(从零开始释放LLMs的推理能力:可扩展的问题合成方法)[02:30] 🤔 Can Knowledge Editing Really Correct Hallucinations?(知识编辑真的能纠正幻觉吗?)[03:17] 🎮 Unbounded: A Generative Infinite Game of Character Life Simulation(无界:生成式无限角色生活模拟游戏)[04:02] 🎥 Framer: Interactive Frame Interpolation(Framer:交互式帧插值)[04:48] 📊 Distill Visual Chart Reasoning Ability from LLMs to MLLMs(从LLMs到MLLMs的视觉图表推理能力提炼)[05:35] 📉 Why Does the Effective Context Length of LLMs Fall Short?(为什么大型语言模型的有效上下文长度不足?)[06:14] 🔒 Robust Watermarking Using Generative Priors Against Image Editing: From Benchmarking to Advances(基于生成先验的鲁棒水印技术对抗图像编辑:从基准测试到进展)[06:52] 🔧 Skywork-Reward: Bag of Tricks for Reward Modeling in LLMs(天工奖励:LLM奖励建模的技巧包)[07:27] 🌍 CAMEL-Bench: A Comprehensive Arabic LMM Benchmark(CAMEL-Bench:一个全面的阿拉伯语大型多模态模型基准)[08:09] 📊 Should We Really Edit Language Models? On the Evaluation of Edited Language Models(我们真的应该编辑语言模型吗?关于编辑语言模型的评估)[08:43] 🌐 ADEM-VL: Adaptive and Embedded Fusion for Efficient Vision-Language Tuning(ADEM-VL:高效视觉语言调优的自适应嵌入融合方法)[09:20] 🌐 WAFFLE: Multi-Modal Model for Automated Front-End Development(WAFFLE:自动化前端开发的多模态模型)[09:52] 📚 CCI3.0-HQ: a large-scale Chinese dataset of high quality designed for pre-training large language models(CCI3.0-HQ:一个用于预训练大型语言模型的高质量大规模中文数据集)[10:30] 🔄 Stable Consistency Tuning: Understanding and Improving Consistency Models(稳定一致性调优:理解与改进一致性模型)[11:10] 🧮 Language Models are Symbolic Learners in Arithmetic(语言模型在算术中的符号学习者角色)[12:00] 🐍 Taipan: Efficient and Expressive State Space Language Models with Selective Attention(Taipan:高效且表达丰富的状态空间语言模型与选择性注意力)[12:44] 🔄 Value Residual Learning For Alleviating Attention Concentration In Transformers(残差值学习缓解Transformer中的注意力集中问题)[13:23] 📚 Multi-Draft Speculative Sampling: Canonical Architectures and Theoretical Limits(多草稿推测采样:典型架构与理论极限)[14:03] 🤖 Data Scaling Laws in Imitation Learning for Robotic Manipulation(机器人操作中的模仿学习数据缩放定律)【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递在小宇宙查看该单集文稿

15分钟
80
1年前
2024.10.24 每日AI论文 | 多图像任务优化,视频生成模型评估

2024.10.24 每日AI论文 | 多图像任务优化,视频生成模型评估

HuggingFace 每日AI论文速递

本期的 10 篇论文如下:[00:25] 🖼 MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models(多图像增强直接偏好优化:大型视觉语言模型)[01:09] 🌍 WorldSimBench: Towards Video Generation Models as World Simulators(世界模拟器:迈向视频生成模型作为世界模拟器)[01:47] 🌊 Scaling Diffusion Language Models via Adaptation from Autoregressive Models(通过自回归模型适应扩展扩散语言模型)[02:20] 📱 Lightweight Neural App Control(轻量级神经应用控制)[03:01] 🏠 ARKit LabelMaker: A New Scale for Indoor 3D Scene Understanding(ARKit标签制造者:室内3D场景理解的新尺度)[03:47] 🖼 Scalable Ranked Preference Optimization for Text-to-Image Generation(可扩展的文本到图像生成中的排序偏好优化)[04:23] 🌆 DynamicCity: Large-Scale LiDAR Generation from Dynamic Scenes(动态城市:动态场景的大规模LiDAR生成)[05:05] 🩺 MedINST: Meta Dataset of Biomedical Instructions(医学指令元数据集:MedINST)[05:52] 🌍 M-RewardBench: Evaluating Reward Models in Multilingual Settings(多语言环境下的奖励模型评估:M-RewardBench)[06:27] 📊 TP-Eval: Tap Multimodal LLMs' Potential in Evaluation by Customizing Prompts(TP-Eval:通过定制提示挖掘多模态大语言模型的评估潜力)【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递在小宇宙查看该单集文稿

7分钟
99+
1年前
2024.10.23 每日AI论文 | 视觉冗余减少提升效率,动态三维重建优化镜面场景。

2024.10.23 每日AI论文 | 视觉冗余减少提升效率,动态三维重建优化镜面场景。

HuggingFace 每日AI论文速递

本期的 8 篇论文如下:[00:27] 🔍 PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction(金字塔式视觉冗余减少:通过金字塔视觉冗余减少加速大型视觉-语言模型)[01:09] 🌟 SpectroMotion: Dynamic 3D Reconstruction of Specular Scenes(光谱运动:镜面场景的动态三维重建)[01:48] 🤖 Aligning Large Language Models via Self-Steering Optimization(通过自引导优化对齐大型语言模型)[02:30] 🇯 JMMMU: A Japanese Massive Multi-discipline Multimodal Understanding Benchmark for Culture-aware Evaluation(JMMMU:一个用于文化意识评估的日本大规模多学科多模态理解基准)[03:11] 🧬 EvoPress: Towards Optimal Dynamic Model Compression via Evolutionary Search(EvoPress:通过进化搜索实现最优动态模型压缩)[03:53] 🧠 MiniPLM: Knowledge Distillation for Pre-Training Language Models(MiniPLM:预训练语言模型的知识蒸馏)[04:30] 🔍 Mitigating Object Hallucination via Concentric Causal Attention(通过同心因果注意力缓解对象幻觉)[05:19] 🧠 Math Neurosurgery: Isolating Language Models' Math Reasoning Abilities Using Only Forward Passes(数学神经外科:仅使用前向传递隔离语言模型的数学推理能力)【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递在小宇宙查看该单集文稿

6分钟
76
1年前
2024.10.22 每日AI论文 | 指南针评判者加速模型评估,SAM2Long提升长视频分割精度。

2024.10.22 每日AI论文 | 指南针评判者加速模型评估,SAM2Long提升长视频分割精度。

HuggingFace 每日AI论文速递

本期的 21 篇论文如下:[00:24] 🤖 CompassJudger-1: All-in-one Judge Model Helps Model Evaluation and Evolution(指南针评判者-1:一体化评判模型助力模型评估与进化)[01:11] 🌲 SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree(SAM2长:通过无需训练的记忆树增强SAM 2以实现长视频分割)[01:55] 🌐 PUMA: Empowering Unified MLLM with Multi-granular Visual Generation(PUMA:赋予统一多模态大语言模型多粒度视觉生成能力)[02:37] 🤖 AutoTrain: No-code training for state-of-the-art models(AutoTrain:无代码训练最先进的模型)[03:10] ⚡ FrugalNeRF: Fast Convergence for Few-shot Novel View Synthesis without Learned Priors(节俭NeRF:无学习先验的少样本新视角合成快速收敛)[03:56] 📊 Baichuan Alignment Technical Report(百川对齐技术报告)[04:39] 🌍 Pangea: A Fully Open Multilingual Multimodal LLM for 39 Languages(泛亚:一个完全开放的多语种多模态LLM,涵盖39种语言)[05:21] 🔍 RM-Bench: Benchmarking Reward Models of Language Models with Subtlety and Style(RM-Bench:评估语言模型奖励模型的细致性与风格敏感度)[06:05] 📚 Meta-Chunking: Learning Efficient Text Segmentation via Logical Perception(元分块:通过逻辑感知学习高效的文本分割)[06:41] 🔍 Pre-training Distillation for Large Language Models: A Design Space Exploration(大型语言模型预训练蒸馏:设计空间探索)[07:16] 🔬 Alchemy: Amplifying Theorem-Proving Capability through Symbolic Mutation(炼金术:通过符号变异增强定理证明能力)[07:55] 🔄 SemiEvol: Semi-supervised Fine-tuning for LLM Adaptation(半监督微调:LLM适应的半监督微调框架)[08:31] 📚 Selecting Influential Samples for Long Context Alignment via Homologous Models' Guidance and Contextual Awareness Measurement(通过同源模型引导和上下文意识测量选择长上下文对齐的关键样本)[09:11] 🤖 Zero-shot Model-based Reinforcement Learning using Large Language Models(基于大语言模型的零样本模型强化学习)[09:53] 🗣 Ichigo: Mixed-Modal Early-Fusion Realtime Voice Assistant(一护:混合模态早期融合实时语音助手)[10:28] 🧠 CBT-Bench: Evaluating Large Language Models on Assisting Cognitive Behavior Therapy(CBT-Bench:评估大型语言模型在辅助认知行为疗法中的应用)[11:12] 🛠 Router-Tuning: A Simple and Effective Approach for Enabling Dynamic-Depth in Transformers(路由器调优:一种简单有效的Transformer动态深度调整方法)[11:58] 🧠 Hallucination Detox: Sensitive Neuron Dropout (SeND) for Large Language Model Training(幻觉解毒:用于大型语言模型训练的敏感神经元丢弃方法)[12:45] 🌍 Cross-Lingual Auto Evaluation for Assessing Multilingual LLMs(多语言大语言模型的跨语言自动评估)[13:25] 🗣 DM-Codec: Distilling Multimodal Representations for Speech Tokenization(多模态表示蒸馏用于语音标记化)[14:17] 🧠 In-context learning and Occam's razor(上下文学习与奥卡姆剃刀)【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递在小宇宙查看该单集文稿

15分钟
99+
1年前
2024.10.21 每日AI论文 | 提升网页导航成功率,增强图像生成精细度。

2024.10.21 每日AI论文 | 提升网页导航成功率,增强图像生成精细度。

HuggingFace 每日AI论文速递

本期的 12 篇论文如下:[00:27] 🌐 Web Agents with World Models: Learning and Leveraging Environment Dynamics in Web Navigation(拥有世界模型的网络代理:学习和利用环境动态进行网页导航)[01:11] 👗 MagicTailor: Component-Controllable Personalization in Text-to-Image Diffusion Models(魔法裁缝:文本到图像扩散模型中的组件可控个性化)[01:48] 💼 UCFE: A User-Centric Financial Expertise Benchmark for Large Language Models(UCFE:面向用户的大语言模型金融专业能力基准)[02:37] 🧠 NaturalBench: Evaluating Vision-Language Models on Natural Adversarial Samples(自然对抗样本:评估视觉语言模型)[03:12] 🧠 SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs(SeerAttention:在LLMs中学习内在稀疏注意力)[03:54] 📊 Are AI Detectors Good Enough? A Survey on Quality of Datasets With Machine-Generated Texts(AI检测器足够好吗?机器生成文本数据集质量调查)[04:25] 🌐 Diffusion Curriculum: Synthetic-to-Real Generative Curriculum Learning via Image-Guided Diffusion(扩散课程:通过图像引导扩散实现合成到真实的生成课程学习)[05:08] 🎥 DAWN: Dynamic Frame Avatar with Non-autoregressive Diffusion Framework for Talking Head Video Generation(DAWN: 非自回归扩散框架动态帧头像的讲话头视频生成)[05:50] 🔄 A Common Pitfall of Margin-based Language Model Alignment: Gradient Entanglement(基于边际的语言模型对齐常见陷阱:梯度纠缠)[06:31] 🧬 DPLM-2: A Multimodal Diffusion Protein Language Model(DPLM-2: 一种多模态扩散蛋白质语言模型)[07:12] 📰 Context is Key(NMF): Modelling Topical Information Dynamics in Chinese Diaspora Media(关键在于上下文(NMF):建模华人媒体中的主题信息动态)[07:56] 🧠 How Do Training Methods Influence the Utilization of Vision Models?(训练方法如何影响视觉模型的利用?)【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递在小宇宙查看该单集文稿

8分钟
98
1年前
2024.10.18 每日AI论文 | AI评估标准化,电影生成模型领先。

2024.10.18 每日AI论文 | AI评估标准化,电影生成模型领先。

HuggingFace 每日AI论文速递

本期的 31 篇论文如下:[00:23] 📊 MixEval-X: Any-to-Any Evaluations from Real-World Data Mixtures(MixEval-X:从现实世界数据混合中进行任意到任意评估)[01:02] 🎥 Movie Gen: A Cast of Media Foundation Models(电影生成:媒体基础模型集合)[01:35] 📱 MobA: A Two-Level Agent System for Efficient Mobile Task Automation(MobA:一种高效移动任务自动化的两级代理系统)[02:18] 🌐 Harnessing Webpage UIs for Text-Rich Visual Understanding(利用网页UI进行丰富的视觉理解)[02:59] 🔄 Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation(雅努斯:解耦视觉编码以实现统一的多模态理解和生成)[03:29] 🩺 MMed-RAG: Versatile Multimodal RAG System for Medical Vision Language Models(多功能多模态RAG系统在医学视觉语言模型中的应用)[04:04] 📊 A Unified View of Delta Parameter Editing in Post-Trained Large-Scale Models(大规模模型后训练中Delta参数编辑的统一视角)[04:46] 🔄 PopAlign: Diversifying Contrasting Patterns for a More Comprehensive Alignment(PopAlign:多样化对比模式以实现更全面的模型对齐)[05:23] 🔍 BenTo: Benchmark Task Reduction with In-Context Transferability(BenTo: 基于上下文迁移性的基准任务缩减)[06:03] 🎥 DreamVideo-2: Zero-Shot Subject-Driven Video Customization with Precise Motion Control(DreamVideo-2:零样本主题驱动视频定制与精确运动控制)[06:49] 🧠 MoH: Multi-Head Attention as Mixture-of-Head Attention(MoH:多头部注意力机制作为混合头部注意力机制)[07:28] 🎥 VidPanos: Generative Panoramic Videos from Casual Panning Videos(VidPanos:从随意拍摄的平移视频生成全景视频)[08:03] 📉 FlatQuant: Flatness Matters for LLM Quantization(FlatQuant:扁平化对LLM量化的重要性)[08:44] 🔄 Retrospective Learning from Interactions(从交互中回顾学习)[09:22] 🔄 Failing Forward: Improving Generative Error Correction for ASR with Synthetic Data and Retrieval Augmentation(向前失败:利用合成数据和检索增强改进ASR的生成错误校正)[10:06] 🖼 Can MLLMs Understand the Deep Implication Behind Chinese Images?(多模态大语言模型能否理解中文图像的深层含义?)[10:43] 📱 MedMobile: A mobile-sized language model with expert-level clinical capabilities(MedMobile:具备专家级临床能力的移动端语言模型)[11:22] 🌍 WorldCuisines: A Massive-Scale Benchmark for Multilingual and Multicultural Visual Question Answering on Global Cuisines(世界美食:多语言多文化视觉问答的大规模基准)[12:04] 🤖 Remember, Retrieve and Generate: Understanding Infinite Visual Concepts as Your Personalized Assistant(记住、检索与生成:理解无限视觉概念作为个性化助手)[12:48] 🔄 LoLDU: Low-Rank Adaptation via Lower-Diag-Upper Decomposition for Parameter-Efficient Fine-Tuning(LoLDU:通过下三角-对角-上三角分解实现低秩适应的参数高效微调)[13:29] 🔒 AERO: Softmax-Only LLMs for Efficient Private Inference(AERO:仅使用Softmax的LLM实现高效隐私推断)[14:12] 🌐 $γ-$MoD: Exploring Mixture-of-Depth Adaptation for Multimodal Large Language Models(γ-MoD:探索多模态大语言模型的深度混合适应)[14:45] 🌐 Long-LRM: Long-sequence Large Reconstruction Model for Wide-coverage Gaussian Splats(长序列大重建模型:广覆盖高斯点云)[15:24] 🎶 MuVi: Video-to-Music Generation with Semantic Alignment and Rhythmic Synchronization(MuVi:视频到音乐生成与语义对齐及节奏同步)[16:05] 🔒 Do LLMs Have Political Correctness? Analyzing Ethical Biases and Jailbreak Vulnerabilities in AI Systems(大型语言模型是否具备政治正确性?分析AI系统中的伦理偏见与越狱漏洞)[16:48] 📚 SBI-RAG: Enhancing Math Word Problem Solving for Students through Schema-Based Instruction and Retrieval-Augmented Generation(基于模式教学和检索增强生成的数学应用题解决方法)[17:27] 🗺 Roadmap towards Superhuman Speech Understanding using Large Language Models(基于大型语言模型的超人类语音理解路线图)[18:05] 🔄 Toward Guidance-Free AR Visual Generation via Condition Contrastive Alignment(面向无指导的AR视觉生成的条件对比对齐)[18:47] 🤖 TransAgent: Transfer Vision-Language Foundation Models with Heterogeneous Agent Collaboration(TransAgent:异构代理协作迁移视觉语言基础模型)[19:25] 🔬 Open Materials 2024 (OMat24) Inorganic Materials Dataset and Models(开放材料2024(OMat24)无机材料数据集与模型)[20:05] 📚 Minimum Tuning to Unlock Long Output from LLMs with High Quality Data as the Key(最小调整解锁LLM长输出:高质量数据的关键)【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递在小宇宙查看该单集文稿

21分钟
99+
1年前
2024.10.17 每日AI论文 | 视觉推理能力待提升,自中心视频理解需改进

2024.10.17 每日AI论文 | 视觉推理能力待提升,自中心视频理解需改进

HuggingFace 每日AI论文速递

本期的 19 篇论文如下:[00:28] 🧠 HumanEval-V: Evaluating Visual Understanding and Reasoning Abilities of Large Multimodal Models Through Coding Tasks(HumanEval-V:通过编码任务评估大型多模态模型的视觉理解和推理能力)[01:15] 🎥 VidEgoThink: Assessing Egocentric Video Understanding Capabilities for Embodied AI(VidEgoThink:评估具身AI的自中心视频理解能力)[01:50] 🧠 The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio(多模态的诅咒:评估大型多模态模型在语言、视觉和音频中的幻觉)[02:31] 🤖 Revealing the Barriers of Language Agents in Planning(揭示语言代理在规划中的障碍)[03:15] 📄 DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception(DocLayout-YOLO:通过多样合成数据和全局到局部自适应感知增强文档布局分析)[03:56] ⚙ Large Language Model Evaluation via Matrix Nuclear-Norm(大型语言模型评估通过矩阵核范数)[04:38] 🧬 Exploring Model Kinship for Merging Large Language Models(探索大型语言模型合并中的模型亲缘关系)[05:15] 📊 ProSA: Assessing and Understanding the Prompt Sensitivity of LLMs(ProSA:评估和理解大型语言模型的提示敏感性)[05:50] ⚡ ZipVL: Efficient Large Vision-Language Models with Dynamic Token Sparsification and KV Cache Compression(ZipVL:动态令牌稀疏化和KV缓存压缩的高效大视觉-语言模型)[06:31] 📄 Improving Long-Text Alignment for Text-to-Image Diffusion Models(改进文本到图像扩散模型的长文本对齐)[07:11] 🔄 Simplifying, Stabilizing and Scaling Continuous-Time Consistency Models(简化、稳定和扩展连续时间一致性模型)[07:55] 🛡 Controllable Safety Alignment: Inference-Time Adaptation to Diverse Safety Requirements(可控安全对齐:推理时适应多样安全需求)[08:34] 🔍 Tracking Universal Features Through Fine-Tuning and Model Merging(通过微调和模型合并追踪通用特征)[09:08] 🔄 Insights from the Inverse: Reconstructing LLM Training Goals Through Inverse RL(逆向洞察:通过逆向强化学习重构LLM训练目标)[09:46] 🧠 Neural Metamorphosis(神经变形)[10:25] 🌍 WorldMedQA-V: a multilingual, multimodal medical examination dataset for multimodal language models evaluation(世界医学QA-V:多语言、多模态医学考试数据集用于多模态语言模型评估)[11:09] 🌐 OMCAT: Omni Context Aware Transformer(全上下文感知变压器)[11:44] ⏳ ChroKnowledge: Unveiling Chronological Knowledge of Language Models in Multiple Domains(ChroKnowledge:揭示语言模型在多领域中的时间知识)[12:22] 📚 DyVo: Dynamic Vocabularies for Learned Sparse Retrieval with Entities(DyVo:动态词汇表用于实体学习的稀疏检索)【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递在小宇宙查看该单集文稿

13分钟
86
1年前

加入我们的 Discord

与播客爱好者一起交流

立即加入

扫描微信二维码

添加微信好友,获取更多播客资讯

微信二维码

播放列表

自动播放下一个

播放列表还是空的

去找些喜欢的节目添加进来吧