每日 AI 简报

2026-06-05(内容获取于 06/05 06:17)

headroom:压缩输入,节省 60-95% 的 LLM Token

GitHub Trending

Chopratejas/headroom 项目在 LLM 前压缩工具输出、日志、文件等,节省 60-95% Token 且不影响答案质量,提供库、代理和 MCP 服务器。

推荐理由:直接降低 LLM 成本,开源可立即使用,实用性强。

NSA 使用 Anthropic 神话模型进行网络攻击

Hacker News

NSA 正在使用 Anthropic 的 Mythos 模型进行网络攻击操作,引发对 AI 军事化应用的激烈讨论。

推荐理由:涉及国家机构对前沿 AI 模型的使用,对行业安全治理有重大启示。

Microsoft Build 发布 MAI-Thinking-1 等多个模型

Smol AI News

微软发布 MAI-Thinking-1(35B MoE,256K 上下文),AIME 2025 达 97%,超越 Sonnet 4.6;同时推出 MAI 系列其他模型及 Surface RTX Spark Dev Box。

推荐理由:Microsoft 重大模型发布,性能数据扎实,值得关注其生态影响。

Anthropic 工程团队分享 Claude 跨产品安全限制方案

Anthropic Engineering

Anthropic 工程师详细介绍了为 claude.ai、Claude Code、Cowork 构建安全限制(containment)的经验,应对 Agent 能力增长带来的潜在风险。

推荐理由:深度工程实践分享,对 AI 安全与 Agent 开发有直接启发。

NVIDIA 推出 Nemotron 3.5 多模态内容安全模型

Hugging Face Blog

NVIDIA 发布 Nemotron 3.5 Content Safety,支持自定义多模态安全过滤,面向全球企业级 AI 部署。

推荐理由:企业 AI 安全需求明确,模型开源可定制,对运营团队有用。

NousResearch 发布 Hermes Agent:成长型智能体

GitHub Trending

NousResearch 开源 Hermes Agent,定位为“与你一起成长的智能体”,强调持续学习和适应性。

推荐理由:开源 Agent 框架,适合开发者探索动态学习机制。

DeepSeek 融资,Meta 模型延迟,Gemma 4 12B 发布

TLDR AI

DeepSeek 正在融资;Meta 模型延迟发布;Google 推出 Gemma 4 12B 开源模型。

推荐理由:行业动态快讯,反映资本与竞争温度,适合快速扫读。

Cloudflare 收购 VoidZero

V2EX

Cloudflare 已完成对 VoidZero 的收购,具体细节待披露,社区热议其战略意图。

推荐理由:体现云基础设施厂商对新兴技术的整合,值得关注后续产品影响。

Anthropic 发布社会科学编码 Agent 研究报告

Anthropic Research

Anthropic 发布经济研究报告,探讨编码 Agent 在社会科学研究中的应用。

推荐理由:开拓 AI Agent 在非技术领域的应用场景,对跨学科研究者有启发。

AI 生成诉讼激增,美国法院应对新挑战

MIT Tech Review AI

美国联邦法官每天需处理大量无律师人士用 AI 生成的法律文件,法院系统面临全新压力与合规问题。

推荐理由:揭示 AI 滥用对司法系统的实际冲击,具有社会视角价值。

chopratejas/headroom

Python · ★ 12,312 · 🍴 804 · 📈 3,139 stars today

Compress tool outputs, logs, files, and RAG chunks before they reach the LLM. 60-95% fewer tokens, same answers. Library, proxy, MCP server.

中文介绍 在将工具输出、日志、文件或 RAG 片段送入 LLM 之前,Headroom 可压缩其内容,减少 60-95% 的 token 数而答案不变。提供库、代理和 MCP Server 三种集成方式,适用于降低 API 成本、提升推理效率的场景。

NousResearch/hermes-agent

Python · ★ 180,880 · 🍴 31,020 · 📈 1,951 stars today

The agent that grows with you

中文介绍 Hermes Agent 是一个可随用户持续成长的 AI 代理框架,专注于个性化与自适应能力。适合需要长期陪伴、学习用户偏好并不断进化交互体验的个人助手或实验性应用场景。

affaan-m/ECC

JavaScript · ★ 207,137 · 🍴 31,800 · 📈 1,736 stars today

The agent harness performance optimization system. Skills, instincts, memory, security, and research-first development for Claude Code, Codex, Opencode, Cursor and beyond.

中文介绍 ECC 是一个面向 AI 编程代理(如 Claude Code、Cursor 等)的性能优化系统,提供技能、直觉、记忆、安全与研发优先的开发框架。帮助代理在复杂任务中更高效、稳定地执行。

PaddlePaddle/PaddleOCR

Python · ★ 79,811 · 🍴 10,597 · 📈 105 stars today

Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.

中文介绍 PaddleOCR 是一个轻量、强大的 OCR 工具包,支持 100+ 语言,可将任意 PDF 或图片文档转为 AI 可直接处理的结构化数据。适合文档数字化、发票识别、LLM 前处理等场景。

github/spec-kit

Python · ★ 108,531 · 🍴 9,593 · 📈 311 stars today

💫 Toolkit to help you get started with Spec-Driven Development

中文介绍 Spec Kit 是 GitHub 官方提供的开发工具包,帮助团队快速上手面向规范驱动开发(Spec-Driven Development)工作流。适用于需要先定义契约、再实现接口的 API 设计和协作场景。

NVIDIA/cosmos

Jupyter Notebook · ★ 8,955 · 🍴 578 · 📈 244 stars today

NVIDIA Cosmos is an open platform of world models, datasets, and tools that enables developers to build Physical AI for robots, autonomous vehicles, smart infrastructure, and more.

中文介绍 NVIDIA Cosmos 是一个开源世界模型平台,包含预训练模型、数据集和工具,用于构建物理 AI,如机器人、自动驾驶汽车和智能基础设施。开发者可基于该平台训练和部署具身智能。

lfnovo/open-notebook

TypeScript · ★ 24,922 · 🍴 2,907 · 📈 482 stars today

An Open Source implementation of Notebook LM with more flexibility and features

中文介绍 Open Notebook 是 Notebook LM 的开源替代品,提供更灵活的文档笔记与问答能力。支持导入各类文档并生成笔记、摘要和对话式检索,适合研究、学习与知识管理场景。

Open-LLM-VTuber/Open-LLM-VTuber

Python · ★ 9,543 · 🍴 1,144 · 📈 583 stars today

Talk to any LLM with hands-free voice interaction, voice interruption, and Live2D taking face running locally across platforms

中文介绍 Open-LLM-VTuber 是一个跨平台桌面应用,支持与任意 LLM 进行免提语音交互,具备语音中断、Live2D 面部动画等能力。适合直播、虚拟主播、桌面陪伴等场景。

jwasham/coding-interview-university

★ 349,674 · 🍴 83,224 · 📈 740 stars today

A complete computer science study plan to become a software engineer.

中文介绍 一个完整的计算机科学学习计划,旨在帮助从零基础或非科班背景的人系统性准备软件工程师面试。内容涵盖算法、数据结构、系统设计等核心知识,并配有资源推荐。

github/copilot-sdk

Java · ★ 8,940 · 🍴 1,209 · 📈 107 stars today

Multi-platform SDK for integrating GitHub Copilot Agent into apps and services

中文介绍 GitHub Copilot SDK 是一个多平台开发工具包,帮助开发者将 GitHub Copilot Agent(AI 编程助手)集成到自有应用或服务中。适用于 IDE 插件、内部工具或自动化工作流。

aquasecurity/trivy

Go · ★ 35,654 · 🍴 431 · 📈 255 stars today

Find vulnerabilities, misconfigurations, secrets, SBOM in containers, Kubernetes, code repositories, clouds and more

中文介绍 Trivy 是一款开源的安全扫描器,可检测容器镜像、Kubernetes、代码仓库、云环境等中的漏洞、错误配置、密钥泄露和 SBOM。适合 DevOps 和云原生环境下的安全合规流水线。

openclaw/openclaw-windows-node

C# · ★ 1,302 · 🍴 166 · 📈 358 stars today

Windows companion suite for OpenClaw - System Tray app, Shared library, Node, and PowerToys Command Palette extension

中文介绍 OpenClaw Windows Node 是 OpenClaw 的 Windows 伴侣套件,包括系统托盘应用、共享库、Node 支持和 PowerToys 命令面板扩展。主要用于在 Windows 环境下增强 OpenClaw 的集成与操作效率。

reconurge/flowsint

TypeScript · ★ 5,268 · 🍴 637 · 📈 308 stars today

A modern platform for visual, flexible, and extensible graph-based investigations. For cybersecurity analysts and investigators.

中文介绍 Flowsint 是一个面向网络安全分析师和调查人员的现代化可视化调查平台,支持基于图的灵活、可扩展分析流程。适合威胁追踪、事件响应和复杂关系挖掘场景。

mvanhorn/last30days-skill

Python · ★ 27,528 · 🍴 2,342 · 📈 173 stars today

AI agent skill that researches any topic across Reddit, X, YouTube, HN, Polymarket, and the web - then synthesizes a grounded summary

中文介绍 这是一个 AI 代理技能,可自动研究指定话题在 Reddit、X、YouTube、Hacker News、Polymarket 等平台过去 30 天的讨论,并生成有据可依的摘要。适合舆情监控、趋势分析和快速调研。

Probing Outcome-Level Resemblance and Mechanism-Level Alignment in LLM Risk Decisions: Evidence from the St. Petersburg Game

👍 1

LLMs can appear cautious in risk decision-making tasks, yet cautious-looking outputs do not necessarily indicate alignment with human decision-making mechanisms. We investigate this distinction using the St. Petersburg game as a controlled testbed, a classical paradox in which the expected payoff is

中文介绍 研究利用圣彼得堡游戏作为受控实验,探讨大语言模型在风险决策中的输出谨慎性是否真正与人类决策机制对齐,发现表象谨慎未必等价于机制对齐。

ZipSplat: Fewer Gaussians, Better Splats

👍 10

Feed-forward 3D Gaussian Splatting methods reconstruct a scene from posed or pose-free images in a single forward pass, yet current approaches predict one Gaussian per input pixel, tying the representation budget to camera resolution rather than scene complexity. A flat wall and a richly textured ob

中文介绍 提出ZipSplat方法,用于前馈式3D高斯泼溅,减少每个像素对应一个高斯的开销,使表示预算适配场景复杂度而非相机分辨率。

DAR: Deontic Reasoning with Agentic Harnesses

👍 3

Deontic reasoning is the task of answering questions by applying explicit rules and policies to case-specific facts, for example computing tax liability under a statute or determining the outcome of an immigration appeal. A key technical challenge for LLM-based deontic reasoning is that the relevant

中文介绍 DAR框架研究基于LLM的道义推理,即根据显式规则和事实回答法律或政策问题,如计算税负或判定移民上诉结果,分析了其中的技术挑战。

Evaluating Large Language Models in Dynamic Clinical Decision-Making with Standardized Patient Cases

👍 3

Large language models (LLMs) are increasingly proposed as clinical agents, yet static, single-turn benchmarks cannot capture how a model dynamically delivers care across an encounter: gathering information, planning treatment, and adapting longitudinal management across successive patient states. Me

中文介绍 使用标准化病人案例评估大语言模型在动态临床决策中的表现,涵盖信息收集、治疗规划和长期管理,指出静态单轮基准无法反映实际诊疗能力。

Reproducing, Analyzing, and Detecting Reward Hacking in Rubric-Based Reinforcement Learning

👍 34

Rubric-based reinforcement learning (RL) uses an LLM-as-a-Judge (LaaJ) to score model outputs according to rubrics as rewards. However, policy models may exploit latent biases in the judge, leading to reward hacking and ineffective or unsafe training outcomes. In real-world rubric-based RL, such hac

中文介绍 研究基于评分标准的强化学习中奖励黑客现象,即策略模型利用LLM评判器的潜在偏见导致无效或危险训练,并分析了检测方法。

STRIDE: Training Data Attribution via Sparse Recovery from Subset Perturbations

👍 3

Training Data Attribution (TDA) seeks to trace a model's predictions back to its training data. The gold standard for TDA relies on causal interventions, observing how a model changes when data is added or removed, but repeated retraining is computationally challenging for Large Language Models (LLM

中文介绍 提出STRIDE方法,通过子集扰动的稀疏恢复实现训练数据归因,避免大规模模型重复训练,追踪模型预测与训练数据的因果关系。

AutoLab: Can Frontier Models Solve Long-Horizon Auto Research and Engineering Tasks?

👍 9

Scientific and engineering progress is fundamentally a long-horizon iterative process: proposing changes, running experiments, measuring outcomes, and continuously refining artifacts. Yet existing benchmarks for frontier models primarily evaluate either single-turn responses or short-horizon agent t

中文介绍 AutoLab基准测试评估前沿模型在长时间跨度的自主科研与工程任务上的表现,包括提出修改、运行实验和迭代优化,超越单轮问答。

M^3Eval: Multi-Modal Memory Evaluation through Cognitively-Grounded Video Tasks

👍 24

As multi-modal models advance towards long-form video understanding, memory emerges as a critical capability. Despite substantial efforts in developing video datasets and benchmarks, existing works primarily focus on perception and reasoning, without systematically evaluating memory: what models ret

中文介绍 提出M^3Eval,基于认知启发的视频任务评估多模态模型的记忆能力,填补现有视频基准在记忆评估方面的空白。

MapAgent: An Industrial-Grade Agentic Framework for City-scale Lane-level Map Generation

👍 11

Lane-level maps are critical infrastructure for autonomous driving and lane-level navigation, yet constructing and maintaining standardized lane networks for hundreds of cities remains highly labor-intensive. Recent end-to-end vectorized mapping methods can predict lane geometry and topology directl

中文介绍 MapAgent是一个工业级智能体框架,用于城市规模的车道级地图生成,解决传统端到端方法在多个城市上的高人工成本问题。

MeshWeaver: Sparse-Voxel-Guided Surface Weaving for Autoregressive Mesh Generation

👍 1

Autoregressive mesh generation has gained attention by tokenizing meshes into sequences and training models in a language-modeling fashion. However, existing approaches suffer from two fundamental limitations: (i) low tokenization efficiency, which yields long token sequences and prevents scaling to

中文介绍 MeshWeaver通过稀疏体素引导的表面编织实现自回归网格生成,解决了现有方法标记化效率低和序列过长的问题。

KletterMix: Climbing Toward High-Quality German Pretraining Data

👍 10

High-quality pretraining data is a central ingredient in modern language models, but German-language resources remain far less developed than their English counterparts: they are often smaller, less carefully curated, weakly documented, and rarely validated through controlled training experiments. W

中文介绍 KletterMix致力于提升德语预训练数据质量,指出德语数据在规模、筛选和文档化方面远落后于英语,并通过受控训练实验验证。

ThoughtFold: Folding Reasoning Chains via Introspective Preference Learning

👍 24

Large Reasoning Models (LRMs) have achieved remarkable progress thanks to Reinforcement Learning with Verifiable Rewards (RLVR) on Chain-of-Thoughts (CoTs). However, since long CoTs naturally contain trial and errors and mainstream RLVR approaches choose outcome-correct CoT trajectories for memoriza

中文介绍 ThoughtFold通过内省偏好学习折叠推理链,改进大推理模型在长思维链中利用试错信息,克服结果正确但过程低效的问题。

WebRISE: Requirement-Induced State Evaluation for MLLM-Generated Web Artifacts

👍 8

Existing benchmarks for MLLM-generated web artifacts assess interaction through local evidence and miss the requirement-induced states and transitions that determine whether a page works. We introduce WebRISE, which compiles task requirements into Interaction Contract Graphs (ICGs) of observable sta

Agentic Chain-of-Thought Steering for Efficient and Controllable LLM Reasoning

👍 1

Large language models improve final-answer accuracy through extended chain-of-thought reasoning, but often spend tokens inefficiently and offer little inference-time control. Existing efficient reasoning methods control thinking length by shortening, early-stopping, or compressing traces, leaving ho

Large Language Models Hack Rewards, and Society

👍 1

Reinforcement learning (RL) has become a dominant post-training paradigm, enabling large language models (LLMs) to learn from rewards. We observe that societal regulations are structurally similar to reward functions. They define measurable outcomes, thresholds, and exceptions, while often leaving i

When Graph Tokens Sink: A Mechanistic Analysis of Graph Language Models

👍 1

Graph Language Models (GLMs) have become a promising direction for adapting Large Language Models (LLMs) to graph learning tasks. By transforming graph topology and node information into graph tokens, GLMs allow LLMs to jointly process structured graph inputs and textual instructions. Yet, it remain

Training-Free Multi-Concept LoRA Composition with Prompt-Aware Weighting

👍 0

Low-Rank Adaptation (LoRA) successfully enables personalization in text-to-image generation by adapting pre-trained diffusion models to specific visual concepts and styles. However, extending such models to multi-concept customization remains challenging. Naively combining multiple LoRA weights or t

OVO-S-Bench: A Hierarchical Benchmark for Streaming Spatial Intelligence in Multimodal LLMs

👍 28

Multimodal agents in robotics, AR, and autonomous driving must reason about places and layouts from continuous egocentric streams, often using evidence outside the current view. Existing benchmarks either evaluate offline over full videos or target events rather than spatial structure. We introduce

Economy of Minds: Emerging Multi-Agent Intelligence with Economic Interactions

👍 3

How can a population of agents self-orchestrate and self-adapt into stronger collective intelligence without centralized control? Inspired by Friedrich Hayek's economic theory of decentralized coordination in markets, we study this question through an agent economy in which agents compete via auctio

Filter, Then Reweight: Rethinking Optimization Granularity in On-Policy Distillation

👍 11

On-Policy distillation (OPD) in large language models is shifting from full-trace KL supervision toward more selective training paradigms. Recent OPD methods increasingly focus on selecting which trajectories to learn from, which tokens are most informative, and which supervision signals are most re

SuperMemory-VQA: An Egocentric Visual Question-Answering Benchmark for Long-Horizon Memory

👍 1

AI glasses present a compelling platform for AI agents to serve as personalized memory assistants. To be genuinely useful, such systems must move beyond short-term video comprehension and address memory gaps that humans experience for practical, personal, or social purposes over longitudinal egocent

Semi-Supervised Noise Adaptation: Transferring Knowledge from Noise Domain

👍 1

Transfer learning aims to facilitate the learning of a target domain by transferring knowledge from a source domain. The source domain typically contains semantically meaningful samples (*e.g.*, images) to facilitate effective knowledge transfer. However, a recent study observes that the noise domai

Do Text Edits Generalize to Visual Generation? Benchmarking Cross-Modal Knowledge Editing in UMMs

👍 0

Unified multimodal models (UMMs) have emerged as a promising paradigm for general-purpose multimodal intelligence. As they are deployed in real-world applications, effectively updating internal knowledge becomes critical. While knowledge editing has matured for text-only models, it remains unclear w

PaintBench: Deterministic Evaluation of Precise Visual Editing

👍 2

While current multimodal models are proficient at open-ended visual editing, executing precise single-answer edits remains an important obstacle. To probe this challenge, we introduce PaintBench, a dynamically scalable benchmark targeting 20 fundamental precise visual editing operations across four

SpatialAct: Probing Spatial Reasoning-to-Action Capabilities of VLM Agents in 3D Scenes

👍 2

Humans can effortlessly perceive spatial layouts, form cognitive representations, reason about spatial relations, and translate such reasoning into actions in everyday 3D environments. Although recent vision-language models (VLMs) have shown promising performance on observation-conditioned spatial p

Access Sets Matter: Budgeting Expert Reads for Scalable Weight-Space Model Merging

👍 4

Weight-space model merging is usually formulated as an algebraic operation on checkpoints, yet at LLM scale the limiting resource is often the set of expert weights that must be read. We introduce MergePipe, a budget-aware execution layer that casts LLM merging as an expert access-set problem: given

Benchmarks are Not Enough: RAMP for Runtime Assessing of Agentic Models in Production Systems

👍 19

LLM agents are rapidly evolving from coding assistants into autonomous software engineering systems. However, existing evaluation methodologies remain largely centered on static, isolated, and short-horizon benchmarks that fail to capture the dynamic complexity of real-world production workflows. As

Context as Topology: Why Your Agent's Memory Forgets, and How Structure Escapes It

@elpresidank · 116 粉丝 · 2.9M 阅 · 543 赞 · 35 转

Most AI agent memory is built on embeddings. And there's now a proof that this entire class of system is going to forget what you stored in it — and confidently make up things you never stored at all.

中文介绍 证明基于嵌入的智能体记忆存在结构性遗忘——不仅会忘记存储内容,还会自信地编造从未存储过的信息。从拓扑角度解释记忆为何失效。

Range and Depth on Demand

@1salman · 363 粉丝 · 2.0M 阅 · 682 赞 · 45 转

Everyone keeps asking whether AI favors specialists or generalists. I think that is the wrong question. AI does not pick a side. It changes the tradeoff. The old world forced a choice. You could go

中文介绍 认为AI并不偏向通才或专才,而是改变了选择权衡。旧世界迫使人二选一,AI打破了这一限制。探讨AI时代的能力模型新范式。

How to build a 4-agent team, that ships a feature while you sleep (Exact Setup Inside)

@zodchiii · 20.0K 粉丝 · 743.3K 阅 · 509 赞 · 55 转

Four AI agents can ship a feature while you sleep. Most people never wire them up. They fire a reviewer here, a test generator there, by hand, one at a time, each forgetting what the last one did.

中文介绍 分享构建4个AI智能体团队的精确设置,让其在睡眠时自动发布功能。多数人仍手动逐个触发智能体,彼此遗忘上下文。

30 Obsidian Workflows, Plugins, and Setups That Most Users Don't Know

@eng_khairallah1 · 61.9K 粉丝 · 693.5K 阅 · 511 赞 · 71 转

Obsidian has 2,700+ community plugins. Over 100 of them are AI-related. Save this :) And the CEO of Obsidian personally published official Claude Skills for the platform - 12,900+ GitHub stars in

中文介绍 列举30个Obsidian不为人知的工作流、插件与配置,其中100多个为AI相关插件。还提到Obsidian CEO官方发布了Claude Skills,获12900+ GitHub星标。

How to master Dynamic Workflows in Claude Code: 6 patterns and 14 steps Anthropic engineers actually

@0xCodez · 3.3K 粉丝 · 637.2K 阅 · 510 赞 · 59 转

Most Claude Code users still write their workflows by hand. They chain prompts, copy outputs, paste them into the next prompt, fix what went wrong, repeat. 9 out of 10 builders haven’t tried Dynamic

中文介绍 展示如何用Claude Code掌握动态工作流:6种模式与14步实战。指出9成用户仍在手动链式操作,未尝试动态工作流。

What an Enterprise Context Layer Actually Is

@prukalpa · 23.1K 粉丝 · 583.2K 阅 · 506 赞 · 80 转

A field guide to what it is, what it is not, and where it fits in your AI architecture. I have had some version of the same conversation with a CIO almost every day this year. Their team has read

中文介绍 企业AI架构中「上下文层」的实战指南。解释其本质、非本质及在AI架构中的定位。源自与多位CIO的日常对话。

I Searched the Whole Claude Skills Ecosystem - These Are the Ones That Matter [Full GitHub Links]

@polydao · 18.1K 粉丝 · 559.5K 阅 · 505 赞 · 55 转

Most people are still using Claude like a smarter chatbot That is not the game anymore You’re competing against people who treat Claude like an operating system > While you’re typing one-off

中文介绍 淘遍整个Claude Skills生态系统,筛选出真正值得使用的技能合集,并附完整GitHub链接。批评多数人仍把Claude当高级聊天机器人用。

hacking pewdiepie's AI agent harness using an evil cocomelon website (then helping protect it)

@theonejvo · 22.1K 粉丝 · 504.3K 阅 · 861 赞 · 1 转

Over the past year, @pewdiepie, has been turning into one of the most visible champions of private, self-hosted computing, and it has been a genuine pleasure to watch. What began in late 2025 as an

中文介绍 实际演示如何用恶意Cocomelon网站攻击PewDiePie的AI智能体系统,随后帮助加固防护。展示私有自托管计算的脆弱性与防护思路。

Generative UI Is the New Frontend

@Saboo_Shubham_ · 116.2K 粉丝 · 263.3K 阅 · 517 赞 · 74 转

The frontend used to be a fixed thing. Designers drew it. Engineers built it. Users got what shipped. That's over. The interfaces shipping in 2026 are drawn partly by the agent itself, in real time,

中文介绍 认为生成式UI正在取代固定前端:2026年的界面将由智能体实时绘制,设计师画图、工程师开发、用户被动接收的时代已终结。

Claude Code + NotebookLM + Obsidian: Research Monster That Gets Smarter Every Time You Use It

@monokern · 1.2K 粉丝 · 263.1K 阅 · 505 赞 · 72 转

Most people treat research as a manual task. You open 10 tabs. You watch videos. You read articles. You take notes somewhere. An hour later you have a pile of information you're not sure what to do

中文介绍 整合Claude Code、NotebookLM与Obsidian打造研究流程:非手动开10个标签页,而是每次使用都让系统更聪明。自动化吸收、整理信息。

Stop building Foxconn factories for your agents

@garrytan · 853.3K 粉丝 · 180.6K 阅 · 503 赞 · 43 转

In January I got back into coding and I built Garry's List. Over five hundred thousand lines of Rails and the tests to police it. I was proud of it. I shouldn't have been. The thing worth being proud

中文介绍 反思自己用Rails写了50万行代码:真正值得骄傲的不应是大量手工代码,批评为智能体建「富士康工厂」式的大规模生产流程。

The Agentic Economy Is Here

@base · 1.3M 粉丝 · 97.3K 阅 · 519 赞 · 74 转

TL;DR: Agents are becoming the internet’s newest paying customers, and the economy serving them is moving fast. On Base, agents already use wallets and stablecoins to pay for inference, live search,

中文介绍 智能体正成为互联网的新付费客户:在Base上,智能体已用钱包和稳定币支付推理、实时搜索等费用,智能体经济已在路上。

🥇Top AI Papers of the Week

@dair_ai · 124.6K 粉丝 · 84.0K 阅 · 504 赞 · 83 转

1. SkillOpt Microsoft Research treats a compact natural-language skill document as the trainable state of a frozen agent, then learns that document through rollouts, reflection, and bounded edits

中文介绍 本周Top AI论文精选。重点介绍微软研究院的SkillOpt:将自然语言技能文档作为冻结智能体的可训练状态,通过rollout、反思和有界编辑学习。

My Agent Stack For Automating My Personal Life

@nicbstme · 23.7K 粉丝 · 84.0K 阅 · 530 赞 · 35 转

My agent manages my emails, SMS, Whatsapp, Telegram and pretty much everything to automate my personal life. People keep asking me how I use agents in real life. I mean the actual boring things that

中文介绍 分享个人生活自动化智能体堆栈:管理邮件、短信、WhatsApp、Telegram等。专注真实而非炫酷的日常自动化场景。

State of Memory in Agent Harness

@mem0ai · 17.6K 粉丝 · 82.8K 阅 · 520 赞 · 60 转

Agent harnesses are where AI software actually runs. Cursor, Devin, Claude Code, Codex: these environments handle context, orchestrate tools, coordinate agents, and increasingly, manage memory. The

中文介绍 分析主流智能体框架(Cursor、Devin、Claude Code、Codex)中记忆管理的现状。这些环境处理上下文、编排工具、协调智能体,记忆日益关键。

Robotics: The Next AI Frontier

@ParadisLabs · 48.9K 粉丝 · 82.0K 阅 · 501 赞 · 60 转

AI's next frontier will be Robotics and Humanoids. The past decade has seen rapid AI adoption in the structured digital world. Those LLM breakthroughs now enable more general-purpose learning and more

中文介绍 机器人是AI下一个前沿。过去十年AI在结构化数字世界快速落地,LLM突破正赋能更通用的学习与更自主的物理世界能力。

A harness for every task: dynamic workflows in Claude Code

@trq212 · 263.1K 粉丝 · 75.7K 阅 · 542 赞 · 36 转

Last week, we released dynamic workflows in Claude Code. Claude can now write its own harness on the fly, custom-built for the task at hand. While the default Claude Code harness is built for coding,

中文介绍 发布Claude Code动态工作流:Claude能即时编写自己的定制框架,不再局限于默认编码场景,每个任务都有专属框架。

A Functional Taxonomy of World Models

@drfeifei · 738.0K 粉丝 · 72.2K 阅 · 699 赞 · 144 转

“The world is everything that is the case.” — Ludwig Wittgenstein, Tractatus Logico-Philosophicus, 1921 The world is not made of words. In an earlier essay, we argued that spatial intelligence is AI’s

中文介绍 提出世界模型的功能分类法。引用维特根斯坦名言论述世界非由词构成,空间智能是AI的下一个重大方向。

How To Fix AI Slop (Using Hermes)

@EXM7777 · 115.1K 粉丝 · 70.1K 阅 · 520 赞 · 47 转

There's a reason some people seem to be constantly shipping the best software, writing incredible content, or generating insane images... They adopted the eval loop, while you... You've tried better

中文介绍 指出消除AI内容「低质化」的关键在于采用评估循环(eval loop),而非仅用更好的提示词。分享通过Hermes框架实现持续迭代优化的方法。

Context as Topology: Why Your Agent's Memory Forgets, and How Structure Escapes It

@elpresidank · 116 粉丝 · 2.9M 阅 · 7d 曝光 2.9M

Context as Topology: Why Your Agent's Memory Forgets, and How Structure Escapes It

中文介绍 证明基于嵌入的智能体记忆存在结构性遗忘——不仅会忘记存储内容,还会自信地编造从未存储过的信息。从拓扑角度解释记忆为何失效。

Reality: The Final Eval — Lukas Petersson and Axel Backlund of Andon Labs

We talk with the VendingBench authors on evaling Claudes from Haiku to Mythos, and how they build leading, and lasting, frontier evals from scratch.

中文介绍 Latent Space 与 Andon Labs 的 Lukas Petersson 和 Axel Backlund 讨论了 VendingBench 项目,该项目评估 Claude 系列模型(从 Haiku 到 Mythos)并提出构建前沿评测的方法。

EVA-Bench Data 2.0: 3 Domains, 121 Tools, 213 Scenarios

中文介绍 EVA-Bench 数据集 2.0 版本发布,覆盖 3 个领域,包含 121 个工具和 213 个场景,用于评估 AI agent 的工具使用能力。

How Endava is redesigning software delivery around AI agents

Learn how Endava is using AI agents, ChatGPT Enterprise, and Codex to accelerate software delivery, automate workflows, and build an AI-native culture across the enterprise.

中文介绍 Endava 利用 AI agent、ChatGPT Enterprise 和 Codex 加速软件交付,自动化工作流,构建企业 AI 原生文化。

How courts are coping with a flood of AI-generated lawsuits

Most days in her chambers, Judge Maritza Braswell, a federal magistrate judge in Colorado, sifts through stacks of documents written by people without a lawyer. Many of them can’t afford to hire a lawyer, and others have cases too weak or too small to interest one. She reads each one carefully, mind

中文介绍 美国法院面临大量由 AI 生成的法律文书,科罗拉多州联邦治安法官 Maritza Braswell 指出,许多当事人无力聘请律师,导致法院审阅负担加重。

Dreaming: Better memory for a more helpful ChatGPT

ChatGPT introduces a new memory system to better remember preferences, keeping context fresh and relevant across conversations.

中文介绍 OpenAI 为 ChatGPT 推出新记忆系统,名为 Dreaming,能够更有效地记住用户偏好,保持对话上下文的新鲜性和相关性。

[AINews] Reve 2 and Ideogram 4: Layouts in Imagegen

a quiet day.

中文介绍 最新 AI 动态:Reve 2 和 Ideogram 4 发布,增强图像生成中的布局控制能力,但当日其他方面较平静。

Biodefense in the Intelligence Age

An action plan for AI-powered biological resilience

中文介绍 OpenAI 发布一份行动方案,阐述如何利用 AI 提升生物防御能力,增强对生物威胁的 resilience。

🔬Scaling Past Informal AI - Carina Hong, Axiom Math

Verified Generation and Compounding Intelligence

中文介绍 Axiom Math 的 Carina Hong 讨论了验证生成和复合智能,旨在推动 AI 从非正式推理向可验证推理扩展。

Introducing new capabilities to GPT-Rosalind

GPT-Rosalind advances life sciences research with enhanced biological reasoning, medicinal chemistry expertise, genomics analysis, and experimental workflow capabilities.

中文介绍 OpenAI 为 GPT-Rosalind 新增功能,提升其在生命科学领域的生物学推理、药物化学、基因组分析和实验流程能力。

Direct Preference Optimization Beyond Chatbots

中文介绍 文章探讨直接偏好优化(DPO)在聊天机器人之外的应用,拓展其在其他 AI 场景中的潜力。

How Wasmer used Codex to build a Node.js runtime for the edge

See how Wasmer used Codex with GPT-5.5 to build a Node.js runtime for the edge, accelerating development 10x to 20x and shipping in weeks instead of months.

中文介绍 Wasmer 使用基于 GPT-5.5 的 Codex 构建边缘端 Node.js 运行时,开发速度提升 10~20 倍,交付周期从数月缩短至数周。

『君の公益』 服务器迁移完成

应该没问题了 muyuan.do 不要在我这里浪费时间,去做你该做的事情,去爱你该爱的人 119 个帖子 - 118 位参与者 阅读完整话题

入行AI漫剧3个月,刚刚完结了一部S+的海外狼人剧,一些心得和大家分享一下

L站的佬友们好,我入行这个AI漫剧这行已经三个月了,基本都是接触的海外项目,然后这个月刚做了海外阅文和抖音合作的海外项目,在审核阶段,就和大家分享一下自己做真人海外的心得 人物站位和人物走位控制 情绪参考 工具使用 画面,镜头的参考 最后,大家做这个真人时候,我建议大家看着提示词过一遍画面,脑子里面有这个画面的详细情况,你能知道这个画面是不是符合你的想法,还有就是,剪辑,配乐真的很重要,有能力的佬友可以去抖音找找suno ai音乐的教程,我这几天都在做这个的广告项目 欢迎佬友来问我问题,但是我上班时间一般不会登陆L站,但是晚上9点钟后到家一般会回复,还有如果是小白要入这行现在,我的建议是,先按

ai图生视频公益站推出基于ltx的数字人模型

l0veyou公益站推出了数字人模型,一次可生成10秒,效果非常逼真,没过多久我会给它支持到生成20秒一个视频,这样更好 在这里选择音频(上传的音频需要大于两秒,否则会生成视频失败),然后点击上传模型,我需要在右下角的加号,点击上传参考图,然后提示词可以这样填:她在说话。 另外提醒一下, Ai生图暂时用不了了。我今天修了一天也没修好,可能至少要明天才能好了。今晚也不知道还有没有得睡,因为明天我还要上架新模型,明天我要上架gemini3.1pro(0.1ldc一句话)和3d模型生成(生成出来的3d模型,它的零件可以直接拆分,而不是一整块模型,效果非常好) 14 个帖子 - 9 位参与者 阅读完整

【GPT订阅额度解析】通过9个GPT Pro 20x账号计算额度

今日早上八点半奥特曼重置了所有账号,当前正在管理9个Pro20x,刚好可以拿来估计一波GPT订阅的额度。 账号: 到截图时间的用量数据: 模型 输入 输出 缓存读取 合计 Token GPT5.5 62,178,157 1,759,623 245,985,792 309,923,572 GPT5.4 284,049,752 12,110,928 1,555,673,600 1,851,834,280 平均缓存命中率: 83.2% 到截图时间的费用数据: GPT5.4:1313 美元 到截图时间9个账号合计使用周限制: 7% + 9% + 8% + 8% + 8% + 7% + 11% + 10

教育部:推动录取通知书回归一页纸

转自新华社 记者从教育部获悉,今年全国高考报名人数为1290万人。 对于高校招生宣传工作,教育部强调,严禁虚假宣传、违规承诺,严禁以任何形式炒作“高考状元”“高分考生”“升学率”。推动高校录取通知书回归“一页纸”,坚决纠治奢华录取通知书、新生礼盒等不良风气。 来源:新华社微信公众号综合人民日报客户端(记者:吴丹) 有佬友知道这是何意味吗? 86 个帖子 - 38 位参与者 阅读完整话题

服务器给我干冒烟了

我合计合计得买个新服务器了 极其的夸张 137 个帖子 - 128 位参与者 阅读完整话题

Ramp报告:为省成本,大量美国公司直接购买中国DeepSeek官方API

美国企业支出管理平台 Ramp 发布的 2026 年 6 月报告显示,中国 AI 公司 DeepSeek 登上了热门软件榜首。尽管美国官方先前高度防范中国大模型,但真实的商业交易数据却揭示了相反的现状。Ramp 分析了平台上 5 万多家企业的信用卡消费记录,发现许多美国公司并未在本地部署开源模型,而是直接掏钱购买 DeepSeek 官方的托管 API 服务。这意味着,大量美国企业的数据正直接发送并存储在位于中国的服务器上。 真实的资金流向与一年多前美国社会对 DeepSeek R1 刚发布时的警惕态度形成了强烈反差。当时出于对泄密和安全的担忧,美国大公司和政府机构普遍限制使用中国模型。然而,面

Ask HN: High school student – is learning programming still worthwhile?

As a high school student, I’m trying to figure out what major I’m interested in. About half a year ago, I thought EECS was a great major for some STEM students like me, because I see many of the world's most influential entrepreneurs, such as Elon Musk and Jensen Huang, have built companies aro

Show HN: Cost.dev (YC W21) – making agents cost-aware and cheaper to call

We launched Infracost on HN five years ago (https://news.ycombinator.com/item?id=26064588) where our CLI generated cost estimates for infra-as-code, e.g. "this Terraform PR adds $400/mo". The idea was to shift cloud costs (FinOps) left, so engineers get visibility of co