每日 AI 简报

2026-06-08(内容获取于 06/08 17:22)

AI Agent 研究多源话题并生成摘要

GitHub Trending

一个 AI Agent 技能,可研究 Reddit、X、YouTube、HN、Polymarket 等平台的任意主题,综合生成有据可依的摘要,适合信息搜集与报告生成。

推荐理由:一个高可行动性的开源 Agent 工具,可直接用于信息聚合与分析,提升研究效率。

ChatGPT 推出“Dreaming”记忆新系统

OpenAI News

OpenAI 为 ChatGPT 引入新记忆系统,可在对话间更好记住用户偏好,保持上下文连续性,提升对话体验。

推荐理由:产品级重要更新,改进核心记忆能力,对用户体验影响显著。

Anthropic 销售团队用 Claude Code 重构工作流

Claude Blog

Anthropic 的一位销售代表分享如何利用 Claude Code 自动化团队工作流,提升 GT(市场)工程效率。

推荐理由:来自一线团队的实践案例,对想用代码 Agent 优化工作的从业者极具参考价值。

17 个 Hermes Agent 提示词让你“睡觉时运行”

X 创作者 (AttentionVC)

分享 17 个可直接复制的提示词,让 Hermes Agent 在无人值守时自动执行任务,涉及内容生成、数据整理等场景。

推荐理由:高可行动性的提示词合集,适合希望自动化日常任务的用户直接使用。

Anthropic 让 Claude 成为“化学家”

Anthropic Research

Anthropic 发布研究,展示 Claude 在化学领域的应用能力,包括分子分析与实验规划,拓展了 AI 在科学研究中的边界。

推荐理由:前沿研究突破,展示 LLM 在专业科学领域的新可能,具有启发意义。

Meta AI 客服漏洞致 Instagram 账户被盗

MIT Tech Review AI

报道称攻击者利用 Meta 的 AI 客服代理,通过简单提问即可将受害者 Instagram 账户链接到攻击者邮箱,实现账户窃取。

推荐理由:揭示 AI Agent 应用中的重大安全漏洞,警示行业需加强防护。

Harness 工程师:2026 年 AI 工程师须知

X 推文 (AttentionVC)

文章引用案例称,2026 年 2 月 OpenAI 一个小团队用 AI Agent 写下 100 万行生产代码,人类只负责设计 Agent 系统。

推荐理由:高度示意性的行业案例,展示 AI 编程的规模化应用前景。

OneDrive 数据即将设置过期日期

Hacker News

微软 OneDrive 宣布将为存储数据设定过期日期,超出期限的数据可能会被自动清理或限制访问,影响大量用户数据管理策略。

推荐理由:直接影响用户数据管理习惯,读者需尽快调整备份策略。

模型能力不稳定成 AI 取代程序员最大障碍

V2EX

V2EX 用户讨论认为,虽然 AI 代理能写代码,但输出质量飘忽不定,难以信任其独立完成生产级任务,成为取代程序员的主要瓶颈。

推荐理由:反映了社区真实痛点,对理解 AI 代码生成落地现状有参考价值。

用版权模板绕过 GPT 道德限制获取爬虫代码

LinuxDo

LinuxDo 用户分享经验:通过修改已获授权模板,成功诱使 GPT 写出本拒绝执行的爬虫代码,引发对 AI 安全边界讨论。

推荐理由:展示 AI 安全机制的可能绕过方式,对开发者安全使用 AI 有警示意义。

mvanhorn/last30days-skill

Python · ★ 32,560 · 🍴 2,687 · 📈 1,111 stars today

AI agent skill that researches any topic across Reddit, X, YouTube, HN, Polymarket, and the web - then synthesizes a grounded summary

中文介绍 为AI智能体添加跨平台调研能力的技能模块,能自动抓取Reddit、X、YouTube、HN、Polymarket及网页上关于任意话题的最新内容,并生成有依据的摘要。适用于需要实时舆情分析、市场调研或内容聚合的场景。

opencv/opencv

C++ · ★ 88,282 · 🍴 56,608 · 📈 65 stars today

Open Source Computer Vision Library

中文介绍 开源计算机视觉库,提供数百种图像处理和机器学习算法,支持C++、Python、Java等语言。广泛应用于人脸识别、物体检测、图像分割、视频分析等场景,是计算机视觉领域的标准工具。

Leonxlnx/taste-skill

Shell · ★ 37,741 · 🍴 2,695 · 📈 1,103 stars today

Taste-Skill - gives your AI good taste. stops the AI from generating boring, generic slop

中文介绍 赋予AI审美品味的技能,通过特定机制阻止模型生成乏味、套话式的内容。适用于写作辅助、创意生成、对话系统等需要高质量输出的场景,提升AI内容的新颖性和吸引力。

NousResearch/hermes-agent

Python · ★ 186,629 · 🍴 32,100 · 📈 1,112 stars today

The agent that grows with you

中文介绍 一个与用户共同成长的AI智能体,核心设计强调持续学习和适应。通过长期交互,逐步匹配用户偏好与需求,适用于个性化助手、长期陪伴或定制化任务执行的场景。

lfnovo/open-notebook

TypeScript · ★ 27,657 · 🍴 3,129 · 📈 554 stars today

An Open Source implementation of Notebook LM with more flexibility and features

中文介绍 Notebook LM的开源替代品,提供更灵活的功能扩展,支持多文档交互、知识整合与智能问答。适合研究人员、学生和知识工作者用于整理资料、生成摘要或构建个人知识库。

yikart/AiToEarn

TypeScript · ★ 19,181 · 🍴 2,949 · 📈 183 stars today

Let's use AI to Earn!

中文介绍 利用AI技术实现变现的项目,可能涉及自动化创作、数据标注、内容生成等方向。目标用户是希望借助AI工具获取收入的个人或小团队。

aaif-goose/goose

Rust · ★ 47,769 · 🍴 5,033 · 📈 322 stars today

an open source, extensible AI agent that goes beyond code suggestions - install, execute, edit, and test with any LLM

中文介绍 开源可扩展AI智能体,超越代码补全功能,支持安装、执行、编辑和测试任务,并可接入任意大语言模型。面向开发者,用于自动化工作流、环境管理和软件测试。

Crosstalk-Solutions/project-nomad

TypeScript · ★ 29,913 · 🍴 2,963 · 📈 309 stars today

Project N.O.M.A.D, is a self-contained, offline survival computer packed with critical tools, knowledge, and AI to keep you informed and empowered—anytime, anywhere.

中文介绍 自包含的离线生存计算机,集成关键工具、知识和AI功能,适应无网络环境。适用于野外探险、灾难应急或偏远地区工作,提供信息访问和决策支持。

ggml-org/llama.cpp

C++ · ★ 115,515 · 🍴 19,331 · 📈 158 stars today

LLM inference in C/C++

中文介绍 用C/C++实现的高性能LLM推理引擎,支持在CPU和混合架构上运行Llama等模型。适合本地部署、资源受限设备或需要低延迟推理的场景,是个人开发者和边缘计算的热门选择。

RyanCodrai/turbovec

Python · ★ 7,736 · 🍴 738 · 📈 1,554 stars today

A vector index built on TurboQuant, written in Rust with Python bindings

中文介绍 基于TurboQuant技术的向量索引库,使用Rust编写并提供Python绑定。专注于高效向量检索与存储,适用于推荐系统、语义搜索或大模型知识库的近似最近邻搜索。

TapXWorld/ChinaTextbook

Roff · ★ 72,780 · 🍴 16,297 · 📈 350 stars today

所有小初高、大学PDF教材。

中文介绍 收集中国小学至大学全学科的PDF教材资源,涵盖各年级与版本。服务于学生、教师或自学者,方便电子化查阅和离线学习。

openai/plugins

JavaScript · ★ 2,157 · 🍴 277 · 📈 262 stars today

OpenAI Plugins

中文介绍 OpenAI官方插件集合,用于扩展ChatGPT等模型的功能,如联网搜索、代码执行等。开发者可通过插件将外部API和工具无缝集成到AI对话中。

refactoringhq/tolaria

TypeScript · ★ 13,130 · 🍴 924 · 📈 245 stars today

Desktop app to manage markdown knowledge bases

中文介绍 桌面端Markdown知识库管理应用,提供笔记组织、搜索和编辑功能。面向注重本地优先、纯文本格式的知识工作者或开发者,用于个人知识管理或写作。

HunxByts/GhostTrack

Python · ★ 13,886 · 🍴 1,850 · 📈 28 stars today

Useful tool to track location or mobile number

中文介绍 手机号或位置追踪工具,利用公开信息或技术手段获取定位。常用于安全测试或个人查找,需注意隐私与法律合规性。

microsoft/pg_durable

Rust · ★ 1,591 · 🍴 37 · 📈 316 stars today

PostgreSQL in-database durable execution

中文介绍 PostgreSQL的数据库内持久化执行引擎,允许用户定义的任务可靠执行并自动恢复。适用于需要事务性工作流、定时任务或数据管道处理的场景。

AnchorWorld: Embodied Egocentric World Simulation with View-based Evolution Customization

👍 22

Despite being a pivotal frontier, interactive world modeling remains underexplored in terms of the versatile controllability required by practical scenarios. To bridge this gap, we present AnchorWorld, a framework that advances egocentric simulation through enhanced interaction integrity and a flexi

中文介绍 AnchorWorld 提出一个框架,通过基于视角演化的定制,推进具身第一人称世界模拟,提升交互式世界建模的灵活性。

MMAE: A Massive Multitask Audio Editing Benchmark

👍 33

We introduce MMAE, a Massive Multitask Audio Editing benchmark, serving as the first comprehensive evaluation testbed designed for general-purpose instruction-based audio editing. Spurred by the shift toward intelligent creation, interactive editing has rapidly expanded from visual domains, pioneere

中文介绍 MMAE 是一个大规模多任务音频编辑基准,首个专为通用指令式音频编辑设计的综合评估平台。

Stream3D-VLM: Online 3D Spatial Understanding with Incremental Geometry Priors

👍 2

Despite advances in 3D scene understanding, existing 3D Large Multimodal Models operate in offline settings, requiring complete scene observations or predefined video clips. In this paper, we present an online 3D vision-language model that enables real-time spatial understanding from streaming video

中文介绍 Stream3D-VLM 提出在线3D视觉语言模型,通过增量几何先验实现实时空间理解,不同于传统需要完整场景的离线方法。

Your UnEmbedding Matrix is Secretly a Feature Lens for Text Embeddings

👍 50

Large language models exhibit impressive zero-shot capabilities across a wide range of downstream tasks. However, they struggle to function as off-the-shelf embedding models, leading to suboptimal performance on massive text embedding benchmarks. In this paper, we identify a potential cause underlyi

中文介绍 研究发现,大语言模型的解嵌入矩阵可视为文本嵌入的特征透镜,有助于提升其作为嵌入模型的零样本性能。

Watch, Remember, Reason: Human-View Video Understanding with MLLMs

👍 12

Video understanding is being rapidly transformed by multimodal large language models (MLLMs), as research moves from short clips to long, multimodal, and knowledge-intensive video scenarios. These scenarios require models to handle sparse evidence, long-range dependencies, multimodal alignment, and

中文介绍 论文探讨多模态大语言模型在人类视角视频理解中的能力,处理长视频中的稀疏证据和长程依赖关系。

Socratic-SWE: Self-Evolving Coding Agents via Trace-Derived Agent Skills

👍 1

LLM-driven software engineering agents have become a central testbed for real-world language-model capability, yet their training remains limited by the availability of high-quality SWE tasks. Existing synthetic data methods typically create tasks through fixed mutation or bug-injection procedures,

中文介绍 Socratic-SWE 提出自演化编码智能体,通过轨迹派生技能生成高质量软件工程任务,突破合成数据瓶颈。

dots.tts Technical Report

👍 10

We present dots.tts, a 2B-parameter continuous autoregressive text-to-speech (TTS) foundation model that models speech in a continuous latent space. Compared with existing continuous autoregressive models, our key innovations are threefold. First, we train an AudioVAE with multiple objectives to bui

中文介绍 dots.tts 发布20亿参数连续自回归文本转语音基础模型,在连续潜在空间中建模语音,创新点包括多粒 AudioVAE 训练。

SoCRATES: Towards Reliable Automated Evaluation of Proactive LLM Mediation across Domains and Socio-cognitive Variations

👍 30

Evaluating LLM mediators remains challenging, as mediation unfolds as a real-time trajectory shaped by disputants' shifting emotions, intentions, and context. Existing testbeds rely on a few expert-authored domains, vary mainly strategic posture, and score every turn against every topic, introducing

中文介绍 SoCRATES 提出跨领域与社会认知变化的主动式大模型调解自动评估方法,克服传统测试场依赖专家域的局限。

LLM Explainability with Counterfactual Chains and Causal Graphs

👍 8

Causal graphs provide a high-level language for making mechanisms transparent. Recent work uses Large Language Models (LLMs) to recover causal graphs of external-world processes. Instead, in this paper, we use causal graphs to model LLM inference itself, providing stakeholders with a transparent vie

中文介绍 论文使用因果图建模大语言模型自身推理过程,结合反事实链提供可解释性,帮助利益相关者理解机制。

Thinking with Imagination: Agentic Visual Spatial Reasoning with World Simulators

👍 10

While Vision-Language Models (VLMs) have shown strong visual reasoning capabilities, their spatial reasoning abilities remain largely constrained to the observed images and text-oriented chain-of-thought. They often struggle to infer unobserved layouts, maintain cross-view consistency, and reason fr

中文介绍 Thinking with Imagination 利用世界模拟器增强视觉语言模型的空间推理,使其能够推断未观察布局并保持跨视图一致性。

SubtleMemory: A Benchmark for Fine-Grained Relational Memory Discrimination in Long-Horizon AI Agents

👍 15

Persistent AI assistants, such as OpenClaw, accumulate large collections of related memories over long-term interactions. As these memories grow, they may reinforce one another, diverge across contexts, or directly conflict, making correct assistance depend on memory relations rather than isolated r

中文介绍 SubtleMemory 基准测试长期AI智能体对细粒度关系记忆的辨别能力,应对记忆增长中的冲突与分化。

When Tools Fail: Benchmarking Dynamic Replanning and Anomaly Recovery in LLM Agents

👍 14

Existing benchmarks evaluate Tool-Integrated Reasoning (TIR) in LLMs on idealized ''happy paths'', largely overlooking real-world tool failures. We introduce ToolMaze, a benchmark for dynamic path discovery and error recovery in TIR agents. To separate systematic replanning from blind trial-and-erro

中文介绍 ToolMaze 基准测试大语言模型在工具失败时的动态重规划和异常恢复能力,区别于仅评估理想路径的现有基准。

WorldBench: A Challenging and Visually Diverse Multimodal Reasoning Benchmark

👍 0

In real-world applications, models are expected to perform reliably across diverse settings. Yet, many existing multimodal benchmarks expand task types without capturing the visual diversity needed to handle open-ended visual inputs. We present WorldBench, a challenging and visually diverse reasonin

OpenSkill: Open-World Self-Evolution for LLM Agents

👍 6

Self-evolving agents requires adaptation after deployment, but existing approaches assume a usable learning loop, such as curated skills, successful trajectories, or verifier signals. Real open-world deployments may provide none of these, offering only a task prompt. In this work, we study open-worl

Regret Minimization with Adaptive Opponents in Repeated Games

👍 1

In this paper, we study regret minimization in repeated games with adaptive opponents who can respond based on histories of play. The standard metric of external regret in online learning is known to fail to capture such adaptivity. To account for players' counterfactual reasoning, we introduce {\tt

AffordanceVLA: A Vision-Language-Action Model Empowering Action Generation through Affordance-Aware Understanding

👍 10

Vision-Language-Action (VLA) models leverage the rich world knowledge of pretrained vision-language models (VLMs) to enable instruction-following robotic manipulation. However, the structural mismatch between VLM semantic spaces and embodied control policies often hinders the learning of precise per

Code2LoRA: Hypernetwork-Generated Adapters for Code Language Models under Software Evolution

👍 73

Code language models need repository-level context to resolve imports, APIs, and project conventions. Existing methods inject this knowledge as long inputs (retrieved through RAG or dependency analysis) or through per-repository fine-tuning and LoRA -- costly at repository scale and brittle to evolv

AURA: Intent-Directed Probing for Implicit-Need Surfacing in Situated LLM Agents

👍 1

A situated query like "where is Lin Wei?" often encodes more than its literal content: the user may also want to know whether Lin Wei is free, in a good mood, or worth interrupting now. Standard tool-use agents answer the literal question and stop. AURA inserts an inference step between scene percep

Benchmark Everything Everywhere All at Once

👍 2

Benchmarks are fundamental for evaluating and advancing LLMs and MLLMs by providing standardized and explicit measures of performance. However, their construction is labor-intensive and hard to reuse, raising concerns about sustainability and scalability. Moreover, existing benchmarks often quickly

MAOAM: Unified Object and Material Selection with Vision-Language Models

👍 9

Selection is a core operation in interactive image editing. To be practical, a user should be able to specify and disambiguate the desired selection region through either text or click-based interactions, and the system should support selecting not only objects but also other criteria, such as mater

HarnessForge: Joint Harness and Policy Evolution for Adaptive Agent Systems

👍 1

LLM agents are increasingly expected to operate across heterogeneous task regimes that require distinct execution paradigms. This challenges fixed agent systems and motivates system-level meta-adaptation beyond isolated component updates. While existing works have adapted external harness or trained

Parametric Social Identity Injection and Diversification in Public Opinion Simulation

👍 0

Large language models (LLMs) have recently been adopted as synthetic agents for public opinion simulation, offering a promising alternative to costly and slow human surveys. Despite their scalability, current LLM-based simulation methods fail to capture social diversity, producing flattened inter-gr

LLM Anonymization Against Agentic Re-Identification

👍 1

Agentic LLMs with web search change the threat model for text anonymization: weak contextual cues can become cross-referenceable evidence for re-identification, yet those same details also carry downstream analytic value of the text. Existing defenses either remove explicit identifiers, perturb text

SABER: Benchmarking Operational Safety of LLM Coding Agents in Stateful Project Workspaces

👍 0

Large language models are increasingly deployed as coding agents, shifting safety from individual responses to action sequences. Existing benchmarks, however, primarily assess whether models refuse unsafe prompts, leaving impacts on stateful workspaces largely unexamined. We present SABER, a benchma

The Shape of Addition: Geometric Structures of Arithmetic in Large Language Models

👍 4

Large Language Models exhibit paradoxical fragility in fundamental arithmetic, implying a disconnect between internal computation and discrete output. By analyzing the residual stream geometry during multi-operand addition, we identify the Iso-Raw-Sum Trajectory (IRST), a geometric structure where r

SIA: Self Improving AI with Harness & Weight Updates

👍 1

Humans are the bottleneck in building and improving AI. Both the models and the agents that wrap them are written, tuned, and corrected by people. The long-horizon goal of an AI that can figure out how to improve itself remains open. Two largely disjoint research lines attack this bottleneck. The ha

Trust Region Q Adjoint Matching

👍 3

Off-policy reinforcement learning of pretrained flow policies remains challenging due to the instability of optimization arising from the multi-step sampling process. Recently, Q-learning with Adjoint Matching (QAM) addressed this issue by reformulating into a memoryless stochastic optimal control (

When Gradients Collide: Failure Modes of Multi-Objective Prompt Optimization for LLM Judges

👍 3

Customizing an LLM judge to a specific task or domain often involves optimizing its prompt across multiple evaluation criteria simultaneously. Textual gradient methods automate this for a single judge criterion, however they produce natural-language critiques, not numerical vectors. Thus, the confli

Context as Topology: Why Your Agent's Memory Forgets, and How Structure Escapes It

@elpresidank · 116 粉丝 · 2.9M 阅 · 543 赞 · 35 转

Most AI agent memory is built on embeddings. And there's now a proof that this entire class of system is going to forget what you stored in it — and confidently make up things you never stored at all.

中文介绍 证明基于嵌入的 AI 智能体记忆系统存在根本缺陷:会遗忘存储内容,且会自信地捏造从未存储的信息。作者从拓扑学角度分析了这一问题的必然性。

How To Become An AI Engineer in 2026 (Without a CS Degree)

@sairahul1 · 111.8K 粉丝 · 710.8K 阅 · 509 赞 · 97 转

How To Become An AI Engineer in 2026. Without a CS degree. Without a bootcamp. Without knowing what a transformer is today. Here's what nobody tells you: The companies hiring right now don't need

中文介绍 2026 年成为 AI 工程师的路线图,无需 CS 学位或训练营。企业实际招聘的是能解决特定问题的选手,而不需要懂 Transformer 原理。

How to master Dynamic Workflows in Claude Code: 6 patterns and 14 steps Anthropic engineers actually

@0xCodez · 3.3K 粉丝 · 637.2K 阅 · 510 赞 · 59 转

Most Claude Code users still write their workflows by hand. They chain prompts, copy outputs, paste them into the next prompt, fix what went wrong, repeat. 9 out of 10 builders haven’t tried Dynamic

中文介绍 揭秘 Claude Code 的动态工作流:6 种模式、14 个步骤,由 Anthropic 工程师实践。指出 9 成开发者仍手动链式提示,未尝试动态工作流。

What an Enterprise Context Layer Actually Is

@prukalpa · 23.1K 粉丝 · 583.2K 阅 · 506 赞 · 80 转

A field guide to what it is, what it is not, and where it fits in your AI architecture. I have had some version of the same conversation with a CIO almost every day this year. Their team has read

中文介绍 企业上下文层(Enterprise Context Layer)的实战指南:它是什么、不是什么,以及在 AI 架构中的位置。作者与 CIO 多次讨论此话题。

Harness Engineering: What Every AI Engineer Needs to Know in 2026

@sairahul1 · 111.8K 粉丝 · 546.4K 阅 · 536 赞 · 94 转

In February 2026, a small OpenAI team shipped 1 million lines of production code. They didn't write a single line by hand. The AI agents wrote it. The humans designed the system that made the agents

中文介绍 Harness Engineering 是 2026 年 AI 工程师必备技能:举例 OpenAI 团队靠 AI 代理写出百万行代码,人类只负责设计系统。

hacking pewdiepie's AI agent harness using an evil cocomelon website (then helping protect it)

@theonejvo · 22.1K 粉丝 · 504.3K 阅 · 861 赞 · 1 转

Over the past year, @pewdiepie, has been turning into one of the most visible champions of private, self-hosted computing, and it has been a genuine pleasure to watch. What began in late 2025 as an

中文介绍 利用邪恶儿童网站 hack 了 PewDiePie 的 AI 代理系统,并协助加固防护。PewDiePie 是私有自托管计算的知名倡导者。

Generative UI Is the New Frontend

@Saboo_Shubham_ · 116.2K 粉丝 · 263.3K 阅 · 517 赞 · 74 转

The frontend used to be a fixed thing. Designers drew it. Engineers built it. Users got what shipped. That's over. The interfaces shipping in 2026 are drawn partly by the agent itself, in real time,

中文介绍 2026 年的前端界面由代理实时生成,不再固定。设计师画、工程师建的旧模式结束,生成式 UI 成为新范式。

How to get 100k YouTube subscribers in 3 hours (The Complete Guide)

@maubaron · 16.9K 粉丝 · 233.8K 阅 · 506 赞 · 19 转

Our YouTube channel has 125k subscribers and we've never made or uploaded a single video ourselves. This is a completely automated system. It is this very same strategy that made us the first app

中文介绍 自动化 YouTube 频道达到 125k 订阅,零人工视频制作。作者将此策略用于成为首个全自动应用。

Stop building Foxconn factories for your agents

@garrytan · 853.3K 粉丝 · 180.6K 阅 · 503 赞 · 43 转

In January I got back into coding and I built Garry's List. Over five hundred thousand lines of Rails and the tests to police it. I was proud of it. I shouldn't have been. The thing worth being proud

中文介绍 作者亲手构建 50 万行 Rails 代码后反思应‘停止为代理搭建富士康工厂’。强调核心是让代理自主运行,而非重复造轮子。

Building cloud agent infrastructure: what's different, and what we learned

@intuitiveml · 6.4K 粉丝 · 171.3K 阅 · 524 赞 · 70 转

Most agent frameworks today assume a desktop. One user, one machine, one process. The agent runs while the laptop is open, writes to a local filesystem, holds API keys in environment variables, and

中文介绍 构建云原生代理基础设施的经验:现有框架假定本地桌面环境(单机单用户),而云环境需要不同的架构设计。

A guide to /goal 🥅

@dkundel · 19.3K 粉丝 · 116.9K 阅 · 523 赞 · 40 转

We launched the goal mode (or /goal) as a way to help you have Codex drive towards a concrete outcome. When you set a goal Codex will continue to work until the goal is achieved, whether that takes

中文介绍 Codex 的/goal 模式指南:设定目标后,Codex 会持续工作直至达成,无论耗时长短。适合驱动具体成果的场景。

State of Memory in Agent Harness

@mem0ai · 17.6K 粉丝 · 82.8K 阅 · 520 赞 · 60 转

Agent harnesses are where AI software actually runs. Cursor, Devin, Claude Code, Codex: these environments handle context, orchestrate tools, coordinate agents, and increasingly, manage memory. The

中文介绍 调研 Cursor、Devin、Claude Code、Codex 等代理系统的内存管理现状。这些环境正越来越多地集成上下文与记忆功能。

A harness for every task: dynamic workflows in Claude Code

@trq212 · 263.1K 粉丝 · 75.7K 阅 · 542 赞 · 36 转

Last week, we released dynamic workflows in Claude Code. Claude can now write its own harness on the fly, custom-built for the task at hand. While the default Claude Code harness is built for coding,

中文介绍 Claude Code 动态工作流发布:Claude 能自行编写即时工作任务定制的 harness。默认 harness 专注于编程,新特性扩展了适用场景。

A Functional Taxonomy of World Models

@drfeifei · 738.0K 粉丝 · 72.2K 阅 · 699 赞 · 144 转

“The world is everything that is the case.” — Ludwig Wittgenstein, Tractatus Logico-Philosophicus, 1921 The world is not made of words. In an earlier essay, we argued that spatial intelligence is AI’s

中文介绍 提出世界模型的功能分类法。引用维特根斯坦开篇,强调空间智能是当前 AI 的缺口,并构建了系统性分类框架。

How to Build a Custom Agent Harness

@sydneyrunkle · 7.5K 粉丝 · 69.5K 阅 · 511 赞 · 74 转

Building useful agents is largely about customization: connecting your agent to the right context, data, and environment(s) for the task at hand. At its core, an agent is a model calling tools in a

中文介绍 构建自定义代理 harness 的教程:核心是连接模型到正确上下文、数据与环境。代理本质是调用工具的模型。

some notes on getting into frontier ai labs

@itsreallyvivek · 3.6K 粉丝 · 65.8K 阅 · 521 赞 · 28 转

A few days ago I wrote that getting into a frontier AI lab mostly comes down to two things: proven research and trench engineering. The more I think about it, the less these feel like separate skills.

中文介绍 进入前沿 AI 实验室的笔记:关键在于‘已证实的科研能力’和‘战壕工程能力’,两者正融合为一。

Harness Engineering: What Every AI Engineer Needs to Know in 2026

@sairahul1 · 111.8K 粉丝 · 546.4K 阅 · 7d 曝光 546.4K

Harness Engineering: What Every AI Engineer Needs to Know in 2026

中文介绍 Harness Engineering 是 2026 年 AI 工程师必备技能:举例 OpenAI 团队靠 AI 代理写出百万行代码,人类只负责设计系统。

How to get 100k YouTube subscribers in 3 hours (The Complete Guide)

@maubaron · 16.9K 粉丝 · 233.8K 阅 · 7d 曝光 233.8K

How to get 100k YouTube subscribers in 3 hours (The Complete Guide)

中文介绍 自动化 YouTube 频道达到 125k 订阅,零人工视频制作。作者将此策略用于成为首个全自动应用。

Amazing Digital Dentures (a failed project)

中文介绍 Hugging Face博客发布了一个名为“Amazing Digital Dentures”的项目回顾,该项目最终以失败告终。

[AINews] not much happened today

a quiet day of RSI.

中文介绍 据Latent Space的AINews报道,当天AI领域较为平静,仅有RSI相关动态。

How to Stop Shipping Low-Quality RL Environments (with Examples)

Your broken harness is actively making the model worse. Here's what I keep seeing after years of eyeballing trajectories, and what you need to fix.

中文介绍 Latent Space文章探讨如何避免发布低质量的强化学习环境,指出有问题的环境会损害模型性能,并提供改进建议。

The Meta hack shows there’s more to AI security than Mythos

On June 5, 404 Media reported that attackers had been using Meta’s AI customer support agent to steal Instagram accounts. Their approach was simple: They asked the agent to link the accounts to email addresses that they controlled, and the agent complied. One attacker broke into the dormant Obama Wh

中文介绍 MIT Tech Review报道,攻击者利用Meta的AI客服代理窃取Instagram账户,通过要求代理将账户链接到攻击者控制的邮箱,代理竟然执行了该操作。

not much happened today

**Anthropic's Mythos/Opus cycle** sparked mixed reactions with praise for **Claude Mythos**'s one-shot workflows and concerns over **Opus 4.8** benchmark regressions. **Opus 4.7** showed strong chemistry task performance, "making Claude a chemist." **Sakana AI** launched an **RSI Lab** focusing on r

中文介绍 Smol AI News报道,Anthropic的Mythos/Opus周期引发热议,Claude Mythos在一站式工作流中获赞,但Opus 4.8基准性能出现倒退;Opus 4.7在化学任务中表现强劲,Sakana AI推出RSI。

The Claude Cowork product guide

The Claude Cowork product guide

中文介绍 Claude博客发布了Claude Cowork产品指南。

Jun 5, 2026ScienceMaking Claude a chemist

Jun 5, 2026ScienceMaking Claude a chemist

中文介绍 Anthropic研究团队发布报告,展示如何让Claude具备化学家能力。

Reality: The Final Eval — Lukas Petersson and Axel Backlund of Andon Labs

We talk with the VendingBench authors on evaling Claudes from Haiku to Mythos, and how they build leading, and lasting, frontier evals from scratch.

中文介绍 Latent Space与Andon Labs的Lukas Petersson和Axel Backlund讨论VendingBench评估,涵盖从Haiku到Mythos的Claude模型评估及前沿评估构建方法。

How Endava is redesigning software delivery around AI agents

Learn how Endava is using AI agents, ChatGPT Enterprise, and Codex to accelerate software delivery, automate workflows, and build an AI-native culture across the enterprise.

中文介绍 OpenAI报道,Endava利用AI代理、ChatGPT Enterprise和Codex加速软件交付、自动化工作流,并构建AI原生文化。

How courts are coping with a flood of AI-generated lawsuits

Most days in her chambers, Judge Maritza Braswell, a federal magistrate judge in Colorado, sifts through stacks of documents written by people without a lawyer. Many of them can’t afford to hire a lawyer, and others have cases too weak or too small to interest one. She reads each one carefully, mind

中文介绍 MIT Tech Review报道,联邦治安法官Maritza Braswell等人发现,法院正面临大量由AI生成的诉讼文件,这些文件多由无力请律师的当事人提交。

已恢复,莫辜负,签到升星站

快去看看有没有更加流畅更加舒适 但是,之前数据库是sqlite的,现在改成了pg,所以,咳咳,有部分数据丢失了,不知道二星要升三星的用户,时间会不会重置23333 特别感谢: @ouyangqiqi 107 个帖子 - 102 位参与者 阅读完整话题

终于还是进来了,感谢5年GitHub账号直接进

每天摸鱼都会刷L站,之前写了十几次小作文(试过不同风格,我甚至都假装大学生了),每次都是真情流露,手写无AI润色,一度怀疑账号是不是被拉黑了,最终用过四个邮箱,都还是不过,本来都放弃了,结果也是赶上了这波好福利,立马注册进来 114 个帖子 - 103 位参与者 阅读完整话题

[西瓜狗公益] 已结束兑换

本帖使用社区公益推广,符合推广要求。我申明并遵循社区要求的以下内容: 我的项目是免费使用的,无收费(变相收费、赞助)部分: 是 我的帖子已经打上 #公益推广 标签: 是 我的项目属于个人项目,与公司或商业机构无关: 是 我的项目不存在 QQ、TG 等群组引流: 是 我的项目不存在非运营必要的网站引流: 是 我的项目不存在为他人推广、AFF: 是 我的项目无关联的商业项目: 是 我的站点存在登录,并已接入 LINUX DO Connect: 是 我帖子内的项目介绍,AI 生成、润色内容部分已截图发出: 是 以上选择我承诺是永久有效的,接受社区和佬友监督: 是 彻底疯狂 有人说我这个公益 等这波风

【甘草铺公益站】取消 vip 订阅通知

各位佬,因为我个人精力的原因无法保证 sla,也有佬友评论吐槽过这一点,所以暂时取消 vip 订阅,另外说明一下,vip 订阅只是用来提高 rpm的,并不是充值,无法保证 sla 自然要取消,但是已经订阅的不受影响哈,其实各位正常使用根本就不用订阅 vip,普通分组的 rpm 限制为 10 应该都已经够用啦。 https://shop.aini8.com 活动的原因并不是因为 bug team 哈,已经进展好几天了,原因是我们要进行架构升级,升级之前就一直有这个活动,各位佬看清楚小店的商品说明再购买~ 44 个帖子 - 33 位参与者 阅读完整话题

孩子基本确诊自闭症,中年危机爆发,生活没戏了。求佬们指点如何能挂到114的专家号,黄牛2800一个号。

哎,感觉天都塌了,现在36岁,职业上刚爆发中年危机,随时有失业风险。我家孩子1岁11个月。疑似重度自闭症。 典型症状:应名差,不找人。不和小朋友玩,蝴蝶手+刻板语言+情绪随时失控+转圈,听不懂指令,大运动、精细运动都不会。什么玩具都不会玩,只会咬,碰几下一扔,比我家小的孩子啥都会了,和同龄的一比感觉完全是降为打击。媳妇也是年初3月份被裁员。去年还欣欣向荣的家,转眼就支离破碎了。想带孩子看看,网上都推建北京大学第六医院 郭延庆,挂了1周没抢到号,某鱼上2800一个号,实在承受不起,佬们有没有什么好的挂号办法,求分享,感激不尽 编辑一下,感谢大家的安慰和帮助。我现在已经闲鱼2800待抢了,但是不保

缘分真是妙不可言

每天都要来L站逛逛,今天终于升3级了 在这一个多月里真是学到了不少东西,十分敬佩各位佬们无私的分享自己的一些经验与技术,以及感谢佬友们对各种问题的答疑 33 个帖子 - 31 位参与者 阅读完整话题

第一次因为vibe coding这么高兴,很想和佬友们分享我的喜悦(大晚上很激动写的很长,如果吵到您的眼睛了我很抱歉QAQ)

楼主不是计算机相关背景的,大学本科学的是艺术管理(这是个艺术专业),但是毕业之后还是阴差阳错的来到了一家AI公司工作。楼主在大学还没毕业的时候非常喜欢AIRP,当时25年上旬吧,就非常想给自己手机qq上能随时整一个可以陪我聊天的bot,结果在b站刷到了一个astrbot的视频,命运的齿轮就此开始转动…… 在这之前我想讲讲我这两年的经历,非常的平淡,也没什么人听我说,但我真的太高兴了这个晚上,我必须要写出来点什么。当初我大三大四的时候还是文心一言的时代(2024),国内AI都特别笨呢,楼主那会不喜欢这个专业的对口工作,当时也不太关注AI,但是有一个AI相关的工作找上我来了,那我必须得去啊。是这样

Show HN: Lathe – Use LLMs to learn a new domain, not skip past it

Hey HN!Lathe is an experiment in using LLMs to teach me something new, instead of doing the work for me. It generates a hands-on, source-backed tutorial for any technical topic you want to learn. Then you work through it yourself by reading and typing the code by hand (gasp) in a local UI built for

Show HN: I Derived a Pancake

After 25 years of making other people's pancake recipes - always yearning for more tang, more fluff, and more predictability - I decided to derive the pancake recipe from the chemistry.You mark checkboxes for what you have on hand (ricotta, sour cream, kefir, buttermilk, yogurt, cottage cheese,

01

模型发布/更新

Model Releases 44 篇

NVIDIA 发布 Nemotron 3.5 内容安全模型

官方Hugging Face Blog

NVIDIA 发布 Nemotron 3.5 内容安全模型,提供可定制的多模态安全功能,适用于全球企业 AI 部署。该模型旨在帮助企业在推理层面过滤有害内容,支持文本、图像等多模态输入,满足不同地区的合规与内容审核需求。

安全多模态

dots.tts 开源 20 亿参数语音基础模型

官方HuggingFace Trending Papers

dots.tts 发布技术报告,推出 20 亿参数的连续自回归文本转语音基础模型。该模型在连续潜在空间中建模语音,创新点包括多粒度 AudioVAE 训练方法,为高质量语音合成提供了新的基础架构。

文本转语音基础模型

Hermes Agent 超级应用与 DeepSeek v4 接近 Opus 4.8

大咖博客Riley Brown (YouTube)

Hermes Agent 发布新型超级应用,同时 DeepSeek v4 的性能已接近 OpenAI 的 Opus 4.8 模型。这一进展显示开源模型在推理与任务完成能力上正在快速追赶闭源标杆。

大模型开源

Claude Opus 4.8 在诚实性上显著改进

大咖博客Two Minute Papers

两分钟论文报道,Claude Opus 4.8 在诚实性方面取得重大改进,不再像前代模型那样易于生成虚假信息。这标志着大语言模型在事实性与可信度方面迈出了重要一步。

大模型模型更新
02

产品发布/更新

Product 44 篇

Claude Code 发布动态工作流功能

X·KOLX 推文 (AttentionVC)

Claude Code 发布动态工作流新特性,Claude 能自行编写即时定制的工作任务 harness。默认 harness 专注于编程,新特性将适用场景扩展至其他任务,9 成开发者此前仍在使用手动链式提示。

产品发布工作流

ChatGPT 与 Codex 合并,将改变一切

大咖博客Riley Brown (YouTube)

Riley Brown 报道,ChatGPT 与 Codex 正在进行合并,这将彻底改变 AI 编程与对话的交互方式。合并后的产品有望将对话式 AI 的便利性与代码执行能力深度融合。

大模型产品更新

Claude 发布 Cowork 产品指南

官方Claude Blog

Claude 博客发布《The Claude Cowork Product Guide》,详细介绍了 Claude Cowork 作为协同 AI 工具的使用方法、功能亮点及应用场景,旨在帮助团队更高效地与 AI 进行协作。

产品指南

开源 Notebook LM 替代品开放灵活扩展

开源项目GitHub Trending

开源项目 open-notebook 作为 Notebook LM 的替代品,提供更灵活的功能扩展,支持多文档交互、知识整合与智能问答。适合研究人员、学生和知识工作者整理资料、生成摘要或构建个人知识库。

知识管理开源替代AI笔记
03

行业动态

Industry 44 篇

Endava 用 AI 代理重新设计软件交付

官方OpenAI News

OpenAI 报道,Endava 利用 AI 代理、ChatGPT Enterprise 和 Codex 加速软件交付、自动化工作流,并构建 AI 原生文化。这一案例展示了企业如何系统性地用 AI 改造软件工程流程。

AI代理企业应用

Meta 客服 AI 被利用窃取 Instagram 账户

综合资讯MIT Tech Review AI

MIT Tech Review 报道,攻击者利用 Meta 的 AI 客服代理窃取 Instagram 账户。攻击者要求代理将账户链接到其控制的邮箱,代理竟执行了该操作。这表明 AI 客服系统存在严重安全漏洞。

安全AI漏洞

法院面临 AI 生成诉讼文件潮

综合资讯MIT Tech Review AI

MIT Tech Review 报道,联邦治安法官发现法院正面临大量由 AI 生成的诉讼文件,这些文件多由无力请律师的当事人提交。AI 降低了法律文书制作门槛,但也带来了质量与真实性的挑战。

法律AI滥用

Anthropic Mythos 工作流获赞,Opus 4.8 基准倒退

综合资讯Smol AI News

Smol AI News 报道,Anthropic 的 Mythos/Opus 周期引发热议。Claude Mythos 在一站式工作流中获得好评,但 Opus 4.8 基准性能出现倒退。Opus 4.7 在化学任务中表现强劲,Sakana AI 推出 RSI。

大模型基准测试
04

技巧与观点

Tips & Takes 44 篇

2026 年如何无 CS 学位成为 AI 工程师

X·KOLX 推文 (AttentionVC)

X 上 sai 分享 2026 年成为 AI 工程师的路线图,强调无需 CS 学位或训练营。企业实际招聘的是能解决特定问题的人,而非懂 Transformer 原理的理论家。关键在于构建可落地的项目经验。

教程职业建议

Claude Code 动态工作流 6 种模式 14 步详解

X·KOLX 推文 (AttentionVC)

X 上 0xCodez 揭秘 Claude Code 的动态工作流:6 种模式、14 个步骤,由 Anthropic 工程师实践总结。指出 9 成开发者仍使用手动链式提示,未尝试动态工作流,建议尽早迁移以提升效率。

工作流教程

基于嵌入的 AI 记忆系统有根本缺陷

X·KOLX 推文 (AttentionVC)

AttentionVC 通过 X 推文指出,当前基于嵌入的 AI 智能体记忆系统存在根本缺陷:会遗忘存储内容,并自信地捏造从未存储的信息。作者从拓扑学角度分析了该问题的必然性,提醒开发者重新思考记忆架构。

观点技术讨论

全网 Codex Skill 资源一站式汇总

X·KOLX 创作者 (AttentionVC)

X 创作者 AI_A 整理全网 Codex Skill 资源,汇总最值得安装的 Skill、安装方法及资源仓库。这是一份面向开发者的实用指南,帮助快速上手 Codex 的扩展能力。

教程资源汇总