每日 AI 简报

2026-06-04(内容获取于 06/04 06:58)

OpenAI 推出 GPT-Rosalind 新能力,强化生命科学研究

OpenAI News

GPT-Rosalind 新增生物推理、药物化学、基因组分析及实验工作流能力,旨在加速生命科学研究,降低研发门槛。

推荐理由:OpenAI 将通用大模型能力垂直化注入生命科学,可能改变药物发现与基因组研究范式,值得关注。

Anthropic 分享 AI-native 工程组织运行经验

Claude Blog

Anthropic 工程团队发表博客,详述如何构建以 AI 为核心的工程组织,包括工具选择、工作流设计及团队协作模式。

推荐理由:对技术管理者与团队领导者有直接参考价值,可学习一线 AI 公司的组织实践。

Anthropic 公布 Claude 多产品智能体管控方案

Anthropic Engineering

Anthropic 工程师详解如何在 claude.ai、Claude Code、Cowork 等产品中构建智能体安全管控机制,限制 Agent 爆炸半径。

推荐理由:智能体安全是当前热点,Anthropic 的工程方案对构建 Agent 系统的开发者极具借鉴意义。

Headroom:压缩 LLM 输入内容减少 60-95% Token

GitHub Trending

开源项目 Headroom 可压缩工具输出、日志、文件及 RAG 块,减少 60-95% Token 消耗,同时保持答案质量,提供代理和 MCP 服务器模式。

推荐理由:实用且可立即上手,能显著降低 LLM 使用成本,推荐给所有使用大模型 API 的开发者和企业。

Codex 公益站首字速度优化完成

LinuxDo

Codex 公益站性能压榨至极致,首字响应速度大幅提升,社区用户反馈积极。

推荐理由:对使用 Codex 或类似 AI 编码工具的开发者来说是实用更新,可体验优化后效果。

Google 开源 JPEG XL 图像编码实验与演进

Hacker News

Google 博客回顾 JPEG XL 编码标准的开源实验历程,展示社区驱动下未来图像编码技术的前沿探索。

推荐理由:对图像编码、存储与传输领域从业者了解下一代标准有帮助,扩展技术视野。

直接偏好优化超越聊天场景

Hugging Face Blog

博客探讨将 Direct Preference Optimization (DPO) 方法扩展到非聊天领域,包括代码生成、图像理解等多模态场景。

推荐理由:DPO 研究者与模型训练工程师可从中获得扩展应用的新视角。

社会科学家使用编码 Agent 的经济学分析

Anthropic Research

Anthropic 发布经济研究,分析社会科学领域研究者使用编码 Agent 的效果、成本与工作流变化。

推荐理由:对 Agent 在专业领域落地的经济性分析,适合研究员与产品经理评估投资回报。

推理模型 KV 缓存随机逐出优化研究

HuggingFace Trending Papers

论文提出 Value-Aware 随机 KV 缓存逐出方法,针对推理模型长思维链导致的内存瓶颈,在减少资源消耗同时保持准确性。

推荐理由:推理模型部署成本高,该研究为优化 KV 缓存提供新思路,适合 AI 系统研究者关注。

世界模型的功能分类学探讨

X 推文 (AttentionVC)

李飞飞等人发布世界模型功能分类文章,从哲学与工程角度梳理空间智能与世界模型的框架。

推荐理由:为空间智能与世界模型研究者提供分类学参考,拓展理论认知。

chopratejas/headroom

Python · ★ 9,524 · 🍴 629 · 📈 3,528 stars today

Compress tool outputs, logs, files, and RAG chunks before they reach the LLM. 60-95% fewer tokens, same answers. Library, proxy, MCP server.

中文介绍 Headroom 是一个工具输出、日志、文件和 RAG 块压缩库,在交给大模型前减少 60-95% 的 token 数,同时保持答案不变。提供库、代理和 MCP 服务器三种使用方式,适合需要降低 LLM 调用成本的开发者。

affaan-m/ECC

JavaScript · ★ 205,639 · 🍴 31,573 · 📈 2,147 stars today

The agent harness performance optimization system. Skills, instincts, memory, security, and research-first development for Claude Code, Codex, Opencode, Cursor and beyond.

中文介绍 ECC 是一个智能体性能优化系统,为 Claude Code、Codex、Opencode、Cursor 等编码助手提供技能、直觉、记忆、安全与研究优先的开发能力。适合希望提升 AI 编码代理效率与安全性的开发者。

aquasecurity/trivy

Go · ★ 35,376 · 🍴 413 · 📈 26 stars today

Find vulnerabilities, misconfigurations, secrets, SBOM in containers, Kubernetes, code repositories, clouds and more

中文介绍 Trivy 是一款开源安全扫描器,可发现容器、Kubernetes、代码仓库和云环境中的漏洞、错误配置、密钥和 SBOM。适合 DevOps 和安全团队用于持续集成和基础设施安全审计。

NousResearch/hermes-agent

Python · ★ 179,015 · 🍴 30,668 · 📈 1,736 stars today

The agent that grows with you

中文介绍 Hermes Agent 是一个可随用户成长的自适应智能体,专注于持续学习和个性化交互。适用于需要长期陪伴或逐步适应个人使用习惯的 AI 助手场景。

microsoft/markitdown

Python · ★ 142,793 · 🍴 9,756 · 📈 2,006 stars today

Python tool for converting files and office documents to Markdown.

中文介绍 MarkItDown 是微软开源的 Python 工具,可将各类文件(包括 Office 文档)转换为 Markdown 格式。适合需要将文档内容结构化以便于 LLM 处理或存档的开发者。

nesquena/hermes-webui

Python · ★ 13,076 · 🍴 1,590 · 📈 734 stars today

Hermes WebUI: The best way to use Hermes Agent from the web or from your phone!

中文介绍 Hermes WebUI 是 Hermes Agent 的最佳 Web 和手机端界面,提供便捷的交互方式。适合远程或移动端使用 Hermes Agent 的用户,无需依赖命令行。

D4Vinci/Scrapling

Python · ★ 60,175 · 🍴 5,801 · 📈 1,078 stars today

🕷️ An adaptive Web Scraping framework that handles everything from a single request to a full-scale crawl!

中文介绍 Scrapling 是一个自适应 Web 抓取框架,能从单次请求到全量爬取全面覆盖。提供智能解析和反爬规避,适合需要稳定、大规模采集网站数据的开发者。

opendataloader-project/opendataloader-pdf

Java · ★ 23,221 · 🍴 2,177 · 📈 573 stars today

PDF Parser for AI-ready data. Automate PDF accessibility. Open-source.

中文介绍 OpenDataLoader-PDF 是一个开源 PDF 解析器,专为 AI-ready 数据设计,可自动提取 PDF 中的内容并提升可访问性。适合需要从海量 PDF 中提取结构化数据供模型训练的团队。

odoo/odoo

Python · ★ 51,905 · 🍴 32,642 · 📈 29 stars today

Odoo. Open Source Apps To Grow Your Business.

中文介绍 Odoo 是一套开源企业应用套件,涵盖 CRM、电商、会计、库存、人力资源等模块,可快速定制部署。适合中小企业及开发者,用于构建一体化的业务管理系统。

Open-LLM-VTuber/Open-LLM-VTuber

Python · ★ 8,913 · 🍴 1,103 · 📈 702 stars today

Talk to any LLM with hands-free voice interaction, voice interruption, and Live2D taking face running locally across platforms

中文介绍 Open-LLM-VTuber 是一个跨平台的开源项目,支持与任意大模型进行免提语音交互、语音打断,并带有 Live2D 虚拟形象,全本地运行。适合想要打造个人 AI 虚拟主播或语音助手的用户。

jwasham/coding-interview-university

★ 348,969 · 🍴 83,153 · 📈 459 stars today

A complete computer science study plan to become a software engineer.

中文介绍 Coding Interview University 是一套完整的计算机科学自学计划,旨在帮助学习者系统掌握算法、数据结构等核心知识,以通过顶尖科技公司的面试。适合准备软件工程面试的开发者。

lyogavin/airllm

Jupyter Notebook · ★ 18,858 · 🍴 2,068 · 📈 208 stars today

AirLLM 70B inference with single 4GB GPU

中文介绍 AirLLM 让 70B 参数的大模型推理仅需单块 4GB 显存的 GPU,通过高效内存管理和计算优化实现。适合硬件资源有限的开发者进行大规模模型本地推理。

supermemoryai/supermemory

TypeScript · ★ 25,133 · 🍴 2,206 · 📈 601 stars today

Memory engine and app that is extremely fast, scalable. The Memory API for the AI era.

中文介绍 Supermemory 是一个极快且可扩展的记忆引擎和应用程序,为 AI 时代提供 Memory API。适合需要为聊天机器人或 AI 助手添加长期记忆能力的开发者。

HKUDS/Vibe-Trading

Python · ★ 9,861 · 🍴 1,998 · 📈 221 stars today

"Vibe-Trading: Your Personal Trading Agent"

中文介绍 Vibe-Trading 是一个个人交易助手,基于技术分析和情绪信号提供交易决策支持。适合希望借助 AI 辅助进行股票或加密货币交易的投资者。

Value-Aware Stochastic KV Cache Eviction for Reasoning Models

👍 6

Reasoning models improve accuracy through extended chains of thought, but their long outputs create a memory and compute bottleneck. KV cache eviction methods reduce this cost by evicting unimportant key-value pairs from the cache, yet they often yield worse accuracy than selection-based sparse atte

中文介绍 推理模型通过长链思维提升准确性,但长输出带来显存和计算瓶颈。价值感知随机KV缓存驱逐方法在保留关键信息的同时减少缓存大小,优于传统方法。

World Models Meet Language Models: On the Complementarity of Concrete and Abstract Reasoning

👍 21

World models and multimodal large language models (MLLMs) provide complementary capabilities for predicting future outcomes from static visual observations. World models can generate concrete visual rollouts of possible futures, while MLLMs can reason abstractly over questions, goals, and rules. How

中文介绍 世界模型与多模态大语言模型互补:世界模型可生成具体视觉推演,多模态大模型能进行抽象推理。研究结合两者,提升静态视觉观测下的未来预测能力。

Small RL Controller, Large Language Model: RL-Guided Adaptive Sampling for Test-Time Scaling

👍 10

Test-time scaling improves the reasoning performance of large language models but incurs substantial cost in both total computation and latency. Existing adaptive sampling methods partially mitigate this issue by dynamically deciding when to stop sampling, yet they typically rely on heuristic rules

中文介绍 测试时缩放提升大语言模型推理性能,但增加计算和延迟。研究提出用小强化学习控制器引导自适应采样,动态决定停止采样时机,降低测试时缩放成本。

Conditional Hypothesis Generation for LLM-Based Text Analysis with Researcher-Specified Covariates

👍 4

A core goal of computational social science is to discover interpretable differences in how language varies across outcomes of interest, such as political affiliation or instructional quality. Recent LLM-based hypothesis generation methods describe such differences in natural language, but select fo

中文介绍 计算社会科学核心目标是发现语言随结果变量(如政治倾向)的差异。研究提出条件假设生成方法,利用大语言模型在指定协变量下生成可解释的自然语言假设。

Ultralytics YOLO26: Unified Real-Time End-to-End Vision Models

👍 3

Real-time vision demands models that are accurate, efficient, and simple to deploy across diverse hardware. The YOLO family has become widely deployed for this reason, yet most YOLO detectors still rely on non-maximum suppression at inference, carry heavy detection heads due to Distribution Focal Lo

中文介绍 Ultralytics发布YOLO26,统一实时端到端视觉模型。新模型无需非极大值抑制,简化部署,在多种硬件上实现高精度、高效率的目标检测。

KVarN: Variance-Normalized KV-Cache Quantization Mitigates Error Accumulation in Reasoning Tasks

👍 26

Test-time scaling is a powerful approach to obtain better reasoning in large language models, but it becomes memory-bottlenecked during long-horizon decoding, as the KV-cache grows. KV-cache quantization can help improve this, but current methods are evaluated under prefill-like settings and errors

中文介绍 测试时缩放带来长推理解码中KV缓存显存瓶颈。KVarN通过方差归一化量化KV缓存,减少量化误差累积,提升长推理任务中的模型性能。

Bootstrap Your Generator: Unpaired Visual Editing with Flow Matching

👍 12

Modern generative models possess a deep understanding of visual content, yet training them for image editing typically requires massive datasets of paired examples. This limits scalability, especially for video editing where collecting paired data is prohibitively expensive. We propose Bootstrap You

中文介绍 现代生成模型理解视觉内容,但训练图像编辑需大量配对数据。研究提出Bootstrap Your Generator方法,利用流匹配实现无需配对数据的图像编辑,可扩展至视频。

PaddleOCR-VL-1.6: Expanding the Frontier of Document Parsing with Under-Optimized Region Refinement and Progressive Post-Training

👍 8

We introduce PaddleOCR-VL-1.6, an upgraded compact document parsing model built upon PaddleOCR-VL-1.5. Although PaddleOCR-VL-1.5 establishes a strong 0.9B baseline, its remaining errors concentrate in under-optimized regions where model behavior is unstable, data coverage is sparse, or supervision i

中文介绍 PaddleOCR-VL-1.6升级版文档解析模型,基于1.5版本0.9B参数基线,通过未优化区域精炼和渐进式后训练,提升文档解析性能。

NVIDIA OmniDreams: Real-Time Generative World Model for Closed-Loop Autonomous Vehicle Simulation

👍 13

As autonomous vehicle capabilities advance, the safe evaluation of driving policies in long-tail scenarios remains a critical bottleneck. In closed-loop simulation, the driving policy model actively interacts with the environment, where its actions dynamically update the simulator state and directly

中文介绍 NVIDIA推出OmniDreams实时生成式世界模型,用于自动驾驶闭环仿真。该模型能动态更新模拟环境,评估长尾场景下的驾驶策略,提升安全性。

Language Models Need Sleep: Learning to Self-Modify and Consolidate Memories

👍 14

The past few decades have witnessed significant advances in the design of machine learning algorithms, from early studies on task-specific shallow models to more general deep Large Language Models (LLMs). Despite showing promising results in tasks that require instant prediction or in-context learni

中文介绍 研究提出语言模型需「睡眠」机制,通过自我修改和记忆巩固来提升长期性能,类比人类睡眠中的记忆整合过程。

AURA: Action-Gated Memory for Robot Policies at Constant VRAM

👍 2

The KV-cache is the right memory for datacenters but the wrong memory for robots. Datacenter inference batches many short requests and resets them, amortizing an attention cache across a crowd. Embodied agents instead run one long, non-resetting episode on bandwidth-limited edge hardware, where high

中文介绍 机器人策略中,KV缓存适合数据中心但不适合边缘设备。AURA提出动作门控记忆机制,在固定显存下处理长序列,适用于机器人实时场景。

Ψ-Bench: Evaluating Persona-Sensitive Influencing in Persuasive Dialogues

👍 11

Personalization is a crucial capability of modern language agents. However, current research primarily positions personalized agents as passive responders to user preferences, limiting their ability to interact with users and provide suggestions or guidance proactively. To systematically evaluate su

中文介绍 当前个性化代理多被动响应用户偏好。Ψ-Bench提出评估框架,测试语言代理在说服性对话中主动适应不同人格并进行影响的能力。

TRON: Targeted Rule-Verifiable Online Environments for Visual Reasoning RL

👍 16

Reinforcement learning (RL) for visual reasoning needs scalable, verifiable, and controllable training signals. Existing visual RL post-training trains on static curated datasets, with fixed image-question-answer samples bounded by their collection budget. In this work, we introduce TRON (Targeted,

WALL-WM: Carving World Action Modeling at the Event Joints

👍 0

WALL-WM is a World Action Model that shifts video-action learning from chunk-centric optimization to event-grounded Vision-Language-Action pretraining, using semantically coherent action events as the atomic unit of learning. Existing WAMs commonly initialize from multimodal or video foundation mode

Mitigating Perceptual Judgment Bias in Multimodal LLM-as-a-Judge via Perceptual Perturbation and Reward Modeling

👍 1

Recent multimodal large language models have demonstrated strong reasoning ability, yet their reliability as automated evaluators remains limited by a critical weakness: when visual evidence conflicts with textual cues, MLLM judges tend to reward plausible narratives over perceptually correct answer

AutoMedBench: Towards Medical AutoResearch with Agentic AI Models

👍 20

Autonomous agents are increasingly expected to support end-to-end medical-AI research workflows, moving beyond isolated prediction tasks or short-form clinical question answering. However, existing medical agent benchmarks primarily evaluate final outputs, providing limited visibility into agent beh

A Local Perturbation Theory for Cross-Domain Interference and Recovery in Multi-Domain RL

👍 24

Reinforcement learning (RL) post-training improves large language models (LLMs) on individual domains such as mathematical reasoning, code generation, question answering, and creative writing (CW), but training on one domain often degrades performance on others. Existing explanations based on catast

Decentralized Instruction Tuning: Conflict-Aware Splitting and Weight Merging

👍 8

Instruction tuning aligns large language models, including multimodal ones, with diverse user intents, but scaling to heterogeneous mixtures is hindered by gradient interference and bandwidth-heavy synchronization. We ask whether these two bottlenecks can be addressed jointly by training parts of th

PlatonicNav: Unveiling Semantic Correspondence in Navigation with Platonic Topological Maps

👍 8

Embodied visual navigation, where an agent perceives a complex environment and acts to reach a goal from raw sensory input, underpins a wide range of applications such as household service robotics, assistive robotics, and large-scale autonomous exploration. However, recent attempts to unify vision-

ClawHub Security Signals: When VirusTotal, Static Analysis, and SkillSpector Disagree

👍 3

Agent skills extend AI agents with reusable instructions, tools, scripts, references, and workflows, establishing a security boundary distinct from both model safety and traditional package-malware detection. ClawHub Security Signals is a sanitized dataset of 67,453 latest public OpenClaw skill vers

OmniOPD: Logit-Free On-Policy Distillation via Speculative Verification

👍 7

On-Policy Distillation (OPD) trains a student model on its own generative trajectories under dense token-level feedback from a stronger teacher, mitigating both the off-policy distribution shift of Supervised Fine-Tuning (SFT) and the sparse credit assignment of Reinforcement Learning (RL). However,

OCC-RAG: Optimal Cognitive Core for Faithful Question Answering

👍 72

Recent progress in the development of language models has been defined by scale, with each generation absorbing more of the world's knowledge into its weights. However, many practical applications benefit more from robust reasoning than from extensive parametric knowledge. In this setting, task-spec

Domain-Specific Data Synthesis for LLMs via Minimal Sufficient Representation Learning

👍 4

Large Language Models have demonstrated remarkable progress in general-purpose capabilities and can achieve strong performance in specific domains through fine-tuning on domain-specific data. However, acquiring high-quality data for target domains remains a significant challenge. Existing data synth

MIRA: Mid-training Rubric Anchoring for Source-Aware Data Selection

👍 19

Mid-training has become an important stage in modern LLM development, using large-scale curated mixtures to strengthen capabilities before final post-training. Its data selection problem is distinct: the data are optimized under a pretraining-style objective at near-pretraining scale, but are curate

Diagnosing Harmful Continuation in Answer-Correct Long-CoT Training Traces

👍 8

Long chain-of-thought (CoT) traces are widely used as supervision for reasoning-oriented LLM SFT, yet answer-correct traces can still lead to markedly different fine-tuning outcomes. We study post-conclusion continuation in answer-correct long-CoT data: a continuation where the answer appears suffic

MERIT: Learning Disentangled Music Representations for Audio Similarity

👍 7

Current music similarity models typically compute a single, monolithic score, entangling distinct musical dimensions like melody, rhythm, and timbre. This limits user control and interpretability, making it impossible to execute nuanced queries. We introduce MERIT, a framework for learning disentang

From Activation to Causality: Discovery of Causal Visual Representations in the Human Brain

👍 40

Identifying which brain regions represent a visual concept in the human brain is a central challenge in neuroscience. Existing approaches have localized coarse functional regions (e.g., faces, places) through activation maximization, identifying regions that activate strongly for a target concept re

HOW ONE $2,999 NVIDIA BOX MADE ME $22,000 IN A YEAR

@w1nklerr · 44.2K 粉丝 · 17.7M 阅 · 1.4K 赞 · 161 转

Nobody told me about this for months. I'm telling you now so you don't lose the year I lost. Let me start with the number that made me angry. Last quarter my cloud GPU spend was sitting at $1,900 a

中文介绍 自建 NVIDIA 盒子($2,999)替代云端 GPU,一年省下云支出并赚回 $22,000。现身说法对比自建 vs. 按需云的成本差异。

Context as Topology: Why Your Agent's Memory Forgets, and How Structure Escapes It

@elpresidank · 116 粉丝 · 2.9M 阅 · 543 赞 · 35 转

Most AI agent memory is built on embeddings. And there's now a proof that this entire class of system is going to forget what you stored in it — and confidently make up things you never stored at all.

中文介绍 证明基于嵌入的 AI 代理记忆系统天生会遗忘并产生幻觉,提出拓扑结构作为替代方案,突破现有记忆架构局限。

Range and Depth on Demand

@1salman · 363 粉丝 · 2.0M 阅 · 682 赞 · 45 转

Everyone keeps asking whether AI favors specialists or generalists. I think that is the wrong question. AI does not pick a side. It changes the tradeoff. The old world forced a choice. You could go

中文介绍 AI 不会偏向专才或通才,而是改变了两者的成本权衡。旧世界必须二选一,AI 时代可兼得广度与深度。

How to build a 4-agent team, that ships a feature while you sleep (Exact Setup Inside)

@zodchiii · 20.0K 粉丝 · 743.3K 阅 · 509 赞 · 55 转

Four AI agents can ship a feature while you sleep. Most people never wire them up. They fire a reviewer here, a test generator there, by hand, one at a time, each forgetting what the last one did.

中文介绍 详细搭建 4 个 AI 子代理协同的工作流:开发者、评审者、测试者、部署者,串联成整条流水线,在你睡觉时完成功能发布。

30 Obsidian Workflows, Plugins, and Setups That Most Users Don't Know

@eng_khairallah1 · 61.9K 粉丝 · 693.5K 阅 · 511 赞 · 71 转

Obsidian has 2,700+ community plugins. Over 100 of them are AI-related. Save this :) And the CEO of Obsidian personally published official Claude Skills for the platform - 12,900+ GitHub stars in

中文介绍 整理 30 个 Obsidian 隐藏玩法,包括 100+ 个 AI 插件、官方 Claude Skills 集成(12,900+ GitHub star),以及最新工作流与插件搭配。

What an Enterprise Context Layer Actually Is

@prukalpa · 23.1K 粉丝 · 583.2K 阅 · 506 赞 · 80 转

A field guide to what it is, what it is not, and where it fits in your AI architecture. I have had some version of the same conversation with a CIO almost every day this year. Their team has read

中文介绍 企业上下文层(Context Layer)实战指南:其本质、与其他架构组件的区别,以及在 AI 架构中的定位——解决 CIO 日常困惑。

I Searched the Whole Claude Skills Ecosystem - These Are the Ones That Matter [Full GitHub Links]

@polydao · 18.1K 粉丝 · 559.5K 阅 · 505 赞 · 55 转

Most people are still using Claude like a smarter chatbot That is not the game anymore You’re competing against people who treat Claude like an operating system > While you’re typing one-off

中文介绍 搜遍 Claude Skills 生态,筛选出真正有价值的技能,附 GitHub 链接。核心观点:不要只把 Claude 当聊天机器人,要当操作系统使用。

hacking pewdiepie's AI agent harness using an evil cocomelon website (then helping protect it)

@theonejvo · 22.1K 粉丝 · 504.3K 阅 · 861 赞 · 1 转

Over the past year, @pewdiepie, has been turning into one of the most visible champions of private, self-hosted computing, and it has been a genuine pleasure to watch. What began in late 2025 as an

中文介绍 攻击 PewDiePie 的私有 AI 代理系统(恶搞 cocomelon 网站),随后协助加固安全,展示自托管代理的攻防实战。

Claude Code + NotebookLM + Obsidian: Research Monster That Gets Smarter Every Time You Use It

@monokern · 1.2K 粉丝 · 263.1K 阅 · 505 赞 · 72 转

Most people treat research as a manual task. You open 10 tabs. You watch videos. You read articles. You take notes somewhere. An hour later you have a pile of information you're not sure what to do

中文介绍 结合 Claude Code、NotebookLM 和 Obsidian 构建研究流程:用前者分析代码/文件,NotebookLM 聚合多模态信息,Obsidian 形成知识库,越用越智能。

Stop building Foxconn factories for your agents

@garrytan · 853.3K 粉丝 · 180.6K 阅 · 503 赞 · 43 转

In January I got back into coding and I built Garry's List. Over five hundred thousand lines of Rails and the tests to police it. I was proud of it. I shouldn't have been. The thing worth being proud

中文介绍 建议停止为 AI 代理搭建「富士康式」工厂。作者用 AI 写完 50 万行 Rails 代码后反思:真正该自豪的不是代码量,而是产品效果。强调 AI 编码带来全新效率范式。

The Agentic Economy Is Here

@base · 1.3M 粉丝 · 97.3K 阅 · 519 赞 · 74 转

TL;DR: Agents are becoming the internet’s newest paying customers, and the economy serving them is moving fast. On Base, agents already use wallets and stablecoins to pay for inference, live search,

中文介绍 代理经济正在到来:AI 代理成为新一代付费客户,在 Base 链上使用钱包和稳定币支付推理、搜索等服务。生态正在形成。

🥇Top AI Papers of the Week

@dair_ai · 124.6K 粉丝 · 84.0K 阅 · 504 赞 · 83 转

1. SkillOpt Microsoft Research treats a compact natural-language skill document as the trainable state of a frozen agent, then learns that document through rollouts, reflection, and bounded edits

中文介绍 本周最佳 AI 论文精选:微软 SkillOpt(将技能文档视为可训练状态)、世界模型、MCP 协议扩展等核心进展速览。

My Agent Stack For Automating My Personal Life

@nicbstme · 23.7K 粉丝 · 84.0K 阅 · 530 赞 · 35 转

My agent manages my emails, SMS, Whatsapp, Telegram and pretty much everything to automate my personal life. People keep asking me how I use agents in real life. I mean the actual boring things that

中文介绍 用 AI 代理自动化个人生活:管理邮件、短信、WhatsApp、Telegram 等所有消息流,分享真实可落地的工具栈与经验。

State of Memory in Agent Harness

@mem0ai · 17.6K 粉丝 · 82.8K 阅 · 520 赞 · 60 转

Agent harnesses are where AI software actually runs. Cursor, Devin, Claude Code, Codex: these environments handle context, orchestrate tools, coordinate agents, and increasingly, manage memory. The

中文介绍 盘点主流代理框架(Cursor、Devin、Claude Code、Codex)中的记忆管理现状,分析不同语境管理策略对开发效率的影响。

Robotics: The Next AI Frontier

@ParadisLabs · 48.9K 粉丝 · 82.0K 阅 · 501 赞 · 60 转

AI's next frontier will be Robotics and Humanoids. The past decade has seen rapid AI adoption in the structured digital world. Those LLM breakthroughs now enable more general-purpose learning and more

中文介绍 机器人/人形机器人是 AI 下一个前沿——LLM 突破使通用学习与物理世界动作结合成为可能,超越数字世界局限。

How to build your own agent harness???

@mfpiccolo · 7.4K 粉丝 · 81.9K 阅 · 607 赞 · 56 转

Most agent teams don't build a harness. They adopt one. LangChain, LangGraph, OpenAI Agents SDK, Anthropic SDK, CrewAI, AutoGen, the loop, the tools, the memory, and the orchestration are picked off

中文介绍 如何自建 AI 代理框架:不依赖 LangChain、CrewAI 等现成方案,从零搭建包含循环、工具、记忆、编排的完整框架,以更好匹配任务。

A harness for every task: dynamic workflows in Claude Code

@trq212 · 263.1K 粉丝 · 75.7K 阅 · 542 赞 · 36 转

Last week, we released dynamic workflows in Claude Code. Claude can now write its own harness on the fly, custom-built for the task at hand. While the default Claude Code harness is built for coding,

中文介绍 Claude Code 新增动态工作流:代理自己能按需编写框架,无需固定模板,任务类型不同则自动适配最佳执行结构。

A Functional Taxonomy of World Models

@drfeifei · 738.0K 粉丝 · 72.2K 阅 · 699 赞 · 144 转

“The world is everything that is the case.” — Ludwig Wittgenstein, Tractatus Logico-Philosophicus, 1921 The world is not made of words. In an earlier essay, we argued that spatial intelligence is AI’s

中文介绍 提出世界模型功能分类法:解构空间智能不是「看图说话」,而是构建对物理世界的因果理解——类比 Wittgenstein 哲学框架。

How To Fix AI Slop (Using Hermes)

@EXM7777 · 115.1K 粉丝 · 70.1K 阅 · 520 赞 · 47 转

There's a reason some people seem to be constantly shipping the best software, writing incredible content, or generating insane images... They adopted the eval loop, while you... You've tried better

中文介绍 用 Hermes 工具链构建评价循环(eval loop),修复 AI 内容「油滑感」(slop):持续反馈迭代比一次性提示更重要。

Context as Topology: Why Your Agent's Memory Forgets, and How Structure Escapes It

@elpresidank · 116 粉丝 · 2.9M 阅 · 7d 曝光 2.9M

Context as Topology: Why Your Agent's Memory Forgets, and How Structure Escapes It

中文介绍 证明基于嵌入的 AI 代理记忆系统天生会遗忘并产生幻觉,提出拓扑结构作为替代方案,突破现有记忆架构局限。

🔬Scaling Past Informal AI - Carina Hong, Axiom Math

Verified Generation and Compounding Intelligence

中文介绍 Axiom Math 的 Carina Hong 探讨了超越非正式人工智能的规模化方法,重点关注验证生成和复合智能。

Introducing new capabilities to GPT-Rosalind

GPT-Rosalind advances life sciences research with enhanced biological reasoning, medicinal chemistry expertise, genomics analysis, and experimental workflow capabilities.

中文介绍 OpenAI 推出 GPT-Rosalind 新功能,增强其在生命科学研究中的生物学推理、药物化学、基因组分析和实验工作流能力。

Direct Preference Optimization Beyond Chatbots

中文介绍 文章探讨直接偏好优化(DPO)在聊天机器人之外的应用,扩展其在人工智能训练中的潜力。

A blueprint for democratic governance of frontier AI

OpenAI outlines a blueprint for U.S. governance of frontier AI, proposing a federal framework for safety, resilience, and national security.

中文介绍 OpenAI 发布前沿人工智能民主治理蓝图,提出美国联邦框架,以确保安全性、韧性和国家安全。

OpenAI public policy agenda

OpenAI outlines its public policy agenda for AI, including safety, youth protection, workforce transition, and global standards to ensure AI benefits society.

中文介绍 OpenAI 概述其公共政策议程,涵盖安全、青少年保护、劳动力转型和全球标准,以确保人工智能惠及社会。

Running an AI-native engineering org

Running an AI-native engineering org

中文介绍 文章分享如何运营一个原生人工智能的工程组织,涉及文化、流程和工具的最佳实践。

Lessons from building Claude Code: How we use skills

Lessons from building Claude Code: How we use skills

中文介绍 从构建 Claude Code 中汲取教训,阐述如何利用技能(Skills)增强代码生成能力。

Adding MCP Tools to Reachy Mini

中文介绍 为 Reachy Mini 机器人添加 MCP(模型上下文协议)工具,扩展其交互和任务执行能力。

GitHub's plan for Agents — Kyle Daigle, GitHub

GitHub pioneered the modern AI coding era with Copilot, and the resulting explosion in agentic coding has led to notable strains on the most popular developer platform in the world. Here's the plan.

中文介绍 GitHub 高管 Kyle Daigle 介绍平台应对代理式编码爆发的计划,源于 Copilot 引发的开发者平台压力。

【picpi 皮皮工艺站】复活!共享sub2api,自给自足。和冰佬有点像但是又不一样!

本帖使用社区公益推广,符合推广要求。我申明并遵循社区要求的以下内容: 我的项目是免费使用的,无收费(变相收费、赞助)部分: 是 我的帖子已经打上 公益推广 标签: 是 我的项目属于个人项目,与公司或商业机构无关: 是 我的项目不存在QQ、TG等群组引流: 是 我的项目不存在非运营必要的网站引流: 是 我的项目不存在为他人推广、AFF: 是 我的项目无关联的商业项目: 是 我的站点存在登录,并已接入 LINUX DO Connect: 是 我帖子内的项目介绍,AI生成、润色内容部分已截图发出: 是 以上选择我承诺是永久有效的,接受社区和佬友监督: 是 以下为项目介绍正文内容,AI生成、润色内容已

难绷,codex app自己把自己删了

给它临时开的权限模式,让我帮我排查下什么导致电脑卡顿,查完以后说让我重启,重启以后直接不见了。。。 18 个帖子 - 18 位参与者 阅读完整话题

【Pi】基础扩展简单推荐

昨天看到佬友写的帖子 【π】关于pi的完善之路,package生态的折腾之路 感觉真心不错,刚睡醒,心血来潮也来分享一下自己近期折腾 pi 扩展的经验。 先给误入此帖 (bushi) 的佬介绍下, Pi 是一个轻量实现的 harness, 跟 claude code / codex / opencode 这些工具比起来绝对是简易甚至可以称得上是毛坯房了,很多功能作者都刻意避开,这也让它的 system prompt token 开销很小。 虽然 Pi 这类 agent 可应用的领域不止 coding,但我目前主要还是用于 coding。本贴推荐的都是一些基础插件,coding 场景还是都能用得

欧洲线路机横评:欧洲线路哪家强?

MJJ你们好,欢迎来到热门机器横评第二期,在这个系列中,我们将直接横评热门地区的热门机器套餐,让MJJ最直观的了解到机器差异,尽管这个测试肯定无法反映机器的完整状态,但可以从测试结果中窥见某些机器的缺陷和优点,希望给迷惑的MJJ呈现一个相对直观的测试结果。 本期我们横评的是欧洲的线路机,包含德国、荷兰、英国、俄罗斯四个地区,共计15台服务器,服务商包含AKKOCloud、SaltyFish咸鱼云、搬瓦工BWH、CloudSilk白丝云、Misaka、Geelinx、NOSLA家人云、V.PS小秘书、Nube、MoeCloud。 此板块第一次发布时不会进行完整评价(因为很多机器的体验并不是测试数

情侣旅行记忆地图(地图 + 时间线 + 回忆)

Map of Intl 是一个属于两个人的旅行记录应用。 最开始只是想解决一个很简单的问题: 旅行结束后,照片越来越多,但回忆却越来越难整理。 于是我尝试用 AI 和 Vibe Coding 的方式,做了一个能够记录共同生活的小程序。 它将地图、时间线、旅行计划与回忆整理融合在一起,把一起去过的城市、走过的路线、拍下的照片和重要纪念日串联成完整的故事。 随着记录越来越多,地图上的足迹会不断增长,时间线也会越来越长。 后来发现,它记录的不只是旅行。 而是两个人一起走过的生活。 希望多年以后再次打开时,依然能够看见那些共同经历过的风景和时光。 可自由编辑路线 可查看旅行日记 接入高德地图API 3

claude这是你不仁

这是你不仁别怪我不义。我是你忠实粉丝,就这样回报我。你彻底成功的让我对你失望透顶 我这么纯都不行 既然你很喜欢发验证码是吧,那我就写个脚本让你全国发,不要停 31 个帖子 - 28 位参与者 阅读完整话题

codex 自动化分析 A 股收盘日报

如果有特别想关注的,可以在提示词中说的更清楚一些,codex拿到的数据还是蛮完善的,我现在就是每天看一下,太忙,最近都空仓着,没咋关注这些。 提示词(可根据需求自行修改,我样式上做了一些细微的调整,不好总结出来了[token 我没算过,我开的pro20x 会员,这个具体会跑多少token,我没细看过,记得注意哈,万一撑不住就不好了])。 github链接:GitHub - ningzaichun/ai_stock_daily_report: codex ai stock daily report · GitHub 大家感兴趣可以看看,完整提示词和使用的案例都在里面。 49 个帖子 - 28 位

君の公益 回来吧,天才程序员们~

gpt 恢复供应 大家尽量不要给我发私信,有问题评论就行,无力回复,太多了 不要以任何理由索要额度/寻求解封 用不了就是号池在维护 不要去星辰的售后群去问我的客服为啥君の公益用不了,没有别的意思,就是客服可能会比较崩溃 我不会建任何有关君の公益的qq群/tg群,我也不喜欢看到有人以我的名义建群 君の公益唯一的解释权在我这里,我不喜欢看到有人代替我解释什么,希望大家尽量避免 签到额度调整为每天52刀 已经安排佬友帮我查分发了,已经封了一批,后面不定期也会安排检查 341 个帖子 - 294 位参与者 阅读完整话题

Launch HN: Hyper (YC P26) – Company brain to power agentic development

Hey HN, we’re Shalin & Kanyes, best friends who've been hacking together for 10+yrs, and now founders of Hyper (https://heyhyper.ai/). Hyper is a shared “company brain” that plugs into information flowing inside a company to make AI agents and automations better and ultimatel

ESP32-S31

231 points · 125 comments