每日 AI 简报

2026-06-11(内容获取于 06/11 20:53)

Anthropic 发布 Claude Fable 5 及 Mythos 5 模型

Smol AI News

Anthropic 发布两款新模型:Fable 5(通用版)和 Mythos 5(受限访问),敏感查询可回退至 Opus 4.8。标志着模型能力的又一次重大迭代。(多家报道)

推荐理由:这是今天最重要的模型发布,直接关系到 AI 能力天花板的变化,从业者必须关注。

苹果发布轻量级 Mac Linux 容器工具 Container

GitHub Trending

苹果官方的开源项目 Container,可用轻量级虚拟机在 Mac(特别是 Apple Silicon)上创建和运行 Linux 容器,工具采用 Swift 编写,性能优化。

推荐理由:苹果官方的容器工具,对 Mac 开发者极具实用价值,可直接上手使用。

GitHub 项目 agent-skills:生产级 AI 编码代理技能

GitHub Trending

开源项目 agent-skills,提供面向 AI 编码代理的生产级工程技能集合,帮助开发者构建更优质的智能体代码能力。

推荐理由:面向 AI 代理开发的实战技能库,能直接提升编码代理的生产力,非常值得开发者借鉴。

DeepMind 警示:千万 AI 代理交互存潜在风险

MIT Tech Review AI

Google DeepMind 资助研究,关注数百万 AI 代理相互交互时的潜在危险。AI 安全主管 Rohin Shah 指出,大规模自主交互可能引发不可预测的连锁反应。

推荐理由:行业顶级研究机构的前瞻性预警,对理解 AI 代理安全领域至关重要。

Claude Code 更新 v2.1.114:修复代理工具权限崩溃

Claude Code Changelog

Claude Code 发布 v2.1.114 版本,主要修复了代理团队队友请求工具权限时导致的崩溃问题。

推荐理由:Claude Code 用户的实用更新,修复了关键协作场景中的崩溃问题。

PyTorch 性能分析进阶:从 nn.Linear 到融合 MLP

Hugging Face Blog

技术教程深入探讨 PyTorch 性能分析方法,展示如何将标准 nn.Linear 层融合为更高效的 MLP 内核。

推荐理由:PyTorch 性能优化实战教程,对模型训练加速有直接帮助,适合深度学习工程师。

DeepMind 发布 DiffusionGemma:四倍速文本生成

DeepMind Blog

DeepMind 推出 DiffusionGemma,新型扩散语言模型,相比传统自回归模型可实现高达 4 倍的文本生成速度提升。

推荐理由:文本生成速度的重大突破,对大模型推理效率有显著影响,值得关注。

OpenAI 支持欧盟可信 AI 生态体系建设

OpenAI News

OpenAI 宣布支持欧盟 AI 内容透明度实践准则,推动来源标准和工具的建立,以帮助公众识别 AI 生成内容。

推荐理由:关乎 AI 内容监管和透明度的政策动向,对理解全球 AI 治理趋势有参考价值。

论文解读:基于艺术的强化训练微调多模态大模型

HuggingFace Trending Papers

新论文提出 ART(Art-based Reinforcement Training)方法,用于多模态大模型的参数高效微调,结合强化学习提升模型表现。

推荐理由:多模态微调方向的新方法,对研究者有一定启发,但对一线工程师行动价值有限。

比亚迪「闪充」超充网络将落地加拿大

Hacker News

比亚迪将其声称 5 分钟可充满电的「Flash」超充技术引入加拿大,计划建设超充网络。

推荐理由:电动汽车快充技术的重要商业进展,影响新能源行业格局,值得关注。

apple/container

Swift · ★ 30,916 · 🍴 864 · 📈 2,419 stars today

A tool for creating and running Linux containers using lightweight virtual machines on a Mac. It is written in Swift, and optimized for Apple silicon.

中文介绍 苹果开源的 Swift 工具,用于在 Mac 上通过轻量级虚拟机创建和运行 Linux 容器。专为 Apple Silicon 优化,性能优异。适合开发者在 macOS 环境安全、高效地运行 Linux 工作负载。

addyosmani/agent-skills

Shell · ★ 53,594 · 🍴 5,856 · 📈 3,275 stars today

Production-grade engineering skills for AI coding agents.

中文介绍 为 AI 编码智能体提供的工程技能集合,强调生产级质量。帮助 AI Agent 更好地理解代码规范、调试和优化,适用于提升代码生成与审查的可靠性。

maziyarpanahi/openmed

Python · ★ 2,541 · 🍴 256 · 📈 427 stars today

open-source healthcare ai

中文介绍 开源医疗 AI 项目,致力于构建面向医疗领域的智能解决方案。涵盖诊断辅助、病历分析等场景,为医疗机构和研究者提供可定制的 AI 工具。

phuryn/pm-skills

★ 15,902 · 🍴 1,675 · 📈 1,944 stars today

PM Skills Marketplace: 100+ agentic skills, commands, and plugins — from discovery to strategy, execution, launch, and growth.

中文介绍 面向产品经理的技能市场,包含 100 多种智能体技能、命令和插件,覆盖从发现、战略到执行、增长的全流程。适合 PM 通过 AI 代理加速产品工作流。

NVIDIA/SkillSpector

Python · ★ 2,298 · 🍴 195 · 📈 308 stars today

Security scanner for AI agent skills. Detect vulnerabilities, malicious patterns, and security risks.

中文介绍 英伟达开源的 AI 智能体技能安全扫描器,用于检测漏洞、恶意模式和风险。保护 AI Agent 生态安全,适用于企业部署前审计第三方技能。

soxoj/maigret

Python · ★ 32,370 · 🍴 2,376 · 📈 665 stars today

🕵️‍♂️ Collect a dossier on a person by username from 3000+ sites

中文介绍 通过用户名在 3000 多个网站上搜集人物档案的开源工具。适用于社交工程调查、背景核查和数字足迹分析,提供丰富的站点支持与结果导出。

x1xhlol/system-prompts-and-models-of-ai-tools

★ 139,737 · 🍴 34,608 · 📈 369 stars today

FULL Augment Code, Claude Code, Cluely, CodeBuddy, Comet, Cursor, Devin AI, Junie, Kiro, Leap.new, Lovable, Manus, NotionAI, Orchids.app, Perplexity, Poke, Qoder, Replit, Same.dev, Trae, Traycer AI, VSCode Agent, Warp.dev, Windsurf, Xcode, Z.ai Code, Dia & v0. (And other Open Sourced) System Prompts

中文介绍 收集了大量 AI 工具(如 Cursor、Claude Code、Devin 等)的系统提示词和模型信息。便于开发者了解、对比和复现不同 AI 助手的底层行为逻辑。

refactoringhq/tolaria

TypeScript · ★ 15,230 · 🍴 1,051 · 📈 604 stars today

Desktop app to manage markdown knowledge bases

中文介绍 桌面端 Markdown 知识库管理应用,支持本地存储和快速检索。适合个人笔记管理、文档书写,提供简洁的文件夹式组织和标记功能。

obra/superpowers

Shell · ★ 224,400 · 🍴 19,936 · 📈 1,323 stars today

An agentic skills framework & software development methodology that works.

中文介绍 一套智能体技能框架与软件开发方法论,旨在系统化地提升 AI Agent 的协作与执行效率。适用于构建可复用的 Agent 能力,并指导实际项目落地。

restic/restic

Go · ★ 34,019 · 🍴 1,780 · 📈 33 stars today

Fast, secure, efficient backup program

中文介绍 快速、安全、高效的备份程序,支持加密、增量备份和多种存储后端(本地、云存储等)。适合个人及企业定期备份数据,恢复速度快。

msitarzewski/agency-agents

Shell · ★ 111,033 · 🍴 18,201 · 📈 1,434 stars today

A complete AI agency at your fingertips - From frontend wizards to Reddit community ninjas, from whimsy injectors to reality checkers. Each agent is a specialized expert with personality, processes, and proven deliverables.

中文介绍 一站式 AI 智能体代理集合,涵盖前端开发、社交媒体运营、创意生成等多种角色。每个智能体拥有专业化技能和个性,可用于自动化营销、内容创作等场景。

masterking32/MasterDnsVPN

Go · ★ 5,487 · 🍴 509 · 📈 510 stars today

Advanced DNS tunneling VPN for censorship bypass, optimized beyond DNSTT and SlipStream with low-overhead ARQ, resolver load balancing, high packet-loss stability and speed.

中文介绍 高级 DNS 隧道 VPN,专为突破网络审查设计。采用低开销 ARQ、解析器负载均衡等技术,在高丢包环境下依然保持稳定高速,优于传统 DNSTT 方案。

chatwoot/chatwoot

Ruby · ★ 30,144 · 🍴 7,491 · 📈 31 stars today

Open-source live-chat, email support, omni-channel desk. An alternative to Intercom, Zendesk, Salesforce Service Cloud etc. 🔥💬

中文介绍 开源全渠道客服平台,支持实时聊天、邮件和社交媒体集成,是 Intercom、Zendesk 的替代品。适合中小型团队搭建高效、可定制的客户支持系统。

kenn-io/agentsview

Go · ★ 1,444 · 🍴 166 · 📈 98 stars today

Local-first session intelligence and analytics for coding agents, supporting Claude Code, Codex, and more than 20 other agents. Also: 100x faster replacement for ccusage!

中文介绍 本地优先的编码智能体会话分析与统计工具,支持 Claude Code、Codex 等 20 多种 Agent。可替代 ccusage 并快 100 倍,帮助开发者洞察 Agent 行为与效率。

alchaincyf/zhangxuefeng-skill

★ 7,865 · 🍴 2,410 · 📈 94 stars today

张雪峰.skill — 张雪峰的认知操作系统。高考志愿/考研/职业规划的实战思维框架。由女娲.skill生成。

中文介绍 基于张雪峰认知方法论构建的智能体技能,聚焦高考志愿、考研和职业规划等实战框架。由女娲.skill 自动生成,适合学生和家长进行教育决策参考。

TapXWorld/ChinaTextbook

Roff · ★ 73,857 · 🍴 16,528 · 📈 345 stars today

所有小初高、大学PDF教材。

中文介绍 中国小学、初中、高中及大学全阶段 PDF 电子教材集合。为学生、教师和自学者提供便捷的教材获取渠道,覆盖主要学科。

hexo-ai/sia

Python · ★ 1,043 · 🍴 146 · 📈 177 stars today

SIA is a Self Improving AI framework to autonomously improve the performance of any AI system (Model / Agent) on a benchmark task.

中文介绍 自改进 AI 框架,能够自动提升任意 AI 系统(模型或智能体)在基准任务上的表现。适用于持续优化模型性能,减少人工调参成本。

mattermost/mattermost

TypeScript · ★ 37,126 · 🍴 8,689 · 📈 26 stars today

Mattermost is an open source platform for secure collaboration across the entire software development lifecycle..

中文介绍 开源的安全协作平台,专为软件开发生命周期设计。提供消息、文件共享、集成 CI/CD 等能力,是 Slack 的私有化替代品,适合企业级团队。

bannedbook/fanqiang

Kotlin · ★ 46,556 · 🍴 7,990 · 📈 342 stars today

翻墙-科学上网

中文介绍 翻墙/科学上网资料合集,提供各类突破网络封锁的工具、教程和配置。适合寻求网络自由访问的用户参考和使用。

Fine-tuning Multi-modal LLMs with ART: Art-based Reinforcement Training

👍 1

There are two main Parameter-Efficient Fine-Tuning (PEFT) techniques for Large Language Models (LLMs). While Low-Rank Adaptation (LoRA) introduces additional weights between the LLM layers, Soft Prompting introduces additional fine-tuning-specific raw tokens to an LLM input. However, both require mo

中文介绍 该研究提出ART(基于艺术的强化训练),一种多模态大语言模型参数高效微调方法。ART结合了LoRA和软提示两种主流技术,旨在提升多模态模型性能。

Grammar-Constrained Decoding Can Jailbreak LLMs into Generating Malicious Code

👍 16

Large Language Models (LLMs) are increasingly used for code generation, raising concerns that they may be misused to produce malicious code. Meanwhile, Grammar-Constrained Decoding (GCD) has been widely adopted to improve the reliability of LLM-generated code by enforcing syntactic validity. In this

中文介绍 研究发现,语法约束解码(GCD)可能被利用来破解大型语言模型,使其生成恶意代码。该工作揭示了GCD在保证代码生成可靠性时的安全隐患。

Time-Series Foundation Model Embeddings for Remaining Useful Life Estimation

👍 1

Remaining Useful Life (RUL) prediction is essential for industrial predictive maintenance, yet many learning-based approaches rely on extensive feature engineering or large labeled datasets to train task-specific sequence models. In this work, we introduce a lightweight learning approach, in which w

中文介绍 该研究提出一种轻量学习方法,利用时间序列基础模型的嵌入进行剩余使用寿命(RUL)预测,减少了对特征工程和大规模标注数据的依赖,适用于工业预测性维护。

Reason, Then Re-reason: Cross-view Revisiting Improves Spatial Reasoning

👍 25

Spatial reasoning from egocentric videos is inherently challenging because the observable evidence is constrained by the camera trajectory. Existing methods rely on single-turn inference, forcing models to resolve geometric ambiguity through semantic priors rather than verifiable evidence. We argue

中文介绍 该研究提出“先推理,再重推理”方法,通过交叉视角复查来改进从第一人称视频中的空间推理能力。相比单轮推理,该方法利用可验证证据解决几何歧义。

Claw-SWE-Bench: A Benchmark for Evaluating OpenClaw-style Agent Harnesses on Coding Tasks

👍 49

General-purpose agents such as OpenClaw are increasingly used as autonomous tool users, but their coding ability is difficult to measure under SWE-bench: a generic agent does not by itself satisfy the clean Docker workspace, patch, and prediction contract required for scoring. We introduce Claw-SWE-

中文介绍 Claw-SWE-Bench是一个用于评估通用代理(如OpenClaw)在编码任务中表现的基准。它解决了标准SWE-bench无法直接评测通用代理代码能力的问题。

Toward Generalist Autonomous Research via Hypothesis-Tree Refinement

👍 57

Scientific progress depends on a repeated loop of exploration, experimentation, and abstraction. Researchers test candidate directions, interpret the evidence, and carry the resulting lessons into later attempts. We study how an AI agent can run this loop autonomously over long horizons. We introduc

中文介绍 该研究探讨如何让AI代理通过“假设树精炼”循环长期自主运行科学研究的探索、实验和抽象过程,推动通用自主研究的发展。

Verifiable Environments Are LEGO Bricks: Recursive Composition for Reasoning Generalization

👍 5

Reinforcement Learning (RL) with verifiable environments has emerged as a powerful approach for enhancing the reasoning capabilities of Large Language Models (LLMs). While prior research demonstrates that scaling environment quantity improves RL performance, existing manual or individual constructio

中文介绍 研究提出将可验证环境视为“乐高积木”,通过递归组合来增强大语言模型的推理泛化能力。该方法可扩展环境数量以提升强化学习性能。

Agentic Environment Engineering for Large Language Models: A Survey of Environment Modeling, Synthesis, Evaluation, and Application

👍 55

Environments serve as interactive systems for large language model (LLM) based agents across diverse scenarios and play a crucial role in driving the continual evolution of model capabilities. Despite this importance, existing work lacks a systematic categorization and deep analysis. This paper syst

中文介绍 该综述系统分类了面向大语言模型代理的环境工程,涵盖环境建模、合成、评估与应用四个方面,填补了该领域缺乏系统归纳的空白。

InternVideo3: Agentify Foundation Models with Multimodal Contextual Reasoning

👍 14

Recent progress in foundation models has shifted toward agentic behavior involving multi-step reasoning and tool use. However, open-source efforts largely focus on text-dominant settings, leaving long-horizon multimodal tasks underexplored. This gap is evident in video tasks requiring sustained temp

中文介绍 InternVideo3提出将基础模型代理化,使其具备多步推理和工具使用能力,专注于长时程多模态任务,弥补了开源模型在视频任务上的不足。

World Model Self-Distillation: Training World Models to Solve General Tasks

👍 4

Pretrained video generators are promising visual world models that exhibit emergent task-solving abilities; however, their reliance on detailed textual descriptions limits their direct use for planning and decision-making. Existing approaches either outsource this reasoning to language or vision-lan

中文介绍 世界模型自蒸馏方法训练视频生成器自主解决通用任务,无需外部推理模型。该工作增强了世界模型在规划和决策中的直接可用性。

Breaking Entropy Bounds: Accelerating RL Training via MTP with Rejection Sampling

👍 14

Reinforcement learning (RL) has become a key component in modern large language models, yet the rollout stage remains the key bottleneck in RL training pipelines. Although Multi-Token Prediction (MTP) offers a natural solution to accelerate rollouts through speculative decoding, many studies have ob

中文介绍 该研究提出通过多token预测和拒绝采样加速强化学习训练,特别是rollout阶段,突破熵界限制,提升大语言模型RL训练效率。

Lius: Translation Model Based Instructional Lingustic Using Continual Instruction Tuning In Kupang Malay

👍 1

Large Language Models (LLMs) offer new potential for translation tasks but often experience performance degradation when handling low-resource languages. To address this limitation, we propose an approach for fine-tuning LLMs on a low-resource language, Kupang Malay. Our approach involves designing

中文介绍 研究提出Lius方法,通过持续指令微调来提升大语言模型在低资源语言(库邦马来语)上的翻译性能,缓解了低资源场景下的性能下降问题。

ICA Lens: Interpreting Language Models Without Training Another Dictionary

👍 14

Finding interpretable directions in language-model representations is critical for understanding and controlling model behavior. Sparse autoencoders (SAEs) have become the standard tool for this purpose, but using them as the default first lens often requires training, storing, and evaluating large

On the Limits of LLM-as-Judge for Scientific Novelty Assessment

👍 1

LLMs are increasingly used to generate and judge scientific ideas. This makes novelty evaluation a central problem. Full idea evaluation is difficult because it often requires judging a method, its feasibility, and its empirical promise. We therefore study a cleaner upstream object: the research que

Embodied-R1.5: Evolving Physical Intelligence via Embodied Foundation Models

👍 5

We introduce Embodied-R1.5, a unified Embodied Foundation Model (EFM) that integrates comprehensive embodied reasoning capabilities, spanning embodied cognition, task planning, correction, and pointing, within a single architecture toward general physical intelligence. Leveraging three automated dat

i1: A Simple and Fully Open Recipe for Strong Text-to-Image Models

👍 2

Diffusion models have consistently driven progress in text-to-image generation. However, it is challenging to attribute recent progress to specific modeling and data choices: state-of-the-art open-weight models provide limited ablations, and do not disclose their training data and full training deta

DeNovoSWE: Scaling Long-Horizon Environments for Generating Entire Repositories from Scratch

👍 26

As the capabilities of LLM-based code agents continue to advance, their expected role is expanding beyond localized bug fixing in existing codebases toward architecting and implementing complete software repositories from high-level specifications. However, training agents for such long-horizon soft

Beyond Uniform Token-Level Trust Region in LLM Reinforcement Learning

👍 41

Reinforcement learning with verifiable rewards (RLVR) has become standard for improving LLM reasoning. However, existing PPO-style trust-region mechanisms remain position-agnostic by enforcing uniform thresholds across all tokens independently. This pointwise treatment conflicts with autoregressive

τ-Rec: A Verifiable Benchmark for Agentic Recommender Systems

👍 0

As recommender systems transition toward agentic, multi-turn conversational interfaces, evaluation paradigms have struggled to keep pace. Current benchmarks often rely on "LLM-as-a-judge" evaluations, which introduce subjectivity, high costs and inconsistency. We present τ-Rec, a benchmark for agent

FlowLet: Conditional 3D Brain MRI Synthesis using Wavelet Flow Matching

👍 0

Brain Magnetic Resonance Imaging (MRI) plays a central role in studying neurological development, aging, and diseases. One key application is Brain Age Prediction (BAP), which estimates an individual's biological brain age from MRI data. Effective BAP models require large, diverse, and age-balanced

Beyond Scalar Rewards by Internalizing Reasoning into Score Distributions

👍 46

Reward models are central to text-to-image post-training, but visual preference is subjective and better represented as a distribution over rubric scores than as a deterministic scalar. Existing scalar, score-token, and pairwise reward models over-compress uncertainty and fine-grained score differen

PaperMentor: A Human-Centered Multi-Agent Writing Tutor for AI Research Papers on Overleaf

👍 1

Expert writing feedback from experienced researchers is critical for early-career scholars to improve their manuscripts, yet high-quality feedback often remains scarce because reviewing research papers is labor-intensive. Emerging AI-powered writing assistants largely focus on grammar fixes or simul

POISE: Position-Aware Undetectable Skill Injection on LLM Agents

👍 3

Agent skills provide a lightweight mechanism for extending general-purpose agents, but their open format exposes them to skill-poisoning attacks. A practically dangerous injection must stay invisible: if executing the payload derails the user's legitimate task, the resulting failure signal invites i

When Behavioral Safety Evaluation Fails: A Representation-Level Perspective

👍 1

Large Language Model (LLM) safety has often been evaluated at the behavior level, which provides limited evidence of internal robustness, as these evaluations target outputs rather than representation-level vulnerability under intervention. We formalize this discrepancy as the audit gap: the differe

In-Context Multiple Instance Learning

👍 0

Multiple Instance Learning (MIL) addresses problems where supervision is available at the level of bags of instances and has been successfully applied in fields ranging from computational pathology to satellite imagery. Nevertheless, existing algorithms struggle in the low-label regime that characte

Large Language Models Are Overconfident in Their Own Responses

👍 2

Prior work has shown that instruction-tuned large language models (LLMs) are less well calibrated than their base pre-trained counterparts. However, little is known about the frequently used chat template's effect on the calibration of conversational LLMs. In this work, we investigate the mechanisms

Distilling LLM Feedback for Lean Theorem Proving

👍 0

Post-training for reasoning models typically combines supervised fine-tuning with reinforcement learning from verifiable rewards, most commonly with GRPO. However, this algorithm suffers from sparse rewards, limited exploration, and mode collapse. Building upon recent works on self-distillation, we

Loops: What Every AI Engineer Needs to Know in 2026

@sairahul1 · 113.0K 粉丝 · 852.6K 阅 · 600 赞 · 79 转

Peter Steinberger, creator of OpenClaw, who now works with OpenAI. Yesterday he posted this: "You shouldn't be prompting coding agents anymore. You should be designing loops that prompt your agents."

中文介绍 重复第一条内容:2026 年 AI 工程师应转向设计循环而非手动提示 agent。核心信息一致。

How To Become An AI Engineer in 2026 (Without a CS Degree)

@sairahul1 · 113.0K 粉丝 · 710.8K 阅 · 509 赞 · 97 转

How To Become An AI Engineer in 2026. Without a CS degree. Without a bootcamp. Without knowing what a transformer is today. Here's what nobody tells you: The companies hiring right now don't need

中文介绍 给出 2026 年零 CS 学位转型 AI 工程师的路径。指出当前公司招聘不看传统背景,而是看重实战能力。不依赖 bootcamp 或理解 transformer 原理,聚焦具体技能和项目经验。

Harness Engineering: What Every AI Engineer Needs to Know in 2026

@sairahul1 · 113.0K 粉丝 · 546.4K 阅 · 536 赞 · 94 转

In February 2026, a small OpenAI team shipped 1 million lines of production code. They didn't write a single line by hand. The AI agents wrote it. The humans designed the system that made the agents

中文介绍 介绍「精馏工程」(Harness Engineering)概念:2026 年 2 月 OpenAI 一个小团队用 AI agent 写出 100 万行生产代码,人类不写一行。核心不是写代码,而是设计让 agent 高效产出的系统。

Everything Is Recorded Now

@dhaber · 50.0K 粉丝 · 497.3K 阅 · 500 赞 · 57 转

One of the biggest ways that AI is transforming work (and also one of the most taboo subjects inside companies at the moment) is that most work discussions are being recorded now by default. This

中文介绍 重复第四条内容:AI 改变工作——大部分讨论被录音。

How to get 100k YouTube subscribers in 3 hours (The Complete Guide)

@maubaron · 16.9K 粉丝 · 233.8K 阅 · 506 赞 · 19 转

Our YouTube channel has 125k subscribers and we've never made or uploaded a single video ourselves. This is a completely automated system. It is this very same strategy that made us the first app

中文介绍 分享全自动 YouTube 频道运营系统:频道已获 12.5 万订阅,从未亲自制作或上传视频。用自动化策略实现 3 小时内增长 10 万订阅,并宣称是首个搭载该系统的应用。

The Untrainable

@saranormous · 143.5K 粉丝 · 194.8K 阅 · 614 赞 · 40 转

The mid-2026 investor's version of AI psychosis is a despair that nothing is investable, that we should put all our money into Anthropic and Nvidia and go home. I have never felt it. I have been sure

中文介绍 重复第六条内容:2026 年 AI 投资悲观观点谬误,应有非共识投资机会。

Building cloud agent infrastructure: what's different, and what we learned

@intuitiveml · 6.4K 粉丝 · 171.3K 阅 · 524 赞 · 70 转

Most agent frameworks today assume a desktop. One user, one machine, one process. The agent runs while the laptop is open, writes to a local filesystem, holds API keys in environment variables, and

中文介绍 总结构建云端 agent 基础设施的教训:现有框架大多假设桌面端(单用户、单机器、本地文件系统),但云端多用户、分布式环境完全不同。分享了差异和落地经验。

A guide to /goal 🥅

@dkundel · 19.3K 粉丝 · 116.9K 阅 · 523 赞 · 40 转

We launched the goal mode (or /goal) as a way to help you have Codex drive towards a concrete outcome. When you set a goal Codex will continue to work until the goal is achieved, whether that takes

中文介绍 介绍 Codex 的 /goal 模式(目标模式):设定后 agent 会持续工作直到目标达成,无论耗时多久。帮助用户驱动 agent 达成具体成果,而非手动逐步控制。

My Week with Fable

@MatthewBerman · 121.3K 粉丝 · 108.0K 阅 · 661 赞 · 26 转

tl;dr I've been testing Fable (Mythos) for the past week and it feels unlike any other model I've used. It feels, and is priced, like a next-generation model. It also has some real quirks. The Good

中文介绍 重复第九条内容:Fable 一周体验,感觉像下一代模型但有 quirks。

Kimi to Predict All 104 World Cup Matches: Germany May Be Underestimated

@Kimi_Moonshot · 172.7K 粉丝 · 106.6K 阅 · 500 赞 · 61 转

Our predictions will probably be wrong. But the World Cup offers a rare, public, verifiable, and constantly evolving real-world setting. Through this initiative, we hope to place analysis,

中文介绍 Kimi 宣布预测全部 104 场世界杯比赛,承认预测大概率不准,但世界杯是罕见公开、可验证的真实场景。借此推动分析、推理技术在动态问题上的应用,认为德国可能被低估。

Loop engineering: the 14-step roadmap from prompter to loop designer.

@0xCodez · 5.3K 粉丝 · 97.8K 阅 · 510 赞 · 80 转

Most developers still prompt their coding agents by hand. They type, they wait, they read the diff, they type again. 9out of 10 builders have never written a single loop that prompts the agent for

中文介绍 提出 14 步从 prompt 工程师到循环设计师的路线图。指出 90% 的开发者仍手工提示 agent,从未写过让 agent 自我驱动的循环。呼吁转向循环工程范式。

Designing loops with Fable 5

@RLanceMartin · 30.4K 粉丝 · 84.7K 阅 · 660 赞 · 50 转

Mythos-class models like Claude Fable 5 have changed the way many of us work at Anthropic. I want to share two tips for getting the most out of this class of models. Self-correction loops There’s been

中文介绍 Anthropic 人员分享使用 Fable 5 的两条技巧:自我纠正循环(agent 自动发现并修复错误)和反馈设计。强调这类强模型需要围绕循环而非单次 prompt 来设计。

Principled Thinking and AI Need to Go Together

@RayDalio · 2.2M 粉丝 · 72.6K 阅 · 515 赞 · 93 转

What is the best approach to being effectively intelligent now that human intelligence and artificial intelligence are merging? Because I have been building computerized investment decision-making

中文介绍 Ray Dalio 探讨原则性思维与 AI 如何结合:他一直在构建基于原则的计算机化投资决策系统,认为人类智能与 AI 融合的最佳方法是依靠清晰的原则来引导 AI,而非完全依赖黑箱模型。

some notes on getting into frontier ai labs

@itsreallyvivek · 4.3K 粉丝 · 65.8K 阅 · 521 赞 · 28 转

A few days ago I wrote that getting into a frontier AI lab mostly comes down to two things: proven research and trench engineering. The more I think about it, the less these feel like separate skills.

中文介绍 分享进入前沿 AI 实验室的关键:已被验证的研究能力和扎实的工程能力,二者缺一不可。传统 CS 背景或刷题意义不大,实际解决问题的能力才是决定性因素。

How to Build an AI GTM Brain using Claude Code

@nifinet · 10.2K 粉丝 · 60.9K 阅 · 522 赞 · 54 转

When a team says they want AI for growth, they usually mean a faster send. An agent that fires the same template at a longer list, day and night. That is the cheap half of the job, and it stopped

中文介绍 批判「用 AI 做增长」的常见做法——只是批量发送模板消息。提出用 Claude Code 构建真正智能的 GTM(走向市场)大脑,从自动化发送升级到智能化策略执行。

I Gave Claude David Ogilvy's Writing Rules And Built A Legendary AI Writing Coach

@dickiebush · 441.8K 粉丝 · 57.7K 阅 · 519 赞 · 45 转

Legendary marketer David Ogilvy generated over $864 million for his clients. He was a British advertiser known as "The Father of Advertising." And in 1982, Ogilvy sent this 1-page memo to his staff:

中文介绍 将传奇广告人大卫·奥格威 1982 年的 1 页写作备忘录注入 Claude,打造一位 AI 写作教练。奥格威曾为客户创造 8.64 亿美元营收,其写作规则至今适用。

Everything Is Recorded Now

@dhaber · 50.0K 粉丝 · 497.3K 阅 · 7d 曝光 497.3K

Everything Is Recorded Now

中文介绍 重复第四条内容:AI 改变工作——大部分讨论被录音。

The Untrainable

@saranormous · 143.5K 粉丝 · 194.8K 阅 · 7d 曝光 194.8K

The Untrainable

中文介绍 重复第六条内容:2026 年 AI 投资悲观观点谬误,应有非共识投资机会。

My Week with Fable

@MatthewBerman · 121.3K 粉丝 · 108.0K 阅 · 7d 曝光 108.0K

My Week with Fable

中文介绍 重复第九条内容:Fable 一周体验,感觉像下一代模型但有 quirks。

Loops: What Every AI Engineer Needs to Know in 2026

@sairahul1 · 113.0K 粉丝 · 852.6K 阅 · 7d 曝光 852.6K

Loops: What Every AI Engineer Needs to Know in 2026

中文介绍 重复第一条内容:2026 年 AI 工程师应转向设计循环而非手动提示 agent。核心信息一致。

Loop Engineering.

@addyosmani · 395.5K 粉丝 · 42.7K 阅 · 7d 曝光 42.7K

Loop Engineering.

Claude Fable 5 plays Factorio

中文介绍 Claude Fable 5 展示了在自动化建造模拟游戏《异星工厂》中的操作能力,能够进行资源管理与生产规划。

Claude Fable 5 plays Factorio

中文介绍 Claude Fable 5 展示了在自动化建造模拟游戏《异星工厂》中的操作能力,能够进行资源管理与生产规划。

Google DeepMind is worried about what happens when millions of agents start to interact

Google DeepMind is funding research into the potential dangers of situations where millions of different AI agents interact with each other online. According to Rohin Shah, who directs the company’s AGI safety and alignment research, the mass-market arrival of agents that can carry out tasks without

中文介绍 Google DeepMind 担心数百万 AI 代理在线互动可能带来的风险,已资助相关研究。该项目由公司 AGI 安全与对齐研究主管 Rohin Shah 领导,重点探索代理大规模交互的潜在危险。

Supporting Europe’s work in ensuring a trustworthy AI ecosystem

OpenAI supports the EU Code of Practice on AI content transparency, advancing provenance standards and tools to help people understand AI-generated content.

中文介绍 OpenAI 支持欧盟关于 AI 内容透明度的《实践准则》,推动溯源标准与工具开发,帮助用户识别 AI 生成内容。

How an astrophysicist uses Codex to help simulate black holes

Discover how astrophysicist Chi-kwan Chan uses Codex to build black hole simulations, helping scientists study extreme physics and test Einstein’s theory of general relativity.

中文介绍 天体物理学家 Chi-kwan Chan 使用 OpenAI Codex 构建黑洞模拟,助力科学家研究极端物理并检验爱因斯坦的广义相对论。

Access OpenAI models and Codex through your Oracle cloud commitment

Access OpenAI models and Codex through Oracle Cloud, using existing commitments to build and deploy AI with enterprise security and governance.

中文介绍 用户可通过 Oracle Cloud 现有承诺访问 OpenAI 模型及 Codex,实现企业级安全与治理下的 AI 构建与部署。

PRC-linked influence operations are targeting AI debates in the US

A new report from OpenAI details PRC-linked influence operations using AI to target U.S. tech debates, data center narratives, tariffs, and false claims about ChatGPT.

中文介绍 OpenAI 报告称,与中国有关联的影响行动正利用 AI 针对美国科技辩论、数据中心叙事、关税及关于 ChatGPT 的不实言论。

Investing in multi-agent AI safety research

Google DeepMind and partners announce a $10M funding call for multi-agent safety research.

中文介绍 Google DeepMind 与合作伙伴宣布启动一项 1000 万美元的资助计划,用于多代理 AI 安全研究。

From data to decisions: how LSEG is scaling trusted AI

See how LSEG uses OpenAI to scale trusted AI across its global business, accelerating insights, shrinking release cycles, and empowering 4,000 employees.

中文介绍 伦敦证券交易所集团(LSEG)利用 OpenAI 扩展可信 AI 应用,加速洞察、缩短发布周期,并赋能 4000 名员工。

今天我生日 佬友们可以祝我生日快乐吗

如题,第一次在L站过生日,来到L站的这几个月里跟着佬友们学到了很多有用的东西,愿佬友们天天开心,社区越来越好 97 个帖子 - 95 位参与者 阅读完整话题

【干草铺公益站紧急通知】

各位干草铺的老友,请立刻停止YOLO模式,现在我不知道是我自己程序的问题还是奥特曼的问题,已经有佬友给我反馈了GPT回复还像被夺舍的情况,虽然我说过号商再薅我就放毒,但这次真不是我,请大家先放弃YOLO,稍后我会停一会儿公益站。YOLO,也就是bypass那个模式,我现在在外面,来不及写详细,大家能理解就好了,不要完全托管给GPT,能夺舍第一次就能有第二次,至少今天大家先放弃这种模式哈 30 个帖子 - 28 位参与者 阅读完整话题

记一次对 Claude Fable 5、Opus 4.8、Minimax M3、Xiaomi Mimo V2.5 系列、Hy3、Qwen3.7 系列的真实项目需求的横向评测(榜首更迭!)

由于测试的模型越积越多了,表格会删除一些同厂商的旧模型,你可以在之前的评测帖子里找到它们的成绩。 项目 这是一个 Unity C# 项目,我进行测试的是一份皮肤系统需求案,我已经做了好预制体,而模型需要编写代码。 本轮与上两轮评测的项目和环境都完全一致: 第一轮 … 上一轮 模型来源 Claude 系列模型: 官方 API Mimo V2.5 系列模型: 官方 Token Plan Hy3 Preview: 官方 API Qwen3.7 系列模型: 官方 API Minimax M3: 官方 API Nex-N2-Pro: OpenRouter Free API Nemotron 3 Ultr

原来是我搞砸了一切

从这次是机房搞砸了一切继续讨论: 是的,我们知道: 这次是机房搞砸了一切 运营反馈 [image] 服务器是有点压力,但不至于挂掉,看看怎么扩容一下子。 世界是个草台班子,接着奏乐接着舞~ 而42向hub提供服务器,所以都是42搞砸了一切 对不起,给大家跪了! 18 个帖子 - 17 位参与者 阅读完整话题

这次是机房搞砸了一切

真不是甩锅,机房搞砸了一切,已通过工单解决: 服务器是有点压力,但不至于挂掉,看看怎么扩容一下子。 世界是个草台班子,接着奏乐接着舞~ 434 个帖子 - 426 位参与者 阅读完整话题

logs数据33g大吗?不大!

查到了,真正大的不是本地 logs 文件夹,而是数据库里的 logs 表。 当前情况 数据库 oneapi 总大小约: - 35,476.81 MB ≈ 34.65 GB 其中 logs 表: - 34,462.88 MB ≈ 33.65 GB - 行数约:2800 万行 - 数据大小:12.24 GB - 索引大小:21.42 GB 100g的硬盘太小了5555,就放那么一点点东西就塞满了 假设一下,一个服务器硬盘大小2t,如果1t都用不到,是不是在浪费硬盘!!!回答我!!! 所以硬盘爆满的本质原因:硬盘容量太小!!! @ouyangqiqi 48 个帖子 - 47 位参与者 阅读完整话题

很多人觉得 GPT Pro 贵,试试换这个思考方式?

之前也觉得 GPT Pro 200美金一个月很贵,有点肉疼! 但最近发生了一个事情,自己想通了。 之前一直健身请了个私教,包月的那种,一个月2600多,这个教练每次带我同样的动作,一年都没什么长进,最近把这个停了。 因为 GPT 一个月1300左右,可以教会我很多东西,教练费用一半的价格,而教练带我的课程我都可以自己去做完。这样一想,gpt 还挺划算的。 我觉得之前主要是习惯把AI订阅归类到「软件服务」这个心理账户——Netflix、各种SaaS几十刀,所以200刀就显得离谱。 但如果把它对标成**智力、人力成本,比如家长找家教、健身找私教、职场找导师,一节课少说 200-400 RMB。 而

πFS

827 points · 189 comments

Show HN: Extend UI – open-source UI kit for modern document apps

We're open-sourcing 14 components & examples today for PDF, DOCX, and XLSX viewers, plus bounding box citations, file upload, e-signature, and more. It's MIT licensed and fully customizable.Demo video here: https://share.extend.ai/kRmSGKRFWhen we started, we tried every

今日主题

今日 AI 圈的核心看点是 Anthropic 正式发布 Claude Fable 5 模型,官方展示其在游戏通关、天体模拟、CAD 设计等多模态与自主构建领域的惊人能力,但伴随的争议性使用条款也引发了行业热议。另一方面,Google DeepMind 推出提速 4 倍的 DiffusionGemma 并斥资 1000 万美元启动多智能体安全研究,OpenAI 则聚焦于欧盟 AI 透明度政策与企业级云部署。开源社区同样活跃,NVIDIA 开源智能体安全扫描工具,Cohere 发布面向开发者的代码模型 North Mini Code。在方法论层面,AI 工程师正从“提示词工程师”向“循环设计师”范式转变,智能体体系统与持续优化框架成为焦点。

01

模型发布/更新

Model Releases 44 篇

Anthropic 发布 Claude Fable 5 模型

官方Latent Space

Anthropic 正式推出备受期待的 Claude Fable 5 模型,属于 Mythos 级别,在安全性与能力间取得平衡。同期官方演示中,该模型仅凭视觉能力通关《宝可梦:火红》、流畅运行《异星工厂》、模拟太阳系并成功预测日食、设计可 3D 打印的 CAD 模型,展现出强大的多模态与自主构建能力。但配套的争议性使用政策也引发社区广泛讨论。

模型发布多模态政策争议

DeepMind 发布 DiffusionGemma 提速 4 倍

官方DeepMind Blog

Google DeepMind 推出 DiffusionGemma,一种基于扩散模型的新方法,可将文本生成速度提升 4 倍。该模型在保证生成质量的同时大幅降低了延迟,为实时应用与大规模部署提供了更高效的解决方案,直接挑战现有自回归模型的生成范式。

文本生成速度优化扩散模型

Cohere 发布面向开发者的代码模型 North Mini Code

官方Hugging Face Blog

Cohere 正式推出其首款专为开发者打造的模型 North Mini Code,针对代码补全、审查和生成等任务进行了深度优化。该模型旨在为开发者在 IDE 和 CI/CD 管道中提供更精准、高效的代码智能支持,进一步丰富了专用代码模型生态。

代码模型产品发布开发者工具

InternVideo3 提出代理化多模态推理框架

官方HuggingFace Trending Papers

上海 AI 实验室提出的 InternVideo3 将基础模型代理化,赋予其多步推理与工具使用能力,专长于长时程多模态视频理解任务。该工作在开源模型普遍欠缺的视频推理与复杂场景解析上取得了显著突破,为视频智能体应用奠定基础。

多模态视频智能体
02

产品发布/更新

Product 44 篇

OpenAI 模型现可通过 Oracle Cloud 访问

官方OpenAI News

OpenAI 宣布用户可通过 Oracle Cloud 的现有承诺额度访问 GPT 系列模型及 Codex,实现企业级安全、治理与合规下的 AI 构建与部署。此举降低了大型企业采用前沿 AI 的门槛,尤其适合对数据驻留和隐私有严格要求的金融机构与政府部门。

云服务企业应用合作

Kimi 宣布预测全部 104 场世界杯比赛

X·KOLX 推文 (AttentionVC)

Kimi 宣布将利用其推理与分析模型预测 2026 年世界杯全部 104 场比赛。团队坦言预测大概率不准,但强调世界杯是罕见的公开、可验证的真实场景测试场,旨在推动 AI 在动态复杂问题上的推理能力发展,并特别指出德国队实力可能被低估。

产品发布应用推理

全自动 YouTube 频道实现 3 小时增粉 10 万

X·KOLX 推文 (AttentionVC)

一位 KOL 分享了其完全自动化的 YouTube 频道运营系统,已获得 12.5 万订阅者,从未亲自制作或上传视频。该系统宣称可在 3 小时内实现 10 万订阅增长,目前正作为首个搭载该系统的应用进行推广,展示了 AI 在内容创作与增长自动化领域的激进潜力。

工具教程自动化

开源智能体分析工具 AgentsView 发布

开源项目GitHub Trending

开源项目 AgentsView 发布,这是一款本地优先的编码智能体会话分析与统计工具,支持 Claude Code、Codex 等 20 余种 Agent。其在性能上号称比同类工具 ccusage 快 100 倍,帮助开发者直观洞察 Agent 行为模式与效率瓶颈,推动智能体开发的可观测性。

AI 智能体分析工具本地优先
03

行业动态

Industry 44 篇

DeepMind 投资 1000 万美元用于多智能体 AI 安全研究

官方DeepMind Blog

Google DeepMind 宣布与合作伙伴共同启动一项总额达 1000 万美元的资助计划,重点研究多 AI 代理系统在大规模交互时可能产生的安全风险。该计划由 AGI 安全与对齐研究主管 Rohin Shah 领导,旨在提前防范数百万智能体在线协作可能引发的失控、竞争与欺骗性行为。

多代理AI 安全资金

OpenAI 支持欧盟 AI 内容透明度实践准则

官方OpenAI News

OpenAI 宣布支持欧盟关于 AI 内容透明度的《实践准则》,承诺推动溯源标准与工具的开发,帮助用户更清晰地识别 AI 生成内容。此举顺应全球监管趋势,旨在建立用户对 AI 系统的信任,同时为行业划定可操作的透明度基线。

政策透明度合规

OpenAI 报告中国关联信息行动干扰 AI 讨论

官方OpenAI News

OpenAI 发布报告指出,与中国有关联的影响力行动正利用 AI 工具针对美国科技辩论、数据中心建设、关税政策以及关于 ChatGPT 的不实信息进行操纵。该报告引发了对 AI 生成虚假信息与跨境信息战的进一步关注,凸显了平台安全与内容鉴伪的紧迫性。

AI 安全虚假信息地缘政治

进入前沿 AI 实验室的关键被揭示

X·KOLX 推文 (AttentionVC)

一位 KOL 分享了对进入顶级 AI 实验室的观察:已验证的研究能力与扎实的工程能力二者缺一不可。传统 CS 背景或刷题技巧意义不大,实际解决复杂问题的能力才是决定性因素。该观点为有志于投身 AI 前沿的从业者提供了清晰的职业发展指引。

职业发展观点
04

技巧与观点

Tips & Takes 44 篇

精馏工程:2026 年 AI 工程师新范式

X·KOLX 推文 (AttentionVC)

文章提出「精馏工程」概念,指出 OpenAI 一个小团队曾用 AI agent 写出 100 万行生产代码而人类未写一行。核心在于设计让 AI 高效产出的系统而非手动编码,强调 AI 工程师应转型为“循环设计师”,通过构建闭环工作流让 agent 自我驱动,彻底改变软件开发效率。

工作流观点方法论

14 步路线图:从提示工程师到循环设计师

X·KOLX 推文 (AttentionVC)

一位 KOL 提出 14 步路线图,指导开发者从手工提示 agent 过渡到设计自我驱动循环。指出当前 90% 的开发者仍停留在手动提示阶段,从未编写过让 agent 连续迭代的循环。该路线图系统化地定义了新范式下的技能栈,呼吁社区立刻开始采用循环工程。

教程工作流提示词

PyTorch 性能分析:融合 MLP 优化

官方Hugging Face Blog

Hugging Face 官方博客发布了 PyTorch 性能分析系列第二篇,详解如何从标准 nn.Linear 模块出发,通过融合多层感知器(MLP)来优化 GPU 内核启动开销。文章提供了可复现的代码与基准数据,帮助深度学习工程师在实际训练和推理中显著提升性能。

PyTorch性能优化教程

注入奥格威写作规则的 AI 写作教练

X·KOLX 推文 (AttentionVC)

一位 KOL 将传奇广告人大卫·奥格威 1982 年的一页写作备忘录注入 Claude,打造出一位 AI 写作教练。奥格威的规则曾为客户创造 8.64 亿美元营收,该实践展示了将经典方法论与强大语言模型结合的潜力,为内容创作者提供了一条高效提升文案质量的路径。

提示词应用写作