每日 AI 简报

2026-06-11（内容获取于 06/11 20:53）

Anthropic 发布 Claude Fable 5 及 Mythos 5 模型

Smol AI News

Anthropic 发布两款新模型：Fable 5（通用版）和 Mythos 5（受限访问），敏感查询可回退至 Opus 4.8。标志着模型能力的又一次重大迭代。（多家报道）

Anthropic大模型Claude

推荐理由：这是今天最重要的模型发布，直接关系到 AI 能力天花板的变化，从业者必须关注。

苹果发布轻量级 Mac Linux 容器工具 Container

GitHub Trending

苹果官方的开源项目 Container，可用轻量级虚拟机在 Mac（特别是 Apple Silicon）上创建和运行 Linux 容器，工具采用 Swift 编写，性能优化。

开源容器macOS

推荐理由：苹果官方的容器工具，对 Mac 开发者极具实用价值，可直接上手使用。

GitHub 项目 agent-skills：生产级 AI 编码代理技能

GitHub Trending

开源项目 agent-skills，提供面向 AI 编码代理的生产级工程技能集合，帮助开发者构建更优质的智能体代码能力。

AI编程代理开源

推荐理由：面向 AI 代理开发的实战技能库，能直接提升编码代理的生产力，非常值得开发者借鉴。

DeepMind 警示：千万 AI 代理交互存潜在风险

MIT Tech Review AI

Google DeepMind 资助研究，关注数百万 AI 代理相互交互时的潜在危险。AI 安全主管 Rohin Shah 指出，大规模自主交互可能引发不可预测的连锁反应。

AI安全多智能体DeepMind

推荐理由：行业顶级研究机构的前瞻性预警，对理解 AI 代理安全领域至关重要。

Claude Code 更新 v2.1.114：修复代理工具权限崩溃

Claude Code Changelog

Claude Code 发布 v2.1.114 版本，主要修复了代理团队队友请求工具权限时导致的崩溃问题。

Claude更新工具

推荐理由：Claude Code 用户的实用更新，修复了关键协作场景中的崩溃问题。

PyTorch 性能分析进阶：从 nn.Linear 到融合 MLP

Hugging Face Blog

技术教程深入探讨 PyTorch 性能分析方法，展示如何将标准 nn.Linear 层融合为更高效的 MLP 内核。

PyTorch性能优化教程

推荐理由：PyTorch 性能优化实战教程，对模型训练加速有直接帮助，适合深度学习工程师。

DeepMind 发布 DiffusionGemma：四倍速文本生成

DeepMind Blog

DeepMind 推出 DiffusionGemma，新型扩散语言模型，相比传统自回归模型可实现高达 4 倍的文本生成速度提升。

大模型推理加速Gemma

推荐理由：文本生成速度的重大突破，对大模型推理效率有显著影响，值得关注。

OpenAI 支持欧盟可信 AI 生态体系建设

OpenAI News

OpenAI 宣布支持欧盟 AI 内容透明度实践准则，推动来源标准和工具的建立，以帮助公众识别 AI 生成内容。

AI监管透明度OpenAI

推荐理由：关乎 AI 内容监管和透明度的政策动向，对理解全球 AI 治理趋势有参考价值。

论文解读：基于艺术的强化训练微调多模态大模型

HuggingFace Trending Papers

新论文提出 ART（Art-based Reinforcement Training）方法，用于多模态大模型的参数高效微调，结合强化学习提升模型表现。

论文微调多模态

推荐理由：多模态微调方向的新方法，对研究者有一定启发，但对一线工程师行动价值有限。

比亚迪「闪充」超充网络将落地加拿大

Hacker News

比亚迪将其声称 5 分钟可充满电的「Flash」超充技术引入加拿大，计划建设超充网络。

电动汽车快充比亚迪

推荐理由：电动汽车快充技术的重要商业进展，影响新能源行业格局，值得关注。

apple/container

Swift · ★ 30,916 · 🍴 864 · 📈 2,419 stars today

A tool for creating and running Linux containers using lightweight virtual machines on a Mac. It is written in Swift, and optimized for Apple silicon.

容器macOS开发者工具

中文介绍苹果开源的 Swift 工具，用于在 Mac 上通过轻量级虚拟机创建和运行 Linux 容器。专为 Apple Silicon 优化，性能优异。适合开发者在 macOS 环境安全、高效地运行 Linux 工作负载。

addyosmani/agent-skills

Shell · ★ 53,594 · 🍴 5,856 · 📈 3,275 stars today

Production-grade engineering skills for AI coding agents.

AI 智能体代码工程生产级

中文介绍为 AI 编码智能体提供的工程技能集合，强调生产级质量。帮助 AI Agent 更好地理解代码规范、调试和优化，适用于提升代码生成与审查的可靠性。

maziyarpanahi/openmed

Python · ★ 2,541 · 🍴 256 · 📈 427 stars today

open-source healthcare ai

医疗 AI开源辅助诊断

中文介绍开源医疗 AI 项目，致力于构建面向医疗领域的智能解决方案。涵盖诊断辅助、病历分析等场景，为医疗机构和研究者提供可定制的 AI 工具。

phuryn/pm-skills

★ 15,902 · 🍴 1,675 · 📈 1,944 stars today

PM Skills Marketplace: 100+ agentic skills, commands, and plugins — from discovery to strategy, execution, launch, and growth.

产品经理AI 智能体工作流

中文介绍面向产品经理的技能市场，包含 100 多种智能体技能、命令和插件，覆盖从发现、战略到执行、增长的全流程。适合 PM 通过 AI 代理加速产品工作流。

NVIDIA/SkillSpector

Python · ★ 2,298 · 🍴 195 · 📈 308 stars today

Security scanner for AI agent skills. Detect vulnerabilities, malicious patterns, and security risks.

安全扫描AI 智能体漏洞检测

中文介绍英伟达开源的 AI 智能体技能安全扫描器，用于检测漏洞、恶意模式和风险。保护 AI Agent 生态安全，适用于企业部署前审计第三方技能。

soxoj/maigret

Python · ★ 32,370 · 🍴 2,376 · 📈 665 stars today

🕵️‍♂️ Collect a dossier on a person by username from 3000+ sites

侦查工具用户名搜索情报收集

中文介绍通过用户名在 3000 多个网站上搜集人物档案的开源工具。适用于社交工程调查、背景核查和数字足迹分析，提供丰富的站点支持与结果导出。

x1xhlol/system-prompts-and-models-of-ai-tools

★ 139,737 · 🍴 34,608 · 📈 369 stars today

FULL Augment Code, Claude Code, Cluely, CodeBuddy, Comet, Cursor, Devin AI, Junie, Kiro, Leap.new, Lovable, Manus, NotionAI, Orchids.app, Perplexity, Poke, Qoder, Replit, Same.dev, Trae, Traycer AI, VSCode Agent, Warp.dev, Windsurf, Xcode, Z.ai Code, Dia & v0. (And other Open Sourced) System Prompts

提示词AI 工具参考资源

中文介绍收集了大量 AI 工具（如 Cursor、Claude Code、Devin 等）的系统提示词和模型信息。便于开发者了解、对比和复现不同 AI 助手的底层行为逻辑。

refactoringhq/tolaria

TypeScript · ★ 15,230 · 🍴 1,051 · 📈 604 stars today

Desktop app to manage markdown knowledge bases

知识管理Markdown桌面应用

中文介绍桌面端 Markdown 知识库管理应用，支持本地存储和快速检索。适合个人笔记管理、文档书写，提供简洁的文件夹式组织和标记功能。

obra/superpowers

Shell · ★ 224,400 · 🍴 19,936 · 📈 1,323 stars today

An agentic skills framework & software development methodology that works.

AI 智能体开发框架方法论

中文介绍一套智能体技能框架与软件开发方法论，旨在系统化地提升 AI Agent 的协作与执行效率。适用于构建可复用的 Agent 能力，并指导实际项目落地。

restic/restic

Go · ★ 34,019 · 🍴 1,780 · 📈 33 stars today

Fast, secure, efficient backup program

备份工具加密增量备份

中文介绍快速、安全、高效的备份程序，支持加密、增量备份和多种存储后端（本地、云存储等）。适合个人及企业定期备份数据，恢复速度快。

msitarzewski/agency-agents

Shell · ★ 111,033 · 🍴 18,201 · 📈 1,434 stars today

A complete AI agency at your fingertips - From frontend wizards to Reddit community ninjas, from whimsy injectors to reality checkers. Each agent is a specialized expert with personality, processes, and proven deliverables.

AI 智能体自动化营销

中文介绍一站式 AI 智能体代理集合，涵盖前端开发、社交媒体运营、创意生成等多种角色。每个智能体拥有专业化技能和个性，可用于自动化营销、内容创作等场景。

masterking32/MasterDnsVPN

Go · ★ 5,487 · 🍴 509 · 📈 510 stars today

Advanced DNS tunneling VPN for censorship bypass, optimized beyond DNSTT and SlipStream with low-overhead ARQ, resolver load balancing, high packet-loss stability and speed.

VPN网络翻墙DNS 隧道

中文介绍高级 DNS 隧道 VPN，专为突破网络审查设计。采用低开销 ARQ、解析器负载均衡等技术，在高丢包环境下依然保持稳定高速，优于传统 DNSTT 方案。

chatwoot/chatwoot

Ruby · ★ 30,144 · 🍴 7,491 · 📈 31 stars today

Open-source live-chat, email support, omni-channel desk. An alternative to Intercom, Zendesk, Salesforce Service Cloud etc. 🔥💬

客服系统开源多渠道

中文介绍开源全渠道客服平台，支持实时聊天、邮件和社交媒体集成，是 Intercom、Zendesk 的替代品。适合中小型团队搭建高效、可定制的客户支持系统。

kenn-io/agentsview

Go · ★ 1,444 · 🍴 166 · 📈 98 stars today

Local-first session intelligence and analytics for coding agents, supporting Claude Code, Codex, and more than 20 other agents. Also: 100x faster replacement for ccusage!

AI 智能体分析工具本地优先

中文介绍本地优先的编码智能体会话分析与统计工具，支持 Claude Code、Codex 等 20 多种 Agent。可替代 ccusage 并快 100 倍，帮助开发者洞察 Agent 行为与效率。

alchaincyf/zhangxuefeng-skill

★ 7,865 · 🍴 2,410 · 📈 94 stars today

张雪峰.skill — 张雪峰的认知操作系统。高考志愿/考研/职业规划的实战思维框架。由女娲.skill生成。

教育规划智能体高考

中文介绍基于张雪峰认知方法论构建的智能体技能，聚焦高考志愿、考研和职业规划等实战框架。由女娲.skill 自动生成，适合学生和家长进行教育决策参考。

TapXWorld/ChinaTextbook

Roff · ★ 73,857 · 🍴 16,528 · 📈 345 stars today

所有小初高、大学PDF教材。

教材教育PDF

中文介绍中国小学、初中、高中及大学全阶段 PDF 电子教材集合。为学生、教师和自学者提供便捷的教材获取渠道，覆盖主要学科。

hexo-ai/sia

Python · ★ 1,043 · 🍴 146 · 📈 177 stars today

SIA is a Self Improving AI framework to autonomously improve the performance of any AI system (Model / Agent) on a benchmark task.

AI 优化自改进基准测试

中文介绍自改进 AI 框架，能够自动提升任意 AI 系统（模型或智能体）在基准任务上的表现。适用于持续优化模型性能，减少人工调参成本。

mattermost/mattermost

TypeScript · ★ 37,126 · 🍴 8,689 · 📈 26 stars today

Mattermost is an open source platform for secure collaboration across the entire software development lifecycle..

协作平台开源DevOps

中文介绍开源的安全协作平台，专为软件开发生命周期设计。提供消息、文件共享、集成 CI/CD 等能力，是 Slack 的私有化替代品，适合企业级团队。

bannedbook/fanqiang

Kotlin · ★ 46,556 · 🍴 7,990 · 📈 342 stars today

翻墙-科学上网

翻墙科学上网工具合集

中文介绍翻墙/科学上网资料合集，提供各类突破网络封锁的工具、教程和配置。适合寻求网络自由访问的用户参考和使用。

Fine-tuning Multi-modal LLMs with ART: Art-based Reinforcement Training

👍 1

06/10 17:30

There are two main Parameter-Efficient Fine-Tuning (PEFT) techniques for Large Language Models (LLMs). While Low-Rank Adaptation (LoRA) introduces additional weights between the LLM layers, Soft Prompting introduces additional fine-tuning-specific raw tokens to an LLM input. However, both require mo

大模型方法

中文介绍该研究提出ART（基于艺术的强化训练），一种多模态大语言模型参数高效微调方法。ART结合了LoRA和软提示两种主流技术，旨在提升多模态模型性能。

Grammar-Constrained Decoding Can Jailbreak LLMs into Generating Malicious Code

👍 16

06/10 08:00

Large Language Models (LLMs) are increasingly used for code generation, raising concerns that they may be misused to produce malicious code. Meanwhile, Grammar-Constrained Decoding (GCD) has been widely adopted to improve the reliability of LLM-generated code by enforcing syntactic validity. In this

安全代码生成

中文介绍研究发现，语法约束解码（GCD）可能被利用来破解大型语言模型，使其生成恶意代码。该工作揭示了GCD在保证代码生成可靠性时的安全隐患。

Time-Series Foundation Model Embeddings for Remaining Useful Life Estimation

👍 1

06/10 08:00

Remaining Useful Life (RUL) prediction is essential for industrial predictive maintenance, yet many learning-based approaches rely on extensive feature engineering or large labeled datasets to train task-specific sequence models. In this work, we introduce a lightweight learning approach, in which w

时间序列工业

中文介绍该研究提出一种轻量学习方法，利用时间序列基础模型的嵌入进行剩余使用寿命（RUL）预测，减少了对特征工程和大规模标注数据的依赖，适用于工业预测性维护。

Reason, Then Re-reason: Cross-view Revisiting Improves Spatial Reasoning

👍 25

06/10 08:00

Spatial reasoning from egocentric videos is inherently challenging because the observable evidence is constrained by the camera trajectory. Existing methods rely on single-turn inference, forcing models to resolve geometric ambiguity through semantic priors rather than verifiable evidence. We argue

空间推理方法

中文介绍该研究提出“先推理，再重推理”方法，通过交叉视角复查来改进从第一人称视频中的空间推理能力。相比单轮推理，该方法利用可验证证据解决几何歧义。

Claw-SWE-Bench: A Benchmark for Evaluating OpenClaw-style Agent Harnesses on Coding Tasks

👍 49

06/10 08:00

General-purpose agents such as OpenClaw are increasingly used as autonomous tool users, but their coding ability is difficult to measure under SWE-bench: a generic agent does not by itself satisfy the clean Docker workspace, patch, and prediction contract required for scoring. We introduce Claw-SWE-

基准智能体

中文介绍 Claw-SWE-Bench是一个用于评估通用代理（如OpenClaw）在编码任务中表现的基准。它解决了标准SWE-bench无法直接评测通用代理代码能力的问题。

Toward Generalist Autonomous Research via Hypothesis-Tree Refinement

👍 57

06/10 08:00

Scientific progress depends on a repeated loop of exploration, experimentation, and abstraction. Researchers test candidate directions, interpret the evidence, and carry the resulting lessons into later attempts. We study how an AI agent can run this loop autonomously over long horizons. We introduc

智能体科研

中文介绍该研究探讨如何让AI代理通过“假设树精炼”循环长期自主运行科学研究的探索、实验和抽象过程，推动通用自主研究的发展。

Verifiable Environments Are LEGO Bricks: Recursive Composition for Reasoning Generalization

👍 5

06/10 08:00

Reinforcement Learning (RL) with verifiable environments has emerged as a powerful approach for enhancing the reasoning capabilities of Large Language Models (LLMs). While prior research demonstrates that scaling environment quantity improves RL performance, existing manual or individual constructio

强化学习推理

中文介绍研究提出将可验证环境视为“乐高积木”，通过递归组合来增强大语言模型的推理泛化能力。该方法可扩展环境数量以提升强化学习性能。

Agentic Environment Engineering for Large Language Models: A Survey of Environment Modeling, Synthesis, Evaluation, and Application

👍 55

06/10 08:00

Environments serve as interactive systems for large language model (LLM) based agents across diverse scenarios and play a crucial role in driving the continual evolution of model capabilities. Despite this importance, existing work lacks a systematic categorization and deep analysis. This paper syst

智能体综述

中文介绍该综述系统分类了面向大语言模型代理的环境工程，涵盖环境建模、合成、评估与应用四个方面，填补了该领域缺乏系统归纳的空白。

InternVideo3: Agentify Foundation Models with Multimodal Contextual Reasoning

👍 14

06/10 08:00

Recent progress in foundation models has shifted toward agentic behavior involving multi-step reasoning and tool use. However, open-source efforts largely focus on text-dominant settings, leaving long-horizon multimodal tasks underexplored. This gap is evident in video tasks requiring sustained temp

多模态视频

中文介绍 InternVideo3提出将基础模型代理化，使其具备多步推理和工具使用能力，专注于长时程多模态任务，弥补了开源模型在视频任务上的不足。

World Model Self-Distillation: Training World Models to Solve General Tasks

👍 4

06/10 08:00

Pretrained video generators are promising visual world models that exhibit emergent task-solving abilities; however, their reliance on detailed textual descriptions limits their direct use for planning and decision-making. Existing approaches either outsource this reasoning to language or vision-lan

世界模型方法

中文介绍世界模型自蒸馏方法训练视频生成器自主解决通用任务，无需外部推理模型。该工作增强了世界模型在规划和决策中的直接可用性。

Breaking Entropy Bounds: Accelerating RL Training via MTP with Rejection Sampling

👍 14

06/10 08:00

Reinforcement learning (RL) has become a key component in modern large language models, yet the rollout stage remains the key bottleneck in RL training pipelines. Although Multi-Token Prediction (MTP) offers a natural solution to accelerate rollouts through speculative decoding, many studies have ob

强化学习效率

中文介绍该研究提出通过多token预测和拒绝采样加速强化学习训练，特别是rollout阶段，突破熵界限制，提升大语言模型RL训练效率。

Lius: Translation Model Based Instructional Lingustic Using Continual Instruction Tuning In Kupang Malay

👍 1

06/10 08:00

Large Language Models (LLMs) offer new potential for translation tasks but often experience performance degradation when handling low-resource languages. To address this limitation, we propose an approach for fine-tuning LLMs on a low-resource language, Kupang Malay. Our approach involves designing

翻译低资源

中文介绍研究提出Lius方法，通过持续指令微调来提升大语言模型在低资源语言（库邦马来语）上的翻译性能，缓解了低资源场景下的性能下降问题。

ICA Lens: Interpreting Language Models Without Training Another Dictionary

👍 14

06/10 08:00

Finding interpretable directions in language-model representations is critical for understanding and controlling model behavior. Sparse autoencoders (SAEs) have become the standard tool for this purpose, but using them as the default first lens often requires training, storing, and evaluating large

sparse autoencodersindependent component analysislanguage-model representations

On the Limits of LLM-as-Judge for Scientific Novelty Assessment

👍 1

06/10 08:00

LLMs are increasingly used to generate and judge scientific ideas. This makes novelty evaluation a central problem. Full idea evaluation is difficult because it often requires judging a method, its feasibility, and its empirical promise. We therefore study a cleaner upstream object: the research que

research questionslarge language modelsnovelty evaluation

TRACE: A Unified Rollout Budget Allocation Framework for Efficient Agentic Reinforcement Learning

👍 13

06/10 01:16

Reinforcement learning with verifiable rewards (RLVR) is a promising approach for enhancing reasoning and agentic behavior in large language models. However, rollout-intensive policy optimization is often limited by insufficient reward contrast, arising when overly simple or complex prompts generate

reinforcement learningverifiable rewardspolicy optimization

ComBench: A Benchmark for Rigorous Proof Reasoning and Constructive Realization in Olympiad-Level Combinatorics

👍 18

06/09 08:00

Combinatorics is central to Olympiad-level mathematical problem solving, requiring deep discrete reasoning, creative constructions, and rigorous structural insight. Recent evidence suggests that even today's strongest frontier models remain uneven on Olympiad combinatorics, revealing a gap in creati

combinatoricslarge language modelsmathematical reasoning

Embodied-R1.5: Evolving Physical Intelligence via Embodied Foundation Models

👍 5

06/09 08:00

We introduce Embodied-R1.5, a unified Embodied Foundation Model (EFM) that integrates comprehensive embodied reasoning capabilities, spanning embodied cognition, task planning, correction, and pointing, within a single architecture toward general physical intelligence. Leveraging three automated dat

Embodied Foundation Modelembodied cognitiontask planning

i1: A Simple and Fully Open Recipe for Strong Text-to-Image Models

👍 2

06/09 08:00

Diffusion models have consistently driven progress in text-to-image generation. However, it is challenging to attribute recent progress to specific modeling and data choices: state-of-the-art open-weight models provide limited ablations, and do not disclose their training data and full training deta

text-to-image diffusion modelscontrolled experimentspublicly available datasets

DeNovoSWE: Scaling Long-Horizon Environments for Generating Entire Repositories from Scratch

👍 26

06/09 08:00

As the capabilities of LLM-based code agents continue to advance, their expected role is expanding beyond localized bug fixing in existing codebases toward architecting and implementing complete software repositories from high-level specifications. However, training agents for such long-horizon soft

LLM-based code agentswhole-repository generationlarge-scale dataset

Beyond Uniform Token-Level Trust Region in LLM Reinforcement Learning

👍 41

06/09 08:00

Reinforcement learning with verifiable rewards (RLVR) has become standard for improving LLM reasoning. However, existing PPO-style trust-region mechanisms remain position-agnostic by enforcing uniform thresholds across all tokens independently. This pointwise treatment conflicts with autoregressive

PPO-style trust-region mechanismsautoregressive generationtoken-level masking

τ-Rec: A Verifiable Benchmark for Agentic Recommender Systems

👍 0

06/08 08:00

As recommender systems transition toward agentic, multi-turn conversational interfaces, evaluation paradigms have struggled to keep pace. Current benchmarks often rely on "LLM-as-a-judge" evaluations, which introduce subjectivity, high costs and inconsistency. We present τ-Rec, a benchmark for agent

agentic recommender systemsLLM-as-a-judgereward-based evaluation

FlowLet: Conditional 3D Brain MRI Synthesis using Wavelet Flow Matching

👍 0

06/08 08:00

Brain Magnetic Resonance Imaging (MRI) plays a central role in studying neurological development, aging, and diseases. One key application is Brain Age Prediction (BAP), which estimates an individual's biological brain age from MRI data. Effective BAP models require large, diverse, and age-balanced

brain age predictiongenerative data augmentationlatent diffusion models

Beyond Scalar Rewards by Internalizing Reasoning into Score Distributions

👍 46

06/08 08:00

Reward models are central to text-to-image post-training, but visual preference is subjective and better represented as a distribution over rubric scores than as a deterministic scalar. Existing scalar, score-token, and pairwise reward models over-compress uncertainty and fine-grained score differen

reward modelsvisual preferencescore distributions

PaperMentor: A Human-Centered Multi-Agent Writing Tutor for AI Research Papers on Overleaf

👍 1

06/07 08:00

Expert writing feedback from experienced researchers is critical for early-career scholars to improve their manuscripts, yet high-quality feedback often remains scarce because reviewing research papers is labor-intensive. Emerging AI-powered writing assistants largely focus on grammar fixes or simul

AI-powered writing assistantsexpert skill libraryspecialized agents

POISE: Position-Aware Undetectable Skill Injection on LLM Agents

👍 3

06/06 08:00

Agent skills provide a lightweight mechanism for extending general-purpose agents, but their open format exposes them to skill-poisoning attacks. A practically dangerous injection must stay invisible: if executing the payload derails the user's legitimate task, the resulting failure signal invites i

skill-poisoning attacksattack success rateYAML-header injections

When Behavioral Safety Evaluation Fails: A Representation-Level Perspective

👍 1

06/06 08:00

Large Language Model (LLM) safety has often been evaluated at the behavior level, which provides limited evidence of internal robustness, as these evaluations target outputs rather than representation-level vulnerability under intervention. We formalize this discrepancy as the audit gap: the differe

large language modelbehavioral safetyrepresentation-level robustness

In-Context Multiple Instance Learning

👍 0

06/05 01:50

Multiple Instance Learning (MIL) addresses problems where supervision is available at the level of bags of instances and has been successfully applied in fields ranging from computational pathology to satellite imagery. Nevertheless, existing algorithms struggle in the low-label regime that characte

multiple instance learningPerceiver-style architecturesynthetic data

Large Language Models Are Overconfident in Their Own Responses

👍 2

06/02 08:00

Prior work has shown that instruction-tuned large language models (LLMs) are less well calibrated than their base pre-trained counterparts. However, little is known about the frequently used chat template's effect on the calibration of conversational LLMs. In this work, we investigate the mechanisms

instruction-tuned large language modelscalibrationchat template

EvoTrainer: Co-Evolving LLM Policies and Training Harnesses for Autonomous Agentic Reinforcement Learning

👍 7

06/02 08:00

Autonomous LLM training is often framed as recipe search, which leaves the training harness largely static. This limitation sharpens in agentic RL, where shifting bottlenecks and scalar rewards mask diverse failure modes. We introduce EvoTrainer, an autonomous training framework that co-evolves LLM

autonomous training frameworkco-evolutionempirical feedback

Distilling LLM Feedback for Lean Theorem Proving

👍 0

05/29 08:00

Post-training for reasoning models typically combines supervised fine-tuning with reinforcement learning from verifiable rewards, most commonly with GRPO. However, this algorithm suffers from sparse rewards, limited exploration, and mode collapse. Building upon recent works on self-distillation, we

supervised fine-tuningreinforcement learningverifiable rewards

Loops: What Every AI Engineer Needs to Know in 2026

@sairahul1 · 113.0K 粉丝 · 852.6K 阅 · 600 赞 · 79 转

06/09 17:26

Peter Steinberger, creator of OpenClaw, who now works with OpenAI. Yesterday he posted this: "You shouldn't be prompting coding agents anymore. You should be designing loops that prompt your agents."

观点工作流

中文介绍重复第一条内容：2026 年 AI 工程师应转向设计循环而非手动提示 agent。核心信息一致。

How To Become An AI Engineer in 2026 (Without a CS Degree)

@sairahul1 · 113.0K 粉丝 · 710.8K 阅 · 509 赞 · 97 转

06/05 16:10

How To Become An AI Engineer in 2026. Without a CS degree. Without a bootcamp. Without knowing what a transformer is today. Here's what nobody tells you: The companies hiring right now don't need

教程职业发展

中文介绍给出 2026 年零 CS 学位转型 AI 工程师的路径。指出当前公司招聘不看传统背景，而是看重实战能力。不依赖 bootcamp 或理解 transformer 原理，聚焦具体技能和项目经验。

Harness Engineering: What Every AI Engineer Needs to Know in 2026

@sairahul1 · 113.0K 粉丝 · 546.4K 阅 · 536 赞 · 94 转

06/07 16:53

In February 2026, a small OpenAI team shipped 1 million lines of production code. They didn't write a single line by hand. The AI agents wrote it. The humans designed the system that made the agents

工作流观点

中文介绍介绍「精馏工程」（Harness Engineering）概念：2026 年 2 月 OpenAI 一个小团队用 AI agent 写出 100 万行生产代码，人类不写一行。核心不是写代码，而是设计让 agent 高效产出的系统。

Everything Is Recorded Now

@dhaber · 50.0K 粉丝 · 497.3K 阅 · 500 赞 · 57 转

06/10 22:09

One of the biggest ways that AI is transforming work (and also one of the most taboo subjects inside companies at the moment) is that most work discussions are being recorded now by default. This

观点行业观察

中文介绍重复第四条内容：AI 改变工作——大部分讨论被录音。

How to get 100k YouTube subscribers in 3 hours (The Complete Guide)

@maubaron · 16.9K 粉丝 · 233.8K 阅 · 506 赞 · 19 转

06/06 03:22

Our YouTube channel has 125k subscribers and we've never made or uploaded a single video ourselves. This is a completely automated system. It is this very same strategy that made us the first app

工具教程

中文介绍分享全自动 YouTube 频道运营系统：频道已获 12.5 万订阅，从未亲自制作或上传视频。用自动化策略实现 3 小时内增长 10 万订阅，并宣称是首个搭载该系统的应用。

The Untrainable

@saranormous · 143.5K 粉丝 · 194.8K 阅 · 614 赞 · 40 转

06/10 08:49

The mid-2026 investor's version of AI psychosis is a despair that nothing is investable, that we should put all our money into Anthropic and Nvidia and go home. I have never felt it. I have been sure

观点投资

中文介绍重复第六条内容：2026 年 AI 投资悲观观点谬误，应有非共识投资机会。

Building cloud agent infrastructure: what's different, and what we learned

@intuitiveml · 6.4K 粉丝 · 171.3K 阅 · 524 赞 · 70 转

06/05 08:55

Most agent frameworks today assume a desktop. One user, one machine, one process. The agent runs while the laptop is open, writes to a local filesystem, holds API keys in environment variables, and

教程基础架构

中文介绍总结构建云端 agent 基础设施的教训：现有框架大多假设桌面端（单用户、单机器、本地文件系统），但云端多用户、分布式环境完全不同。分享了差异和落地经验。

A guide to /goal 🥅

@dkundel · 19.3K 粉丝 · 116.9K 阅 · 523 赞 · 40 转

06/05 05:39

We launched the goal mode (or /goal) as a way to help you have Codex drive towards a concrete outcome. When you set a goal Codex will continue to work until the goal is achieved, whether that takes

产品发布工作流

中文介绍介绍 Codex 的 /goal 模式（目标模式）：设定后 agent 会持续工作直到目标达成，无论耗时多久。帮助用户驱动 agent 达成具体成果，而非手动逐步控制。

My Week with Fable

@MatthewBerman · 121.3K 粉丝 · 108.0K 阅 · 661 赞 · 26 转

06/10 01:05

tl;dr I've been testing Fable (Mythos) for the past week and it feels unlike any other model I've used. It feels, and is priced, like a next-generation model. It also has some real quirks. The Good

产品体验模型对比

中文介绍重复第九条内容：Fable 一周体验，感觉像下一代模型但有 quirks。

Kimi to Predict All 104 World Cup Matches: Germany May Be Underestimated

@Kimi_Moonshot · 172.7K 粉丝 · 106.6K 阅 · 500 赞 · 61 转

06/09 19:38

Our predictions will probably be wrong. But the World Cup offers a rare, public, verifiable, and constantly evolving real-world setting. Through this initiative, we hope to place analysis,

产品发布应用

中文介绍 Kimi 宣布预测全部 104 场世界杯比赛，承认预测大概率不准，但世界杯是罕见公开、可验证的真实场景。借此推动分析、推理技术在动态问题上的应用，认为德国可能被低估。

Loop engineering: the 14-step roadmap from prompter to loop designer.

@0xCodez · 5.3K 粉丝 · 97.8K 阅 · 510 赞 · 80 转

06/09 23:50

Most developers still prompt their coding agents by hand. They type, they wait, they read the diff, they type again. 9out of 10 builders have never written a single loop that prompts the agent for

教程工作流

中文介绍提出 14 步从 prompt 工程师到循环设计师的路线图。指出 90% 的开发者仍手工提示 agent，从未写过让 agent 自我驱动的循环。呼吁转向循环工程范式。

Designing loops with Fable 5

@RLanceMartin · 30.4K 粉丝 · 84.7K 阅 · 660 赞 · 50 转

06/10 01:21

Mythos-class models like Claude Fable 5 have changed the way many of us work at Anthropic. I want to share two tips for getting the most out of this class of models. Self-correction loops There’s been

提示词工作流

中文介绍 Anthropic 人员分享使用 Fable 5 的两条技巧：自我纠正循环（agent 自动发现并修复错误）和反馈设计。强调这类强模型需要围绕循环而非单次 prompt 来设计。

Principled Thinking and AI Need to Go Together

@RayDalio · 2.2M 粉丝 · 72.6K 阅 · 515 赞 · 93 转

06/11 01:15

What is the best approach to being effectively intelligent now that human intelligence and artificial intelligence are merging? Because I have been building computerized investment decision-making

观点

中文介绍 Ray Dalio 探讨原则性思维与 AI 如何结合：他一直在构建基于原则的计算机化投资决策系统，认为人类智能与 AI 融合的最佳方法是依靠清晰的原则来引导 AI，而非完全依赖黑箱模型。

some notes on getting into frontier ai labs

@itsreallyvivek · 4.3K 粉丝 · 65.8K 阅 · 521 赞 · 28 转

06/05 23:48

A few days ago I wrote that getting into a frontier AI lab mostly comes down to two things: proven research and trench engineering. The more I think about it, the less these feel like separate skills.

职业发展观点

中文介绍分享进入前沿 AI 实验室的关键：已被验证的研究能力和扎实的工程能力，二者缺一不可。传统 CS 背景或刷题意义不大，实际解决问题的能力才是决定性因素。

How to Build an AI GTM Brain using Claude Code

@nifinet · 10.2K 粉丝 · 60.9K 阅 · 522 赞 · 54 转

06/10 01:21

When a team says they want AI for growth, they usually mean a faster send. An agent that fires the same template at a longer list, day and night. That is the cheap half of the job, and it stopped

教程工作流

中文介绍批判「用 AI 做增长」的常见做法——只是批量发送模板消息。提出用 Claude Code 构建真正智能的 GTM（走向市场）大脑，从自动化发送升级到智能化策略执行。

I Gave Claude David Ogilvy's Writing Rules And Built A Legendary AI Writing Coach

@dickiebush · 441.8K 粉丝 · 57.7K 阅 · 519 赞 · 45 转

06/05 20:35

Legendary marketer David Ogilvy generated over $864 million for his clients. He was a British advertiser known as "The Father of Advertising." And in 1982, Ogilvy sent this 1-page memo to his staff:

提示词应用

中文介绍将传奇广告人大卫·奥格威 1982 年的 1 页写作备忘录注入 Claude，打造一位 AI 写作教练。奥格威曾为客户创造 8.64 亿美元营收，其写作规则至今适用。

Principled Thinking and AI Need to Go Together

@RayDalio · 2.2M 粉丝 · 72.6K 阅 · 7d 曝光 72.6K

06/11 01:15

Principled Thinking and AI Need to Go Together

Everything Is Recorded Now

@dhaber · 50.0K 粉丝 · 497.3K 阅 · 7d 曝光 497.3K

06/10 22:09

Everything Is Recorded Now

观点行业观察

中文介绍重复第四条内容：AI 改变工作——大部分讨论被录音。

【教程】自动化风格探索器，请躺好，自动收图就完事了！

@MANISH1027512 · 37.1K 粉丝 · 84.5K 阅 · 7d 曝光 84.5K

06/10 19:32

【教程】自动化风格探索器，请躺好，自动收图就完事了！

The Untrainable

@saranormous · 143.5K 粉丝 · 194.8K 阅 · 7d 曝光 194.8K

06/10 08:49

The Untrainable

观点投资

中文介绍重复第六条内容：2026 年 AI 投资悲观观点谬误，应有非共识投资机会。

What Is Quip Network? A Primer

@quipnetwork · 141.1K 粉丝 · 32.8K 阅 · 7d 曝光 32.8K

06/10 05:59

What Is Quip Network? A Primer

DOE Releases Finalized Fusion Science and Technology Roadmap to Accelerate Commercial Fusion Power

@ENERGY · 884.7K 粉丝 · 64.6K 阅 · 7d 曝光 64.6K

06/10 03:01

DOE Releases Finalized Fusion Science and Technology Roadmap to Accelerate Commercial Fusion Power

How to Build an AI GTM Brain using Claude Code

@nifinet · 10.2K 粉丝 · 60.9K 阅 · 7d 曝光 60.9K

06/10 01:21

How to Build an AI GTM Brain using Claude Code

Designing loops with Fable 5

@RLanceMartin · 30.4K 粉丝 · 84.7K 阅 · 7d 曝光 84.7K

06/10 01:21

Designing loops with Fable 5

My Week with Fable

@MatthewBerman · 121.3K 粉丝 · 108.0K 阅 · 7d 曝光 108.0K

06/10 01:05

My Week with Fable

产品体验模型对比

中文介绍重复第九条内容：Fable 一周体验，感觉像下一代模型但有 quirks。

Fluid, natural voice translation with Gemini 3.5 Live Translate

@GoogleAIStudio · 176.3K 粉丝 · 32.2K 阅 · 7d 曝光 32.2K

06/10 01:01

Fluid, natural voice translation with Gemini 3.5 Live Translate

Loop engineering: the 14-step roadmap from prompter to loop designer.

@0xCodez · 5.3K 粉丝 · 97.8K 阅 · 7d 曝光 97.8K

06/09 23:50

Loop engineering: the 14-step roadmap from prompter to loop designer.

Kimi to Predict All 104 World Cup Matches: Germany May Be Underestimated

@Kimi_Moonshot · 172.7K 粉丝 · 106.6K 阅 · 7d 曝光 106.6K

06/09 19:38

Kimi to Predict All 104 World Cup Matches: Germany May Be Underestimated

Loops: What Every AI Engineer Needs to Know in 2026

@sairahul1 · 113.0K 粉丝 · 852.6K 阅 · 7d 曝光 852.6K

06/09 17:26

Loops: What Every AI Engineer Needs to Know in 2026

观点工作流

中文介绍重复第一条内容：2026 年 AI 工程师应转向设计循环而非手动提示 agent。核心信息一致。

总结下我使用 Codex 的 8 个高频场景。

@xiaogaifun · 1.3K 粉丝 · 56.5K 阅 · 7d 曝光 56.5K

06/09 16:49

总结下我使用 Codex 的 8 个高频场景。

Implications of Large-Scale Test-Time Compute

@polynoamial · 129.2K 粉丝 · 43.7K 阅 · 7d 曝光 43.7K

06/09 12:57

Implications of Large-Scale Test-Time Compute

Loop Engineering.

@addyosmani · 395.5K 粉丝 · 42.7K 阅 · 7d 曝光 42.7K

06/09 07:30

Loop Engineering.

Your Agent Harness Should Repair Itself

@akshay_pachaar · 276.5K 粉丝 · 38.0K 阅 · 7d 曝光 38.0K

06/09 02:28

Your Agent Harness Should Repair Itself

How We Improved the Startup Time of Our App by 50%

@Pumpfun · 661.9K 粉丝 · 66.0K 阅 · 7d 曝光 66.0K

06/09 01:44

How We Improved the Startup Time of Our App by 50%

一句话，翻译任何视频：我把用了半年的视频翻译工具开源了

@xiaohu · 108.5K 粉丝 · 76.8K 阅 · 7d 曝光 76.8K

06/08 21:11

一句话，翻译任何视频：我把用了半年的视频翻译工具开源了

Hermes Agent NEW Super-App and DeepSeek v4 Catches Up To Opus 4.8?

06/07 04:10

ChatGPT And Codex Are Merging (This Changes Everything)

06/04 06:17

Claude Opus 4.8: Everything You Need to Know

06/01 06:27

Browsers Are Dead. Claude Just Replaced Them.

05/28 22:56

Every NEW Claude Feature Explained

05/24 06:29

Codex: Build Your Full AI Marketing Team (Agents + Skills)

05/18 23:31

OpenAI just released Codex Mobile

05/16 02:58

Codex Mobile App Released (Complete Setup Guide)

05/15 09:35

Vibe Coding for Beginners (Full Course 2026)

05/08 03:28

Codex is The NEW Best AI Coding Tool (Here's Why)

05/03 04:35

The Problem Solvers | Michael Truell at Cursor

06/11 00:00

访谈

中文介绍本次访谈中，Cursor 联合创始人 Michael Truell 分享了其创业历程，探讨了 AI 编码工具如何解决编程中的实际问题。

Claude Fable 5 beats Pokémon FireRed only using vision

06/10 04:31

AI 游戏视觉模型

中文介绍 Claude Fable 5 仅依靠视觉能力，成功通关了游戏《宝可梦：火红》，展现了在复杂游戏场景中的视觉理解与决策能力。

Claude Fable 5 plays Factorio

06/10 01:18

AI 游戏

中文介绍 Claude Fable 5 展示了在自动化建造模拟游戏《异星工厂》中的操作能力，能够进行资源管理与生产规划。

Claude Fable 5 simulates the solar system and predicts a solar eclipse

06/10 01:18

科学模拟

中文介绍 Claude Fable 5 通过模拟太阳系运行，成功预测了一次日食，展示了其在物理模拟与天文计算方面的能力。

Claude Fable 5 sets a fluid simulation to Beethoven

06/10 01:18

AI 创意

中文介绍 Claude Fable 5 将流体模拟与贝多芬音乐相结合，生成了一个动态的视听艺术作品，体现了跨模态创意能力。

Claude Fable 5 designs a 3D-printable model in a Claude-built CAD editor

06/10 01:18

3D 打印自主构建

中文介绍 Claude Fable 5 在一个由 Claude 自主构建的 CAD 编辑器内，设计了一个可供 3D 打印的模型，展示了工具构建与设计能力。

Working Like a Lawyer with Claude

06/09 04:55

Reflecting on a year of Claude Code

06/09 00:31

How Anthropic uses Claude in GTM Engineering

06/06 01:32

The Problem Solvers | Anton Osika at Lovable

06/04 22:04

The Problem Solvers | Michael Truell at Cursor

06/11 00:00

访谈

中文介绍本次访谈中，Cursor 联合创始人 Michael Truell 分享了其创业历程，探讨了 AI 编码工具如何解决编程中的实际问题。

Claude Fable 5 beats Pokémon FireRed only using vision

06/10 04:31

AI 游戏视觉模型

中文介绍 Claude Fable 5 仅依靠视觉能力，成功通关了游戏《宝可梦：火红》，展现了在复杂游戏场景中的视觉理解与决策能力。

Claude Fable 5 plays Factorio

06/10 01:18

AI 游戏

中文介绍 Claude Fable 5 展示了在自动化建造模拟游戏《异星工厂》中的操作能力，能够进行资源管理与生产规划。

Claude Fable 5 simulates the solar system and predicts a solar eclipse

06/10 01:18

科学模拟

中文介绍 Claude Fable 5 通过模拟太阳系运行，成功预测了一次日食，展示了其在物理模拟与天文计算方面的能力。

Claude Fable 5 sets a fluid simulation to Beethoven

06/10 01:18

AI 创意

中文介绍 Claude Fable 5 将流体模拟与贝多芬音乐相结合，生成了一个动态的视听艺术作品，体现了跨模态创意能力。

Claude Fable 5 designs a 3D-printable model in a Claude-built CAD editor

06/10 01:18

3D 打印自主构建

中文介绍 Claude Fable 5 在一个由 Claude 自主构建的 CAD 编辑器内，设计了一个可供 3D 打印的模型，展示了工具构建与设计能力。

Working Like a Lawyer with Claude

06/09 04:55

Reflecting on a year of Claude Code

06/09 00:31

How Anthropic uses Claude in GTM Engineering

06/06 01:32

The Problem Solvers | Anton Osika at Lovable

06/04 22:04

AI Agents as "Games Masters"? 🎮🔥

06/06 14:20

DeepMind’s New AI Found A Strange New Way To Think

06/05 23:50

Meet the AI "Co-Scientist" Changing Everything 🤖🧪 #ai

06/04 01:00

Claude Opus 4.8: Lying Machine No More?

06/03 21:49

A Second Nobel Prize for AlphaFold? 🧬🏆 #alphafold #deepmind #nobelprize #science #ai

06/02 15:13

What Happens After A 1,000,000x AI Compute Leap? | Jeff Dean

06/01 23:41

Einstein vs Feynman, Who Wins? 🧠🤔 #physics #ai #science #feynman #research

06/01 08:16

Google DeepMind CEO Loves Hard Questions 🙂

05/27 01:35

Demis Hassabis On What AI Will Do Next

05/26 01:49

DeepSeek’s New AI Is A Game Changer

05/22 08:47

Google DeepMind is worried about what happens when millions of agents start to interact

MIT Tech Review AI · 06/11 19:00

Google DeepMind is funding research into the potential dangers of situations where millions of different AI agents interact with each other online. According to Rohin Shah, who directs the company’s AGI safety and alignment research, the mass-market arrival of agents that can carry out tasks without

AI安全多代理

中文介绍 Google DeepMind 担心数百万 AI 代理在线互动可能带来的风险，已资助相关研究。该项目由公司 AGI 安全与对齐研究主管 Rohin Shah 领导，重点探索代理大规模交互的潜在危险。

[AINews] Open Models, Model Labs vs Agent Labs, and What's Untrainable — Sarah Guo

Latent Space · 06/11 11:14

a quiet day lets us reflect on a great essay

开放模型

中文介绍文章反思了一个安静的日子，探讨了开放模型、模型实验室与代理实验室的对比，以及“什么不可训练”等话题。

Supporting Europe’s work in ensuring a trustworthy AI ecosystem

OpenAI News · 06/11 08:00

OpenAI supports the EU Code of Practice on AI content transparency, advancing provenance standards and tools to help people understand AI-generated content.

政策透明度

中文介绍 OpenAI 支持欧盟关于 AI 内容透明度的《实践准则》，推动溯源标准与工具开发，帮助用户识别 AI 生成内容。

How an astrophysicist uses Codex to help simulate black holes

OpenAI News · 06/11 08:00

Discover how astrophysicist Chi-kwan Chan uses Codex to build black hole simulations, helping scientists study extreme physics and test Einstein’s theory of general relativity.

科学研究代码生成

中文介绍天体物理学家 Chi-kwan Chan 使用 OpenAI Codex 构建黑洞模拟，助力科学家研究极端物理并检验爱因斯坦的广义相对论。

Profiling in PyTorch (Part 2): From nn.Linear to a Fused MLP

Hugging Face Blog · 06/11 08:00

PyTorch性能优化

中文介绍本文是 PyTorch 性能分析系列的第二部分，讲解如何从 nn.Linear 出发融合 MLP（多层感知器）以优化性能。

Access OpenAI models and Codex through your Oracle cloud commitment

OpenAI News · 06/11 04:00

Access OpenAI models and Codex through Oracle Cloud, using existing commitments to build and deploy AI with enterprise security and governance.

云服务企业应用

中文介绍用户可通过 Oracle Cloud 现有承诺访问 OpenAI 模型及 Codex，实现企业级安全与治理下的 AI 构建与部署。

DiffusionGemma: 4x faster text generation

DeepMind Blog · 06/11 00:24

文本生成速度优化

中文介绍 DeepMind 发布 DiffusionGemma，可将文本生成速度提升 4 倍。

PRC-linked influence operations are targeting AI debates in the US

OpenAI News · 06/10 20:00

A new report from OpenAI details PRC-linked influence operations using AI to target U.S. tech debates, data center narratives, tariffs, and false claims about ChatGPT.

AI安全虚假信息

中文介绍 OpenAI 报告称，与中国有关联的影响行动正利用 AI 针对美国科技辩论、数据中心叙事、关税及关于 ChatGPT 的不实言论。

Investing in multi-agent AI safety research

DeepMind Blog · 06/10 18:21

Google DeepMind and partners announce a $10M funding call for multi-agent safety research.

多代理AI安全资金

中文介绍 Google DeepMind 与合作伙伴宣布启动一项 1000 万美元的资助计划，用于多代理 AI 安全研究。

[AINews] Anthropic Claude Fable 5 — Mythos but Safe, with Controversial Terms

Latent Space · 06/10 11:50

The much anticipated launch of the Mythos-class model was marred by some controversial usage policies

模型发布政策争议

中文介绍 Anthropic 发布备受期待的 Mythos 级模型 Claude Fable 5，但伴随争议性使用政策，引发讨论。

From data to decisions: how LSEG is scaling trusted AI

OpenAI News · 06/10 08:00

See how LSEG uses OpenAI to scale trusted AI across its global business, accelerating insights, shrinking release cycles, and empowering 4,000 employees.

企业应用金融

中文介绍伦敦证券交易所集团（LSEG）利用 OpenAI 扩展可信 AI 应用，加速洞察、缩短发布周期，并赋能 4000 名员工。

The evolution of agentic surfaces: building with Claude Managed Agents

Claude Blog · 06/10 08:00

The evolution of agentic surfaces: building with Claude Managed Agents

代理开发指南

中文介绍文章探讨了代理界面的演变，并介绍如何使用 Claude 管理代理进行构建。

Claude Fable 5 🚀, Gemini 3.5 Live Translate 📱, scaling test time compute 📈

TLDR AI · 06/10 08:00

模型发布翻译技术趋势

中文介绍本期 TLDR AI 聚焦 Claude Fable 5 发布、Gemini 3.5 Live Translate 功能，以及测试时扩展计算（test time scaling compute）的趋势。

Can Voice Agents Handle Bilingual Customers? Benchmarking Frontier ASR on Code-Switched Speech

Hugging Face Blog · 06/10 03:38

语音识别多语言

中文介绍研究团队对前沿自动语音识别（ASR）模型在双语混合语音（语码转换）上进行基准测试，评估语音代理处理双语客户的能力。

Introducing North Mini Code: Cohere’s First Model For Developers

Hugging Face Blog · 06/09 23:56

代码模型产品发布

中文介绍 Cohere 发布首款面向开发者的模型 North Mini Code，专为代码相关任务优化。

做为 IT 工程师十几年，你做的最不后悔的决定是什么？

06/11 14:24

9 回复 · 程序员节点

港版 17， iOS27，开启新版 AI 流程

06/11 14:00

8 回复 · Apple 节点

大家抓紧看看 deepseek 开放平台有没有异常 apikey!

06/11 12:54

33 回复 · 程序员节点

chatgpt plus 土耳其区涨价一倍

06/11 11:05

13 回复 · 程序员节点

mbp m1 pro 挺耐用啊，战力大于 m4?

06/11 10:40

40 回复 · Apple 节点

tg 已更新. 修复 macOS27 聊天界面布局显示异常问题.

06/11 08:45

12 回复 · Apple 节点

Siri AI 用了 Gemini，国内会用谁的

06/10 22:26

7 回复 · Apple 节点

在国行 Mac(macOS 27)上开启完整 Apple 智能

06/10 22:06

28 回复 · Apple 节点

HomePod 在发 Thread/HomeKit IPv6 RA，怎么规避？

06/10 20:29

8 回复 · Apple 节点

求问， iOS27 的新 siri 一直在排队……

06/10 18:35

11 回复 · Apple 节点

今天我生日佬友们可以祝我生日快乐吗

06/11 18:40

如题，第一次在L站过生日，来到L站的这几个月里跟着佬友们学到了很多有用的东西，愿佬友们天天开心，社区越来越好 97 个帖子 - 95 位参与者阅读完整话题

【干草铺公益站紧急通知】

06/11 18:27

各位干草铺的老友，请立刻停止YOLO模式，现在我不知道是我自己程序的问题还是奥特曼的问题，已经有佬友给我反馈了GPT回复还像被夺舍的情况，虽然我说过号商再薅我就放毒，但这次真不是我，请大家先放弃YOLO，稍后我会停一会儿公益站。YOLO，也就是bypass那个模式，我现在在外面，来不及写详细，大家能理解就好了，不要完全托管给GPT，能夺舍第一次就能有第二次，至少今天大家先放弃这种模式哈 30 个帖子 - 28 位参与者阅读完整话题

记一次对 Claude Fable 5、Opus 4.8、Minimax M3、Xiaomi Mimo V2.5 系列、Hy3、Qwen3.7 系列的真实项目需求的横向评测（榜首更迭！）

06/11 17:22

由于测试的模型越积越多了，表格会删除一些同厂商的旧模型，你可以在之前的评测帖子里找到它们的成绩。项目这是一个 Unity C# 项目，我进行测试的是一份皮肤系统需求案，我已经做了好预制体，而模型需要编写代码。本轮与上两轮评测的项目和环境都完全一致：第一轮 … 上一轮模型来源 Claude 系列模型: 官方 API Mimo V2.5 系列模型: 官方 Token Plan Hy3 Preview: 官方 API Qwen3.7 系列模型: 官方 API Minimax M3: 官方 API Nex-N2-Pro: OpenRouter Free API Nemotron 3 Ultr

原来是我搞砸了一切

06/11 17:06

从这次是机房搞砸了一切继续讨论：是的，我们知道：这次是机房搞砸了一切运营反馈 [image] 服务器是有点压力，但不至于挂掉，看看怎么扩容一下子。世界是个草台班子，接着奏乐接着舞～而42向hub提供服务器，所以都是42搞砸了一切对不起,给大家跪了! 18 个帖子 - 17 位参与者阅读完整话题

当DeepSeek还在招产品经理和研发做code的时候 mimo已经用5个人14天拿开源项目vibe了一个MIMOCODE

06/11 16:57

这么大个企业能不能走点心还整得挺自豪啊宣传上无限上下文了说是 ==================== 补充说明 45 个帖子 - 29 位参与者阅读完整话题

我爱你！退钱！梁圣恩情还不完，DS官方居然返钱了

06/11 16:17

这点钱你收着啊，退什么钱唉我是差这点钱的吗（暗爽）什么缓存系统出问题你这种级别缓存炸了我都说炸的响 18 个帖子 - 17 位参与者阅读完整话题

Team漏洞现在又可以啦

06/11 15:23

冲冲冲1!!! 86 个帖子 - 57 位参与者阅读完整话题

这次是机房搞砸了一切

06/11 15:12

真不是甩锅，机房搞砸了一切，已通过工单解决：服务器是有点压力，但不至于挂掉，看看怎么扩容一下子。世界是个草台班子，接着奏乐接着舞～ 434 个帖子 - 426 位参与者阅读完整话题

logs数据33g大吗？不大！

06/11 12:04

查到了，真正大的不是本地 logs 文件夹，而是数据库里的 logs 表。当前情况数据库 oneapi 总大小约： - 35,476.81 MB ≈ 34.65 GB 其中 logs 表： - 34,462.88 MB ≈ 33.65 GB - 行数约：2800 万行 - 数据大小：12.24 GB - 索引大小：21.42 GB 100g的硬盘太小了5555，就放那么一点点东西就塞满了假设一下，一个服务器硬盘大小2t，如果1t都用不到，是不是在浪费硬盘！！！回答我！！！所以硬盘爆满的本质原因：硬盘容量太小！！！ @ouyangqiqi 48 个帖子 - 47 位参与者阅读完整话题

很多人觉得 GPT Pro 贵，试试换这个思考方式？

06/11 11:47

之前也觉得 GPT Pro 200美金一个月很贵，有点肉疼！但最近发生了一个事情，自己想通了。之前一直健身请了个私教，包月的那种，一个月2600多，这个教练每次带我同样的动作，一年都没什么长进，最近把这个停了。因为 GPT 一个月1300左右，可以教会我很多东西，教练费用一半的价格，而教练带我的课程我都可以自己去做完。这样一想，gpt 还挺划算的。我觉得之前主要是习惯把AI订阅归类到「软件服务」这个心理账户——Netflix、各种SaaS几十刀，所以200刀就显得离谱。但如果把它对标成**智力、人力成本，比如家长找家教、健身找私教、职场找导师，一节课少说 200-400 RMB。而

Lines of Code Got a Better Publicist

06/11 20:26

22 points · 5 comments

BYD is bringing its 5-min 'Flash' electric car charging to Canada

06/11 19:41

42 points · 11 comments

Human migration has surged since 2000 – these maps reveal where people are going

06/11 19:26

29 points · 19 comments

Web Browsers on Video Game Consoles

06/11 16:47

68 points · 38 comments

Sweet Jeebus, macOS 27 Golden Gate Removes the Dumb Icons from Menu Items

06/11 15:35

145 points · 52 comments

Pokémon Go Scans Trained the Navigation Tech for Military Drones

06/11 14:42

418 points · 179 comments

AI agent runs amok in Fedora and elsewhere

06/11 08:10

455 points · 207 comments

Sequoyah’s syllabary created a written language for the Cherokee

06/11 06:07

https://en.wikipedia.org/wiki/Cherokee_syllabary#Unicode

πFS

06/11 02:54

827 points · 189 comments

GeoLibre 1.0

06/11 01:39

268 points · 22 comments

How JPL keeps the 13-year-old Curiosity rover doing science

06/11 01:30

246 points · 73 comments

Cybersecurity researchers aren't happy about the guardrails on Anthropic's Fable

06/11 00:42

495 points · 433 comments

Show HN: Extend UI – open-source UI kit for modern document apps

06/11 00:09

We're open-sourcing 14 components & examples today for PDF, DOCX, and XLSX viewers, plus bounding box citations, file upload, e-signature, and more. It's MIT licensed and fully customizable.Demo video here: https://share.extend.ai/kRmSGKRFWhen we started, we tried every

I'm Eric Ries, author of "The Lean Startup" and new book "Incorruptible" – AMA

06/10 22:47

Hey gang, you may remember me from such books as _The Lean Startup_ and _The Startup Way_.It's been fifteen years since I wrote The Lean Startup, and in that time I've seen some things. In both big companies and tiny startups, NGOs and governments, in almost every industry you can name.I&#

PgDog is funded and coming to a database near you

06/10 22:02

493 points · 232 comments

Building an HTML-first site doubled our users overnight

06/10 20:45

1178 points · 529 comments

Vacuum-Form Signage

06/10 10:48

76 points · 11 comments

Who's the smartest corvid?

06/10 01:37

129 points · 112 comments

Anthropic requires 30 day data retention for Fable and Mythos

06/10 01:23

https://www.theverge.com/report/947575/microsoft-claude-fabl...

Build a Basic AI Agent from Scratch: Long Task Planning

06/09 22:29

57 points · 17 comments

Symbolicating a minified stack trace by hand: why source maps can't do it alone

06/09 22:14

4 points · 0 comments

CSS: Unavoidable Bad Parts

06/09 19:30

99 points · 56 comments

Linux latency measurements and compositor tuning

06/09 17:50

72 points · 12 comments

L'Affaire Siloxane

06/09 13:21

247 points · 41 comments

Starfish by Peter Watts (1999)

06/09 09:45

81 points · 24 comments

Klondike Solitaire game for curses in 5k of C

06/09 03:08

87 points · 15 comments

The Life and Works of Raoul Bott

06/09 01:33

11 points · 1 comments

World Capitals Voronoi

06/08 23:20

109 points · 55 comments

Making a Shading Language for My Offline Renderer

06/08 21:03

27 points · 2 comments

Reverse engineering the Creative Katana soundbar to control it from Linux

06/07 19:20

97 points · 5 comments

今日主题

今日 AI 圈的核心看点是 Anthropic 正式发布 Claude Fable 5 模型，官方展示其在游戏通关、天体模拟、CAD 设计等多模态与自主构建领域的惊人能力，但伴随的争议性使用条款也引发了行业热议。另一方面，Google DeepMind 推出提速 4 倍的 DiffusionGemma 并斥资 1000 万美元启动多智能体安全研究，OpenAI 则聚焦于欧盟 AI 透明度政策与企业级云部署。开源社区同样活跃，NVIDIA 开源智能体安全扫描工具，Cohere 发布面向开发者的代码模型 North Mini Code。在方法论层面，AI 工程师正从“提示词工程师”向“循环设计师”范式转变，智能体体系统与持续优化框架成为焦点。

模型发布/更新

Model Releases 44 篇

Anthropic 发布 Claude Fable 5 模型

官方Latent Space

Anthropic 正式推出备受期待的 Claude Fable 5 模型，属于 Mythos 级别，在安全性与能力间取得平衡。同期官方演示中，该模型仅凭视觉能力通关《宝可梦:火红》、流畅运行《异星工厂》、模拟太阳系并成功预测日食、设计可 3D 打印的 CAD 模型，展现出强大的多模态与自主构建能力。但配套的争议性使用政策也引发社区广泛讨论。

模型发布多模态政策争议

DeepMind 发布 DiffusionGemma 提速 4 倍

官方DeepMind Blog

Google DeepMind 推出 DiffusionGemma，一种基于扩散模型的新方法，可将文本生成速度提升 4 倍。该模型在保证生成质量的同时大幅降低了延迟，为实时应用与大规模部署提供了更高效的解决方案，直接挑战现有自回归模型的生成范式。

文本生成速度优化扩散模型

Cohere 发布面向开发者的代码模型 North Mini Code

官方Hugging Face Blog

Cohere 正式推出其首款专为开发者打造的模型 North Mini Code，针对代码补全、审查和生成等任务进行了深度优化。该模型旨在为开发者在 IDE 和 CI/CD 管道中提供更精准、高效的代码智能支持，进一步丰富了专用代码模型生态。

代码模型产品发布开发者工具

InternVideo3 提出代理化多模态推理框架

官方HuggingFace Trending Papers

上海 AI 实验室提出的 InternVideo3 将基础模型代理化，赋予其多步推理与工具使用能力，专长于长时程多模态视频理解任务。该工作在开源模型普遍欠缺的视频推理与复杂场景解析上取得了显著突破，为视频智能体应用奠定基础。

多模态视频智能体

产品发布/更新

Product 44 篇

OpenAI 模型现可通过 Oracle Cloud 访问

官方OpenAI News

OpenAI 宣布用户可通过 Oracle Cloud 的现有承诺额度访问 GPT 系列模型及 Codex，实现企业级安全、治理与合规下的 AI 构建与部署。此举降低了大型企业采用前沿 AI 的门槛，尤其适合对数据驻留和隐私有严格要求的金融机构与政府部门。

云服务企业应用合作

Kimi 宣布预测全部 104 场世界杯比赛

X·KOLX 推文 (AttentionVC)

Kimi 宣布将利用其推理与分析模型预测 2026 年世界杯全部 104 场比赛。团队坦言预测大概率不准，但强调世界杯是罕见的公开、可验证的真实场景测试场，旨在推动 AI 在动态复杂问题上的推理能力发展，并特别指出德国队实力可能被低估。

产品发布应用推理

全自动 YouTube 频道实现 3 小时增粉 10 万

X·KOLX 推文 (AttentionVC)

一位 KOL 分享了其完全自动化的 YouTube 频道运营系统，已获得 12.5 万订阅者，从未亲自制作或上传视频。该系统宣称可在 3 小时内实现 10 万订阅增长，目前正作为首个搭载该系统的应用进行推广，展示了 AI 在内容创作与增长自动化领域的激进潜力。

工具教程自动化

开源智能体分析工具 AgentsView 发布

开源项目GitHub Trending

开源项目 AgentsView 发布，这是一款本地优先的编码智能体会话分析与统计工具，支持 Claude Code、Codex 等 20 余种 Agent。其在性能上号称比同类工具 ccusage 快 100 倍，帮助开发者直观洞察 Agent 行为模式与效率瓶颈，推动智能体开发的可观测性。

AI 智能体分析工具本地优先

行业动态

Industry 44 篇

DeepMind 投资 1000 万美元用于多智能体 AI 安全研究

官方DeepMind Blog

Google DeepMind 宣布与合作伙伴共同启动一项总额达 1000 万美元的资助计划，重点研究多 AI 代理系统在大规模交互时可能产生的安全风险。该计划由 AGI 安全与对齐研究主管 Rohin Shah 领导，旨在提前防范数百万智能体在线协作可能引发的失控、竞争与欺骗性行为。

多代理AI 安全资金

OpenAI 支持欧盟 AI 内容透明度实践准则

官方OpenAI News

OpenAI 宣布支持欧盟关于 AI 内容透明度的《实践准则》，承诺推动溯源标准与工具的开发，帮助用户更清晰地识别 AI 生成内容。此举顺应全球监管趋势，旨在建立用户对 AI 系统的信任，同时为行业划定可操作的透明度基线。

政策透明度合规

OpenAI 报告中国关联信息行动干扰 AI 讨论

官方OpenAI News

OpenAI 发布报告指出，与中国有关联的影响力行动正利用 AI 工具针对美国科技辩论、数据中心建设、关税政策以及关于 ChatGPT 的不实信息进行操纵。该报告引发了对 AI 生成虚假信息与跨境信息战的进一步关注，凸显了平台安全与内容鉴伪的紧迫性。

AI 安全虚假信息地缘政治

进入前沿 AI 实验室的关键被揭示

X·KOLX 推文 (AttentionVC)

一位 KOL 分享了对进入顶级 AI 实验室的观察：已验证的研究能力与扎实的工程能力二者缺一不可。传统 CS 背景或刷题技巧意义不大，实际解决复杂问题的能力才是决定性因素。该观点为有志于投身 AI 前沿的从业者提供了清晰的职业发展指引。

职业发展观点

技巧与观点

Tips & Takes 44 篇

精馏工程：2026 年 AI 工程师新范式

X·KOLX 推文 (AttentionVC)

文章提出「精馏工程」概念，指出 OpenAI 一个小团队曾用 AI agent 写出 100 万行生产代码而人类未写一行。核心在于设计让 AI 高效产出的系统而非手动编码，强调 AI 工程师应转型为“循环设计师”，通过构建闭环工作流让 agent 自我驱动，彻底改变软件开发效率。

工作流观点方法论

14 步路线图：从提示工程师到循环设计师

X·KOLX 推文 (AttentionVC)

一位 KOL 提出 14 步路线图，指导开发者从手工提示 agent 过渡到设计自我驱动循环。指出当前 90% 的开发者仍停留在手动提示阶段，从未编写过让 agent 连续迭代的循环。该路线图系统化地定义了新范式下的技能栈，呼吁社区立刻开始采用循环工程。

教程工作流提示词

PyTorch 性能分析：融合 MLP 优化

官方Hugging Face Blog

Hugging Face 官方博客发布了 PyTorch 性能分析系列第二篇，详解如何从标准 nn.Linear 模块出发，通过融合多层感知器（MLP）来优化 GPU 内核启动开销。文章提供了可复现的代码与基准数据，帮助深度学习工程师在实际训练和推理中显著提升性能。

PyTorch性能优化教程

注入奥格威写作规则的 AI 写作教练

X·KOLX 推文 (AttentionVC)

一位 KOL 将传奇广告人大卫·奥格威 1982 年的一页写作备忘录注入 Claude，打造出一位 AI 写作教练。奥格威的规则曾为客户创造 8.64 亿美元营收，该实践展示了将经典方法论与强大语言模型结合的潜力，为内容创作者提供了一条高效提升文案质量的路径。

提示词应用写作