What Changed Since May 自五月以来发生了什么变化

It’s incredible how far “vibe coding” has come this year. Whereas in ~May I was amazed that some prompts produced code that worked out of the box, this is now my expectation. I can ship code now at a speed that seems unreal. I burned a lot of tokens since then. Time for an update.
今年“氛围编码”进步真是令人难以置信。而在~五月时,我惊讶于一些提示词能生成开箱即用的代码, 现在这成了我的期待 。我现在能以简直难以置信的速度发送代码。从那以后我烧掉了很多代币 。该来更新一下了。

It’s funny how these agents work. There’s been this argument a few weeks ago that one needs to write code in order to feel bad architecture and that using agents creates a disconnection - and I couldn’t disagree more. When you spend enough time with agents, you know exactly how long sth should take, and when codex comes back and hasn’t solved it in one shot, I already get suspicious.
这些经纪人的工作方式真有趣。几周前有人说 ,要让架构感觉糟糕,就必须写代码 ,使用代理会造成断线——我完全不同意 。当你花足够时间和特工相处时,你会准确知道事情应该花多长时间,而当 Codex 回归却没一次性解决时,我已经开始怀疑了。

The amount of software I can create is now mostly limited by inference time and hard thinking. And let’s be honest - most software does not require hard thinking. Most apps shove data from one form to another, maybe store it somewhere, and then show it to the user in some way or another. The simplest form is text, so by default, whatever I wanna build, it starts as CLI. Agents can call it directly and verify output - closing the loop.
我现在能开发的软件数量大多受限于推理时间和深思熟虑。说实话——大多数软件不需要深思熟虑。大多数应用会把数据从一个表单推送到另一个表单,可能存储在某个地方,然后以某种方式向用户展示。最简单的形式是文本,所以默认情况下,无论我想构建什么,都是以 CLI 开头的。代理可以直接调用并验证输出——闭环。

The Model Shift 模型转变

The real unlock into building like a factory was GPT 5. It took me a few weeks after the release to see it - and for codex to catch up on features that claude code had, and a bit to learn and understand the differences, but then I started trusting the model more and more. These days I don’t read much code anymore. I watch the stream and sometimes look at key parts, but I gotta be honest - most code I don’t read. I do know where which components are and how things are structured and how the overall system is designed, and that’s usually all that’s needed.
真正解锁像工厂一样建造的契机是 GPT 5。发布后我花了几周才看到它——Codex 也跟上了 Claude 代码的功能,也花了一些时间学习和理解差异,但后来我开始越来越信任这个模型。 现在我已经不怎么看代码了。 我会看直播,有时会看关键部分,但说实话——大多数代码我都不看。我知道哪些组件在哪里,结构如何,整体系统是如何设计的,这通常就足够了。

The important decisions these days are language/ecosystem and dependencies. My go-to languages are TypeScript for web stuff, Go for CLIs and Swift if it needs to use macOS stuff or has UI. Go wasn’t something I gave even the slightest thought even a few months ago, but eventually I played around and found that agents are really great at writing it, and its simple type system makes linting fast.
如今重要的决策是语言/生态系统和依赖关系 。我常用的语言是 TypeScript 用于网页相关,Go 用于 CLI 和需要 macOS 或有界面的 Swift。几个月前我甚至没怎么考虑过 Go,但后来我玩了一遍,发现代理们写 Go 非常擅长,而且它简单的类型系统让排版变得快速。

Folks building Mac or iOS stuff: You don’t need Xcode much anymore. I don’t even use xcodeproj files. Swift’s build infra is good enough for most things these days. codex knows how to run iOS apps and how to deal with the Simulator. No special stuff or MCPs needed.
组装 Mac 或 iOS 设备的朋友们:你现在不太需要 Xcode 了。 我甚至不使用 xcodeproj 文件 。Swift 的构建基础设施现在大多数事情都足够用了。Codex 懂得如何运行 iOS 应用,也知道如何应对模拟器。不需要特殊东西或 MCP。

codex vs Opus

I’m writing this post here while codex crunches through a huge, multi-hour refactor and un-slops older crimes of Opus 4.0. People on Twitter often ask me what’s the big difference between Opus and codex and why it even matters because the benchmarks are so close. IMO it’s getting harder and harder to trust benchmarks - you need to try both to really understand. Whatever OpenAI did in post-training, codex has been trained to read LOTS of code before starting.
我写这篇帖子时,军团编纂者正在处理一个庞大的、耗时数小时的重构,清理 Opus 4.0 中较早的罪行。推特上经常有人问我,Opus 和 Codex 有什么大区别,为什么这很重要,因为基准非常接近。依我看,信任基准越来越难了——你需要两种方法都试试才能真正理解。无论 OpenAI 在后期训练中做了什么,Codex 都被训练成在开始前阅读大量代码。

Sometimes it just silently reads files for 10, 15 minutes before starting to write any code. On the one hand that’s annoying, on the other hand that’s amazing because it greatly increases the chance that it fixes the right thing. Opus on the other hand is much more eager - great for smaller edits - not so good for larger features or refactors, it often doesn’t read the whole file or misses parts and then delivers inefficient outcomes or misses sth. I noticed that even tho codex sometimes takes 4x longer than Opus for comparable tasks, I’m often faster because I don’t have to go back and fix the fix, sth that felt quite normal when I was still using Claude Code.
有时候它会静静地读取文件 10 到 15 分钟 ,然后才开始写代码。一方面这很烦人,另一方面又很棒,因为它大大增加了修复正确问题的可能性。而 Opus 则更积极——适合小范围编辑,但对大型功能或重构就不太好,经常不读完整文件或遗漏部分,结果效率低或遗漏。我注意到,虽然 Codex 在类似任务上有时比 Opus 慢 4 倍,但我通常更快,因为不用回头修正,这在我还用 Claude Code 时感觉很正常。

codex also allowed me to unlearn lots of charades that were necessary with Claude Code. Instead of “plan mode”, I simply start a conversation with the model, ask a question, let it google, explore code, create a plan together, and when I’m happy with what I see, I write “build” or “write plan to docs/.md and build this”. Plan mode feels like a hack that was necessary for older generations of models that were not great at adhering to prompts, so we had to take away their edit tools. There’s a highly misunderstood tweet of mine that’s still circling around that showed me that most people don’t get that plan mode is not magic.
Codex 还让我忘记了很多用 Claude Code 必须做的猜字游戏。我不是用“ 计划模式 ”,而是直接和模型对话 ,提个问题,让它谷歌,探索代码,一起制定计划,当我对看到的内容满意时,写“构建”或“写计划到文档/
.md 并构建这个”。计划模式感觉像是老一代模型必须做的破解,因为那些模型不太能遵循提示,所以我们不得不取消他们的编辑工具。我有一条被严重误解的推文至今仍在流传,它让我意识到大多数人不明白计划模式并非魔法 。

Oracle 神谕者#

The step from GPT 5/5.1 to 5.2 was massive. I built oracle 🧿 about a month ago - it’s a CLI that allows the agent to run GPT 5 Pro and upload files + a prompt and manages sessions so answers can be retrieved later. I did this because many times when agents were stuck, I asked it to write everything into a markdown file and then did the query myself, and that felt like a repetitive waste of time - and an opportunity to close the loop. The instructions are in my global AGENTS.MD file and the model sometimes by itself triggered oracle when it got stuck. I used this multiple times per day. It was a massive unlock. Pro is insanely good at doing a speedrun across ~50 websites and then thinking really hard at it and in almost every case nailed the response. Sometimes it’s fast and takes 10 minutes, but I had runs that took more than an hour.
从 GPT 5/5.1 到 5.2 的转变意义重大。我大约一个月前开发了 oracle 🧿——它是一个 CLI,允许代理运行 GPT 5 Pro,上传文件 + 提示,并管理会话,方便后续检索答案。我之所以这么做,是因为很多代理卡住时,我会让它把所有内容写入一个 markdown 文件,然后自己做查询,感觉那是重复性的时间浪费——也是闭环的机会。说明在我的全局 AGENTS 里。MD 文件和模型有时单独触发 oracle 卡住了。我一天多次使用。这是一次巨大的解锁 。Pro 非常擅长快速通关~50 个网站,然后认真思考,几乎每次都能完美应对。有时候跑得很快,只要 10 分钟,但我有跑过超过一小时的。

Now that GPT 5.2 is out, I have far fewer situations where I need it. I do use Pro myself sometimes for research, but the cases where I asked the model to “ask the oracle” went from multiple times per day to a few times per week. I’m not mad about this - building oracle was super fun and I learned lots about browser automation, Windows and finally took my time to look into skills, after dismissing that idea for quite some time. What it does show is how much better 5.2 got for many real-life coding tasks. It one-shots almost anything I throw at it.
现在 GPT 5.2 发布后,我需要它的场合少了很多。我自己有时也会用 Pro 做研究,但我让模型“问神谕”的情况从一天多次变成了每周几次。我并不生气——构建 Oracle 非常有趣,我学到了很多关于浏览器自动化、Windows 的知识,终于花时间去了解技能,之前我已经放弃了这个想法。它确实展示了 5.2 版本在许多实际编程任务上的提升。我扔给它几乎什么都能一击秒杀 。

Another massive win is the knowledge cutoff date. GPT 5.2 goes till end of August whereas Opus is stuck in mid-March - that’s about 5 months. Which is significant when you wanna use the latest available tools.
另一个巨大的胜利是知识截止日期 。GPT 5.2 会持续到八月底,而 Opus 则卡在三月中旬——大约五个月。这在使用最新工具时尤为重要。

A Concrete Example: VibeTunnel 一个具体例子:VibeTunnel#

To give you another example on how far models have come. One of my early intense projects was VibeTunnel. A terminal-multiplexer so you can code on-the-go. I poured pretty much all my time into this earlier this year, and after 2 months it was so good that I caught myself coding from my phone while out with friends… and decided that this is something I should stop, more for mental health than anything. Back then I tried to rewrite a core part of the multiplexer away from TypeScript, and the older models consistently failed me. I tried Rust, Go… god forbid, even zig. Of course I could have finished this refactor, but it would have required lots of manual work, so I never got around completing this before I put it to rest. Last week I un-dusted this and gave codex a two sentence prompt to convert the whole forwarding-system to zig, and it ran over 5h and multiple compactions and delivered a working conversion in one shot.
再举一个例子,说明模型进步有多大。我早期的一个紧张项目是 VibeTunnel。一个终端复用器,方便你随时编程。今年早些时候我几乎把所有时间都投入到这个项目上,两个月后效果非常好,以至于我发现自己在和朋友们出去时用手机写代码…我决定这应该停止,更多是为了心理健康。那时我试图重写多路复用器的核心部分,但老型号一直让我失望。我试过 Rust,Go…天哪,连 Zig 也不例外。当然我本可以完成这个重构,但那会需要大量手工工作,所以我从未完成它就彻底结束了。上周我拆了这个,给 Codex 一个两句话的提示 ,要求把整个转发系统转换成 zig,结果运行了 5 小时和多个合成,一次性完成了一个可运行的转换。

Why did I even un-dust it, you ask? My current focus is Clawdis, an AI assistant that has full access to everything on all my computers, messages, emails, home automation, cameras, lights, music, heck it can even control the temperature of my bed. Ofc it also has its own voice, a CLI to tweet and its own clawd.bot.
你问我为什么要把灰尘除掉?我目前的重点是 Clawdis,一个 AI 助手,能完全访问我所有电脑上的一切 ,消息 、 邮件 、 家庭自动化 、 摄像头 、灯光、 音乐 ,甚至还能控制床的温度 。当然,它也有自己的声音 、 一个可发推的 CLI 和自己的 clawd.bot。

Clawd can see and control my screen and sometimes makes snarky remarks, but I also wanted to give him the ability to check on my agents, and getting a character stream is just far more efficient than looking at images… if this will work out, we’ll see!
Clawd 能看到并控制我的屏幕 ,有时还会说些讽刺的话,但我也想让他能检查我的特工,而且获得角色流比看图片高效多了…如果能成功,我们拭目以待!

My Workflow 我的工作流程

I know… you came here to learn how to build faster, and I’m just writing a marketing-pitch for OpenAI. I hope Anthropic is cooking Opus 5 and the tides turn again. Competition is good! At the same time, I love Opus as general purpose model. My AI agent wouldn’t be half as fun running on GPT 5. Opus has something special that makes it a delight to work with. I use it for most of my computer automation tasks and ofc it powers Clawd🦞.
我知道。。。你来这里是为了学习如何更快地构建 ,而我只是在为 OpenAI 写一个营销推介。我希望 Anthropic 正在制作作品 5,潮流再次转变。竞争是好事!同时,我也很喜欢 Opus 这个通用型号。我的 AI 代理在 GPT 5 上运行可不会有一半乐趣。Opus 有一种特别的东西,让它变得非常愉快。我用它完成大部分电脑自动化任务,当然它驱动了 Clawd🦞。

I haven’t changed my workflow all that much from my last take at it in October.
自十月份上次尝试以来,我的工作流程并没有太大变化。

I usually work on multiple projects at the same time. Depending on complexity that can be between 3-8. The context switching can be tiresome, I really only can do that when I’m working at home, in silence and concentrated. It’s a lot of mental models to shuffle. Luckily most software is boring. Creating a CLI to check up on your food delivery doesn’t need a lot of thinking. Usually my focus is on one big project and satellite projects that chug along. When you do enough agentic engineering, you develop a feeling for what’s gonna be easy and where the model likely will struggle, so often I just put in a prompt, codex will chug along for 30 minutes and I have what I need. Sometimes it takes a little fiddling or creativity, but often things are straightforward.
我通常同时做多个项目 。根据复杂度,可能在 3 到 8 之间。切换语境确实很累,我只有在家工作时才能做到,安静专注。需要调整很多心理模型。幸运的是,大多数软件都很无聊。创建一个用来检查你的食品配送的 CLI 不需要太多思考。通常我的重点是一个大项目和一些分部项目,这些项目都在缓慢推进。当你做足够多的代理工程时,你会对哪些地方容易,哪些地方模型可能吃力,所以我经常只输入一个提示词,Codex 就能慢慢完成 30 分钟,我就有了所需的东西。有时候需要一点调整或创造力,但很多时候事情都很简单。

I extensively use the queueing feature of codex - as I get a new idea, I add it to the pipeline. I see many folks experimenting with various systems of multi-agent orchestration, emails or automatic task management - so far I don’t see much need for this - usually I’m the bottleneck. My approach to building software is very iterative. I build sth, play with it, see how it “feels”, and then get new ideas to refine it. Rarely do I have a complete picture of what I want in my head. Sure, I have a rough idea, but often that drastically changes as I explore the problem domain. So systems that take the complete idea as input and then deliver output wouldn’t work well for me. I need to play with it, touch it, feel it, see it, that’s how I evolve it.
我大量使用 Codex 的排队功能 ——每当有新想法时,我会把它加入流程。我看到很多人在尝试各种多代理编排、邮件或自动任务管理系统——到目前为止我觉得没什么必要——通常我是瓶颈。我构建软件的方式非常迭代。我会自己动手,玩玩看,感受一下“感觉”,然后再想出新点子去完善它。我很少能在脑海中完整描绘自己想要什么。当然,我有个大致的想法,但随着我探索问题领域,这个想法往往会发生巨大变化。所以把完整想法作为输入然后输出的系统对我来说不太适用。我需要玩弄它,触摸它,感受它,看到它,这就是我进化它的方式。

I basically never revert or use checkpointing. If something isn’t how I like it, I ask the model to change it. codex sometimes then resets a file, but often it simply reverts or modifies the edits, very rare that I have to back completely, and instead we just travel into a different direction. Building software is like walking up a mountain. You don’t go straight up, you circle around it and take turns, sometimes you get off path and have to walk a bit back, and it’s imperfect, but eventually you get to where you need to be.
我基本上从不回退或使用检查点。如果某样东西不符合我喜欢的效果,我会让模型修改。Codex 有时会重置文件,但更多时候只是回退或修改编辑内容,这种情况很少需要我完全备份,我们只是朝着不同的方向前进。构建软件就像爬山。你不会直上去,而是绕着它转弯,有时会偏离路线,不得不往后走一点,虽然不完美,但最终你会到达目的地。

I simply commit to main. Sometimes codex decides that it’s too messy and automatically creates a worktree and then merges changes back, but it’s rare and I only prompt that in exceptional cases. I find the added cognitive load of having to think of different states in my projects unnecessary and prefer to evolve it linearly. Bigger tasks I keep for moments where I’m distracted - for example while writing this, I run refactors on 4 projects here that will take around 1-2h each to complete. Ofc I could do that in a worktree, but that would just cause lots of merge conflicts and suboptimal refactors. Caveat: I usually work alone, if you work in a bigger team that workflow obv won’t fly.
我直接选择主玩 。有时候 Codex 会觉得太乱,会自动创建一个工作树,然后再合并回去,但这种情况很少见,我只在极少数情况下提示。我觉得项目中需要考虑不同状态带来的额外认知负担没必要,更喜欢线性地演进。较大的任务我会留到分心时才做——比如写这篇时,我会对 4 个项目进行重构,每个项目大约需要 1-2 小时完成。当然我可以在工作树里这么做,但那样只会引发大量合并冲突和次优重构。需要说明的是:我通常单独工作,如果你在大团队里工作,这种工作流程显然行不通。

I’ve already mentioned my way of planning a feature. I cross-reference projects all the time, esp if I know that I already solved sth somewhere else, I ask codex to look in …/project-folder and that’s usually enough for it to infer from context where to look. This is extremely useful to save on prompts. I can just write “look at …/vibetunnel and do the same for Sparkle changelogs”, because it’s already solved there and with a 99% guarantee it’ll correctly copy things over and adapt to the new project. That’s how I scaffold new projects as well.
我已经提到过我策划一部长片的方式。我经常交叉参考项目 ,尤其是当我知道自己已经在别处解决过某个问题时,我会请求 Codex 去找…/project-folder,通常这就足够让它从上下文推断出该去哪里找。这对节省提示非常有用。我可以写“看…/vibetunnel,并对 Sparkle 变更日志做同样的作“,因为那里已经解决了,而且有 99% 的保证会正确复制并适应新项目。我也是这样搭建新项目的。

I’ve seen plenty of systems for folks wanting to refer to past sessions. Another thing I never need or use. I maintain docs for subsystems and features in a docs folder in each project, and use a script + some instructions in my global AGENTS file to force the model to read docs on certain topics. This pays off more the larger the project is, so I don’t use it everywhere, but it is of great help to keep docs up-to-date and engineer a better context for my tasks.
我见过很多系统,方便有人查阅过去的会话。这也是我从不需要或用不到的东西。我在每个项目的文档文件夹里维护子系统和特征的文档,并用脚本加一些全局 AGENTS 文件的指令强制模型读取某些主题的文档。项目越大,这种方法越划算,所以我不会到处用,但它对保持文档更新和为任务设计更好的上下文非常有帮助。

Apropos context. I used to be really diligent to restart a session for new tasks. With GPT 5.2 this is no longer needed. Performance is extremely good even when the context is fuller, and often it helps with speed since the model works faster when it already has loaded plenty files. Obviously that only works well when you serialize your tasks or keep the changes so far apart that two sessions don’t touch each other much. codex has no system events for “this file changed”, unlike claude code, so you need to be more careful - on the flip side, codex is just FAR better at context management, I feel I get 5x more done on one codex session than with claude. This is more than just the objectively larger context size, there’s other things at work. My guess is that codex internally thinks really condensed to save tokens, whereas Opus is very wordy. Sometimes the model messes up and its internal thinking stream leaks to the user, so I’ve seen this quite a few times. Really, codex has a way with words I find strangely entertaining.
说说背景。我以前非常勤奋地为新任务重新开始游戏。 到了 GPT 5.2 版本,这种情况不再需要。即使上下文更丰富,性能也非常好,而且模型加载了大量文件后运行更快,通常有助于提升速度。显然,只有在你把任务序列化或者让两个会话之间保持足够远的变更时,这种方法才有效。Codex 没有“此文件更改”的系统事件,不像 Claude Code,所以你需要更小心——另一方面,Codex 在上下文管理方面远远胜过 Claude,我觉得一次 Codex 会话能完成的任务是用 Claude 的 5 倍。这不仅仅是客观上更大的上下文大小,还有其他因素在起作用。我猜 Codex 内部思考非常精简以节省代币,而 Opus 则非常冗长。有时模型会出错,内部思维流会泄露给用户 ,所以我见过不少次这种情况。说真的,Codex 用词的本事让我觉得奇怪地有趣。

Prompts. I used to write long, elaborate prompts with voice dictation. With codex, my prompts gotten much shorter, I often type again, and many times I add images, especially when iterating on UI (or text copies with CLIs). If you show the model what’s wrong, just a few words are enough to make it do what you want. Yes, I’m that person that drags in a clipped image of some UI component with “fix padding” or “redesign”, many times that either solves my issue or gets me reasonably far. I used to refer to markdown files, but with my docs:list script that’s no longer necessary.
提示。我以前会用语音输入写长而复杂的提示。用 Codex 后,我的提示词变短了,我经常重新打字,很多时候会添加图片,尤其是在界面迭代(或用 CLI 复制文本)时。如果你向模型展示问题所在,只需几句话就能让它按你想要的行动。是的,我就是那种会拖入某个界面组件的裁剪图片,上面有“修正填充”或“重新设计”,很多时候这要么解决了我的问题,要么帮我走得很远。我以前会参考 markdown 文件,但用 docs:list 脚本后,那个已经不再必要了。

Markdowns. Many times I write “write docs to docs/.md” and simply let the model pick a filename. The more obvious you design the structure for what the model is trained on, the easier your work will be. After all, I don’t design codebases to be easy to navigate for me, I engineer them so agents can work in it efficiently. Fighting the model is often a waste of time and tokens.
折扣。很多时候我写“ 写文档到 docs/
.md”,然后让模型自己选文件名。你越是明确地设计了模型训练对象的结构,工作就越容易。毕竟,我设计代码库并不是让我易于导航,而是让代理能高效地工作。反对模型往往是浪费时间和代币。

Tooling & Infrastructure 工具与基础设施

What’s still hard? Picking the right dependency and framework to set on is something I invest quite some time on. Is this well-maintained? How about peer dependencies? Is it popular = will have enough world knowledge so agents have an easy time? Equally, system design. Will we communicate via web sockets? HTML? What do I put into the server and what into the client? How and which data flows where to where? Often these are things that are a bit harder to explain to a model and where research and thinking pays off.
还有什么难的? 选择合适的依赖和框架来设置,是我投入大量时间的事情。这里维护得好吗?那同伴依赖呢?它受欢迎吗 = 会有足够的世界知识,让特工们轻松应对?同样,系统设计也是如此。我们会通过网页套接字进行通信吗?HTML?我该把什么放到服务器里,又要放什么到客户端?哪些数据如何流向何处?这些往往是向模型解释较难的事情,研究和思考往往会有所回报。

Since I manage lots of projects, often I let an agent simply run in my project folder and when I figure out a new pattern, I ask it to “find all my recent go projects and implement this change there too + update changelog”. Each of my project has a raised patch version in that file and when I revisit it, some improvements are already waiting for me to test.
因为我管理很多项目,通常我会让代理直接在我的项目文件夹里运行,当我找到新的模式时,会让它“ 找到我所有最近的 Go 项目并在那里实现这个更改 + 更新更新日志”。我的每个项目文件里都有一个提升版补丁,当我重新查看时,已经有一些改进等待我测试。

Ofc I automate everything. There’s a skill to register domains and change DNS. One to write good frontends. There’s a note in my AGENTS file about my tailscale network so I can just say “go to my mac studio and update xxx”.
当然,我会自动化所有作 。注册域名和更改 DNS 是一项技能。一个是写好前端的。我的 AGENTS 文件里有关于我的 Tailscale 网络的说明,所以我可以直接说“去我的 Mac Studio 更新 xxx”。

Apropos multiple Macs. I usually work on two Macs. My MacBook Pro on the big screen, and a Jump Desktop session to my Mac Studio on another screen. Some projects are cooking there, some here. Sometimes I edit different parts of the same project on each machine and sync via git. Simpler than worktrees because drifts on main are easy to reconcile. Has the added benefit that anything that needs UI or browser automation I can move to my Studio and it won’t annoy me with popups. (Yes, Playwright has headless mode but there’s enough situations where that won’t work)
关于多台 Mac。我通常用两台 Mac 工作。我的 MacBook Pro 在大屏幕上,另一个屏幕则通过跳板桌面连接到我的 Mac Studio。有些项目在那里进行,有些在这里。有时我会在每台机器上编辑同一项目的不同部分,并通过 git 同步。比工作树简单,因为主树上的漂移更容易对齐。还有一个额外好处是,任何需要 UI 或浏览器自动化的设备我都可以搬到 Studio 里,而且不会被弹窗烦扰。(是的,Playwright 有无头模式,但有足够多的情况它无法使用)

Another benefit is that tasks keep running there, so whenever I travel, remote becomes my main workstation and tasks simply keep running even if I close my Mac. I did experiment with real async agents like codex or Cursor web in the past, but I miss the steerability, and ultimately the work ends up as pull request, which again adds complexity to my setup. I much prefer the simplicity of the terminal.
另一个好处是任务会在那里持续运行 ,所以每次出差时,远程工作站都会成为我的主要工作站,任务即使关闭 Mac 也会继续运行。我过去也尝试过像 codex 或 Cursor Web 这样的真正异步代理,但我怀念它的可引导性,最终工作变成了拉取请求,这又增加了我的设置复杂性。我更喜欢终端的简洁设计。

I used to play with slash commands, but just never found them too useful. Skills replaced some of it, and for the rest I keep writing “commit/push” because it takes the same time as /commit and always works.
我以前玩过砍击命令,但觉得没什么用。技能取代了部分内容,剩下的我一直写“commit/push”,因为这和/commit 花的时间一样,而且总是有效。

In the past I often took dedicated days to refactor and clean up projects, I do this much more ad-hoc now. Whenever prompts start taking too long or I see sth ugly flying by in the code stream, I’ll deal with it right away.
过去我经常专门花几天来重构和清理项目,现在我做得更多是临时拼凑。每当提示变得太慢,或者我在代码流中看到什么难看的东西飞过时,我都会立刻处理。

I tried linear or other issue trackers, but nothing did stick. Important ideas I try right away, and everything else I’ll either remember or it wasn’t important. Of course I have public bug trackers for bugs for folks that use my open source code, but when I find a bug, I’ll immediately prompt it - much faster than writing it down and then later having to switch context back to it.
我试过线性或其他问题追踪器 ,但都没用。重要的想法我会立刻尝试,其他的要么我会记得,要么不重要。当然,我为使用我开源代码的人提供了公开的错误追踪器,但一旦发现错误,我会立刻提示——比写下来然后再切换上下文快得多。

Whatever you build, start with the model and a CLI first. I had this idea of a Chrome extension to summarize YouTube vids in my head for a long time. Last week I started working on summarize, a CLI that converts anything to markdown and then feeds that to a model for summarization. First I got the core right, and once that worked great I built the whole extension in a day. I’m quite in love with it. Runs on local, free or paid models. Transcribes video or audio locally. Talks to a local daemon so it’s super fast. Give it a go!
无论你做什么,先从模型和一个 CLI 开始 。我脑海里一直有个用 Chrome 扩展来总结 YouTube 视频的想法。上周我开始做 summarize,这是一个 CLI,可以把任何东西转换成 markdown,然后再输入模型进行摘要。首先我把核心做对了,一旦效果很好,我一天内就建好了整个延伸部分。我非常喜欢它。可在本地、免费或付费模型上运行。本地转录视频或音频。它能和本地守护进程对话,所以速度非常快。 试试看吧!

My go-to model is gpt-5.2-codex high. Again, KISS. There’s very little benefit to xhigh other than it being far slower, and I don’t wanna spend time thinking about different modes or “ultrathink”. So pretty much everything runs on high. GPT 5.2 and codex are close enough that changing models makes no sense, so I just use that.
我常用的模型是 gpt-5.2-codex 高。 再说一次,亲吻。xhigh 除了速度慢之外几乎没有什么好处,我也不想花时间去想不同的模式或“超思考”。所以几乎所有东西都是高档运行的。GPT 5.2 和 Codex 已经足够接近,换模型没什么意义,所以我就用那个。

My Config 我的配置

This is my ~/.codex/config.toml:
这是我的 ~/.codex/config.toml:

model = "gpt-5.2-codex"
model_reasoning_effort = "high"
tool_output_token_limit = 25000
# Leave room for native compaction near the 272–273k context window.
# Formula: 273000 - (tool_output_token_limit + 15000)
# With tool_output_token_limit=25000 ⇒ 273000 - (25000 + 15000) = 233000
model_auto_compact_token_limit = 233000
[features]
ghost_commit = false
unified_exec = true
apply_patch_freeform = true
web_search_request = true
skills = true
shell_snapshot = true

[projects."/Users/steipete/Projects"]
trust_level = "trusted"

This allows the model to read more in one go, the defaults are a bit small and can limit what it sees. It fails silently, which is a pain and something they’ll eventually fix. Also, web search is still not on by default? unified_exec replaced tmux and my old runner script, rest’s neat too. And don’t be scared about compaction, ever since OpenAI switched to their new /compact endpoint, this works well enough that tasks can run across many compacts and will be finished. It’ll make things slower, but often acts like a review, and the model will find bugs when it looks at code again.
这样模型一次读取更多数据,默认值较小,可能会限制它能看到的内容。它会悄无声息地失效,这很麻烦,他们最终会修复。另外,网页搜索默认还不是开启的吗?unified_exec 替换了 TMUx 和我以前的 Runner 脚本,其他的也很不错。别害怕压缩,自从 OpenAI 切换到新的/compact 端点后,这个方法已经足够好,任务可以跨多个压缩节点运行并完成。它会让速度变慢,但通常像是在复审,模型在重新审视代码时会发现错误。

That’s it, for now. I plan on writing more again and have quite a backlog on ideas in my head, just having too much fun building things. If you wanna hear more ramblings and ideas how to build in this new world, follow me on Twitter.
暂时就这些。我打算继续写作,脑子里积压了不少想法,只是太享受构建东西了。如果你想听更多关于如何在这个新世界中建设的想法和胡言乱语,请关注我的推特 。

ref

https://steipete.me/posts/2025/shipping-at-inference-speed

Logo

有“AI”的1024 = 2048,欢迎大家加入2048 AI社区

更多推荐