GPT-5.3 absolutely cooked on Terminal-Bench. 77.3% vs 65.4%, not even close. If you're running agents that need to hammer out CLI commands all day, this is the one. Speed is legit too — 25% faster and burns half the tokens for the same work.
Opus 4.6?? Near-human on OSWorld. 72.7% when humans score ~72%. GPT-5.3 is at 64.7% there. For anything where the model needs to actually think its way through a messy problem on a computer, Opus pulls ahead.
The 1M context thing is lowkey the biggest deal nobody's paying attention to. Opus scores 76% on needle-in-a-haystack at 1M tokens. That's not "we accept 1M and forget 90% of it" — it's actually tracking stuff across the whole input.
Every.to ran independent tests and found something that matches what I've been feeling too: Opus is better when you throw a vague problem at it and let it cook. GPT-5.3 is better when you write a tight spec and want precise execution.
Opus explores, Codex executes. Pick based on how you work.
The timing is hilarious though. Same day, same hour basically. The AI wars are getting spicy
I know most devs swear by Claude , Cursor or Gemini at this point. and honestly I get it, I've been a Claude Code user for months. But I gave GPT-5.3 Codex a proper shot today and... it's actually really good?
Like the terminal stuff is noticeably faster. I threw a mid-size Node project at it (auth system, REST API, Postgres migrations) and it just went. Didn't overthink, didn't ask 15 clarifying questions, just shipped working code. That 25% speed boost they advertised feels real in practice.
I've seen some tweets about GPT Codex 5.3 superior to claude 4.6 but havent given a try yet; couldnt believe what I read, they are claiming codex 5.3 better at backend than claude :O
6 Comments
Opus explores, Codex executes. Pick based on how you work. The timing is hilarious though. Same day, same hour basically. The AI wars are getting spicy
I know most devs swear by Claude , Cursor or Gemini at this point. and honestly I get it, I've been a Claude Code user for months. But I gave GPT-5.3 Codex a proper shot today and... it's actually really good?
Like the terminal stuff is noticeably faster. I threw a mid-size Node project at it (auth system, REST API, Postgres migrations) and it just went. Didn't overthink, didn't ask 15 clarifying questions, just shipped working code. That 25% speed boost they advertised feels real in practice.