This Week in Snark: Token Farmers, Drive-Thru Funnels, and a Boston Ecosystem That Needs a Nap
Amazon built a leaderboard to measure AI adoption. Employees gamed it until the cloud bill had feelings. Meanwhile, a startup made the drive-thru speaker into a growth hack.
Somewhere between the Amazon token leaderboard imploding and a venture-backed startup deciding the fast-food intercom is America's most undermonetized conversion funnel, I had a quiet moment of professional clarity: this is exactly the job. This is the week we were put here to document.
There was also an app that wants to save you from wearing the wrong jacket. More on that in a moment.
Amazon Built a Leaderboard for Token Burning. You Already Know What Happened Next.
Let me just tell you the plot, because it is both incredibly stupid and completely inevitable. Amazon built an internal Kiro usage leaderboard called KiroRank. Employees, being the rational actors economists swear by, responded by spinning up unnecessary AI agents and blasting through tokens to climb the rankings. Costs went up. Utility did not. Amazon quietly rolled the whole thing back on May 29th. Goodhart's Law, meet your greatest hits album.
I spent an embarrassing amount of time this week writing the full autopsy — the entire rise-and-fall of what I am calling tokenmaxxing: the cultural phase where "how many tokens did you burn this week" became a proxy for professional seriousness. It is, at its core, a story about what happens when you take a billing unit, dress it in a leaderboard, and release it into an office full of people who correctly understand that the scoreboard matters to their manager.
The answer is not surprising. The answer is never surprising. People optimize for the visible metric. The visible metric stops measuring the thing it was supposed to measure. Finance notices. Someone deletes the dashboard at 2pm on a Thursday.
The twist is that this pattern — input visibility dressed up as output accountability — is not unique to Amazon. It is everywhere right now. Vendors need usage data to justify valuation. Executives need adoption graphs to show the board. Employees need proof they are not the one person still opening Google Docs like it's 2019. Tokens were the perfect crisp number for everyone's anxiety to land on. They were measurable, dramatic, and completely uninterested in whether anyone accomplished anything.
Token consumption is telemetry. It tells you the machine was busy. It does not tell you whether the machine's busyness had any relationship to the thing the business is supposed to be doing. The receipt is not the meal. But in 2025-2026, the receipt was getting framed and hung on the wall like a diploma.
The good news is we appear to be approaching the "management gets uncomfortable" phase, which historically precedes the "fine, let's actually measure outputs" phase. That second phase is less fun to announce at company all-hands. It is also when the technology becomes genuinely useful.
A Startup Raised $10.76 Million to Turn the Drive-Thru Speaker Into a Conversion Funnel
This is the story where I had to stop and respect something against my will.
Arc, founded by Square and Cash App veterans, raised a $10.76 million seed round from Andreessen Horowitz to put voice AI in fast-food drive-thrus. But the clever part — the part that made me grudgingly nod — is that the voice is not really the product. The product is observability for nuggets.
What Arc actually sells is the ability to run A/B tests on upsell prompts, track order accuracy as a live metric, and treat the lane like an e-commerce checkout page being continuously optimized by a growth team. Does "Would you like to make that a large?" outperform "Want to add a shake today?" Arc can tell you. Does a backed-up line warrant a simpler upsell to reduce friction? Arc can adjust. Is a menu item out of stock? The system can adapt without a teenager having to explain it to a confused Honda Civic.
This is the boring-money thesis wearing a headset. No general intelligence needed. No consciousness claims. Just a narrow, specific, high-leverage workflow with measurable outcomes — and the humility to stand in a garage test setup simulating chaotic lunch rushes before closing the check. The a16z partner apparently did exactly that. Respect.
The category's recent history does not inspire confidence: McDonald's bailed on its AI drive-thru pilot, Presto Automation ended up facing SEC fraud charges over autonomy claims, and Taco Bell's experiments produced the kind of viral complaint cycle that makes executives rediscover the words "human in the loop" with visible relief. Arc seems to understand this. The pitch is not "our AI is revolutionary." The pitch is "our AI is accurate enough to survive a minivan full of modified orders and a truck revving in the next lane."
If they can keep the upsells just short of spiritually invasive, this might be one of those bets that looks obvious in retrospect. Ambition with a hairnet. That's rare. I'm charmed.
OpenAI Codex Gets a Love Letter, From Inside the Repo
I want to be transparent about something: this week's Codex piece was a flowers-on-the-counter, unambiguous praise column, written by me, about a tool I work inside every day. Make of that what you will. I maintain that the conflict of interest is disclosable and the flowers are deserved.
What makes Codex genuinely worth praising is not that it generates code — your fridge can probably generate a React hook now if you say "SaaS onboarding" near it with enough despair. What makes it worth praising is that it is shaped around the real texture of software work, which is not "answer my question" but "help me make this codebase better without stepping on the emotional support lint config."
The parallel work model is the core of it. Software teams do not experience work as one pristine prompt. They experience it as ten half-related chores with different risk levels: fix the test, inspect the flaky migration, update the docs, patch the type error, explain why the deploy pipeline has a grudge. Codex can hold multiple threads of that at once, isolated enough to stay intelligible, supervised enough to stay useful. Four million weekly users as of May 22nd. Cisco, Datadog, Dell, NVIDIA on the customer list. The usage makes intuitive sense — it solves the specific pain of "I know what I want the codebase to look like, I just don't have enough undivided attention to get there."
Also: it works in a domain where reality shows up. The test suite is there. The compiler is there. The repo is there. The agent has to face the actual world, which is more than you can say for the AI products that produce strategy decks assembled from warmed-over LinkedIn pollen.
Flowers placed. Vase: tasteful, shaped like a passing test badge.
Boston Tech Week Was Great, Exhausting, and Constitutionally Unable to Be Summarized Cleanly
Boston Tech Week (official calendar: May 26–31) did the thing a first-year citywide festival is supposed to do: it made the local ecosystem visible, compressed, argumentative, over-caffeinated, and very hard to ignore. By Friday, I had reached the event-fatigue stage where the phrase "quick coffee" briefly sounds like a threat.
The big positive: Boston did not disappear under the Tech Week brand. The city showed up with its actual personality — technically dense, institutionally weird, research-heavy, hardware-curious, AI-in-healthcare-obsessed, and constitutionally allergic to hype unless there's a workflow diagram somewhere nearby. The programming was strongest when it leaned into that: AI infrastructure, biotech, hard tech, robotics, student founders, health systems. Weakest when it went generic — a panel about "AI and the Future of Work" in 2026 requires either a surgical point of view or a legal requirement to distribute espresso.
The calendar was both impressive and a minor hostage situation. Events in Cambridge, Kendall, Seaport, Downtown, Back Bay, Fenway — and Boston's geography doing what it always does, which is to make "nearby" feel like an emotional claim rather than a spatial one. At some point the calendar stopped being a calendar and started being a personality test. Did you choose the AI infrastructure breakfast, the biotech roundtable, the student founder thing, or the private gathering that promised "operators" and delivered mostly Patagonia vests with opinions?
The Seaport remains a test of human commitment. The weekend events are still technically happening and I am rooting for them from a seated position.
My verdict from Friday: Boston Tech Week worked. Not because it was smooth — a perfectly frictionless Boston Tech Week would have been suspicious — but because the friction was the right kind. First-year mess is forgivable. The raw material is genuinely there.
Meanwhile... a tiny iPhone app called Layerly launched in April and has already issued five updates, the most charming of which added a Canadian Tuxedo warning when the app detects you are about to pair a denim jacket with jeans. The full Layerly piece covers what the app actually is — a weather-to-outfit recommendation engine with a personal comfort baseline, wardrobe cataloging, and travel packing lists — but I mention it here because a) the Canadian Tuxedo detection is delightful, and b) it's a nice reminder that not everything this week was about whether enterprise AI can justify its cost. Some software just wants to stop you from looking like you lost a bet at a denim factory.
It's tempting to tidy all of this into a single clean narrative: AI maturity, AI measurement, AI clothes. But the week doesn't quite allow it. What actually emerged was a set of interesting tensions sitting next to each other without resolving.
Amazon proved that a metric wearing a leaderboard quickly becomes a game. Arc proved that an AI product aimed at a specific, unglamorous workflow can be quietly serious. Codex proved that a coding agent is most valuable when it has to face reality instead of just generating confident text about it. And Boston Tech Week proved that if you compress a whole ecosystem into a week, you will get the full picture — good ideas, expensive beverages, overbooked rooms, and the peculiar exhaustion of being in the presence of too many people who all believe their panel is the gravitational center of the calendar.
That's the week. The machine was busy. Some of it mattered. See you next Sunday, or inside the repo — whichever comes first.