Stop Letting Context Kill Your AI Ops (free tier)
Stop Letting Context Kill Your AI Ops
Most teams think their AI problem is prompt quality.
This week reminded me the real problem is usually operational design. We found an $85 cost spike that looked like a model issue at first. It wasn't. The actual culprit was long-running context, 150+ messages deep, getting reprocessed over and over. Same tools, same goals, very different economics.
That matters because a lot of operators are quietly building fragile AI systems. They work in demos, then get slower, pricier, and harder to manage the moment the workflow gets real.
The takeaway for this week is simple: if you want AI to be useful in a business, design for bounded context, clear handoffs, and repeatable roles.
The playbook: treat AI like an operating system, not a chat window
Here are three practical changes that made a difference in our org this week.
1. Put hard boundaries on context
If an agent keeps carrying the whole conversation forever, costs creep up and accuracy usually gets worse before you notice.
What worked for us: - Start fresh sessions after major topic switches - Keep automations pinned to explicit models instead of vague defaults - Move recurring work into cron jobs or dedicated flows, not one giant ongoing thread - Save key decisions to files so the system can reload the right context instead of dragging old noise forward
The big shift is this: memory should be written, not implied.
2. Separate planning from execution
We spent time this week pressure-testing delegation patterns across the org. The lesson was not "more agents = better." It was "clear authority beats clever architecture."
When a workflow gets muddy, use a simple stack: 1. Planner defines the mission 2. Executor handles the build or task loop 3. Reviewer checks the output
If those roles blur together, you get thrash. If they stay clean, you get throughput.
3. Optimize model choice by job type
Not every task deserves the same model.
This week we reallocated cron jobs based on the kind of work being done: - GPT-5.4 for higher-value operational tasks - Gemma for lightweight content jobs where free/local made sense - Haiku for low-reasoning maintenance tasks like backups and memory cleanup
That's the real cost move. Don't ask "what's the best model?" Ask "what is the cheapest model that can do this reliably?"
This Week in the Org
A few concrete moves from inside the org: - Re-pinned cron jobs to explicit models after discovering context and fallback confusion was driving unnecessary spend - Built a live-sync API layer for Mission Control so multiple machines can pull from one source of truth - Expanded the shared Obsidian vault workflow so all agents checkpoint into the same operating memory - Audited and reorganized cron jobs, including the AI Operative newsletter cadence and tier split - Tested and then reverted a more complex foreman delegation model after deciding the simpler Jarvis → Ralph → Cody structure fit the workflow better
That last point is worth underlining. Reverting a fancy system is progress if the simpler one is actually more usable.
AI News This Week
A few stories worth watching through an operator lens:
- Poke makes using AI agents as easy as sending a text (TechCrunch)
Consumer-style UX for agents keeps getting simpler, which means expectations for business tooling will rise too.
- Anthropic expands its compute deal with Google and Broadcom (TechCrunch)
Infrastructure demand is still exploding, a reminder that the AI layer you see depends on a very expensive layer you don't.
- Meta unveils Muse Spark (Ars Technica)
Big labs are still racing to own distribution and relevance, which gives operators more model choice but also more stack volatility.
- Anthropic limits access to its new cybersecurity model, Mythos (Ars Technica)
Dual-use risk is starting to shape product access, not just model capability.
- Show HN: CSS Studio, design by hand, code by agent (Hacker News)
The most interesting pattern is hybrid control: humans set direction, agents accelerate execution.
Bottom line
If your AI stack feels expensive, messy, or weirdly inconsistent, don't start with better prompting.
Start by asking: - Where is context growing without limits? - Which handoffs are ambiguous? - Which tasks are over-modeled? - What should be written to memory instead of held in chat?
That is usually where the real gains are.
Want advanced tactics and case studies? Subscribe to the premium tier.