Everyone's Building AI Farms. Here's What They're Missing.
Published on March 05, 2026
r/LocalLLaMA is full of Mac Studio builds running 70B parameter models through Ollama. People are posting their setups like it's 2013 Bitcoin mining — except the hash rate is tokens per second.
The software side is moving just as fast. Kimi K2 has free API access through OpenRouter and NVIDIA. Mistral has a free tier. Google is handing out Gemini credits. If you're smart about routing, you can run a small fleet of agents without paying for inference at all.
Both groups are arriving at the same place from different directions: functionally unlimited compute.
So what do you do with it?
I've been running AI agents as actual work infrastructure for a few months now. Morning briefings, evening recaps, Reddit monitoring, content drafting, overnight coding loops. Not experiments — real daily workflows.
But compute was never the constraint.
Unlimited inference is like unlimited lumber. You still need blueprints, a foundation, and someone who knows what house they're building. The bottlenecks are everywhere else.
Orchestration Is Getting Better. Not Solved.
Ten agents running simultaneously sounds impressive until you realize they need to not step on each other.
There are people building real multi-agent systems: feedback loops where agents review each other's work, orchestration layers that spin up sub-agents for specific tasks and report back. The tooling is getting surprisingly good. I can describe a product evolution in enough detail now that an agent could run with it for hours.
But "could" is doing a lot of work in that sentence.
I had an automated review loop running overnight a few weeks ago. Left it going, went to bed. Woke up to find it had been running for hours, building in the wrong direction. Had to scrap the whole thing and start over. The agent was working hard. It was just working hard on the wrong thing.
That's the state of autonomous orchestration right now. The ceiling is high and rising fast, but the failure modes are still "confidently wrong for eight hours straight." You need either tight guardrails or a willingness to throw away work.
One well-directed agent with good memory and clear instructions still outperforms five agents running in parallel with vague goals. My setup runs one primary agent through Telegram — he has context about my projects, my schedule, my preferences. He's not the frontier model, but he knows what I need, and that matters more than raw capability.
You don't need a farm. You need a foreman.
The 90% Problem
Let's say orchestration gets solved tomorrow. Agents can coordinate, stay on track, build autonomously through the night.
There's still a gap.
An agent can build three features overnight: write the code, set up the tests, wire the components together. But it can't really test them the way a human would (though this is also changing fast). It can't click through the flows and feel whether the UX is right, or catch the subtle wrongness of a feature that technically works but doesn't make sense.
This is the "90% of the work" bottleneck. The agent does most of it, but the last 10% — the verification, the judgment calls, the "does this actually feel right" — still needs a person.
Browser-capable models are already chipping away at this. Computer use, Operator, tools that can navigate UIs and click through workflows. They're not seamless yet, but the trajectory is obvious. This gap is closing, and it's one of the ones I'd bet on closing fast.
Compute Doesn't Solve the Moat Problem
Even with unlimited agents running around the clock, you still can't get Ahrefs' search data. You can't replicate an incumbent's user base. You can't download someone else's distribution.
Compute gives you speed. It doesn't give you access.
I can spin up agents to write content, build features, analyze markets. But the data moats (the proprietary datasets, the network effects, the things that take years of users to accumulate) don't bend to more tokens per second.
So what happens? Does cheap compute erode moats by letting small teams move faster? Or does it widen the gap because incumbents can deploy the same agents on top of their existing advantages?
I don't know. But it's the question I think about most.
What 24/7 Agents Mean for Builders
If every solo founder can run agents overnight (coding, writing, researching, testing), there might be a limited window where this is an advantage.
Right now, most people aren't doing it. The tooling is new, the workflows are unproven, the learning curve is real. If you figure it out early, you get a multiplier that your competitors don't have.
But that window closes. The tools get easier, more people adopt them, and the advantage shifts from "I have agents" to "I have better agents" to "everyone has agents and it's table stakes."
For marketing, for product development, for anything that compounds over time — starting now probably matters more than starting perfectly.
Coordination Is Less of a Problem Than I Expected
Agents are already better at coordinating than humans in some ways.
They use git and can work in separate worktrees without stepping on each other. You can give them concrete communication rules and they actually follow them — and you have full visibility into what each agent is doing at all times. No status meetings required.
Compare that to a team of humans who need standups, Slack threads, Jira tickets, and still manage to duplicate work or block each other. Agents have a lower coordination overhead than people for well-defined tasks.
The hard part isn't coordination itself — it's defining the work clearly enough that coordination is possible. Which brings us back to orchestration.
The Real Cost Equation
Honest math on my setup:
Claude Max: $200/month. The heavy thinking. Complex reasoning, writing, architecture decisions. The subscription sounds expensive until you calculate what it replaces.
Free tiers (Kimi K2, Mistral, Gemini): $0. Useful for structured tasks like classification and monitoring. Kimi K2 through OpenRouter or NVIDIA costs nothing for moderate use. I run cron jobs on a Railway VPS for $5-20/month. The inference is free.
Mac Studio M4 Ultra: $4,000+. Runs large models locally with no ongoing inference costs. But the models you run locally are still a step behind frontier APIs for complex tasks. The gap is closing, but it's there. The latest Qwen models are performing well on as low as 8-12GB of memory, so running performant local models is possible now on even cheaper hardware.
When does local make sense? High-volume inference where model quality matters less — embedding generation, bulk classification, that kind of thing. If you're running thousands of calls a day on repetitive tasks, the hardware pays for itself.
For one to five agents doing thoughtful work? API access is cheaper, better, and requires zero hardware maintenance.
The $4,000 Mac Studio crowd and the $5/month Railway VPS crowd might end up in similar places productivity-wise. The difference is one of them has $3,800 left over.
What Actually Matters
If I had to rank what matters for a productive agent setup, compute would be near the bottom.
Memory. Without persistent memory, every session starts from zero — you're not building, you're rebuilding. I use a combination of daily notes, a knowledge graph, and structured memory files. Not elegant, but my agent knows what I'm working on without me re-explaining it every morning.
Cron. Agents that only work when you talk to them are assistants. Agents that work on a schedule are infrastructure. Morning briefings run automatically, evening recaps happen whether I remember to ask or not, and Reddit monitoring checks every few hours. The compound value comes from consistency, not speed.
Communication channels. I run everything through Telegram — check in from my phone, send quick tasks, get alerts. The channel matters more than the model. If your agent lives in a terminal you have to SSH into, you won't use it.
Compound learning. Every night, my system reviews the day — what worked, what didn't, what to adjust. Those lessons feed into tomorrow. After a few weeks, I noticed the difference. Suggestions got sharper. Briefings got more relevant. Slow, imperfect, but real.
None of this requires a Mac Studio. All of it requires thought.
Where I Am
I studied engineering in school. Went through Techstars, worked at early-stage startups, did growth and product management at Unito, then quit to build SEOTakeoff, among other things. My whole career has been building things and figuring out how to get them in front of people.
Now I'm trying to figure out how agents fit into that.
Hardware: MacBook Air. Not a powerhouse. Runs my agent framework and that's about it.
Compute: Claude Max for the heavy thinking. Kimi K2 (free) on a Railway VPS for cron jobs and automation. I looked at running local models. The quality wasn't there for what I needed (though it's changing quickly).
What's working: Morning briefings have been worth every dollar. I wake up to a summary of what happened overnight, what's on my calendar, what needs attention. Evening recaps help me close loops. Reddit monitoring has surfaced leads I would have missed. Overnight coding loops, when they work, compress days of work into hours.
What's not: Overnight processes still need guardrails. That automated loop that built the wrong thing overnight? Real setback. My task decomposition is getting better but I still catch myself giving vague instructions and getting vague results back.
What I'd tell someone starting today: Don't buy hardware. Get the $20/mo API subscription and a $5/month VPS and get started. Spend the first week figuring out what you actually want automated. Build memory systems before you build anything else. Accept that the hard part isn't getting the AI to run. It's figuring out what to point it at.
The gold rush is real, the Mac Studios are cool, and the free tiers are useful.
But compute was never the problem. Knowing what to compute — and building the systems to make that computation useful — that's the actual work.
I don't have it figured out, and I'm still learning what works. But the bottleneck is not the machine — it's everything around it.
The people who figure out the "everything around it" part first? They're the ones who'll actually build something with all this cheap compute.
Everyone else just has a very expensive space heater.