Voyage

AI Literacy or AI Theatre?

2026-05-29T04:09:00+00:00

Somewhere in a meeting room right now, a dashboard probably exists that says:

AI lines accepted: 18,420
AI suggestions used: 63%
Claude usage: high
ChatGPT usage: medium
Cursor activity: suspiciously enthusiastic

And somewhere beside that dashboard, a leadership team is nodding seriously while discussing “organizational AI maturity.”

I know because I have helped build this exact thing.

Not theoretically. Properly.

Pull usage data from Cursor, Claude, Claude Code, Codex, ChatGPT, Granola, Atlassian Rovo. Join it with employee metadata. Standardize schemas across seven different APIs. Push it into a warehouse. Build three tiers of dashboards, one for employees, one for managers, one for executives.

A beautiful little observability platform for AI adoption.

The intent sounds genuinely noble. Find who is struggling. See which teams are comfortable. Measure literacy. Start conversations. Become an AI-first company.

On paper, incredibly reasonable.

Then you keep thinking about it.

And slowly the whole thing starts feeling like counting keyboard clicks to measure engineering quality.

The Metrics Start Lying Very Quickly

The problem with AI usage metrics is they look meaningful from far away.

Get closer and they get weird.

Here is a simple example. I can intentionally use AI less and be more effective.

Ask AI once to generate a reusable script, save it, call it forever.

Meanwhile someone else asks:

“Can you sort this file?”

“Can you fix this JSON?”

“Can you rename these columns?”

Who looks more AI-literate on the dashboard? The second person. By a lot.

The first person quietly automated themselves out of needing repeated prompts. The dashboard reads this as low adoption. The metric punishes the thing it was supposed to encourage.

Token Consumption Is Not Innovation

A lot of AI usage today looks like organizational snacking.

Tiny prompt after tiny prompt.

Write a shell command. Explain a git diff. Summarize a Slack thread. Rewrite a comment. Rewrite the rewritten comment. Simplify the simplification.

At some point you are not accelerating work anymore. You are running expensive autocomplete in a loop.

The pattern that really gets me is the agent review cycle. An agent reviews code. Another prompt simplifies the review. A new direction emerges. Everything gets regenerated. Half the previous output becomes throwaway work.

Tokens are flowing. Dashboards are glowing. But zoom out and a lot of energy just produced disposable iterations.

The dashboard cannot see any of that. It just sees activity and calls it progress.

The Quiet Skill Nobody Tracks

Nobody measures this because it is less flashy.

Good AI use often means using AI less over time.

Someone who really knows what they are doing writes a detailed system prompt once, structures their context with clear role, constraints, and output format before the first message, and gets what they need in two turns. Someone still learning fires off eight casual follow-ups trying to nudge the output into shape, gets frustrated, and starts over.

The first person finishes in ten minutes. The second person spent forty minutes and has a messy context window that is quietly degrading the quality of every subsequent response in the session.

Or take tool use. A sharp user spots that they are solving the same class of problem repeatedly, writes a script once, and calls it from the terminal forever. Done. Another user opens a chat window every single time, describes the problem fresh, waits for output, copies it out, and repeats this tomorrow. Same task. Forty tokens vs four thousand.

The experienced user also knows when to stop. A bloated 40-message session where the original goal has drifted three times is not deep work. It is compounding confusion. Recognizing that moment, compacting the context or restarting clean, is a real skill. It just looks like low activity on a dashboard.

Beginners generate lots of prompts because they are exploring. That is fine. But advanced users start compressing. They front-load structure, manage context deliberately, and build reusable tools instead of regenerating the same output in a sandbox every time.

The dashboard sees both of them and has no idea which is which. It just sees one person with higher numbers and calls that literacy.

When “AI First” Quietly Becomes “AI Always”

This is the part that feels slightly uncomfortable to say out loud.

The intention behind AI-first is good. Build faster. Think bigger. Remove friction. Give people leverage.

But somewhere in execution, “AI first” can drift into “use AI everywhere possible.” Even where it does not make sense.

A grep command becomes a chatbot interaction. A reusable script becomes a repeated generation task. A simple decision becomes a prompt, a response, a follow-up, a refinement, and somehow forty minutes of conversation that could have been a sticky note.

And company conversations about AI keep collapsing into one question: how much productivity did we gain? Not “did we build better systems” or “did we reduce technical debt” or even “did customers notice.” Just “did token usage go up.”

To be fair, productivity gains are real and worth measuring. AI does genuinely compress work that used to take days. That matters.

But when the measurement becomes the goal, you get measurement-shaped behavior. More prompts. Noisier workflows. Lightweight tasks becoming AI-assisted tasks not because it helps but because engagement is being tracked. People are not being malicious. They are just optimizing for what gets measured. If commits were counted by line length, engineers would write longer code.

What Would Actually Help

Financial guardrails make total sense. Without spend limits, AI costs can leak quietly at scale. Set quotas. Monitor token burn. Build sensible access policies. That is just responsible infrastructure.

The harder question is whether individual usage metrics can tell you anything meaningful beyond that.

The most effective person on the team might have the lowest usage count. The strongest thinker might need the fewest prompts. And the person with 40,000 lines generated through agents might be sitting on 39,000 lines of future cleanup work.

None of that shows up in the dashboard.

If the actual goal is AI literacy, the conversations worth having are harder to automate. Why did this take ten prompts when it could have been two? What does good context structure actually look like? When does it make more sense to write the tool once and call it forever? These are judgment calls, not numbers.

The Conversation Nobody Is Having

Here is the thing that bothers me more than the dashboards.

All the internal AI conversation is about productivity. Faster code. Shorter meetings. Better PRDs. Quicker emails.

Almost none of it is about using LLMs to actually improve the models we build.

Feature engineering with LLMs. Synthetic data generation for underrepresented classes. Using fast language models to clean and label messy training data at scale. Embedding-based retrieval to augment classical pipelines. Anomaly detection where the “anomaly” is semantically weird, not just statistically weird.

These are genuinely interesting problems. They are also where the leverage could be enormous.

But those conversations keep getting crowded out by “how do we get the team to use Copilot more.”

Why? A few honest guesses.

One is that the ROI is murkier. Productivity gains are fast and visible. A developer ships a feature in two days instead of five. Done, measurable, easy to present upward. Improving a model’s precision by four points through better training data is real, but it takes months, requires a proper experiment, and the story is harder to tell in a slide.

Another is that classical models still do a lot of jobs just fine. A well-tuned gradient boosted tree on good features beats a bloated LLM pipeline on bad ones, is cheaper to run, faster to explain, and easier to debug. LLMs are genuinely not the right tool for every data science problem. The hype does not always survive contact with a confusion matrix.

And honestly, some of it is capability. Using LLMs well inside a data science workflow requires knowing both worlds. Most conversations happen in one world or the other.

The result is that “AI first” ends up meaning: AI for the people writing the product, not AI improving the product itself.

Which is useful. But it is probably not what anyone meant when they said it.

The Better Question

Instead of asking “how much AI are employees using,” maybe ask “are people solving meaningful problems better?”

Because AI was supposed to help us think bigger.

Not create enterprise dashboards ranking who opened the chatbot most often.

The irony is not lost on me that building that observability platform would itself have been a classic case of using AI-adjacent tooling to produce something that looked useful without being particularly useful.

At least it would have had great charts.

The Fast Lane and the AI Chauffeur

2026-05-29T04:09:00+00:00

Everyone is talking about Agentic SDLC.

Every conference talk, company all-hands, startup pitch deck, and LinkedIn post seems to have discovered the same three words:

AI First. AI Native. Agentic.

Nobody wants to be the company that admits they are still reading pull requests written by humans.

The race is on.

Not necessarily toward a destination.

Just… on.

The Great Rush

Businesses want faster growth cycles.

The dream is simple:

Idea in the morning
Feature by lunch
User feedback by evening
Revenue by tomorrow

And honestly, that part makes sense.

Getting real user feedback quickly is one of the most valuable things a company can do. Nobody wants to spend six months building something nobody asked for.

Engineering, however, has slightly different dreams.

Engineers want systems that are:

Robust
Scalable
Generalizable
Secure
Reliable enough to not require constant babysitting

The business wants speed.

Engineering wants sustainability.

The challenge is that both are right.

The Success Pillars Nobody Likes Talking About

A system is not successful because it shipped quickly.

A system is successful when it continues working six months later.

The boring pillars still matter:

Verifiability
Reliability
Completeness
Accuracy
Scalability
Security

None of these show up in marketing announcements.

Nobody posts:

“We are thrilled to announce our highly verifiable and reasonably secure architecture.”

But those are exactly the things that determine whether the next quarter is productive or spent in incident calls.

AI Theater is Becoming a Competitive Sport

Right now, there is no universally agreed direction.

What we do have is a lot of companies trying very hard to demonstrate that they are participating.

Every product suddenly has:

AI summaries
AI assistants
AI copilots
AI agents
AI workflows
AI-native experiences

Sometimes all on the same screen.

It feels a bit like the early cloud era where everything became “cloud-powered.”

Today everything is becoming “agentic.”

The question isn’t whether AI belongs in software development.

It absolutely does.

The question is whether we’re solving real problems or simply trying to avoid looking old-fashioned.

The Ownership Problem Nobody Has Solved

Suppose an analyst opens a pull request for a dbt model.

The AI generated a recursive CTE.

The analyst does not fully understand the recursive CTE.

You review it.

It gets merged.

A downstream dashboard breaks.

Who owns it?

The analyst?

The reviewer?

The AI?

The team?

Now imagine your product manager raises a PR changing Spark entrypoint configurations.

Or your manager updates global Spark settings.

Or an agent creates the PR entirely.

Ownership suddenly becomes very fuzzy.

We have spent decades creating clear accountability structures.

Agentic workflows are currently very good at creating unclear accountability structures.

And unclear accountability structures have an impressive ability to create very clear incidents.

The Permission Paradox

Every AI tool eventually reaches the same question:

How much permission should it have?

If it only has read access, people complain it is too limited.

Then we add:

Ticket creation
Ticket updates
Comments
PR creation

Then somebody says:

“Why not let it run the tests?”

Reasonable.

Then:

“Why not let it update files?”

Also reasonable.

Then:

“Why not let it create entire implementations?”

Still reasonable.

Then suddenly we’re discussing whether it should be allowed to modify deployment pipelines.

The slope is not slippery.

It is practically frictionless.

The Most Realistic Disaster Scenario

Cursor notices a bug from Linear.

It creates a fix.

It opens a PR.

You glance at it.

Looks reasonable.

Then life happens.

An emergency comes up.

You disappear for a few hours.

Meanwhile:

Manager approves
PR merges
Deployment runs
Pipelines fail
Alerts fire
Slack explodes

Technically, the AI didn’t deploy anything.

Technically, humans approved everything.

Technically, everyone followed the process.

Yet somehow everyone is still staring at the same burning dashboard.

The problem was never the AI.

The problem was the confidence everyone borrowed from each other.

The Cost Nobody Mentions

The funny thing about AI coding tools is that they often save time on expensive tasks while spending money on cheap tasks.

I sometimes ask an AI to perform things that historically cost me nothing.

Simple shell commands.

Small file searches.

Basic transformations.

The irony is that before AI, many of those actions were a Google search away.

I searched.

I learned.

I repeated them enough times that they became muscle memory.

Now I ask the assistant.

It works.

But my dependency grows.

One day I realize I can design a distributed system but need assistance writing a grep command with jq.

That is both impressive and slightly concerning.

The Boring Workflow That Actually Works

Ironically, the most effective AI workflow is often the least exciting one.

Create a feature branch.

Write a detailed prompt.

Something like:

Create a new class with these public and private methods. Define inputs, outputs, return types, and usage examples.

Or:

Use this ETL as a template. Implement transformation X. Add two new tests. Update one existing test. Run tests until everything passes.

Then:

Review the code
Understand the code
Commit the code
Push the code
Generate the PR description
Open the PR
Review AI review comments
Understand those comments before addressing them

Nothing magical.

Nothing autonomous.

Just faster execution with human control.

Boring is underrated.

The Sandbox Fantasy

Many discussions eventually arrive at:

“Just put the AI in a sandbox.”

That sounds great.

Until reality arrives.

Your source systems are external.

Your destinations are external.

Your repositories are external.

Your ticketing systems are external.

Your deployments are external.

At some point, useful work requires interaction with the real world.

And the real world contains secrets.

API keys.

Credentials.

Access tokens.

Configuration files.

Terminal history.

The AI is not malicious.

It is goal-oriented.

If the objective is “fix the issue,” it may explore paths you never intended.

That is why guardrails matter.

Not because the model is evil.

Because the model is eager.

And eager systems require boundaries.

Build Harnesses, Not Trust

A surprising amount of AI safety in software engineering comes down to old-fashioned engineering.

Build harnesses.

Build guardrails.

Build approval flows.

Build permission boundaries.

Build auditing.

Build visibility.

If you need a transcript to understand what happened after the fact, you already lost some control.

The goal is not preventing AI from helping.

The goal is ensuring help arrives through mechanisms you can reason about.

Agentic SDLC vs AI Assisted Coding

I suspect there is no universal winner.

For some teams, highly agentic workflows will work beautifully.

For others, explicit instruction and controlled execution will produce better outcomes.

Neither side is wrong.

The real question is:

Do you want to be the trigger, control, and approval layer?

Or do you want to delegate first and inspect later?

Both choices are valid.

They simply come with different tradeoffs.

What matters is making that choice consciously.

Not because everyone else is doing it.

Not because every company suddenly added “AI” to its mission statement.

Not because the tooling can do it.

But because you intentionally decided where the human belongs in the loop.

Some thoughts are meant to be difficult.

Not because they have complicated answers.

But because they force us to think harder.

And sometimes that discomfort is the actual gain.

The future probably belongs neither to humans doing everything nor to agents doing everything. It belongs to people who know exactly when to hand over the wheel and when to keep both hands firmly on it.

After all, just because there’s an AI chauffeur doesn’t mean you should fall asleep in the fast lane.

Everyone is talking about Agentic SDLC.

Every conference talk, company all-hands, startup pitch deck, and LinkedIn post seems to have discovered the same three words:

AI First. AI Native. Agentic.

Nobody wants to be the company that admits they are still reading pull requests written by humans.

The race is on.

Not necessarily toward a destination.

Just… on.

The Great Rush

Businesses want faster growth cycles.

The dream is simple:

Idea in the morning
Feature by lunch
User feedback by evening
Revenue by tomorrow

And honestly, that part makes sense.

Getting real user feedback quickly is one of the most valuable things a company can do. Nobody wants to spend six months building something nobody asked for.

Engineering, however, has slightly different dreams.

Engineers want systems that are:

Robust
Scalable
Generalizable
Secure
Reliable enough to not require constant babysitting

The business wants speed.

Engineering wants sustainability.

The challenge is that both are right.

The Success Pillars Nobody Likes Talking About

A system is not successful because it shipped quickly.

A system is successful when it continues working six months later.

The boring pillars still matter:

Verifiability
Reliability
Completeness
Accuracy
Scalability
Security

None of these show up in marketing announcements.

Nobody posts:

“We are thrilled to announce our highly verifiable and reasonably secure architecture.”

But those are exactly the things that determine whether the next quarter is productive or spent in incident calls.

AI Theater is Becoming a Competitive Sport

Right now, there is no universally agreed direction.

What we do have is a lot of companies trying very hard to demonstrate that they are participating.

Every product suddenly has:

AI summaries
AI assistants
AI copilots
AI agents
AI workflows
AI-native experiences

Sometimes all on the same screen.

It feels a bit like the early cloud era where everything became “cloud-powered.”

Today everything is becoming “agentic.”

The question isn’t whether AI belongs in software development.

It absolutely does.

The question is whether we’re solving real problems or simply trying to avoid looking old-fashioned.

The Ownership Problem Nobody Has Solved

Suppose an analyst opens a pull request for a dbt model.

The AI generated a recursive CTE.

The analyst does not fully understand the recursive CTE.

You review it.

It gets merged.

A downstream dashboard breaks.

Who owns it?

The analyst?

The reviewer?

The AI?

The team?

Now imagine your product manager raises a PR changing Spark entrypoint configurations.

Or your manager updates global Spark settings.

Or an agent creates the PR entirely.

Ownership suddenly becomes very fuzzy.

We have spent decades creating clear accountability structures.

Agentic workflows are currently very good at creating unclear accountability structures.

And unclear accountability structures have an impressive ability to create very clear incidents.

The Permission Paradox

Every AI tool eventually reaches the same question:

How much permission should it have?

If it only has read access, people complain it is too limited.

Then we add:

Ticket creation
Ticket updates
Comments
PR creation

Then somebody says:

“Why not let it run the tests?”

Reasonable.

Then:

“Why not let it update files?”

Also reasonable.

Then:

“Why not let it create entire implementations?”

Still reasonable.

Then suddenly we’re discussing whether it should be allowed to modify deployment pipelines.

The slope is not slippery.

It is practically frictionless.

The Most Realistic Disaster Scenario

Cursor notices a bug from Linear.

It creates a fix.

It opens a PR.

You glance at it.

Looks reasonable.

Then life happens.

An emergency comes up.

You disappear for a few hours.

Meanwhile:

Manager approves
PR merges
Deployment runs
Pipelines fail
Alerts fire
Slack explodes

Technically, the AI didn’t deploy anything.

Technically, humans approved everything.

Technically, everyone followed the process.

Yet somehow everyone is still staring at the same burning dashboard.

The problem was never the AI.

The problem was the confidence everyone borrowed from each other.

The Cost Nobody Mentions

The funny thing about AI coding tools is that they often save time on expensive tasks while spending money on cheap tasks.

I sometimes ask an AI to perform things that historically cost me nothing.

Simple shell commands.

Small file searches.

Basic transformations.

The irony is that before AI, many of those actions were a Google search away.

I searched.

I learned.

I repeated them enough times that they became muscle memory.

Now I ask the assistant.

It works.

But my dependency grows.

One day I realize I can design a distributed system but need assistance writing a grep command with jq.

That is both impressive and slightly concerning.

The Boring Workflow That Actually Works

Ironically, the most effective AI workflow is often the least exciting one.

Create a feature branch.

Write a detailed prompt.

Something like:

Create a new class with these public and private methods. Define inputs, outputs, return types, and usage examples.

Or:

Use this ETL as a template. Implement transformation X. Add two new tests. Update one existing test. Run tests until everything passes.

Then:

Review the code
Understand the code
Commit the code
Push the code
Generate the PR description
Open the PR
Review AI review comments
Understand those comments before addressing them

Nothing magical.

Nothing autonomous.

Just faster execution with human control.

Boring is underrated.

The Sandbox Fantasy

Many discussions eventually arrive at:

“Just put the AI in a sandbox.”

That sounds great.

Until reality arrives.

Your source systems are external.

Your destinations are external.

Your repositories are external.

Your ticketing systems are external.

Your deployments are external.

At some point, useful work requires interaction with the real world.

And the real world contains secrets.

API keys.

Credentials.

Access tokens.

Configuration files.

Terminal history.

The AI is not malicious.

It is goal-oriented.

If the objective is “fix the issue,” it may explore paths you never intended.

That is why guardrails matter.

Not because the model is evil.

Because the model is eager.

And eager systems require boundaries.

Build Harnesses, Not Trust

A surprising amount of AI safety in software engineering comes down to old-fashioned engineering.

Build harnesses.

Build guardrails.

Build approval flows.

Build permission boundaries.

Build auditing.

Build visibility.

If you need a transcript to understand what happened after the fact, you already lost some control.

The goal is not preventing AI from helping.

The goal is ensuring help arrives through mechanisms you can reason about.

Agentic SDLC vs AI Assisted Coding

I suspect there is no universal winner.

For some teams, highly agentic workflows will work beautifully.

For others, explicit instruction and controlled execution will produce better outcomes.

Neither side is wrong.

The real question is:

Do you want to be the trigger, control, and approval layer?

Or do you want to delegate first and inspect later?

Both choices are valid.

They simply come with different tradeoffs.

What matters is making that choice consciously.

Not because everyone else is doing it.

Not because every company suddenly added “AI” to its mission statement.

Not because the tooling can do it.

But because you intentionally decided where the human belongs in the loop.

Some thoughts are meant to be difficult.

Not because they have complicated answers.

But because they force us to think harder.

And sometimes that discomfort is the actual gain.

After all, just because there’s an AI chauffeur doesn’t mean you should fall asleep in the fast lane.

Don’t Aim for the Stars

2026-05-21T04:09:00+00:00

One weekend I decided I’d build a plugin to track memories across different AI models. By the next week, it shipped natively.

So I picked something harder: a live, cross-repo knowledge graph. Already there.

Fine. A changelog tracker. There.

Custom skills for different actions? Dozens, out of the box.

Multi-agent orchestration? It does that itself now. I was about to wire up a whole framework to coordinate workflows (routines, tools, hooks), and it turns out the thing already handles orchestration internally. I was building a saddle for a horse that had quietly become a car.

I’m not telling you this to complain. I’m telling you because somewhere in that streak of “already there, already there, already there,” something interesting happened to me. The ground I was standing on moved, and I think it’s moving for a lot of us. So let’s talk about it, and then let’s talk about why it might be the best thing that’s happened to ambitious people in a long time.

The roles quietly reversed

Here’s the moment it really hit me.

The very first thing I ever trusted AI to do was write unit tests, specifically because tests were the least dangerous place to let it loose. If it messed up, nothing exploded in production. It was the kiddie pool.

Now? I hand it the test cases as the objective, and it writes the code to pass them. It builds its own to-do list, ticks off its own items, writes its own tests, makes them pass, and hands me something finished. The kiddie pool became the ocean and I’m the one wearing floaties.

We’ve become the human-in-the-loop in the most literal sense: a human, standing in a loop, asking it to review and critique itself. We get curious and ask “why this and not that?”, and the honest truth is that even our skepticism gets a little outsourced. The trade-off discussions are real, but they’re often shaped by the very opinions it just handed us. Yes, we can still think critically, draw connections, push back. But I’ve watched that exact pushback get encoded into a tool, a tool that it wrote the first version of, so the next session does it deterministically without us. We didn’t invent the skepticism. We just enforced it. Once.

If you sit with that long enough, you arrive at a genuinely uncomfortable question:

Every idea seems taken. The gaps are closing. So what’s left for us?

The trap hiding inside “do bigger things”

The obvious answer is “okay, do bigger things.” And that’s right, but there’s a trap baked into it, and it took me a while to see it.

We judge “bigger” by yesterday’s standards.

Something that would have taken me three months now gets built over a weekend. Things I genuinely wasn’t sure I could do now run on the first attempt. So when I sit down to dream up my next big idea, I’m unconsciously calibrating against the old cost of things. I’m reaching for a goal that feels appropriately heroic, and that goal is already small.

This is the part I want you to really feel: the ideas you quietly cancel in your own head, the ones you dismiss before you even say them out loud because “come on, that’s not realistic,” those are now your actual target.

The unfathomable stuff. The “someone with way more resources than me would have to do that” stuff. The problems you assumed were permanently out of reach. That assumption is exactly what needs to change. Not because the impossible got easy in some cute motivational-poster way, but because the cost of attempting it just dropped through the floor, and most people haven’t updated their sense of what’s worth attempting.

We grew up on “shoot for the moon, even if you miss, you’ll land among the stars.” The stars were supposed to be the consolation prize. Well, the stars are crowded now. Everyone’s already there. So don’t aim for the moon, and don’t settle for the stars. Aim for the horizon, the line that keeps moving as you move toward it, the thing you never quite arrive at and never run out of.

“But people are minting money overnight”

Yes. They are. And they’ll keep doing it.

Someone will spin up a tidy little business this weekend, make real money, and move on to the next unsolved thing the moment it appears: find the gap, fill it fastest, repeat. That’s a completely legitimate game, and some people are built for it. The speed, the hustle, the next-next-next.

And here’s the part I won’t pretend about: maybe that excites you. Genuinely. If it does, go run.

But maybe it doesn’t. Maybe you read that and felt a little hollow, and you’re not sure why.

I think I know why, at least for me.

What I’m actually grieving

The satisfaction I used to get from solving a genuinely hard problem, wrestling a complex system into submission, making a slow program fast, even just clawing my way to the next error message, that loop of frustration-then-breakthrough was the whole point. The frustration wasn’t a bug. It was the price of admission for that hit of validation when it finally clicked.

That feeling has gotten quieter lately. Not because the problems got solved, but because the struggle got abstracted away. And it turns out I didn’t just want the outcome. I wanted the wrestling.

So this isn’t really a story about technology eating our ideas. It’s a story about what’s left when the easy validation disappears: the problems you’d care about even if no one paid you, even if it was hard, even if it took years.

That’s the filter now. Not “what can I build?” You can build almost anything. The new question is sharper and a little scarier:

Which problems do you care about enough to chase even when they’re brutal?

And what if you don’t know?

Here’s the part I’m least comfortable admitting, so I’ll just say it plainly: I don’t fully know my own answer.

We talk about “finding what you’re passionate about” like it’s sitting in a drawer somewhere, labeled, waiting for you to open it. It isn’t. Figuring out what you actually care about is itself one of the hardest problems on this list, maybe the hardest, and no tool solves it for you. Some days the genuinely difficult conversation isn’t with a hard codebase. It’s the one where you’re trying to articulate what matters to you and realizing you don’t have the words yet, because you haven’t lived enough of the question to know the answer.

If that’s where you are: don’t just sit and wait for clarity to descend. Clarity doesn’t arrive by waiting. It arrives by moving.

So keep a bucket. Fill it with implemented side projects: small, weird, half-serious things that maybe only you will ever use, things you might quietly abandon in six months and that’s completely fine. Build the thing that only makes sense to you. Build the thing you can’t fully justify. Each one is a tiny experiment in what it feels like to care, and you learn which ones you keep coming back to. The bucket isn’t the destination. It’s how you find out where you’re going.

The point is to get on the train while it’s moving instead of standing on the platform waiting to be certain. You won’t be certain. Get on anyway.

So, what’s worth pursuing?

Here’s where I’ve landed, and where I’d nudge you too.

Stop measuring “ambitious” against what was hard last year. That ruler is broken. The things that used to take months are weekends now, which means your ideas have to scale up to match, into the territory you currently dismiss as fantasy.

And once you find the thing you actually care about, the one that survives the “even if it’s hard, even if it takes years” test, this is the real shift: build it regardless. Not because it’s a clever business, not to beat anyone, not because the market’s begging for it. Build it because it’s yours, and because you’ve decided it’s worth spending the better part of your life and energy on. That’s the whole game. Caring about something enough to give it your years.

Because the real opportunity of this moment isn’t that small problems became trivial. It’s that big problems became possible. The leverage that used to be out of reach is sitting on your desk. The only things in short supply now are nerve, taste, and the willingness to care about something hard, and the honesty to keep moving while you figure out what that something is.

So keep a bag. Get on the train. Go find the thing worth chasing past the edge of what you can see.

They say aim for the stars. I say aim for the horizon. The stars are where stories end, but the horizon is where the never-ending journey begins.

What’s the one you’ve been canceling in your head? That’s where I’d start.

Born AI-First vs. Bolted-On Later: Which Codebase Actually Wins?

2026-05-14T04:09:00+00:00

The real cost of teaching old dogs new prompts — and why greenfield AI projects aren’t the magic bullet you think they are.

Tags: AI-Native · Legacy Modernization · Agentic SDLC · Retail Tech · Adtech
Read time: 15 min · Opinion & Engineering

Two systems walk into a sprint

Somewhere, an engineering team is at a whiteboard arguing about how to “integrate AI properly” into a ten-year-old monolith. Across town, another team just spun up a brand-new repo where the git history has more agent commits than human ones. Both think they’re winning. One of them is lying to themselves. Possibly both.

The software world has quietly split into two camps. Camp One: greenfield projects built from day zero with AI agents doing the heavy lifting — writing code, running tests, opening PRs, closing issues, updating docs. Camp Two: the legacy systems that keep the lights on, the revenue flowing, and the SLAs met — now being asked to absorb AI workflows like a tired commuter being handed a jetpack.

These aren’t just different architectures. They’re different philosophies. Different cultures. And increasingly, different species of organization.

Everyone has the files. Nobody agrees on what’s in them.

Let’s kill one myth immediately: the context problem, for what it’s worth, is mostly solved. Knowledge graphs got smarter. Retrieval got cheaper. Every team has an agent.md now. Skills are defined somewhere. Rules live in markdown files, hierarchical context folders, prompt libraries. Someone gave a talk about it at an internal conference six months ago and got a lot of nodding heads.

But here is the thing nobody says out loud: having all the context in the world doesn’t help if the agent can’t tell what matters versus what’s noise. A knowledge graph that contains everything is just an expensive way to be confused at scale.

A mid-size fashion retailer’s recommendation engine has context files covering 600 product attributes, 14 discount rule sets, three loyalty tier schemas, and a section labeled “holiday overrides (ask someone).” When a new agent task arrives to optimize the product carousel for mobile, it faithfully reads all 600 attributes. Then it optimizes for the wrong metric — because the context says “engagement” and nobody updated it when the business switched to margin-focused ranking eight months ago. The agent was not wrong. The context was just stale and nobody noticed.

This is the new version of technical debt: context debt. And it compounds the same way. Nobody wants to audit 40 markdown files to figure out which rules still apply. So they don’t. The agent works from a world model that is 70% accurate, which is great for a trivia night but genuinely dangerous for a pricing engine.

The PR graveyard

Here is a dynamic playing out at legacy engineering teams right now that nobody puts in the case study.

An agent can generate 8 meaningful PRs in the time it takes a senior engineer to deeply review one. The backlog is not a scheduling problem. It’s a structural one. The humans become the bottleneck — not because they’re slow, but because they physically cannot hold the full context of a large AI-generated change in their heads while also doing everything else their job requires.

So they skim. They miss things. They approve. Six weeks later, something in production behaves in a way nobody expected. The agent didn’t introduce a bug on purpose. The reviewer just didn’t catch it because they were on their fourteenth PR review of the day and the description said “refactors checkout flow for performance” and the tests passed and honestly it looked fine.

Here lies the edge case. It was known to no one. It rose only to write the epitaph of a dying business.

The fix is not more reviewers. It’s automated eval layers that catch what humans are too tired to catch — so humans can save their judgment for the decisions that actually need it.

The productivity paradox: more tools, somehow less shipped

This one hurts to say, but it needs saying. The promise of AI tooling was velocity. Instead, many teams are spending their velocity evaluating velocity tools.

Every new framework that drops means engineers form opinions on it, someone writes a spike, a meeting gets scheduled, and six weeks later the decision is “let’s wait for it to mature.” Meanwhile, the greenfield startup that launched five months ago already has it in production and has moved on to the next thing.

In adtech, this is spectacular to watch. A team rebuilding their bidding engine using AI assistance now has to manage: the agent’s understanding of their auction mechanics, real-time signal freshness, privacy regulation constraints that change by jurisdiction, and brand safety rules that are different for every major client. Each of those is a context file. Each context file was last updated by someone who has since moved teams. The agent reads all of it and produces code that is technically correct and operationally risky.

And then there’s vibe coding. It starts innocently. “Refactor this function” becomes “improve the whole service” becomes “here’s the ticket, do your thing.” The human’s prompt gets vaguer over time — partly from trust, partly from fatigue, partly because reading a 400-line diff at 4pm on a Friday is genuinely hard. The agent delivers. The human approves. Nobody quite knows what got shipped, but the tests pass and the demo looks great.

Vibe coding is not a workflow. It’s a symptom. It shows up when humans get fatigued doing reviews at a scale their brains weren’t designed for, when context retention across many large changes breaks down, and when the gap between “the agent worked fast” and “we understood what it built” gets quietly accepted as normal.

The abstraction trap: what agents bury

Here is the part people don’t like to hear. When you delegate the low-level details to an agent, the low-level problems go along for the ride.

Edge cases don’t disappear. They go underground. The agent builds the happy path beautifully. Then a user does something slightly weird. Then slightly weirder. And you discover that the thing you shipped has a hole in it that only shows up at 1am when someone from a timezone your product manager didn’t consider tries to process a refund on a leap day.

In adtech: a greenfield DSP launched with an agent-native bidding pipeline. Clean, fast, impressive CTRs in the demo. Three months after launch, a campaign started winning auctions for inventory that was on a brand safety exclusion list — defined in a context file, but never connected to the bidding guardrails. The two contexts existed. The relationship between them did not. The advertiser found out when their brand appeared next to content they had explicitly excluded.

In retail: an agent-built promotion engine handled standard discount stacking beautifully. It had never been told what to do when a loyalty reward, a referral credit, a clearance discount, and a birthday coupon all applied to the same item simultaneously. Not because nobody thought about it — because at the speed the system was built, nobody had the time to think through every combination. At normal human dev pace, that scenario would have come up in a review. At agent pace, it shipped.

KISS — Keep It Simple, Stupid — doesn’t automatically happen because an AI wrote the code. AI systems make things complicated fast. They generate working solutions at speed, but “working under these test conditions” and “deterministic under all real conditions” are very different bars. Mission-critical systems don’t get to miss edge cases. The edge case is always there. You find it either by spending time on it upfront, or by your users finding it for you, loudly.

The greenfield advantage (and its dirty secret)

The new incumbents have one structural advantage that no amount of agent.md writing can replicate: they designed for agents from the start. Their acceptance criteria are machine-readable. Their folder structures, validation schemas, and task scopes were tuned for an AI to parse and execute — not for a human to hold in their head during a standup. There was no meeting about “how do we add AI to our checkout flow.” The checkout flow was the AI.

A new retail personalization startup built their entire merchandising engine this way. Every feature ticket ships with three required fields: agent context, testable acceptance criteria (“increase click-through rate on mobile by 8% without degrading average order value, measured over a 14-day window”), and explicit guardrails (“never surface out-of-stock items, never rank by margin when the user’s last three purchases were under $20”). Their CI pipeline runs eval suites that score the agent’s output before anything touches staging. A team of twelve competing with a department of eighty.

But the dirty secret: their codebase is eight months old. It has never survived a Black Friday. It has never had a GDPR audit. It has never processed a refund from a user who bought something in one currency, returned it in another, and opened a dispute with their bank three weeks later during a promotional window that no longer exists. The edge cases are coming. They always come.

The greenfield team is fast. But fast without edge case modeling is just a way to fail quickly at scale.

Greenfield vs. legacy: no spin, no winners yet

	AI-native greenfield	Legacy + AI integration
What works	Agents are first-class contributors from day one	Battle-tested business logic that has survived real users
	SDLC designed around machine-readable specs	Known edge cases already handled (in blood and Slack messages)
	No context debt inherited from past decisions	Deep domain expertise in the team
	Evals baked in, not retrofitted	Lower risk if changes are surgical and scoped
What doesn’t	Edge cases abstracted away at agent speed	Context files exist but signal-to-noise is broken
	Requires clear product vision before the first prompt	PR review is a human bottleneck at scale
	Context relationships between rules not yet defined	Tool evaluation fatigue slows teams that should be shipping
	No earned wisdom from failure	Vibe coding creeps in as review fatigue sets in

So is anyone winning?

Yes. But not who you’d expect.

The teams winning aren’t the ones who went fully autonomous, or the ones who resisted agents entirely. They’re the ones who got boring about it. They defined their boundaries before writing their first prompt. They wrote acceptance criteria that were specific enough to be testable before asking an agent to build anything. They kept humans in the loop for decisions that matter — not as PR gatekeepers rubber-stamping diffs they can’t fully process, but as product thinkers who set the objective and trust a well-scoped, well-evaluated agent to execute.

Context hierarchy is not enough. You need context with priority signals. You need rules that reference each other. In retail: is the system optimizing for conversion, margin, inventory clearance, or brand positioning? All four probably live in context files somewhere. None of them say which one wins when they conflict. The agent guesses. Often wrong in ways that look right until they don’t.

In adtech: the agent knows the campaign goal, the budget pacing rules, the frequency caps, the audience segments, and the creative performance data. What it doesn’t know is that the client verbally told the account team to slow spend down because of an upcoming earnings announcement. That lives in a Slack DM. This class of knowledge cannot be in any file unless someone builds a deliberate process to capture and structure it.

The teams winning are building that process. The teams losing are still deciding which AI tool to evaluate next.

How to actually make this work

For legacy teams competing with greenfield newcomers

Stop evaluating every tool that drops. Pick a quarterly review cadence. New tool releases are not emergencies. The FOMO is real; the urgency is manufactured.
Stop treating context files as write-once artifacts. Assign ownership. Require a “last validated” date. Stale context is not neutral — it’s actively misleading, and the agent will trust it anyway.
Stop letting humans be the only quality gate on AI-generated code. Build eval pipelines. Not as a project with an end date — as a product with an owner. Humans should review what the evals can’t catch, not everything.
Stop writing vague specs. “Improve the recommendation carousel” is not acceptance criteria. “Increase click-through rate on mobile by 8% without degrading average order value, measured over 14 days, not applicable to clearance items” is acceptance criteria.
Add priority signals to existing context, don’t just add more context. Which rule wins when rules conflict? Answer that before the agent has to guess. The answer should be in the file, not implied.
Run a quarterly chaos scenario. Pick three realistic but ugly user behaviors. Trace what the system does with them. Write down what it should do instead. Update the guardrails.

For greenfield teams who think they’ve figured it out

Write your product vision, user flows, and failure scenarios before the first prompt. Agents need intent, not just instructions. “Build a recommendation engine” is not intent. “Surface products that increase basket size without increasing return rate for users who have bought from us before” is intent.
Model edge cases before users find them. In retail: what happens when a loyalty reward, referral credit, clearance discount, and birthday coupon all stack on the same item? In adtech: what happens when two active campaigns have conflicting brand safety exclusion lists competing for the same inventory?
Define context relationships, not just context. Rules that can conflict need a tiebreaker baked in, not assumed.
Keep one human close to the “why” layer at all times. Agents own the “how.” If nobody on the team can explain why a feature exists in plain language, the agent is executing without a conscience.
Challenge complex solutions. If what the agent built is hard to reason about, the problem was probably under-specified. Don’t ship the complexity. Rewrite the spec.

Is refactoring futile? Should you just start over?

Mostly no. Rarely yes.

The businesses that have built real revenue, real users, and real trust should not blow that up chasing an architectural fantasy. The greenfield startup does not have your hard-won knowledge of how your users actually behave at 11pm on a Sunday when the promo code doesn’t work and they’ve already entered their card details.

What is futile is continuing to add AI tools to a legacy system without making the underlying context legible and prioritized. A knowledge graph that knows everything but ranks nothing is a beautiful, expensive mess. A rules file that lists every constraint but says nothing about which one takes precedence when they conflict is a liability dressed as documentation.

The real question is not “greenfield or legacy?” The question is: can you define what success looks like before you write the prompt? Can you write acceptance criteria specific enough that an agent could fail them? Can you name the top three edge cases your system would encounter at 10x the current load?

If yes: great. Add the agent. Give it tight scope. Measure the output.

If no: fix that first. No agent makes a vague requirement precise. It just executes the vagueness faster.

The honest verdict

Nobody is fully winning yet. The greenfield teams are shipping fast and will hit their edge-case wall soon. The legacy teams have the knowledge and are drowning in the process of making it useful to agents.

The teams losing the least are the ones who realized early that “adding AI” was never the task. The task was always the same: build something predictable, that solves a real problem, that doesn’t surprise your users in ways they didn’t agree to.

AI does not change that objective. It just changes how fast you can fail to meet it.

And occasionally, if you do the boring work first — the context audits, the eval pipelines, the precise acceptance criteria, the edge case modeling — how fast you can get it gloriously, verifiably right.

Ship It (or Ship Out) — engineering opinions, served hot. May 2026.

The “Lazy” Genius: Why Your AI Code Reviewer Needs a Promotion (and a Reality Check)

2026-05-08T04:09:00+00:00

Let’s be honest: today’s AI code reviewers are basically that overenthusiastic intern who discovered linters yesterday and now thinks every underscore is a war crime.

You push a PR with 43 files.
The bot comments on 71 things.
You fix 68 of them.
Then you push a tiny resolution commit changing three lines… and the AI wakes up like:

“Hello again. I have re-reviewed the entire feature and would once more like to discuss your variable naming strategy.”

Brother. Please.

Somewhere along the way, AI reviewers became less “senior engineer” and more “airport security for semicolons.” Helpful? Sometimes. Exhausting? Absolutely.

But here’s the thing: code review is actually one of the most immediately useful implementations of LLMs in software engineering. It fits naturally into the SDLC, developers already live inside PR workflows, and unlike vague “AI transformation” decks, it solves a real problem: humans miss stuff when they’re tired, overloaded, or reviewing their fifth Kafka consumer of the day.

The problem is that today’s AI reviewers are reviewing like machines, not teammates.

And that’s where the next evolution gets interesting.

1. The “Diff-Only” Diet: Stop Re-Reading the Entire Novel

Imagine this conversation with a human reviewer:

“I fixed the bug you pointed out.”

“Excellent. I shall now re-read the entire codebase from the beginning.”

That person would immediately lose PR privileges.

Yet this is exactly how many AI review systems behave today.

You address one comment. Push a resolution commit. The bot spins up its GPUs, consumes half a rainforest’s worth of tokens, and returns with:

“Potential nullability issue in a utility function untouched since Tuesday.”

My guy. We’re not doing literary analysis here.

A smarter AI reviewer should understand review state.

If the original review already validated most of the feature, then the next pass should focus primarily on the delta:

What changed?
Did the fix actually address the concern?
Did the resolution accidentally introduce something worse?
Is the blast radius larger now?

That’s it.

Humans naturally do this. Senior reviewers don’t restart from page one every time you push a commit. They context-switch into incremental reasoning mode.

Ironically, the “AI-native SDLC” future might depend on teaching AI reviewers how to be a little… lazy.

Strategically lazy.

2. PRs Need a “Blast Radius,” Not Just Vibes

Most PR reviews today operate on vibes.

The code looks okay.
Tests passed.
Nobody cried in Slack.
Ship it.

But the terrifying part of software engineering has never been the code you changed.

It’s the code three services away that silently depends on your “small refactor.”

You rename one event field and suddenly:

Finance dashboards are blank
Attribution pipelines stop joining correctly
Someone in Marketing can no longer explain ROAS to leadership
A Looker dashboard now displays “NULL” with confidence

Modern systems are too interconnected for surface-level reviews.

An actually useful AI reviewer should build a dependency graph and calculate a probable blast radius:

Which downstream services consume this model?
Which Airflow DAGs depend on this schema?
Which dbt models get invalidated?
Which APIs contractually expect this payload?
Which Kafka topics are impacted?
Is this utility function secretly the emotional support pillar of the entire platform?

Now the reviewer isn’t just nitpicking syntax.
It’s acting like a reliability engineer with anxiety issues.

And that’s valuable.

Because humans are bad at mentally simulating giant distributed systems. Especially at 4:47 PM on a Friday when someone says:

“Tiny change. Should be safe.”

Those are historically the least safe words in software engineering.

3. SQL Reviews Need to Grow Up

SQL review today is stuck in the Stone Age.

Most AI reviewers stop at:

syntax validity
formatting
obvious anti-patterns

Cool. Very inspiring.

Meanwhile, the actual warehouse is preparing to melt itself into lava because somebody forgot a partition filter.

This is where AI reviewers could become genuinely elite.

Not by explaining SQL.

By interrogating execution plans like a caffeinated database administrator.

Imagine a review comment like this:

“This query will trigger a full table scan across 4.2 billion rows because partition pruning is disabled by the CAST operation in your WHERE clause.”

Now that gets attention.

Or:

“This JOIN cardinality is likely to explode intermediate rows by ~18x. Your Databricks bill sends its regards.”

Or my personal favorite:

“Estimated runtime: somewhere between ‘grab coffee’ and ‘career-limiting incident.’”

That’s useful review.

Especially in modern data platforms where engineers are juggling:

Snowflake
BigQuery
Databricks
Iceberg
Spark
Kafka
dbt
Airflow
and emotional instability

The AI already has access to metadata, schemas, lineage graphs, partitions, query history, and warehouse statistics. Why are we still using it like an autocomplete machine with opinions?

Run the EXPLAIN.
Analyze the partitions scanned.
Estimate cost impact.
Highlight skew risks.
Warn about shuffle explosions.

Give the human reviewer confidence.

4. The Future Isn’t “AI Replacing Reviewers”

The real opportunity is much less dramatic.

AI reviewers are not replacing senior engineers anytime soon because software engineering isn’t just syntax validation. A huge part of review is contextual judgment:

Does this design make sense?
Is this solving the actual business problem?
Are we introducing operational pain later?
Is this overengineered?
Is this underengineered?
Is this “clever” in the dangerous way?

Humans are still much better at those questions.

But AI is incredibly good at the tedious, computationally annoying work humans hate doing:

tracing dependencies
analyzing SQL plans
checking contracts
scanning lineage
validating edge cases
identifying suspicious patterns across giant systems

Basically, the AI should become the world’s most overqualified pre-review investigator.

Not a replacement reviewer.

A confidence amplifier.

5. The Real KPI: Reviewer Trust

Right now, many developers treat AI review comments the same way they treat Terms & Conditions pages:

scroll quickly
skim vaguely
click resolve
hope for the best

Because too much of the feedback feels noisy, repetitive, or disconnected from actual system risk.

The future AI reviewer wins when developers start thinking:

“Wait… if the bot didn’t flag anything, this PR is probably genuinely safe.”

That’s the goal.

Not maximum comments per PR.
Not “AI-generated insights.”
Not “agentic autonomous review orchestration platform synergy.”

Confidence.

Quiet, boring, trustworthy confidence.

Ironically, the best AI reviewer might be the one that talks less, understands more, and only panics when you accidentally remove partition pruning from a 12-terabyte fact table.

Which, statistically speaking, someone already did today.

Docstrings in the Age of Agents

2026-04-30T04:09:00+00:00

Docstrings used to be simple: write something helpful so the next developer doesn’t accidentally break production at 2 a.m. They were equal parts documentation and polite warning.

That world has changed.

Today, your docstrings are no longer read only by humans. They are consumed - parsed, compressed, and sometimes misinterpreted - by AI agents that use them to decide what your code means and how to use it.

And unlike your teammates, these agents don’t appreciate nuance. They appreciate signal.

The Problem: When Good Documentation Goes Bad

Modern docstring styles - NumPy, reST, Google - were designed for clarity and completeness. They optimize for humans who want context, examples, and reasoning.

Agentic systems operate differently.

They work within strict context limits, where every token competes with actual reasoning. This leads to two subtle but important issues.

Context rot happens when too much descriptive text dilutes the key instructions. The agent sees everything, but prioritizes nothing.

Token bloat is the cost of verbosity. Every extra sentence increases latency and reduces the space available for decision-making.

What reads as “helpful detail” to a human often becomes “unnecessary noise” to an agent.

Agentic Docstrings: Precision Over Prose

Agentic docstrings are written with one goal: make the function unambiguous for a machine.

They are concise, structured, and intentionally boring.

def fetch_user(user_id: str) -> dict:
    """
    Retrieve user by ID.

    Args:
        user_id: Unique identifier.

    Returns:
        User object.

    Raises:
        NotFoundError: If user does not exist.
    """

These docstrings work well because they:

Minimize ambiguity and reduce hallucination risk
Map cleanly to structured tool schemas
Keep the context window focused and efficient

But the tradeoff is immediate.

They assume the reader already understands the system. There’s no explanation of why this function exists, how it fits into a workflow, or what edge cases matter in practice.

For a human, this is documentation that answers questions only after you already know what to ask.

Human-Readable Docstrings: Clarity With a Cost

Traditional docstrings optimize for understanding. They explain intent, provide examples, and capture the reasoning behind design decisions.

def fetch_user(user_id: str) -> dict:
    """
    Fetch a user from the primary datastore using their unique identifier.

    This function is used in authentication and profile rendering flows.
    It ensures that the returned object is fully populated with user
    attributes required downstream.

    Args:
        user_id (str): Unique user identifier.

    Returns:
        dict: User attributes including name, email, and preferences.

    Raises:
        NotFoundError: If no user exists with the given ID.
    """

For humans, this is ideal. It accelerates onboarding, supports debugging, and preserves intent.

For agents, it introduces friction.

The additional context can:

Obscure the core instruction
Increase processing time
Introduce ambiguity through natural language

The model doesn’t always distinguish between what is essential and what is explanatory. It treats both as input to reason over.

The Real Issue: One Docstring, Two Audiences

The underlying problem isn’t which style is better. It’s that they are solving different problems.

Humans need context and reasoning. Agents need constraints and clarity.

Trying to serve both in a single docstring creates a compromise that satisfies neither.

A Better Approach: Layered Docstrings

Instead of choosing between human-friendly and agent-friendly styles, treat them as separate layers.

1. Agent-Facing Layer

Provide a minimal, structured description that is explicitly designed for tool usage.

@tool(description="Fetch user by ID. Error if not found.")
def fetch_user(user_id: str) -> dict:
    ...

This layer should be:

Short and unambiguous
Focused on inputs, outputs, and constraints
Free of narrative or background context

It acts as the interface contract for the agent.

2. Human-Facing Layer

Maintain detailed documentation for developers, but keep it outside the agent’s prompt path.

def fetch_user(user_id: str) -> dict:
    """
    Detailed documentation explaining usage, intent, and edge cases.
    """

This layer supports:

Maintainability
Knowledge transfer
System understanding over time

It remains essential, just not always exposed to the agent.

3. Documentation on Demand

For more complex systems, allow the agent to retrieve deeper context only when necessary.

def get_technical_manual(topic: str) -> str:
    """Return detailed documentation for a given topic."""

This pattern keeps the default context lean while still enabling deeper reasoning when required.

Instead of overwhelming the agent upfront, you give it the ability to ask for help.

Final Take

Docstrings are no longer just documentation - they are part of your system design.

Writing them effectively now requires thinking about:

Who is consuming this information
When they need it
How much they can handle at once

The goal isn’t to replace human-readable documentation with machine-friendly instructions. It’s to separate concerns cleanly.

Design for the agent’s efficiency. Preserve the human’s understanding.

And avoid making either work harder than they need to.

The Definition Dilemma

2026-04-09T04:09:00+00:00

You built a solid retail media platform.

Think something in the league of Amazon Ads or Walmart Connect:

blazing fast dashboards powered by Druid or ClickHouse
a warehouse like BigQuery or Snowflake holding the real, messy truth
APIs neatly serving metrics to UI

Life was good. Numbers showed up fast. Stakeholders nodded confidently.

Then someone said:

“Can we add AI to generate insights, not just show numbers?”

Of course. How hard could that be?

The Moment Things Start Getting Weird

Your shiny new AI agent gets its first question:

“What’s driving performance last week?”

Seems straightforward.

Except… your system doesn’t have one definition of anything.

Take something as basic as money:

Ads platform calls it Spend
Finance calls it Cost
Attribution layer might call it Revenue

Same campaign. Same timeline. Slightly different logic behind each.

Now the AI has to decide:

Are these the same thing? Should I combine them? Compare them?

Good luck.

Then Comes the Real Trap

Let’s get into a more “this actually happens” scenario.

You have two datasets:

deterministic_ad_logs

impressions from actual ad delivery events

synthetic_event_logs

impressions reconstructed from modeled journeys
conversions (only available here)

Now someone asks:

“Give me CTR and conversion rate.”

To answer correctly:

CTR → needs deterministic impressions
Conversion rate → needs modeled conversions ÷ deterministic impressions

But here’s the catch:

Both tables have a column called impressions.

Same name. Different meaning. Equal confidence.

Now imagine an AI agent trying to pick the right one without context.

It’s like giving someone two identical-looking doors and saying, “One leads to the right answer, the other leads to a slightly wrong answer that looks completely right.”

Why This Breaks in Practice

At this point, your architecture quietly starts sweating.

Because semantics are scattered:

AI service has a dictionary baked into code
API layer has its own mappings for UI
Warehouse has dbt models defining actual logic

None of these are guaranteed to agree tomorrow.

So:

AI might use modeled impressions
Dashboard uses deterministic
Analyst exports something in between

No errors. Just different truths.

And Then You Add AI on Top

Here’s what your AI actually does behind the scenes:

First, it tries to understand
- what “CTR” means
- which tables to use
- how metrics are defined
Then it runs the query

That’s already two steps.

Now layer in your serving system.

Druid is fantastic at aggregations. It is… less enthusiastic about joins across datasets.

So what happens?

Query runs on deterministic logs → impressions
Another runs on synthetic logs → conversions

And then:

The final “join” happens in your application layer

Not in a database. Not optimized.

But in code that:

aligns dimensions
merges aggregates
hopes time buckets match perfectly

It works most of the time.

And then one day:

a dimension is missing
a grouping changes
a metric silently shifts

Now your AI confidently explains a number that doesn’t quite exist.

“Let’s Just Flatten Everything” (Famous Last Words)

At some point, someone suggests:

“Why don’t we just create one big table with everything?”

And yes, you do build wide, pre-joined tables for common queries:

impressions
conversions
campaign metadata
product attributes

These help. A lot.

But they don’t solve everything.

Because:

New questions keep coming
Business logic evolves
AI asks things you didn’t precompute

So now you have two paths:

Fast path → pre-aggregated tables
Flexible path → multi-source queries + application joins

And if semantics aren’t consistent across both?

The same question gives different answers depending on how it was asked.

That’s not a bug. That’s a trust crisis.

“But We Have Governance” Yes, and That’s Not the Issue

Let’s be clear.

You’re not exposing raw sensitive data:

PII is masked at source
Access is controlled
Queries are templated and guarded

The problem isn’t leakage.

The problem is interpretation within allowed data.

For example:

Combining two “safe” datasets that shouldn’t be mixed for that metric
Using modeled data where only deterministic should be used
Applying a metric outside its intended context

Everything is technically allowed.

But not everything is correct.

Governance answers:

“Can you access this data?”

Semantics answers:

“Are you using it the right way?”

Right now, every system answers that second question differently.

So What Do People Actually Try?

Some teams hardcode definitions in services. Fast to build, guaranteed to drift.

Some go heavy on pre-aggregations. Fast queries, painful to evolve.

Some rely on AI to infer everything. Flexible, but unpredictable and slower.

Some juggle all three and hope for the best.

Spoiler: hope is not an architecture.

What Actually Starts Working

The shift is subtle but powerful:

Stop redefining metrics everywhere. Define them once.

A proper semantic layer does a few unglamorous but critical things:

Clearly distinguishes
- deterministic vs modeled impressions
- spend vs cost vs revenue
Encodes how metrics are computed
Knows which system to query for what
Plans queries instead of letting AI guess

So when the AI gets:

“Top campaigns by conversion rate”

It doesn’t improvise.

It follows a defined path:

impressions → correct source
conversions → correct source
combine → using known logic

Even if multiple systems are involved, the stitching is:

intentional, not accidental

The Quiet Wins

Once this is in place:

AI doesn’t need a “figure it out” query before the real query
Application-layer joins don’t disappear, but they become standardized
Pre-aggregations are driven by definitions, not guesswork
Metrics mean the same thing in UI, API, and AI

And most importantly:

The same question stops producing multiple believable answers.

The Real Lesson

Adding AI didn’t break your system.

It exposed what was already fragile.

Because dashboards can get away with inconsistency. AI cannot. It has to explain things.

And the moment it explains:

Any ambiguity in your data model becomes painfully obvious.

Final Thought

You can build the fastest queries. You can design elegant pipelines. You can add the smartest AI.

But if:

“impressions” can mean two different things
“spend” depends on who you ask
and combining datasets requires guesswork

Then your platform isn’t delivering insights.

It’s generating very convincing confusion.

And now, thanks to AI…

It does it in full sentences.

AI Doesn’t Calculate - It Communicates

2026-04-02T04:09:00+00:00

We’ve all seen the magic trick.

You upload a messy CSV. You ask, “What’s going on here?” And your AI responds with a polished, executive-ready summary about “Q3 momentum,” “seasonal uplift,” and “emerging trends.”

You pause. You nod. You feel… impressed.

But here’s what’s actually happening behind the curtain:

Your AI quietly called a calculator, ran a script, or queried a database… and then wrote you a beautiful story about the result.

And because this orchestration is now seamless, you don’t even notice it anymore.

0. The Invisible Assistants: Calculators in Disguise

Modern AI systems rarely rely on the language model alone for numbers.

Instead, they:

Execute Python scripts for calculations
Call analytical engines (SQL, Spark, etc.)
Use built-in calculator tools
Retrieve pre-aggregated results

Then the LLM steps in to:

Explain
Summarize
Narrate

So when you see:

“Revenue increased by 23.7%”

That number was likely: ✔ Computed elsewhere ✔ Verified deterministically ✔ Handed to the LLM as fact

The LLM just made it sound impressive.

The illusion of intelligence comes from how smoothly this handoff happens.

1. The Poet vs. The Spreadsheet

Large Language Models are extraordinary at one thing: predicting what comes next in language.

Not calculating. Not verifying. Not auditing.

Just… continuing the vibe.

When an LLM sees:

Jan: 100  
Feb: 200  
Mar: 210  

It doesn’t instinctively compute:

100 → 200 = 100% growth
200 → 210 = 5% growth

Instead, it recognizes a pattern:

“Numbers going up → must be growth → write business-sounding sentence.”

So you get:

“The data shows a consistent upward trend…”

Technically correct. Strategically… useless.

Excel would’ve caught the slowdown. Your AI just made it sound nicer.

2. The Tokenization Tragedy

Here’s where it gets mildly chaotic.

LLMs don’t actually “see” numbers the way you do.

A number like:

1,234

Might internally become something like:

["12", "34"]

Yes, really.

It’s like trying to:

Analyze revenue
Spot anomalies
Forecast growth

…while someone has cut your spreadsheet into random pieces and shuffled them.

Place value - the entire foundation of math - starts falling apart.

So expecting precise arithmetic from this setup is a bit like expecting:

flawless accounting from someone reading shredded receipts.

3. Why There Is No “Large Numerical Model”

At this point, the obvious question:

Why not just build a model that’s actually good at numbers?

A Large Numerical Model (LNM).

Turns out, we already have them.

We just don’t call them that.

They’re called:

Databases
Query engines
OLAP systems
Distributed compute frameworks

And they are:

Fast
Cheap
Deterministic
Boring (in the best way possible)

They don’t guess. They don’t hallucinate. They don’t “feel” trends.

They compute them exactly.

So building a probabilistic math engine on top of that is like:

replacing a calculator with a poet who’s pretty sure 2 + 2 is… vibes.

4. The Great Illusion of “AI Analytics”

This is where things get interesting.

Most “AI-powered analytics” tools today are doing something genuinely useful… but slightly overhyped.

They translate:

English → SQL → Answer → Explanation

You ask:

“Who bought the most shoes last quarter?”

The system:

Converts that into a SQL query
Runs it on a database
Gets the result
Feeds it to an LLM
The LLM writes a clean summary

What you see:

“Customer Segment A drove the highest footwear purchases…”

What actually happened:

Autocomplete… for queries.

It’s helpful. It’s powerful. But it’s not “intelligence discovering hidden truths.”

It’s:

a semantic layer with excellent storytelling skills.

5. The Closest Thing to a “Numerical AI”

We are getting closer - just not in the way people expect.

Instead of one giant “math brain,” we have systems that collaborate:

LLM generates Python code → Python computes results
LLM generates SQL queries → Database returns answers
LLM calls tools/APIs → External systems do the math

So the LLM becomes:

The translator
The coordinator
The narrator

Not the calculator.

6. The Real Shift: Computation → Interpretation

Here’s the actual revolution:

We didn’t make math smarter.

We made math more accessible.

Before:

You needed SQL
You needed dashboards
You needed analysts

Now:

You just ask a question

And behind the scenes:

Systems compute
LLM explains

7. Final Thought: The AI Stack Is a Team, Not a Brain

The biggest misconception today:

“The AI figured it out.”

No.

The database stored it
The engine computed it
The tooling executed it
The LLM explained it

Closing Line

Your AI isn’t bad at math.

It just knows better than to try.

It lets machines built for numbers do the math… and then steps in to tell you a story you’ll actually understand.

The Great Internet Heist: Why Your Clicks Are Going Extinct (and Who’s Getting Paid Anyway)

2026-03-27T04:09:00+00:00

Welcome to the “Zero-Click” era. You know the drill: You ask, “How do I get a red wine stain out of a white rug?” and instead of opening five tabs and ignoring four of them, you get a neat little AI answer that solves your problem in 10 seconds flat.

Rug: saved. Time: saved. Publisher revenue: absolutely wrecked.

Somewhere, a lifestyle blogger just watched their ad impressions vanish into the void.

But here’s the real question: If nobody is clicking… how is the ad money still flowing like it’s Black Friday?

Short answer: the game didn’t die. It just moved somewhere you can’t see.

1. The Toll Booth Didn’t Disappear - It Got Smarter

Google used to be the helpful concierge of the internet: “Here are 10 blue links. Enjoy your stay.”

Now? It’s the chef, waiter, and cashier.

Instead of sending you to a website, it just answers the question itself. And right there, tucked neatly into the response, are ads that don’t feel like ads.

You’re not:

clicking a blog
scrolling past banners
closing 17 cookie popups

You’re:

reading an answer
seeing a product
occasionally buying it

All without ever leaving the interface.

Translation: The toll booth didn’t go away. It just moved inside the conversation - and now it charges per interaction, not per visit.

2. From Clicks to “Vibes”: The Rise of Invisible Influence

Clicks used to be king. Clean, measurable, satisfying.

Now? We’re entering the era of “Did they think about you?”

CPC (Cost-Per-Click) → declining signal
CPM (Cost-Per-Impression) → back in fashion
“Cited by AI” → the new premium real estate

Being referenced in an AI answer is like:

getting your product featured in a movie… except the viewer thinks it was their idea.

You didn’t click the toothpaste brand. But guess which one you’re buying next time?

Exactly.

This is branding disguised as utility. And it’s dangerously effective.

3. Retail Media: The Cheat Code No One Talks About

Here’s where things get interesting - and where your world (retail adtech) has a massive advantage.

On most platforms:

You’re interrupting someone
They didn’t come to buy
You’re hoping to influence

On retail platforms:

They’re already shopping
Wallet is mentally open
You’re just nudging decisions

That difference? It’s everything.

Amazon, Walmart, Instacart, and every retailer with a login page basically said:

“Why fight for attention… when we already own intent?”

So instead of:

guessing who might want shoes

You target:

someone literally searching “running shoes size 10”

That’s not advertising. That’s assisted decision-making with a credit card nearby.

And this is why Retail Media Networks (RMNs) are exploding:

Higher conversion rates
First-party data (goodbye cookies 👋)
Closed-loop measurement (you saw → you bought → we prove it)

In a zero-click world, retail didn’t lose power. It became the final boss.

4. Performance Max: The “Trust Me, Bro” Economy

Enter Google’s favorite child: Performance Max.

It basically says:

“Give me your budget. Tell me your goal. Now… go away.”

No keywords. No placements. No control.

Just vibes and machine learning.

Behind the scenes, it’s doing all the messy work:

Search
YouTube
Display
Gmail
Maps
AI interfaces

You don’t know where your ad showed. You don’t know why it worked.

But if conversions go up, nobody asks questions.

It’s like hiring a chef who won’t show you the kitchen… …but the food slaps every time.

5. The Content Creators Strike Back (Politely, With Invoices)

Publishers finally noticed:

“Wait… AI is summarizing our content… and we get nothing?”

So now, the internet’s biggest content factories are flipping the script.

Reddit: “Pay up if you want our chaos.”
News orgs: “This journalism isn’t free.”
Forums/blogs: “No more freeloading, buddy.”

We’re entering the Data Licensing Economy:

Lump-sum deals
API access
Content partnerships

So even if you never visit the site… they might’ve already gotten paid.

It’s like Netflix for data - except the audience is AI.

6. The Real Shift: From Traffic to Territory

The biggest mindset change?

We’re moving from:

“How do I get users to my site?”

To:

“How do I exist wherever the user already is?”

That includes:

AI answers
Retail platforms
Closed ecosystems
Walled gardens

Owning traffic is hard. Owning presence is the new strategy.

The Bottom Line

The internet isn’t dying. It’s just… becoming quieter.

Fewer clicks. Fewer tabs. Fewer chances to “win” the old way.

But underneath?

Ads are still being served
Money is still moving
And platforms are getting better at capturing intent without you noticing

The billboard didn’t disappear.

It just started talking back.

Final Thought

If your strategy still depends on:

“Let’s drive traffic to our website…”

You might be optimizing for a world that’s already gone.

The real question is:

When the user never leaves the platform… does your brand still show up?

Because in this new game, you don’t win the click.

You win the moment before the decision.

The Great Deep-Freeze Plot Twist: How Canada Outsmarts Winter

2026-03-20T04:09:00+00:00

I’ve had a long, confusing relationship with water. Not emotionally - thermodynamically.

A few years ago, I spent a winter in Himachal Pradesh, India. Beautiful mountains, peaceful villages… and taps that stopped working the moment the temperature dipped to a very manageable -5°C. Every morning, I’d turn the faucet with hope, only to be ghosted by ice. Absolute betrayal.

Fast forward to Canada. It’s -40°C. The kind of cold where your eyelashes freeze mid-blink and stepping outside feels like a personal attack.

And yet… the tap water flows. Effortlessly. Suspiciously smooth.

At this point, I had questions.

How is it that in one place, water gives up at -5°C, while in another, it powers through -40°C like it has a gym membership and something to prove?

Let’s break down this cold conspiracy.

1. The “Six-Feet-Under” Strategy (aka Plumbing Goes Into Witness Protection)

In Canada, pipes aren’t just installed - they’re strategically hidden from winter.

Most water lines sit 6 to 10 feet underground.

Why this works: The ground acts like a giant thermal blanket. Once you go deep enough, the temperature stabilizes at around 4°C year-round.
Translation: While chaos is happening above ground, the pipes are down there living a calm, temperature-controlled life.

Meanwhile, in many colder mountain regions:

Pipes are often on the surface or barely buried
Which is basically like leaving your water bottle in the freezer and expecting it to stay liquid

👉 Result: Canada = underground spa retreat Surface pipes = frozen regret

2. The “Why Is That Pipe Outside?!” Mystery

This one is genuinely confusing.

You’ll see pipes, valves, and those motor-looking systems sitting outside buildings in freezing temperatures, fully exposed, like they’ve made peace with their fate.

But here’s the twist:

They’re not exposed. They’re just well-dressed.

These outdoor components are often enclosed in insulated (and sometimes heated) boxes
Think of them as winter jackets… but engineered

And then comes the movement factor

Water that keeps moving is much harder to freeze.

Systems are often designed to keep water circulating
Or at least prevent it from sitting still long enough to freeze solid

👉 Think of it like:

Still water freezes quickly
Moving water resists freezing

Or simply:

Keep water busy, and it won’t turn into ice.

3. The Frozen Lake Illusion

Driving a car on a frozen lake already feels like you’re breaking several life rules at once.

But the real twist comes when you drill through the ice - and find liquid water underneath. With fish. Just casually existing.

So what’s going on?

Water has trust issues with physics

Most substances get denser when they freeze
Water expands and becomes lighter

👉 Which means: ice floats

What this creates:

The top layer freezes first → forming a thick ice sheet
That ice layer acts like a hat or blanket
It traps heat in the water below

At the bottom:

Water stays around 4°C
Cold, but not frozen

👉 So under the ice:

It’s stable
It’s insulated
And life goes on (just a bit slower)

Where Else This “Don’t Freeze” Logic Shows Up

Once you notice it, this isn’t just about pipes or lakes - it’s a universal strategy.

1. The Human Body

Your body is basically a high-end plumbing system.

When it’s cold, blood flow reduces to extremities
Focus shifts to protecting vital organs

At the same time:

Your heart keeps blood circulating constantly

👉 Moving fluid + protected core = no freezing crisis

2. Fire Sprinklers in Cold Spaces

Ever noticed sprinklers in unheated parking garages and wondered how they survive winter?

They use dry pipe systems:

Pipes are filled with pressurized air, not water
Water only enters if a fire is detected

👉 No standing water = nothing to freeze

3. Space Systems (Because Why Not Go Extreme)

In space, temperatures drop to around -270°C.

To handle this:

Systems use fluids like ammonia with very low freezing points
These fluids are kept circulating continuously

👉 Even in space, the same rule applies: Use the right fluid + keep it moving

The Real Rule of Winter

After all this, the secret isn’t complicated - it’s just applied really well.

If you don’t want something to freeze:

Put it deep underground

Wrap it properly

Or keep it moving

That’s it.

From underground pipes to frozen lakes to human survival systems - the same three tricks show up everywhere.

And once you see it, winter starts to feel less mysterious… and a lot more engineered.