You built a solid retail media platform.
Think something in the league of Amazon Ads or Walmart Connect:
Life was good. Numbers showed up fast. Stakeholders nodded confidently.
Then someone said:
“Can we add AI to generate insights, not just show numbers?”
Of course. How hard could that be?
Your shiny new AI agent gets its first question:
“What’s driving performance last week?”
Seems straightforward.
Except… your system doesn’t have one definition of anything.
Take something as basic as money:
Same campaign. Same timeline. Slightly different logic behind each.
Now the AI has to decide:
Are these the same thing? Should I combine them? Compare them?
Good luck.
Let’s get into a more “this actually happens” scenario.
You have two datasets:
deterministic_ad_logs
synthetic_event_logs
Now someone asks:
“Give me CTR and conversion rate.”
To answer correctly:
But here’s the catch:
Both tables have a column called impressions.
Same name. Different meaning. Equal confidence.
Now imagine an AI agent trying to pick the right one without context.
It’s like giving someone two identical-looking doors and saying, “One leads to the right answer, the other leads to a slightly wrong answer that looks completely right.”
At this point, your architecture quietly starts sweating.
Because semantics are scattered:
None of these are guaranteed to agree tomorrow.
So:
No errors. Just different truths.
Here’s what your AI actually does behind the scenes:
First, it tries to understand
Then it runs the query
That’s already two steps.
Now layer in your serving system.
Druid is fantastic at aggregations. It is… less enthusiastic about joins across datasets.
So what happens?
And then:
The final “join” happens in your application layer
Not in a database. Not optimized.
But in code that:
It works most of the time.
And then one day:
Now your AI confidently explains a number that doesn’t quite exist.
At some point, someone suggests:
“Why don’t we just create one big table with everything?”
And yes, you do build wide, pre-joined tables for common queries:
These help. A lot.
But they don’t solve everything.
Because:
So now you have two paths:
And if semantics aren’t consistent across both?
The same question gives different answers depending on how it was asked.
That’s not a bug. That’s a trust crisis.
Let’s be clear.
You’re not exposing raw sensitive data:
The problem isn’t leakage.
The problem is interpretation within allowed data.
For example:
Everything is technically allowed.
But not everything is correct.
Governance answers:
“Can you access this data?”
Semantics answers:
“Are you using it the right way?”
Right now, every system answers that second question differently.
Some teams hardcode definitions in services. Fast to build, guaranteed to drift.
Some go heavy on pre-aggregations. Fast queries, painful to evolve.
Some rely on AI to infer everything. Flexible, but unpredictable and slower.
Some juggle all three and hope for the best.
Spoiler: hope is not an architecture.
The shift is subtle but powerful:
Stop redefining metrics everywhere. Define them once.
A proper semantic layer does a few unglamorous but critical things:
Clearly distinguishes
So when the AI gets:
“Top campaigns by conversion rate”
It doesn’t improvise.
It follows a defined path:
Even if multiple systems are involved, the stitching is:
intentional, not accidental
Once this is in place:
And most importantly:
The same question stops producing multiple believable answers.
Adding AI didn’t break your system.
It exposed what was already fragile.
Because dashboards can get away with inconsistency. AI cannot. It has to explain things.
And the moment it explains:
Any ambiguity in your data model becomes painfully obvious.
You can build the fastest queries. You can design elegant pipelines. You can add the smartest AI.
But if:
Then your platform isn’t delivering insights.
It’s generating very convincing confusion.
And now, thanks to AI…
It does it in full sentences.