Semantic layers are finally getting opinionated enough to be useful
Semantic layers are finally getting opinionated enough to be useful
A semantic layer isn't new; it's the "business translation" that turns raw data into actionable insights for real decisions. It is now necessary for AI, as it's a relatively new technology.
40% of people who use Databricks still don't use dbt. Each BI tool has its own definition of "revenue." What happened? There are dozens of dashboards, but none of them line up.
AtScale, Stardog, Databricks Unity Catalog Metrics, and other semantic layers fix this by defining metrics once and making them usable in SQL, DAX, MDX, Python, and even AI agents.
Your dashboards and model training data should both use the same "revenue" metric.
The magic is not "no-code BI." It's no-drift semantics, which means that metrics have the same meaning for analysts, ML engineers, and LLMs.
The AtScale + Databricks "Semantic Lakehouse" model gets this right:
- No moving data
- Automatic aggregates
- Unified metric definitions
- Direct integration with Unity Catalog and Spark.
It's not so much about analytics as it is about providing AI with a stable way to discern the truth in business.
My new TIL post, "Semantic Layer Solutions in Modern Data Architecture" has a clear walkthrough. It talks about vendors (AtScale, Stardog, Timbr), integrations (Unity Catalog Metrics, Power BI), and the Databricks + AtScale partnership that makes "semantic lakehouse" more than just a buzzword. It explains what a semantic layer is, why 40% of Databricks users still don't use dbt, and how tools like AtScale, Stardog, and Databricks Unity Catalog Metrics are working to solve the "truth problem" in analytics.
It's clear to me now that the semantic layer isn't just a BI issue; it's the missing link between data, metrics, and AI. Once "revenue" has the same meaning in SQL, DAX, MDX, and Python, you've built a base for both human and machine reasoning. That semantic consistency carries over to everything else, such as dashboards, copilots, and evaluations.
"AtScale's main selling point is that it stops data from moving..." It queries data in place within Databricks, creates and manages aggregates independently, accelerates performance through intelligent caching, and maintains a single source of truth without duplication. (AtScale x Databricks blog)
That "no movement" promise isn't about performance; it's about honesty and governance. You create drift as soon as you pull data into a BI cube. AtScale's Databricks integration closes that loop by bringing together technical lineage (through Unity Catalog) and business-facing semantics. It's not flashy, but it's the kind of base that AI architectures will need.
The next step is to use the same layer to train and test the AI model. The same "revenue" metric that powers dashboards should also drive model features and evaluation metrics. That's how you stop AI from learning about the business from one definition while executives use another.
Pushback (you should ask this before you buy anything):
- "Single source of truth" is just a saying unless you hard-gate BI against the layer. You'll end up with "two truths and a hope" if teams can still point tools directly at warehouse tables.
- It sounds great that MDX, DAX, and LookML will all work the same way. However, in reality, edge-case functions and time-intelligence logic do not map one-to-one. Set aside time for testing equivalence.
- It's true that vendor lock-in happens. If your BI surface area is small, Unity Catalog Metrics and dbt/MetricFlow-style semantics might be "good enough." When you're "multi-surface and politically decentralized," AtScale is worth it.
Refer to the full TIL for details on the vendor breakdown, query protocols, and implementation patterns.
If you want to know what to take away, semantic layers are becoming the interface layer between data, analytics, and AI, and they will determine how truth spreads through systems.