What is Data as a Product and why is it becoming a go-to data analytics pattern.
Both traditional business intelligence (BI) and Data as a Product aim to support self-service analytics, empowering users to explore data without relying entirely on technical teams. But there’s a fundamental difference in how these approaches enable users, adapt to changing needs, and manage relationships between data.
Let’s dig into how Data products, or Data as a Product—and why it is becoming so pervasive in organisations, even rivalling traditional approaches to Business Intelligence.
In this short article I start by delving into self-service and the differences between traditional BI and data as a product approaches.
I then look at how data products stay organised without formal models.
I then describe how data as a product are delivered through the lens of both a medallion architecture, and a data mesh framework.
I then describe a short real-world example before concluding with why data as a product will become more and more pervasive in data ecosystems.
Self Service
The Foundation of Self-Service: Traditional BI vs. Data as a Product
Traditional BI: Self-Service, But with Limits
In the traditional BI world, self-service is powered by a combination of upstream data models and downstream reports. For example:
A data team builds a Customer Transactions Model that aggregates transaction data from retail banking accounts. It provides fields like average balance, transaction frequency, and spending categories.
This model feeds reports and dashboards, enabling relationship managers or product teams to explore insights without needing to write SQL or code.
While this approach supports self-service to some extent, it has limitations:
Fixed Scope: The model is purpose-built for specific use cases. If users need new fields (e.g., credit card usage or branch visit patterns), the pipeline and model must be modified.
Dependency on Technical Teams: Adding new dimensions or relationships often requires significant involvement from data engineers, slowing down the process.
Rigid Structure: Models are tightly coupled to specific domains, which can create silos and make cross-domain analysis difficult.
Data as a Product: Self-Service, With Flexibility
Data as a Product also supports self-service but does so with greater flexibility and user empowerment:
Instead of building rigid models tied to specific reports, data teams create modular data products—packaged datasets with clean, reusable data and built-in metadata.
For example, a Retail Banking Customer Data Product might include transactional data, account demographics, loan details, and credit card activity datasets all documented and available for self-service.
What makes this different?
Modular Design: Data products are not tied to one use case or report. Relationship managers, product teams, and marketing analysts can query the same product for different purposes.
Iterative Updates: New fields (like branch visit data) can be added as features, seamlessly enhancing the product without disrupting existing functionality.
Cross-Domain Relationships: While not part of a formal model, relationships between data products are managed through metadata, lineage, and APIs, making it easy to combine datasets (e.g., linking the Retail Banking Customer Data Product with a Fraud Detection Data Product).
So, Why Is Data as a Product Better for Self-Service?
Both approaches aim to empower users, but Data as a Product is better equipped to meet the demands of modern data users for several reasons:
1. It’s Designed for Change
Traditional BI: Once a model or pipeline is built, it’s relatively static. Adding new fields or adapting to new requirements often means significant rework.
Data as a Product: Products are designed to evolve. Adding branch visit data or creating new relationships doesn’t require re-engineering the entire pipeline. Instead, features are layered on incrementally, ensuring users always have access to the latest data.
2. It’s Modular and Reusable
Traditional BI: Models are typically domain specific. For example, a Customer Transactions Model might not integrate easily with a Loan Portfolio Model without additional engineering.
Data as a Product: Products are modular and can be combined as needed. For example, users can link the Retail Banking Customer Data Product with a Loan Portfolio Data Product through shared keys (e.g., customer IDs) and metadata.
3. It Breaks Down Silos
Traditional BI: Models are often siloed, making it difficult for teams to work collaboratively across domains.
Data as a Product: By design, products are interoperable. Clear documentation and governance ensure that teams can understand and connect datasets from different domains.
4. It’s User-Centric
Traditional BI: Self-service is largely defined by pre-built dashboards or specific queries allowed by the model.
Data as a Product: Users have the freedom to query, filter, and combine data however they need, without being restricted by rigid structures.
Of course, this does not men that BI, in terms of business outcome focused models, and reports will disappear. There will always be those who do not hold specific data skills nor aspirations, and for those, ‘someone’ else may still create their BI, even BI through AI (for example as described here).
How Does Data Stay Organised Without Formal Models?
One of the key questions about Data as a Product is how to manage relationships between datasets without a formal, overarching data model. The answer lies in metadata and governance:
Shared Keys and Metadata: Each data product includes metadata that defines its schema, fields, and relationships. For example, a Retail Banking Customer Data Product might include customer IDs, which can link to transaction data in a Fraud Detection Data Product or loan data in a Loan Portfolio Data Product.
Lineage and Documentation: Lineage tools track how data flows across products, ensuring transparency and trust. Users can see how datasets are connected and how transformations were applied.
APIs for Querying and Integration: APIs make it easy to query or combine data across products, enabling cross-domain analysis without requiring a centralised model.
Where are data products delivered?
Data as a product and the Medallion Architecture
In a medallion architecture, data products are typically delivered at the Gold Layer, the final stage where data is curated, enriched, and optimised for specific use cases. It could however also be delivered in the Silver Layer dependant on the methodology followed within the particular organisation.
Let’s break down the medallion architecture and understand how data products fit into this framework:
1. Bronze Layer:
What It Is: The raw data layer where data is ingested in its native format (e.g., JSON, CSV, Parquet).
Purpose: Acts as a "data lake" for storing all incoming data without transformation.
Data Product Role: Not directly used for data products but serves as the source for downstream transformations.
2. Silver Layer:
What It Is: The intermediate layer where data is cleaned, standardised, and structured.
Purpose: Data transformations like deduplication, validation, and joining datasets occur here.
Data Product Role: Some reusable, intermediate data products may emerge here, particularly if they serve cross-domain needs (e.g., a validated customer transaction dataset).
3. Gold Layer:
What It Is: The curated and enriched layer designed for specific analytics, reporting, and self-service use cases.
Purpose: Business-ready datasets are created for consumption by end-users, reports, dashboards, or advanced analytics, including relationships between datasets.
Data Product Role: Most data products are delivered here. They are designed with end-users in mind, featuring documentation, governance, and optimisations for reuse.
There are differing perspectives on where data products should reside within the medallion architecture. One school of thought considers the Gold Layer as the space for outcome-focused models—such as a Customer Fraud model, combining data entities from the Retail Banking Customer Data Product and entities from the Fraud Detection Data Product, all in the Silver Layer. In this view, data products would reside in the Silver Layer. Alternatively, others see the Silver Layer as an interim step toward refined data products delivered in the Gold Layer, with an additional fourth layer, the Reporting Layer, reserved for outcome-focused models. Both approaches have merit and often come down to the complexity of the data ecosystem and the trade-offs between fewer versus more layers.
Data mesh and Data as a product
Data mesh, a decentralised approach to data management that emphasises domain-oriented ownership. In a data mesh framework, each domain team is responsible for its own data products, ensuring they are discoverable, understandable, trustworthy, and interoperable. This promotes the idea that data is managed with the same rigor and quality as any other product, aligned with organisational goals and user needs. This shifts the perception of data from a mere byproduct to a valuable asset, central to business strategy and decision-making
Sticking with my banking theme, in my banking organisation, the Customer Transactions Data Product data product is delivered and managed by the Retail Banking domain team, whereas the Loan Portfolio Data Product data product is managed by the Lending Services domain team.
Real-World Example
Let’s bring it to life:
Traditional BI
Your organisation builds a Customer Transactions Model to feed marketing reports. The model includes average balance, transaction frequency, and top spending categories. Later, product teams want credit card activity data. The pipeline must be reworked to include this new dimension. When the branch network team asks for branch visit patterns, another overhaul is needed.
Data as a Product
The appropriate data team creates a modular Retail Banking Customer Data Product that includes transactional data, account demographics, and loan and credit card details. If a new requirement (like branch visit data) arises, it’s added as a feature to the product without disrupting existing functionality. Users from marketing, product, and branch network teams can all access the updated product, link it to other datasets, and query it as needed.
Note I use the term “the appropriate data team” as it could be either a central data team in a centralise framework, or the retail banking domain data team in a decentralised/ data mesh framework.
Conclusion
Data as a Product - what is in it for you?
Switching to Data as a Product doesn’t just improve self-service—it transforms how organisations work with data:
Empowers Teams: Users have greater flexibility to explore data and build their own insights.
Adapts to Change: Adding new fields or relationships is seamless, keeping data products relevant.
Breaks Down Silos: Modular products with shared keys make cross-domain analysis simple and intuitive.
Builds Trust: Governance, metadata, and lineage ensure transparency and reliability.
Final Thoughts
Both traditional BI and Data as a Product support gaining insights from data and using it as a true asset, but the latter is designed for the dynamic, ever-changing needs of today’s data-driven organisations. By shifting from rigid models to flexible products, organisations can empower users, embrace change, and unlock new possibilities for collaboration and innovation.
This trend will likely accelerate as the workforce become more data savvy, with access to tools do build their own outcome-focused models, with an attitude of ‘just give me access to the data’. Of course, this in turn, raises the importance of proper governance, pragmatic guardrails, and cataloguing of whatever is built. But those are topics for another day.
Comments