This article details the key difference between the data fabric and the data mesh—their unique design concepts, architectures, philosophies, and use cases. Understanding these approaches is essential if you’re looking to optimize your organization’s data strategy and harness the full potential of your data assets. To that end, we’ll break down their key characteristics, compare their strengths, and provide guidance on when to implement each.
When you finish reading, you’ll understand how data mesh and data fabric can transform your data management practices.
What is data mesh?
A data mesh is a decentralized approach to managing and integrating data. It distributes data ownership to domain-specific teams, such as human resources, sales, finance, and legal. This approach follows a principle called domain-oriented data ownership, which distributes data ownership to the teams closest to it.
Data mesh takes a federated approach to data governance. That means that although different domains own the data, there are universal, standardized governance policies that every domain must follow. Think of it like the United States. Federal laws govern the entire country and trade between states, but most governance is left up to individual states.
Is data mesh a technology or a methodology?
It’s more accurate to say that data mesh is a methodology rather than a technology. While implementing a data mesh may involve adopting new tools, it primarily represents a shift in data management processes, organizational structure, and culture. Data mesh is built around four core principles:
-
Domain-oriented data ownership
Data ownership is decentralized, with specific business domains (marketing, finance, etc.) taking responsibility for the data they generate. Each domain is accountable for managing, sharing, and governing its data, ensuring that it meets the needs of data consumers across the organization. This structure encourages domain experts to curate and steward data, leading to higher quality and relevance.
-
Data as a product
Each domain treats its data as a product, emphasizing usability, quality, and discoverability. Data should be designed to meet the needs of its users, with clear documentation, defined SLAs, and intuitive interfaces. Like any product, data should be continuously improved based on feedback, ensuring it remains valuable and accessible to a broad range of stakeholders, including cross-functional teams.
-
Self-serve data access
Data mesh promotes a self-service approach, enabling teams to access, use, and analyze data without needing to rely on a centralized data team. This requires robust, scalable, and user-friendly infrastructure that makes it easy for users to explore and use data with minimal friction. Tools for data discovery, lineage, and transformation are essential to making this principle a reality.
-
Federated data governance
Although data ownership is distributed across domains, there is still a need for overarching governance to ensure consistency and compliance across the organization. Federated governance establishes enterprise-wide standards and policies while allowing domains the flexibility to adapt to their specific needs. This balance helps maintain data quality and security without stifling innovation and agility at the domain level.
By focusing on these four principles, data mesh shifts the paradigm from centralized data management to a decentralized, domain-oriented approach to align data strategy more closely with business needs.
What is a business domain?
In the context of data mesh, a business domain is a specific function, operation, or area of responsibility within an organization. Every organization may define its domains as it sees fit, but domains are often organized by department (e.g. customer service) or function (e.g. order management).
Identifying suitable domains is critical for developing a data mesh because it is the basis for delegating responsibility for data.
One of the fundamental tenets of data mesh is that the teams within each domain have the best understanding of their data. So they are the best positioned to manage the data so it suits their needs. If the domains are poorly chosen, it’s more difficult to build an effective data mesh.
What is domain-driven data?
Domain-driven data is a data management concept inspired by a software development concept called domain-driven design (DDD), popularized by Eric Evans.
Domain-driven design and domain-driven data share a few commonalities, which include:
- A focus on understanding and modeling the business domain as part of the development process.
- The use of common language between technical and non-technical users to describe the domain.
- A recognition that large systems often contain multiple subdomains that have their own bounded context.
Where domain-driven data is distinct from domain-driven design is in its scope, artifacts, and stakeholders.
Domain-driven data focuses on data modeling, governance, and management, is implemented in data storage systems, and involves data engineers, data scientists, business analysts, and domain experts.
Why does data mesh need a cloud-native infrastructure?
A cloud-native infrastructure is ideal for data mesh but isn’t technically required. It’s possible to implement a data mesh using on-premise infrastructure, though it will require significantly more effort and expense.
The benefits of using a cloud-native infrastructure to implement a data mesh are numerous and compelling. These benefits include:
- Support for distributed architecture: A data mesh requires support for distributed workloads, and cloud platforms are much better suited for that than on-premise infrastructure.
- Self-service capabilities: Cloud services offer self-service interfaces with a wide array of AI/ML tools, APIs, data services, and more, which aligns with data mesh’s principle of domain-oriented decentralized data ownership.
- Standardization: Cloud-native infrastructures provide consistent deployment, governance, and communication frameworks, which helps enforce uniformity across various domains.
- Automation: Cloud-native technologies facilitate automation in deploying, scaling, and managing data products.
- Interoperability: Cloud services have built-in integrations, making connecting various data products and tools easier.
- Cost-effectiveness: Pay-as-you-go models in cloud computing can make it more economical to implement and scale a data mesh architecture.
Example of data mesh in action
Zalando, a European e-commerce company, built a data mesh to manage complex, distributed data. They organized their data around business domains such as Order Management, Product, and Customer.
And while there were company-wide standards for data security, interoperability, and quality, each team had autonomy in implementation. Based on these standards, each team created data products that other teams could consume from the self-serve infrastructure Zalando built. Dr. Alexander Borek, Director of Data Analytics at Zalando, details their approach on The Data Chief podcast.
To make this example more concrete, imagine Zalando’s data science team wanted to better predict customer churn.
Under the old paradigm, they had to collect data from different departments including interaction history, purchase history, engagement data, and browsing data. They’d then need to prepare and clean the data before they could even start building and deploying their churn prediction model.
With the data mesh, the data science team can access data products from each department for interaction, purchase history, engagement, and browsing data. And because each data product adheres to company-wide standards, there’s no need for a lengthy integration process. Zalando can now update and deploy churn prediction models much faster and more frequently to better inform customer retention activities.
What is data fabric?
Data fabric is also a relatively new, continuously emerging data integration and management design concept. It uses advanced data technologies, like knowledge graphs and AI/ML on active metadata to create scalable, augmented data integration pipelines that support different use cases on multiple data platforms.
Despite what many organizations may be led to believe, data fabric is not a single solution. Unfortunately, because most organizations can’t clearly define data fabric, they purchase technology that only solves specific problems. This exacerbates the issue data fabric aims to solve because it results in new, additional data silos.
Essentially, the data fabric processes active metadata from various systems and offers automated alerts and recommendations to improve data integration, enhancing the user experience. Some mature enterprises have achieved this effect without data fabric, but maintaining it relies on manual efforts, which is unsustainable.
Is data fabric a technology or a methodology?
Data fabric is not a single technology, or even a group of technologies. It’s a design approach to data management that is metadata-driven. This is in contrast to traditional data management paradigms which are driven by static data models and manual processes.
In fact, implementing data fabric can be done without removing or replacing existing systems. Instead, existing systems supply the data fabric with their metadata which is analyzed to provide insights and recommendations on data management. These recommendations automate data management decisions, such as what integration process to use and how and where to store data.
What is a centralized data integration layer?
A centralized data integration layer is the hub that collects, processes, and combines data from various systems.
This layer is not unique to data fabric. It’s present in every data management architecture. In traditional architectures, the data warehouse serves as the centralized data integration layer. Some enterprises use data lakes as the centralized integration layer.
The key distinction in a data fabric is that its data integration layer is enhanced by active metadata, automation, and AI/ML.
Examples of data fabric solutions
It bears repeating that there is no single data fabric solution you can buy. Obtaining a data fabric requires assembling and combining different tools from different vendors. Not only that, you’ll need data engineers with the skills to make everything work.
That said, according to Gartner, there are several capabilities you’ll need to implement a data fabric. These capabilities include:
- Augmented data catalog
- Knowledge graph
- Data integration
- Data preparation and data delivery
- Metadata activation
- Recommendation engines
- Data and AI orchestration
Examples of vendors who provide solutions for implementing a data fabric include Denodo, IBM, TopQuadrant, Qlik, Informatica, and Cambridge Semantics.
Data mesh vs. data fabric at a glance
Parameter | Data Mesh | Data Fabric |
---|---|---|
Data Ownership Model | Decentralized – Domain teams fully own and manage their data. | Centralized – A central IT or data team owns and manages the enterprise’s data. |
Architecture Philosophy | Domain-driven – Each domain treats data as a product. | Metadata-driven – Data architecture is centralized and automated based on metadata. |
Data Integration Approach | Distributed – Each domain handles its own integration processes according to enterprise standards. | Unified – Data integration pipelines are centralized across all systems. |
Governance | Federated – Global policies with local autonomy for domains. | Centralized – Uniform governance applied across the organization. |
Scalability Model | Independent – Each domain scales independently. | Centralized scaling – Can face bottlenecks due to central control. |
Interoperability | Requires cross-domain consistency efforts. | Built-in interoperability across systems. |
Data Discovery | Domain-specific catalogs – Each domain maintains its own catalog. | Unified discovery – Centralized view of all data sources. |
Automation | Limited – More manual processes within domains. | High – Metadata-driven automation for integration and management. |
Data Quality Control | Distributed – Each domain is responsible for its data quality. | Centralized – Managed by a central team or automated processes. |
Technology Focus | Leverages diverse technologies per domain’s needs. | Utilizes advanced technologies like knowledge graphs and ML/AI, |
When to use data mesh vs. data fabric
Both data mesh and data fabric share the common goal of easing access to data. And it’s worth noting that, in the right context, you don’t have to choose between one or the other—they can be complementary approaches.
However, it may make sense to choose data mesh vs. data fabric. In its research report on data fabric and data mesh, Gartner notes two key parameters to base your decision on:
- Metadata maturity — Without the capability to collect and share metadata between tools, the promise of data fabric will be largely unfulfilled.
- Governance maturity — To enable a data mesh, an organization must be ready to apply a consistent governance approach across domains.
This chart from Gartner’s research report displays how to think about when to use data mesh vs. data fabric:
Metadata Maturity | Governance Maturity | Decision Guidance |
Yes | Yes | Pursue fabric and use it as a base for building domain-centric data products. |
Yes | No | Pursue fabric. Introduce an adaptive governance approach to contextualize business scenarios. |
No | Yes | Pursue mesh. Capture and start using metadata. |
Source: Gartner Research: Gartner, Data and Analytics Essentials: Data Fabric and Data Mesh, Robert Thanaraj, Mark Beer, Ehtisham Zaidi, 19 April 2023
Note that while you can assess your data governance maturity with tools like the Advanced Analytics Capability Maturity Model, metadata maturity is more difficult to quantify. Gartner defines it as “the ability to collect, comprehend, and share metadata.”
Other parameters to consider when you’re deciding to use data fabric vs. data mesh are:
- Data complexity and domain expertise: The more diverse your data is across various business units, the more it benefits from the domain-specific knowledge in a data mesh.
- Organization structure and culture: Data mesh is a logical approach for an organization structured to facilitate autonomy and distributed decision-making, whereas data fabric is better for organizations that prefer centralized oversight.
- Scalability, compliance, and innovation needs: In a data fabric, innovation, compliance and scale are achieved at an enterprise level. But in a data mesh, it’s possible to scale effectively, innovate faster, or comply with domain-specific requirements within one domain compared to another.
- Existing resources: Data mesh maximizes your ability to leverage strong data expertise distributed across different domains. Data fabric is more aligned with organizations with centralized data expertise and advanced technologies like AI/ML for data integration and management.
When to avoid both data fabric and data mesh
It’s critical to note that both data fabric and data mesh are emerging approaches to data management. According to Gartner, most enterprises actually don’t have the data governance or metadata maturity sufficient to effectively implement a data fabric or data mesh.
If you’re not ready for data mesh and data fabric, building a logical data warehouse or a data lake house is the next best path forward. During the implementation process, your organization can build towards the maturity required to implement data fabric or data mesh.
When to combine both data fabric and data mesh
The final scenario to explore is a combination of data mesh and data fabric, which is the best-case scenario. However, for most enterprises, the resources and expertise required to combine data mesh and data fabric effectively are currently out of reach.
While the two approaches emphasize different design concepts, their goals are the same. Whereas data fabric’s defining design characteristic is advanced technology and metadata-driven architecture, data mesh is defined by its focus on a domain-oriented approach to data management. But the reason these are rarely combined isn’t because they are incompatible.
By combining these approaches, organizations leverage the distributed domain expertise of specialized teams and the automation capabilities of a metadata-driven architecture. The data fabric serves as the foundation, where data policies are centrally governed, and the data mesh delivers domain-driven data products.
In essence, humans focus on applying their expertise to define valuable analytics use cases, while machines focus on optimizing the data infrastructure to support those use cases.
Assessing your data management strategy
As you’ve seen, both data mesh and data fabric offer unique advantages for managing vast amounts of data, depending on your organization’s structure, expertise, and technological maturity. Whether you’re drawn to the decentralized governance of data mesh or the metadata-driven automation of data fabric, choosing the right approach requires a thoughtful evaluation of your data needs and capabilities.
At Intellias, we bring the expertise needed to help you navigate this complex decision. With sour data and analytics services, you can implement a robust, future-ready data management solution that fits your business goals.
Contact us today to transform your data strategy and unlock its full potential.