At Hitachi Vantara, we see data fabric as the technology/architectural construct that helps realize a cohesive view of data from disparate data mesh domains. Data fabric itself is realized as integration, orchestration, data virtualization and federation layers built on top of multiple, disjointed data repositories, data lakes or data marts to provide a unified view of all enterprise data. It is independent of current physical implementation and agnostic to existing data environments and processes.
A data fabric supports both operational and analytic use cases delivered across multiple deployment and orchestration platforms. It also helps in data integration styles by using active metadata, knowledge graphs, semantics and machine learning.
The key foundation for the data fabric is a DataOps platform that brings DevOps practices to the data pipeline using a consolidated set of processes and tools. To a large extent, DataOps automates the end-to-end process of discovery, integration, storage, governance and self-service consumption of data, thereby maximizing the value derived in a meaningful and cost-effective way.
In all, data fabric is the engine that powers the data mesh. Data fabric makes the data mesh better by automating key mesh concepts to create data products faster, in a globally governed way, while providing a seamless link between all data components and users.
Figure-3 below illustrates some of the salient features of the data fabric:
Summarizing data fabric and data mesh concepts
A data fabric is designed to provide an integrated view of all the data components and to make it easy to ingest, store, transform, access, manage and analyze that data, while a data mesh is designed to provide a more decentralized way to ingest, store and manage data, and make it easier for users and applications to access.
In a data mesh, each service or team has its own data store and data model, and data is shared through APIs. In contrast, in a data fabric, data is managed centrally, and access is granted through a centralized layer. Data fabric focuses on a unified view of data and ease of access, while data mesh focuses on flexibility, scalability and ownership.
Figure-4 provides a visual representation of how data mesh and data fabric can work together to create a better data ecosystem:
This architecture on Azure provides an idea of how it can be implemented on other clouds, and also on a platform containing both on-prem and cloud components.
Reasons to consider this data fabric plus data mesh architecture
While the need to have a data mesh or data fabric has been strengthened by the factors below, each of these current problems is solved in a novel way using the collaborative approach:
· Increasing data volume
Traditional approach would mean you perform ingestion into different layers, performing ETL and then making changes to all components to start deriving value.
Data mesh plus data fabric architecture means automated data ingestion policies and virtualization for users querying the data, thereby reducing redundancy of ETL and storage, and a self-serve BI layer to ensure faster readiness of data for reporting and analytics.
· Variety of data types and formats
Traditional approach would necessitate extensive standardization and manual processes to integrate with one another for analytics increasing development and testing costs.
Data mesh plus data fabric architecture means modularizing the entire data lifecycle and treating each set of data as an entity owned by the team that generates it. This makes sure teams have the data in a way the analytics layer can consume as is, while only the mandatory summarizations are done for combined analytics.
· Expanding IoT data and edge, core and third-party data streams
Traditional approach would mean more layers of data storage, which would increase cost without knowing if there is a value to be derived from that.
Data mesh plus data fabric architectures have data governance policies, and different storage types and retention periods for different types of data, for cost savings and automated compliance.
· Diversified analytics
Traditional approach would mean a fixed set of KPIs, forcing all the data on different platforms to comply to standards to satisfy those KPIs.
Data mesh plus data fabric architectures provide the advantage of accessing and processing the data as a single version of truth, close to the source, and provides different platforms to cater to different levels of KPIs feeding different categories of users.
· Multiple data locations including on-prem and cloud
Traditional approach forces organizations to spend millions of dollars just to integrate the data to even understand if it can provide any value.
Data mesh plus data fabric architectures solve integration issues by implementing a framework to decide which data should reside where, and how they need to be merged. As well as how virtualization can be used to first understand the combined value before making investments.
Conclusion
Data fabric was listed among Gartner’s Top Strategic Technology Trends for 2022. Together with data mesh’s key principles, it can serve as the next beacon for organizations planning to embark on their journey towards data modernization.
Data mesh and data fabric are two related concepts that aim to address the complexity and scalability challenges that arise when building and maintaining large, distributed systems. Together, they offer a powerful approach to building data-driven systems that are more scalable, adaptable and resilient. By breaking down data silos and promoting autonomy, data mesh allows teams to move quickly and independently. And by providing a common foundation for data management, data fabric makes it easier for teams to share data and collaborate effectively.
However, despite the advantages, implementing data mesh and data fabric can be challenging. It requires a significant shift in organizational culture and practices, as well as adapting to newer technologies and infrastructure services. It also requires a deep understanding of the data domains and business requirements.
On that topic, Hitachi Vantara is here to help. We have extensive expertise handling complex data projects and designing and implementing a data fabric and data mesh-based approach. We are happy to help and guide customers for all data needs.
Subramanian (Subbu) Venkatesan
Senior Architect – Technology & Solutions
email : Subbu.Venkatesan@hitachivantara.com