Hitachi Application Reliability Centers

The benefits of the Data Lakehouse on Hybrid Cloud solutions

This thread has been viewed 16 times
  • 1.  The benefits of the Data Lakehouse on Hybrid Cloud solutions

    Posted 08-18-2023 09:54

    The benefits of the Data Lakehouse on Hybrid Cloud solutions

    In this blog post we will focus on two subjects that at first sight could have nothing to do with each other, however, let's look at some details of the Data Lakehouse, Hybrid cloud solutions, and how they make sense together.

    Let's start by going on a short summary of the Lakehouse

    In today's world, businesses are generating more data than ever before. This data can be used to gain valuable insights into customer behavior, optimize operations, and make better decisions. 

    However, storing and managing such huge amounts and formats and data types in data warehouses became a challenge and at a point in time the concept of the Data Lake came to help. The Data Lake came to help, however companies are now balancing the use of the Datawarehouse against the use of the Data Lake, and in most of the cases maintaining two different systems to be able to answer two different questions.

    Data Lakehouse VS Data Lake

    We could say that it's not impossible to answer the questions with one of the platforms however that would come with technical challenges that most of the times took companies to opt to have two data platforms to support each other. 

    So, wouldn't be great to have One Data Platform that could answer the full question: What happened and what will happen? That's what the Data Lakehouse is about for the business, leveraging the advantages of the Data Warehouse, leveraging the advantages of the Data Lake in one platform that serves multiple questions. 

    Evolution to a Data Lakehouse

    Besides, in one side we have the cost effectiveness of the data lakes when it comes to store huge amounts of data, on the other side the flexibility of performing business analysis on the data warehouse. Of course, there are more differences advantages and disadvantages, however for the benefit of this article we will stick with these two. 

    Looking at the advantages of the Data Lakehouse, we should focus on: 

    • Flexibility: Data Lakehouses can store all types of data, structured and unstructured. This makes them a good fit for a variety of use cases, such as business intelligence, machine learning, and artificial intelligence.
    • Scalability: Data Lakehouses are scalable to meet the needs of growing businesses. They can be easily expanded to store more data and handle more workloads.
    • Cost-effectiveness: Data Lakehouses can be a cost-effective solution for storing and managing large amounts of data. They can be much cheaper than traditional data warehouses.
    • Performance: Data Lakehouses can provide high performance for data analysis and machine learning tasks. This is because they use a variety of technologies to optimize data access and processing.
    • Manageability: Data Lakehouses are easier to manage than traditional data lakes. They use a variety of tools and techniques to make data more accessible and searchable.

    Also, important to mention that one of the premises of the Lakehouse is to separate compute and storage.

    Lakehouse - Separation of Compute & Storage


    The diagram above shows the separations of compute and storage, which I can detail in another blog post if you really want/need to know more, let me know on the comments. This premise, of separation of compute and storage is quite important for what we will be discussing next.

    Now let's talk about Hybrid Cloud

    Let's consider Hybrid Cloud, as being a combination of services running on premises/co-located with services running cloud provider. There is a lot to say about hybrid clouds, how to best leverage it, where it makes sense.

    Let's cover some of the advantages of Hybrid cloud:

    • Flexibility: Hybrid cloud solutions offer the flexibility to choose the right platform for different workloads. For example, you can store your most sensitive data on-premises, while storing your less sensitive data in the cloud.
    • Scalability: Hybrid cloud solutions can be scaled to meet the needs of your business. You can easily add or remove resources as needed.
    • Cost-effectiveness: Hybrid cloud solutions can be more cost-effective than traditional on-premises or cloud-only solutions. You can take advantage of the economies of scale offered by the cloud, while also keeping some of your most sensitive data on-premises.
    • Security: Hybrid cloud solutions can be more secure than traditional on-premises or cloud-only solutions. You can use a variety of security measures to protect your data, both on-premises and in the cloud.

    Known the advantages of Hybrid cloud solutions, there are companies already leveraging those. Companies that are high regulated, companies that want to keep security under more restrictions, companies that want to reduce impact of shifting completely to cloud, and companies that want to be to have a tighter control of cloud costs, can and should leverage hybrid cloud solution.

    Hybrid Advantages

     

    The Data Lakehouse and Hybrid Cloud solutions

    We have discussed before that one of the premises of the Data Lakehouse is the separation of storage and compute. Hybrid cloud solutions combine the benefits of on-premises and cloud computing. This can provide organizations with the flexibility to choose the right platform for their data and workloads. When considering that the Lakehouse separates storage from compute, it makes the data Lakehouses ideal for hybrid cloud solutions. Businesses can make leverage their storage on-premises or co-located, keeping security under tight restrictions and only compute data for analysis in the cloud provider.

    When using the data Lakehouse and Hybrid cloud together, we end up having the following advantages:

    • Cost savings: By combining the flexibility and scalability of hybrid cloud with the cost-effectiveness of data Lakehouses, organizations can save money on their data storage and management costs.
    • Increased agility: Hybrid cloud and data Lakehouse solutions can help organizations to be more agile and responsive to change. They can easily add or remove resources as needed, and they can easily move data between on-premises and cloud environments.
    • Improved security: By using a hybrid cloud and data Lakehouse solution, organizations can improve the security of their data. They can use a variety of security measures to protect their data, both on-premises and in the cloud.
    • Enhanced analytics: Hybrid cloud and data Lakehouse solutions can help organizations to improve the agility and flexibility of their data analysis and machine learning tasks. They can use a variety of technologies to optimize data access and processing, only paying for what they consume.
    • Simplified management: Hybrid cloud and data Lakehouse solutions can help organizations to simplify the management of their data. They can use a single platform to manage all of their data, regardless of where it will be analyzed.
    • Improved flexibility: Where companies can still run some data workloads in cloud or on-premises/co-located depending on infrastructure availability/ scalability.

    But what cloud be the challenges? One of the questions we usually get is how about the latency is: "Wouldn't that be a bottleneck?".

    Well, no, because we have been working with partners that allows us to have co-located infrastructure that stills complies with security compliance for some customers have more restrictions, while still having that infrastructure close to the infrastructure of the cloud providers. This combined with high-speed connectivity connections, allows to minimize the latency to levels that does not have expression.

    Low-latency - Co-located Partner Platform


    If well design, a solution that involves Hybrid cloud solution cloud have latencies from 2-10ms, which is eliminating latency being a bottleneck.

    Hitachi Open Architecture for Lakehouse Hybrid Cloud Solutions

    Hitachi have created a view of a modular architecture for Lakehouse Hybrid Cloud Solutions, where companies can leverage the services already in use, just plugging or doing minor changes where required. This will reduce the impact on changes required while still taking the improvement to drive business outcomes.

    Hitachi Open Architecture for Lakehouse Hybrid Solutions


    The diagram above shows the components where open-source technologies can also be adopted and combined with proprietary solutions. Looking it from bottom to top, we have layers for ingestion where all data, either structured, semi-structured and unstructured can be loaded into the storage layer, making possible to cover different use cases. The solution must use one of the data Lakehouse technologies and leverage it on all the layers above until all the business outcomes. On the right side of the diagram, we have services that are wrapping the solution with Data Governance and Privacy to keep all data under tight control and compliant to regulations. All of it turn out to be wrapped by optional and recent, however quite important concepts on the reliability of your solutions:

    • Reliability Engineering (RE) is a discipline that applies engineering principles to ensure the reliable solutions. It is a proactive approach to preventing failures, rather than simply reacting to them after they occur. Important to keep your data available to users.
    • Data Reliability Engineering (DRE) is a subset of RE that focuses on ensuring the reliability of data. This includes ensuring that data is accurate, complete, and consistent, as well as that it is available when it is needed. DRE uses a variety of techniques, such as data quality testing, data lineage, and data governance, to achieve these goals. Important to make sure that your data is reliable and give the confidence to users that they can trust it.
    • FinOps is a discipline that combines financial management and engineering principles to optimize the cost of IT operations. FinOps uses a variety of techniques, such as cloud cost management, budgeting, and forecasting, to help organizations save money on their IT costs. To minimize and keep your costs under controls even when using hybrid cloud solutions.

    To highlight that the article is focused on Hybrid cloud solutions, however such solutions can be deployed only on-premises, hybrid, multi-cloud, or full cloud.  

    How are businesses using Hybrid cloud and data Lakehouse solutions?

    Here are some examples of how businesses are using hybrid cloud and data Lakehouse solutions:

    • A retail company is using a hybrid cloud and data Lakehouse solution to store and analyze data from its point-of-sale systems, customer loyalty programs, and website traffic. This data is used to improve customer segmentation, personalize marketing campaigns, and optimize inventory levels. While keeping all data in a co-located, all analyses are being performed on a cloud provider making use of cloud compute scalability.
    • A financial services company is using a hybrid cloud and data Lakehouse solution to store and analyze data from its trading systems, customer accounts, and fraud detection systems. This data is used to detect fraud, manage risk, and make investment decisions. Highly regulated sector where some analyses are done in cloud provider while others with highly confidential data is performed on-premises.
    • A healthcare company is using a hybrid cloud and data Lakehouse solution to store and analyze data from its electronic health records, clinical trials, and research studies. This data is used to improve patient care, develop new treatments, and identify drug safety risks. Highly regulated sector where all patient data is kept on premises and obfuscated data is used on analysis executed in a cloud provider. 

    These are just a few examples of how businesses are using hybrid cloud and data Lakehouse solutions to gain insights from their data and make better decisions. If you are looking for a way to improve your business with data while facing some of the concerns mentioned above, please give a shout and we can discuss it.

    Summary

    Data Lakehouses and hybrid cloud are two modern data architectures that can be used together to provide organizations with several benefits. The Data Lakehouse combine the flexibility and scalability of data lakes with the performance and governance of data warehouses. Hybrid cloud uses a combination of on-premises and cloud-based resources.

    By combining the flexibility and scalability of hybrid cloud with the cost-effectiveness and the flexibility of data Lakehouses, organizations can keep costs under tighter control on their data storage and management costs along the flexibility and scalability of cloud resources for the computation of data analysis. Organizations just become more agile and responsive to change, improve the security of their data, and enhance their analytics capabilities.

    Even if most consider that the main challenge of using a hybrid cloud and data Lakehouse solution is latency, this can be minimized by using co-located infrastructure and high-speed connectivity, reducing what could be the disadvantage of hybrid cloud analytics or even hybrid cloud solutions.

    Hitachi has work with you customizing and adopting our modular architecture for Lakehouse hybrid cloud solutions to that allows organizations to leverage their existing services and solutions minimizing the changes of modernizations of data platforms.

    Overall, the combination of data Lakehouses and hybrid cloud can provide organizations with several benefits, we are already seeing and supporting organizations adopt this approach and we expect to see many more in the coming years.

    Lakehouse on hybrid solutions

    References

    Exploring Lakehouse Architecture and Use Cases

    Data Lakehouses: Have You Built Yours?

    Hitachi Vantara Introduces Data Reliability Engineering Services to Optimize Data Ecosystems

    Gartner IT Infrastructure, Operations & Cloud Strategies Conference

    5 Examples of Cloud Data Lakehouse Management in Action

    Data Reliability Engineering Can Boost Data Pipelines

    Reach out to the author on LinkedIn 

    #ApplicationModernization
    #CloudAdvisory
    #CloudCostManagement
    #CloudMigration
    #CloudSecurity

    ------------------------------
    Miguel Gaspar
    Software Architecture Engineer Principal
    Hitachi Vantara
    ------------------------------