Message Image

Skip main navigation (Press Enter).

General Discussion

View Only

Back to discussions

Expand all | Collapse all

DATA LAKE

This thread has been viewed 5 times

1. DATA LAKE

0 Like
Sirigiri Hari Krishna
Posted 12-22-2022 10:47

Reply Reply Privately
Distributed computing started with Google finding a solution for their storage requirements using GFS. The open-source community created a similar solution called HDFS that allowed us to form a cluster of computers and use the combined capacity for storing our data. Then we also got the MapReduce framework which allowed us to use the combined computing power of the cluster and use it to process the enormous data volumes that we stored in HDFS

Before the HDFS and Map/Reduce came into existence we had Data Warehouses -Like Teradata and Exadata. We created pipelines to collect data from many OLTP systems and brought them into the Data Warehouse. Then we processed all that data to extract business insights and used it to make the correct business decision.Hadoop also offered to collect data and process it to extract business insights.So the advent of HDFS and MR started challenging these Data Warehouses in three critical respects:1.Ease of Horizontal Scalability2.Capital Cost3.Volume and Variety of Data ***************
Further Content please view on https://sharikrishna26.medium.com/data-lake-dd3e988ba70a

------------------------------
SIRIGIRI HARI KRISHNA
Senior Consultant 1
Hitachi Vantara
------------------------------

Powered by Higher Logic