General Discussion

 View Only

DATA LAKE

This thread has been viewed 5 times
  • 1.  DATA LAKE

    Posted 12-22-2022 10:47

    Distributed computing started with Google finding a solution for their storage requirements using GFS. The open-source community created a similar solution called HDFS that allowed us to form a cluster of computers and use the combined capacity for storing our data. Then we also got the MapReduce framework which allowed us to use the combined computing power of the cluster and use it to process the enormous data volumes that we stored in HDFS

    Before the HDFS and Map/Reduce came into existence we had Data Warehouses -Like Teradata and Exadata. We created pipelines to collect data from many OLTP systems and brought them into the Data Warehouse. Then we processed all that data to extract business insights and used it to make the correct business decision.Hadoop also offered to collect data and process it to extract business insights.So the advent of HDFS and MR started challenging these Data Warehouses in three critical respects:1.Ease of Horizontal Scalability2.Capital Cost3.Volume and Variety of Data  ***************
    Further Content please view on https://sharikrishna26.medium.com/data-lake-dd3e988ba70a



    ------------------------------
    SIRIGIRI HARI KRISHNA
    Senior Consultant 1
    Hitachi Vantara
    ------------------------------