Big data has become a household name over the last decade and Hadoop is synonymous with it. If you have a big data problem, the answer has been Hadoop. As we start the 2020 decade, let us take stock of how Hadoop has changed over the last decade and how to modernize it for the future.
The rumors of Hadoop’s demise in the market started over the last few years. In 2017, Strata Hadoop World conference by O'Reilly changed its name to Strata Data to expand beyond Hadoop signaling a market fatigue with Hadoop. In 2018, Hortonworks joined forces with its arch-rival Cloudera to create a single company which took a lot of customers by surprise. People wondered if Hadoop's run was coming to an end. MapR's acquisition by HPE in Aug 2019 further exacerbated this "Hadoop is dead" narrative.
The ground reality is that enterprises have invested significantly in large Hadoop farms for their big data analytics. It’s also estimated that on average, between 60- 80% of that data is “cold” or infrequently accessed. Moreover, enterprises continue to generate more data as they digitalize new processes, products, and services. As a result, there is more data produced each year than all the previous ones combined. As Hadoop has struggled to keep up with this exponential data growth, Hyperscalers like AWS, GCP, and Azure took market share. Infact, the rapid commoditization of enterprise hardware infrastructure by the Hyperscalers is the key factor that's caused the slowdown in the on-prem Hadoop market. All this growth is applying pressure on Hadoop's decade old architecture which wasn't designed for this scale and cost equation. The tight coupling of compute and storage, the 3x copy requirement for all data, the inability to easily plug-in cloud analytics technologies, and the general complexity of large-scale Hadoop are all slowing down Hadoop's further adoption. Customers are evaluating public cloud implementations to keep the cost and complexity down. However, they cannot switch or migrate the existing Hadoop applications immediately and therefore must optimize the existing Hadoop environments.
The time is right to modernize the Hadoop environment. Figure out how to cost optimize Hadoop to create headroom for future scale. Get a handle on Hadoop's cost equation. Figure out what's in various data lakes and how to catalog and govern better. Integrate all the different data lakes into a single cohesive data fabric.
Hitachi Vantara is inviting customers to our Hadoop Modernization Roadshow with first stop in New York City on Wednesday, February 26, 2020. We plan to bring in our top big data instructors and experts to discuss modernization strategies and how to get to millions of cost savings. These are not just discussing just high-level approaches. Customers will learn about Hitachi's technology solutions that they can act on right away. They will understand the underlying cost economics with TCO. They will also drive hands-on labs to see it all in action. Click here for more details on the event and register now -