Lumada Data Optimizer: Hadoop Data Optimization

By Hubert Yoshida posted 03-02-2020 19:24


Hadoop (officially known as Apache Hadoop) is an open-source framework for storing data and running applications, offering huge storage capacity and processing power. In the last decade Hadoop was synonymous with Big Data. Hadoop could store Big Data in a distributed environment, allowing for parallel processing across hundreds and even thousands of commodity server nodes. It focused on horizontal scaling instead of vertical scaling and provided a valuable framework for search, log processing, and video and image analysis. A large part of its value came from the fact that Hadoop was low cost, scalable, flexible and fault tolerant – up to a point.


When starting new with fresh data, Hadoop can help unlock the value hidden in large volumes of data. However, trying to keep pace with the astounding data growth in the coming decade will become very complex and expensive as more of the data ages as the volume of data expands. A Hadoop Distributed File System (HDFS) gives you clustered storage that federates many data nodes into a single pool where both compute and storage are co-located. As clusters fill with aged and inactive data, you must scale your storage capacity. Traditional storage scaling in Hadoop requires that you also scale compute. Having to simultaneously add compute and storage creates an inefficient balance and utilization of resources and becomes very costly with today’s growing storage capacity demands.


Lumada Data Optimizer for Hadoop (Data Optimizer) is an intelligent data tiering solution that reduces operating costs and gives you seamless access to HDFS data for real-time analytics with Hitachi Content Platform (HCP). With Lumada Data Optimizer for Hadoop, you can independently scale storage and compute for greater flexibility and resource utilization. Configuring volumes is quite simple. And you can leverage native Hadoop functionality – such as storage types, storage policies, and the Mover service: Use these to automatically tier older, infrequently accessed data into HCP, a cost-effective and industry-leading object storage solution. This approach optimizes resource utilization by reserving Hadoop nodes for active data while storing less-frequently accessed data on HCP. Unlike offloading data with S3A, which removes files from HDFS, Data Optimizer integrates with HDFS to free up capacity and ensure that your data always remains securely accessible through HDFS. By dynamically tiering data between HDFS and HCP, you maintain seamless access to all your data, all the time.


HCP provides a long-term cloud object storage platform that is massively scalable. It makes compliance and data recovery easy with data mobility, AES-256 encryption, policy-based tiering to public clouds, robust data protection and up to fifteen 9s availability with low cost erasure coding, replication, configurable redundancy and automatic repair and versioning. HCP is a resilient and self- protecting storage platform that makes data recovery easy.


 Lumada Data Optimizer automatically tiers data between HDFS and HCP to ensure that your data remains always accessible without having to alter data paths and application configurations. Data Optimizer integrates with Hadoop and operates as an HDFS volume to move HDFS data to and from HCP. Since the files never leave HDFS, capacity is freed with no disruption and data continues to be accessed seamlessly via HDFS. Hadoop maintains three copies of data for redundancy and availability, which consumes additional storage and compute resources. Tiering to HCP improves utilization and reduce costs by eliminating Hadoop’s triple replication of inactive data. Data moved to HCP doesn’t require additional protection and consumes up to 40% less capacity and reserves valuable Hadoop compute and storage resources for your most active data.


Lumada Data Optimizer is part of the Lumada Data Services suite of products that is designed to help you to easily and securely connect data between data producers and data users without locking you into proprietary data stores or cloud silos. The role of Hadoop in Big Data and data analytics, is changing, as more analysis moves to the edge, the cloud, data lakes and containers. all of which are being addressed by Lumada Data Services. However, Hadoop will continue to play an important role in the big data ecosystem. It has established itself as a core element of an enterprise data strategy over the past years and continues to work well in conjunction with other emerging technologies. Lumada Data Optimizer for Hadoop helps to protect the investment in Hadoop while optimizing it for today’s data intensive real time requirements.


For more information about Lumada Data Optimizer see the following link

1 view