Using HCP with Hadoop (whitepaper)

File uploaded by Jeff Lundberg Employee on Jun 10, 2013Last modified by Michael Ratner on Oct 26, 2017
Version 5Show Document
  • View in full screen mode

This documents describes how to setup Apache Hadoop to use Hitachi Content Platform (HCP) as source and/or target for its operations. It is left to the reader to decide if it makes sense to run Hadoop against S3-compatible storage, as this will take away a prominent feature of Hadoop: data locality (MapReduce jobs run on the same nodes where the data is stored, preventing from extensive network traffic).


However, if a lot of data to be processed is already located within HCP, it might be more effective to run MapReduce jobs that read from HCP than first copying the data to HDFS. Another point is, that HCP provides data reliability and redundancy out of the box, making the triple-storage overhead of Hadoop HDFS unnecessary.