Object Storage​

 View Only

 object storage access from Hadoop

  • Object Storage
  • Object Storage
Alex Blachman's profile image
Alex Blachman posted 04-02-2021 10:52
Hall Michel's profile image
Hall Michel

Typical tools will not only include big data tools such as Hadoop, Spark, and Hive, but also deep learning frameworks (such as TensorFlow) and analytics tools (such as Pandas). In addition, it is essential for a data lake to support tools for cataloging, messaging, and transforming the data to support exploration and repurposing of data assets.

Alex Blachman's profile image
Alex Blachman

Many Thanks.

 

Schneider Edyth's profile image
Schneider Edyth

I'm trying to learn more about system design and ML pipeline infra.

What are the pros and cons of using HDFS as opposed to Object Storage as a raw data source for an ML application? Are there are architecture concerns/benefits with either?

Also, does it make sense to have a database (relational or NoSQL) to store cleaned data that has been processed from the raw data source? I'm thinking of reusing the cleaned data for ML model retraining. Is this an applicable use case or will the volume of data grow too much for a database to handle?

 

MyAARPMedicare Login