What do I do with all the data?

By Anonymous User posted 11-06-2019 14:05


The different types of data in an industrial or commercial enterprises and storage technologies available

In the really bad old days when mainframes walked the Earth, data was stored in files in whatever structure the programmer chose. Custom code had to be written to get that data and use it. After several generations of struggles that involved access methods (ISAM, VSAM) and general purpose databases (IMS), we got RDBMS and SQL, which ushered in the good old days, which are still with us.


But now that we are in the good new days, an unprecedented explosion in data volume and data types (like video, audio, geo-data and more) has led to a concurrent explosion in ways to store and access data. 


There are many options and we may well ask, “Where is the best place to store my data?” That is the question facing anyone implementing DataOps, improving data management or building new applications. This question is crucial because once stored, data tends to exert gravity and resists movement. The more data in a repository and the more queries written to retrieve data, the stronger that gravity becomes. On the other hand, when data is infrequently accessed, it makes sense to put it in a less expensive repository for longer term retention, a practice referred to as data tiering.


Each of the new repositories has its own special sauce. When the fit is great, meaning that the needs of the application match the ability of the repository to provide data, applications become simpler and work better. If the fit is bad, it can be really inconvenient and unproductive. 


Here are some of the choices for repositories facing us in the good new days:


Object Storage: While the idea of object storage predated cloud implementations, it is in the cloud arena that usage of such storage has expanded. This type of storage allows any sort of file to be stored and accessed as needed. It has tremendous flexibility and perhaps the lowest cost of any storage. This sort of storage is dominating data lake implementations, replacing the use of Hadoop’s HDFS for this purpose. The flexibility is both a blessing and a curse. For example it lacks a standard structure. But it is perfect for storing large amounts of data with a simple structure.


NoSQL Databases: NoSQL databases depart from the flat file table structures used in RDBMS systems. There are many different types of NoSQL databases including key/value stores, wide column stores, and document databases. These databases are excellent at storing collections of information that have variable structures or schemas that are sparse or constantly changing. Some are hugely performant and others have trouble keeping up with large volumes of writes.


Graph Databases: Graph databases store not only data in nodes and properties, but also in edges between the nodes. The result is an incredibly flexible way to store data that can be accessed using a variety of graph query languages and algorithms. Very complex queries can be represented simply, allowing answers to questions that cannot be easily answered in other systems. Semantic tagging is often added to graph databases to make them even more powerful.


Special Purpose Databases: In an industrial context, data historians have long been used to capture all the time-series data in a plant or other setting so that it can later be used for analysis. Operational technology (OT) professionals are often familiar with these types of platforms, and collaborating with them offers an opportunity to gain insight into the data being captured in industrial environments.


Unusual Databases: The world of databases is a constant focus of innovation. Indelible databases don’t allow data to be changed, preserving the state at every stage of the life of a database. Distributed databases can support massive scale and replication.


Having all these choices means you can make your applications sing. But it also means you must choose carefully.  Ultimately, this points to the need for a data fabric that offers access to data wherever it is stored and supports multiple data store options, on premises and in the cloud, such as Lumada Data Services with Lumada Edge Intelligence. 

To learn more, please read this E-book on Why we need IT/OT Convergence from Hitachi Vantara and visit our Web Site
1 comment



05-04-2022 14:35

Good Read