Sara Gardner

Big Content, Dark Data - Call it what you will but there's insight hidden in there

Blog Post created by Sara Gardner Employee on Nov 5, 2013

Watched a terrific Webinar last night -  about a Healthcare Analytics solution delivered by Hitachi with our partners Tableau and Attivio.  The solution features

  • Hitachi Clinical Repository and Hitachi Content Platform (Object Store)  to capture all the disparate types of information that surround patients - images, notes, medical record etc.
  • Attivio's text mining and analytics to pull out the insight and derive metadata.
  • Tableau analytics tools and visualizations for the user experience.
  • And Hitachi Consulting's health practice tying it all together as a solution.


I really recommend watching the full recording Big Content: Driving Value from Healthcare Analytics 


I have to admit I'm pretty passionate in general about analytics on unstructured or Big content or what folks like Gartner call Dark Data.


Dark Data refers to all the content and data we hoard but don't really leverage beyond the original purpose it was created for and likely don't even know what we have - hence the term Dark.  I like to think of it as the difference between a cluttered garage (dark) and beautifully organized one. 

No prizes for guessing which one is easier to find things in. 

Sadly my garage falls into the Dark category and I have lost count of the number of screw drivers, vacuum cleaners, Holiday decorations and spirit levels that have entered the chasm never to be seen again.  But... thank heavens for the Container Store who sell lots of lovely boxes and labeling devices and shelves and hooks and marvelous platforms to organize all my Dark Garage Data.   Organized stuff takes up less space, is discover-able and can help me answer all sorts of useful questions like 'Whether I really need to buy another vacuum cleaner' and whether I have enough baubles for 2 trees this year....  

Which is kind of analogous to taking unstructured content off of a regular file system and organizing it in an object store like Hitachi Content Platform with applications like Attivio automatically identifying key metadata or tags to 'label it'  so that you can derive new insight and answer questions previously unknown. 

I think as an industry we have been trying to do this for years but... In the past the content was typically an extension to the structured data in the database or tied to a specific application  e.g  Notes from a customer service call.  So typically the content went on the file system and the metadata linking it to the other records in the database went in the database and application.  But with over 80% of digitized content today being unstructured bringing the structure to the content rather than managing that metadata separately in a database makes more sense.

In a lot of ways the Object store is kind of like a Data Warehouse for Content. 

With structured data we typically had questions  in mind when we built Data Warehouses but then they often became sources for additional  insight and apps as new data was added.  I see this with Object Stores and Unstructured content.  For example Attivio showed an example of mining airline customer communication data which comes in from a plethora of channels these days most of which are unstructured e.g. Email, Phone, Tweet...At Hitachi we have customers in other businesses taking a similar approach to customer communications and recognizing in general that the first step towards insight with unstructured data is to get it corralled together in an object store. 

BTW...  customer communications data is a good place to start no matter what industry you are in!

Let's talk ROI for a moment.  ROI should be top of mind for Big Data applications developers but,  to be fair not all Big Data applications have demonstrated quantifiable ROI. 


Unstructured Content is largely dark and that represents a huge opportunity for value!. 

Though many early use cases for Big Data have tended to involve log analytics and social media sentiment analysis,  many see Big Content or Dark Data as the bigger opportunity for quantifiable value.  I remember Merv Adrian from Gartner sharing last year that something like 63% of  their  customers saw deriving new value from Dark Data as the best opportunity for Big Data ROI ( Ahead of Social Media analysis for example).    And there's no shortage of content to tuck into  - much of it already stored on your company network, much of it dark today.  IDC estimate that of the 2 Zettabytes of unstructured content created in 2012 less than 0.5% is analyzed today and yet 20% of that could yield valuable insights if it were able to be mined.  Granted not every byte is going to provide new insight but it is a largely untapped opportunity regardless.

Customer intelligence is a great case in point.  With smart devices and Web 2.0 a significant proportion of customer interactions are now likely captured only in unstructured form.  Which means that trusty old CRM EDW is only giving you part of the picture if you aren't mining these additional channels.  Check out the airline customer example on the webinar and you'll see what I mean.

I tried to draw out a representation below to illustrate  the dark data opportunity to a  colleague.   No it's not to scale but the point it is trying to illustrate is that we have picked the structured data over pretty well so if we are looking for the next level of competitive differentiation or cost savings or new revenue streams its time to look at the unstructured data for new insight.   The grey piece of the pie in each case represents unanalyzed or dark data and the yellow portion where you have shined a light on it with analytics.  Interestingly they are almost the converse of each other with regards to ratio of dark to analyzed data.



Hopefully some food for thought there.


Now go shine a light on some of your Dark Data :-)