Paul Lewis

Big Data: A Fishing Metaphor, Analogy, or Simile

Blog Post created by Paul Lewis Employee on Aug 31, 2015



Among the worst possible public eating experiences is the seafood restaurant.  A place where patrons go to “enjoy” eating a variety of slimy, salty, bottom dwelling creatures that while oddly delicate, must be hacked and sawed to find the “best” parts because most of it is the “worst” parts.  While I never choose to attend those kind of venues, on occasion they are thrust upon me, and I end up ordering the single land-lover chicken item with a starter of the second worst food imaginable, a salad. Yummy.


My decision is easy.  Eat or don’t eat. 


My dining party however jumps into heavy debate over the type and temperature of their entrée as if they are mulling over fine wine.  I’m pretty sure the distinctive taste difference between two fish ranges from “disgustingly awful” to “it’s better than starving”.  But I could be biased; duck is as close to fish as I get.


Just last week, as the table debate continued around me, I turned my attention to the menu trying to make sense of the various seafood dishes and how they relate to anything that would interest me directly.  It dawned on me that my disinterest, is in fact the most interesting part of the menu.  Not really my disinterest, but my ignorance of the content was the most fascinating. I was not an expert.  I couldn’t really appreciate the intricacies of the offerings, and I didn’t understand the story.  I really didn’t have the subject matter expertise for the entire restaurant. 


The menu just became a set of words, data points interpreted by others as appetizing, interpreted by me as loosely organized random items with associated prices.  There is a Big Data metaphor brewing here, or an analogy, or a simile…one of those, even the definitions seems too close to call:


  • Let’s start out simple: A food Subject Matter Expert (SME) defines context and value for a menu, just as business SME defines business context and value for a report or document.  I would be no better at picking a fish, than I would be at finding the obvious shipping problems of a widget to Winnipeg from the TRS weekly report.  However, predefinition of the TRS report is VERY valuable to the business operations folks, just as a static menu is VERY valuable to reoccurring restaurant patrons.  It’s the “I know what I’m looking for” thought process.
    • Extending that concept to data, the analyst wants to see all the possible data available in order find possible insights or relationships, not just a predefined report.  A pre-printed and leather-bound menu simply won’t create the hidden insight of an “It’s possible something is interesting but I don’t know what” thought process.


  • Since a business analyst would not be satisfied with a static fish menu (or static report), there would be an expectation that their unique menu had different characteristics:
    • An uncategorized list of fish, without an assumption of what is the most important sequence of consumption or level of “fishiness” they may taste like; just like having unsorted lists of data
    • All fish available in the lake, not just the ones the chef has pre chosen for the menu; just like having all possible data available
    • Fish from more than just the lake. Include the fish in the tanks upfront in the mock aquarium, and the child’s goldfish in the bowl upstairs; just like sourcing data from inside and outside of the organization
    • Not just the fish that was caught last night, but also the fish that you froze last week, the fish that it’s the soup made this morning, and if possible fish that are being caught at this exact moment; just like accessing multiple points in time, including real time
    • And not just fish, but any living creature in the water.  Crustaceans, shellfish, molluscs and even edible weeds (if that is even possible); databases, files, social feeds, IoT…


  • One step back in the fishing value chain, we find ourselves in a lake, you know, where one would fish for fish:
    • In the traditional menu (and for the operations report), fisher-people would have a list of fish, a quota, a rod and bait.  They sail out, snag several fish to meet the requirement, throwing back what is not on the menu, and come back with their haul to sell to the restaurant.  The variety is limited, the velocity slow, and the volume limited; however solving most business problems and most people eating the halibut that night.
    • In the Big Data Menu, several fisher-people would have no list but will have several rods with several test lines and several forms of bait.  They sail out and using dozens of equipment combinations to catch as many fish of whatever variety they happen come upon.  With no quota they deliver whatever they can every fifteen minutes and sell them to the restaurant.  The variety, velocity and volume is largely undetermined; however could be used in helping determine future menu items or fish replacements if they run out of the status quo


  • And then you need to look at the lake itself (a bit more stretch in thinking on this one):
    • In an environment of controlled variety, velocity and volume, a restaurant could get away with a pre-stocked relatively small pre-defined pond (feel free to call this a data warehouse) originally empty, and over time the fish repopulate themselves in a nice little “eco system” on added value over time
    • That eco system would be less valuable for the Big Data menu where scalability would be key: Significant more variety would require a bigger pond, maybe even a lake (feel free to call this a data lake).  Repopulation would not create enough variety so you would need to truck in different types of seafood from different oceans all over the world (creating a need for high throughput).  If you ran out of room in the current lake, you would lower the lock gates to the neighbour lake to create a virtual larger lake, creating limitless scale out potential


The traditional seafood menu is necessary, and will likely be used more often than not, but for the true sommelier of fish, one needs a Big Data menu.  The real question is how a seafood joint would blend those two menus together to provide value to both types of customers.


I still ate the chicken.


And since you found this blog so interesting…maybe you would want to learn more about Big Data here:


P.S. Cont’d from previous blog:  So this was the night before the “Big Wedding”, and our boredom got the best of us.  We picked up the laminated map that suggested we could make the trek in 45 minutes…the perfect distraction.  By the time we made it to the top of the knoll, we HAD to be at least half way there, and if we had a watch, and a real map, we would have known 30 minutes had past and we were barely a quarter of the way through….