At HDS and just about every tech company across the globe, the role of Data Scientists has reached fame-like status. Data Scientists have been called “the sexiest job of the 21st century” amongst many other things.
Ashok Nirsoe is a Data Scientist on the Social Innovation and Global Industries team at HDS. He started his 15-year long Data Science career at Liberty Media as an instrumental member of their BI Team. While he learned the basics at Liberty Media, Ashok’s experience at Shell is where he learned to apply those basics. With a BA in Software Engineering, Ashok was always interested in numbers and intrigued by the business challenges Data Scientists are tasked with solving on a day to day basis. With no golden model out there, there is always a challenge to solve business specific use cases.
What Does a Scientist Do?
Many people understand that these Data Scientists solve business
problems through making sense of big data, but the question becomes “What exactly do these Data Scientists do on a day-to-day basis”?
I tracked down Ashok Nirsoe to understand what consumes a Data Scientist’s day to day activities. Read below to get a sense of what keeps him occupied.
06:30AM – Alarm goes off, but biological clock woke him up at 06:15AM – He’s getting ready to get in to the groove!
06:45AM – Of course the daily routine kicks in (Shower, brushing teeth, etc.)
07:30AM – The daily commute begins. Oh, the joys of living in Silicon Valley
08:00AM – Arrives in the office. First things first, reads emails and catches up on the latest technology news (what's hot today and what's not) including news from back home in Netherlands/Europe (Ashok is a NL native!)
08:30AM – Contemplates ideas to predict workload on VSP/G1000/G200 using performance data
09:00AM – Answers emails, joins daily calls with potential Hitachi Live Insight for IT Operations POC’s including demos all while going through Spark documentation
11:00AM – Starts playing with newly deployed Hadoop cluster and primarily focuses on data ingestion and movement as part of Hitachi Live Insight for IT Operations
11:30AM – Clones Hadoop cluster for colleague and tries to raise awareness of Spark within team
11:45AM – Enjoys lunch with colleagues in the Hicafe; Eats a sandwich and bowl of soup of the day while working on tan in the California sunshine. Sadly, his natural tan is starting tofade away since the move from NL…
12:30PM – Cloning Hadoop cluster is done and he hands over to his colleague. She wants to try Spark… another win for the pro-Spark fan base
1:00PM – Samples statistics tools in the lab, primarily focusing on Logistics Regression. Runs batch input data 7 days of VSP data
2:30PM – Dedicates some time to POC reporting and open issues. This includes general housekeeping, automation and providing answers on some analysis questions
3:00PM – Statistical batch finished and now analyzes output to determine accuracy of output
3:15PM – Not happy with batch output and therefore tweaks parameters to optimize output / results
3:30PM – Syncs up with colleague on power consumption use case using Hitachi sensors (temperature, humidity) attached to a Hitachi Unified Compute Platform (UCP) 19" rack with smart PDU (Power Distribution Unit)
3:45PM – Sets up data collection, agreed 5m interval for now using IPMI (Intelligent Platform Management Interface) data obtained from VMware and Smart PDU
4:00PM – Focuses on automated data collection. Statistical batch finished, but still not happy with the output as it’s not usable in current format. Maybe a different approach is required such as PCA (Principal Component Analysis) or CCA (Canonical Correlation Analysis) or KCCA (Kernel Canonical Correlation Analysis)? Needs to sync up with the team
4:05PM – Discusses with friends / ex- colleagues on Lync about data pipelines in AWS (Amazon Web Services)
5:00PM – Because parsing storage data is a pain, reviews options to make it faster, but not losing on accuracy. A native API would be the best solution but knows certain team member will not like the idea.
6:00PM – Time to focus on a high priority POC. Next step: Converts all data acquisition tools to PowerShell as the customer wants to keep the number of third party tools to a bare minimum. PowerShell to the rescue! Fight with PowerShell for extraction tuples from JSON strings. End result: Finds out that the BOM character was killing his regular expression
7:00PM – Tests PowerShell scripts in lab prior to sharing with team
7:30PM – Dinner time
8:30PM – Replies to emails and sends updates to colleagues in EMEA & APAC
9:00PM - Works on documentation
11:30PM – Signs off, another day gone by so quickly
Have any questions for Ashok Nirsoe? If so, send them my way!