ANSWER: Inconsistent and/or inaccurate data, plain and simple
Data consistency is one of the biggest benefits of enacting a data governance initiative. Only high-quality data can form the basis for sound business decisions and drive the company in the right direction. But be warned, automating data quality and accuracy checks to match the speed of business can cause distrust when dealing with atypical or rare data assets – aka intelligent automation bias.
In today’s data-centric and transformative-business world, people at every level of the organization are bombarded with "information overload." Regardless of the role or degree of control, everyone is urgently seeking ways to derive greater control, understanding and intelligence from their organization’s data. These topics were discussed with my good friend George Crump (Storage Switzerland) in a recent webinar (replay link) and in his subsequent blog post (blog link), but let me go a little deeper on this topic here.
One of the best solutions to this growing problem is to adopt an Intelligent Data Governance strategy, but in reality, many companies have been slow to do so. This could be due to a lack of knowledge of exactly what this strategy involves, the complexities of implanting a meaningful approach due to their organization’s information estate, knowing where to start or how to begin, and how long it will take to see any benefit. It seems that too often Data Governance is only important when the proverbial “hand is caught in the cookie jar” – e.g. a violation of some kind has occurred and regulatory oversight is as immediate as the associated fines are hefty and reputation is at stake.
The truth: Intelligence Data Governance should be an integral part of your operation – not just to mitigate risk, but to also ensure that your data strategy and information flow are bound by the definitions of what it means for data to be complete and accurate within your workspace, your team, your department, the organization, and the industry you operate in.
The reality: There is an ever-increasing distance growing between what is expected and the reality of effectively managing and controlling data. The driver behind this is partly due to the volume and complexity of data and the supporting technology infrastructure. The more nefarious problem is systematic – it is our auto-reliance on “intelligent systems” that creates an assumed-quality-bias for data.
I am not advocating that we “rise up against the machines” and rid the data center of anything that claims to be “intelligence-based”, which happens to be almost everything today. Instead, I am proposing that we accept that the degree to which data changes and the shift in operating priorities be the mandate to trigger data quality and control reviews. When these quality and control measures are not subjected to regular and unbiased checkups, you’ll begin to see the widening of the Data Governance Gap.
Figure 1: The Data Governance Gap
As shown in the above figure, that gap is the growing distance between what is expected of the person(s) responsible for the organization’s data strategy and the realities of daily data management and use. The scale is straight forward, data should first be measured on its referential value – that is, the likelihood that the data is going to be used and the degree to which it provides usefulness to the operation at hand. As you might imagine, referential value is dynamic and bound by the type of data in question, the activity being performed, the personal bias of the user, market disruptions, and much, much more.
Regardless of what factors impact the referential value of data for you, the business at large fully expects that every data asset is understood, can be located, has a purpose and intent, and is bound to an authoritative role. Makes sense, especially with the onslaught of digital responsibility regulations hitting the market – EU’s GDPR, California Consumer Protection Act, SA’s Protection of Personal Information, and more. In fact, 41 countries have their own variations of laws governing the protection of private and sensitive information with their own unique fines and enforcements.
Figure 2: Closing the Data Governance Gap
Closing the data governance gap is not hard, in fact, the most challenging aspect is striking the right balance between automation and interactive oversight and control. Consider the following recommendations:
- Gain Visibility of the Data: as data is acquired, use dynamic data modeling automation to uncover the structure of the data, the file metadata, and any custom metadata. This is the first step to data awareness – not just of the data and where it came from, but also where it originates from and who its custodian is. Use the automation to ensure data quality at the point where data is generated, on an intermediary staging area, or once it arrives at the central data hub. Of course, data quality is subjective, but in this case, your Chief Data Officer (he/she responsible for the organizations data strategy and assets) would define the basic data elements that any file or object must contain. For example, all files must contain date/time stamps in a specific format, must be tagged when it contains sensitive data, stored in a specific format, etc.
- Understand the Data: Once centralized, further refinements on the data can be taken to increase its referential value to the business. The key to success is centralization, either of the data assets themselves (the ideal situation) or a collective index that represents the critical details of the data elements where they are currently residing. Centralization affords the greatest degree of data awareness and valuation, making refinements, enrichments, blending, aggregation, cleansing, etc. easier to manage, but it is not necessary with an efficient copy-data-management process (more on that at another time). The goal here is to deliver the most insightful data possible to the business in support of the tasks underway. Some degrees of automation can be applied to this process, but the value that data provides will always fluctuate for any number of reasons beyond the common data demographics. Understanding the data allows the business to apply the appropriate degrees of governance and control over it.
- Get Control of the Data: At this point, success is largely dependent upon data centralization. By placing data on a centralized data hub, organizations can use the value of the data and/or the risk the data imposes on the organization as the guiding principles for how it should be managed and governed, how long it should be maintained, the degree to which it can be mobilized, the extent to where it can reside, when it should be disposed of, and much more. Here is where the automation/interactive balance is imperative. The policies can be applied automatically with the right technology foundation, but the policies do require a commitment on behalf of the staff to conduct regular reviews to ensure you remain compliant with any specific regulation you’re operating under.