With so many ways to quickly incorporate artificial intelligence (AI) and machine learning (ML) into business operations, in recent years there’s been a “gold rush” as companies clamor to demonstrate their sparkliest AI/ML capabilities. But a new realization is beginning to creep in: that the quality of the data feeding our models (and the infrastructure supporting them) is just as important, and maybe more important, than the models themselves.
Industry pioneer Andrew Ng even encouraged a shift towards “data-centric AI” - that is, focusing more on data quality and the integrity of the infrastructure - to help companies unlock the (still latent) value of their AI/ML investments. (Radziwill 2022) Those investments still, in many cases, lack the returns that were hoped for just a couple years ago… and have left behind a data management ecosystem littered with partially implemented products, disappointing documentation, and staff that tend to move to their next opportunity in just 12 to 18 months on average.
Some experts claim that the solution is data contracts, which require data consumers and producers to agree on exactly what formats, semantics, and characteristics are accessible. And indeed, data contracts are part of the solution.
But the relationships between data consumers and producers aren’t always one to one, and in some cases, the data consumers may even be completely inaccessible. In these cases, data consumers might have access to APIs, or just data assets that have been prepared for download. Consider, for instance, environmental, social, and governance (ESG) data, the bulk of which might be sourced from vendors or parts of your organization that are so far removed from you that you track down anyone who knows about those sources.
How can you create a data contract when you can’t find a person or a party to contract with?
The answer is: implement a one-way contract... and make it visible and transparent. This is a well-established concept in supply chain management, where acceptance criteria are established to assess incoming shipments (often as they are received at loading docks), and supplier qualification criteria (and audits) are used to select and retain vendors. Supplier qualification asks questions like: Does this supplier have processes in place to ensure the quality of the products they produce? Do they follow them rigorously? How do we know?
As with so many things, change management and governance “make or break” the success of data contracts. Making these contracts visible and transparent will help people understand whether a particular data source will align with their intentions and meet their needs. Like other data contracts, one-way contracts can be used to drive or automate acceptance testing as data arrives in designated storage locations. Frameworks like the Python-based Great Expectations can help you implement data contracts more efficiently.
Whatever situation you find yourself in, be sure to identify the most fundamental data quality standards for the data you collect and keep, and apply them diligently. Without quality controls like data contracts, you’re accepting a future destiny that’s likely to involve a data swamp.
Ultranauts helps companies establish and continually improve data quality through efficient, effective data governance frameworks and other aspects of data quality management systems (DQMS), especially high impact data value audits. If you need to improve the quality of data or analytics in your organization, Ultranauts can also quickly help you identify opportunities for improvement to drive value, reduce costs, and increase revenue.
Additional Reading:
Radziwill, N. (2022, July 28). The Key to Unleashing The Value of AI Systems? It’s Not More Data. Ultranauts Blog. Available from https://info.ultranauts.co/blog/the-key-to-unleashing-the-value-of-ai-systems