Data quality can be described as “fitness for use or purpose for a given context or specific task at hand.” (Mahanti, 2018) But since the same data can be used by many people for a multitude of purposes, maintaining data quality over the lifetime of a data asset can be a challenge. To ensure data quality, you have to do two things simultaneously, and well: protect against negative attributes (incorrect, inconsistent, or unreliable) and promote positive attributes (e.g. unambiguous, contextual, understandable).
The quality of every element of data can be important. Data Quality Dimensions (DQDs) are one tool that can help you draw out the requirements and characteristics of data that make it fit for use or purpose. The DQDs can be used at the file or table level, at the column or variable level, and at the row or observation level. The most frequently used DQDs are:
There's also a loose priority ordering of the DQDs. For example, completeness is often necessary before you can worry about consistency with other data sources; consistency is needed before you can effectively check for uniqueness, otherwise you might interpret duplicate values as individually significant; and so on down the line. Sometimes, not all of the DQDs will be important to evaluate data quality for a particular data element.
These six aren’t the only DQDs that organizations can use, and not all data quality dimensions will be important to all organizations at all times. The Baldrige Excellence Framework (BEF), for example, only calls out accuracy, validity, integrity, reliability, and currency. (NIST, 2019) While some dimensions can only be assessed subjectively or from a relative perspective (accessibility, believability, interpretability, ease of manipulation), others can be evaluated objectively (accuracy, currency, volatility, precision).
Data quality should be evaluated objectively if at all possible. For example, completeness can be measured based on the characteristics of the database (e.g. schema completeness), and accuracy can be assessed by comparing stored values to known references. Some of the dimensions are also closely related to one another. For example, being able to trace the provenance of data can have an impact on its believability and trustworthiness. Data may be more interpretable if it is concise and has adequate coverage. Availability may be supported by redundancy, since if the data cannot be obtained from one source it may be sourced from another. (Radziwill, 2020)
Sound complicated? Yes, it can be. This is one of the many reasons why purchasing enterprise software products to “take care of'' data quality can ultimately be disappointing. There’s no substitute for rigorous reflection on your data by someone who understands it deeply and is invested in its integrity.
Every single DQD can be instrumental in helping you evaluate data quality, but your analysts have to figure out which ones, at which times, and why.
The most important data quality dimension is, as a result: all of them.
Ultranauts helps companies establish and continually improve data quality through efficient, effective data governance frameworks and other aspects of quality management for data, specializing in high impact data value audits. If you need to design quality into your data management practices, Ultranauts can quickly help you identify opportunities for improvement to drive value, reduce costs, and increase impact.
Additional Reading:
Mahanti, R. (2018). Data Quality: Dimensions, Measurement, Strategy, Management, and Governance. Quality Press, Milwaukee, WI. 498 pp.
National Institute of Standards and Technology (NIST). (2019). Baldrige Excellence Framework (Business/Nonprofit): Proven leadership and management practices for high performance. Available from https://www.nist.gov/baldrige/publications/baldrige-excellence-framework/businessnonprofit
Radziwill, N. (2020). Connected, intelligent, automated: The definitive guide to digital transformation and quality 4.0. Quality Press. Available from Amazon.