Using codes for counting the missing (NULL) values


Counting the number of NULL records can be considered as a part of the data quality check. Since NULL stands for missing or unknown value, tallying the number of NULL data gives an overview of the experimental observations.


EXAMPLE DATABASE: Nhanes

DATA INPUT

Table nhanes in the database Nhanes.mdb is used as input.



QUERY count_missing

SELECT ethn, sex, 
       SUM(IIF(height IS NULL, 1, 0)) AS missing_height, 
       SUM(IIF(weight IS NULL, 1, 0)) AS missing_weight
FROM nhanes
GROUP BY ethn, sex
ORDER BY ethn, sex;

Query count_missing tabulates the number of all the missing data per ethnicity and sex.



DISCUSSION

The count_missing query works in a way that the missing heights and weights are coded first. The codes will be 1 for missing values and 0 for non-missing values. The query uses the IIF( ) function to implement the codes. It has three parameters: (<condition>, <return value if condition is true>, <return value if condition is false>).

SELECT ethn, sex, 
       IIF(height is null, 1, 0) AS missing_height, 
       IIF(weight is null, 1, 0) AS missing_weight
FROM nhanes;

In the query, IIF(height is null, 1, 0) statement means that, if the height is null, assign 1 as the value for missing_height, otherwise, assign 0 as the value. The same principle applies to the missing_weight column.



The next step is to add up all the missing heights and weights. Since the missing values are already coded, the SUM( ) function can be used to calculate the total number of missing height and weight values grouped by ethnicity and sex. The result is also sorted per ethnicity and sex in ascending order.

SELECT ethn, sex, 
       SUM(IIF(height IS NULL, 1, 0)) AS missing_height, 
       SUM(IIF(weight IS NULL, 1, 0)) AS missing_weight
FROM nhanes
GROUP BY ethn, sex
ORDER BY ethn, sex;


Last modified September 23, 2008 9:16 am