Creating new columns


Frequently the data stored in columns in a table need to be used to create new columns of derived data. This could be as simple as row-wise adding up the values in two columns and producing a third column that contains the sums. Such derived variates need not be stored in a database table, but they can be generated on-the-fly by a query.

In the following example the columns weight and height are used to calculate the body mass index as a new column. The example uses the terminology and definitions of the CDC for calculating the body mass index BMI.


EXAMPLE DATABASE: Nhanes

DATA INPUT

Table nhanes in the database Nhanes.mdb is used as input.



QUERY adult_bmi

SELECT rseqn, fseqn, ethn, sex, age, ageunit, weight, height,
       ROUND(weight/((height/100)*(height/100)),2) AS bmi
FROM nhanes
WHERE ageunit = "Y" AND age >= 20;

Query adult_bmi produces a table has all the columns from table nhanes as well as an added column bmi that contains the body mass index. The bmi values have been rounded to two decimal places, and calculated according to the following formula:


\mathrm{BMI} = \frac{\mathit{weight} \ \mathrm{(kg)}}{\mathit{height}^2 (\mathrm{m^2})}



DISCUSSION

A new column can be generated for each row of data, based on the data values of other columns in the row. The above example uses the weight and the height of each individual (=row) to calculate the corresponding body mass index according to the formula given above. The result of the calculation is given a new column name (AS bmi). More than one new column can be generated in a query and the expression can involve different column types, not only numeric data.

As the height is stored in units of cm, it needs to be converted into units of m, when used in the bmi formula. The formula can be simplified to: round(weight/(height*height*0.0001),2).

The query is only applied to adults of 20 years or above, which is achieved by selecting only rows that have an ageunit in years (ageunit = "Y") and where the age is equal or greater than 20 (age >= 20). The resulting number of records is slightly more than 50% of the original number of all records.

Note that some of the resulting records may have a missing value (null) for body mass index. This happens if either or both of weight or height for that record were missing (null). Subsequent uses of these data need to be aware of possible missing values for bmi.


Last modified September 23, 2008 7:24 am