|
Using an additional table
In the following example the numeric column bmi (body mass index) will be categorized into the classes underweight, normal, overweight, and obese, according to the terminology and definitions of the CDC for categorizing the body mass index of adults BMI.
EXAMPLE DATABASE: NhanesDATA INPUTQuery adult_bmi in the database Nhanes.mdb is used as input and contains the continuous values for body mass index.
QUERY categorize1SELECT adult_bmi.rseqn, adult_bmi.fseqn, adult_bmi.ethn,
adult_bmi.sex, adult_bmi.age, adult_bmi.ageunit,
adult_bmi.weight, adult_bmi.height, adult_bmi.bmi,
bmi_cat_cdc.status
FROM adult_bmi, bmi_cat_cdc
WHERE adult_bmi.bmi >= bmi_cat_cdc.from_bmi AND
adult_bmi.bmi < bmi_cat_cdc.to_bmi;
Query categorize1 produces a table that applies the categorization to the continuous bmi values and adds the column status.
DISCUSSIONThis query joins two table not on the usual equality of column values in the two tables, but on the result of arithmetic comparisons. Note that the categories as defined in the table bmi_cat_cdc have shared endpoints. This possible ambiguity in classification is avoided by using the inclusive comparison operator (>=) for the lower endpoint, and the exclusive comparison operator (<) for the upper endpoint in the query. Care must be taken that the categories are not overlapping, as this might result in some records appearing multiple times in different classes. Note that the number of records resulting from the query categorize1 are 61 less that from the query adult_bmi. This is due to those records having missing bmi values, which results in the records being excluded by this type of query.
SELECT adult_bmi.rseqn, adult_bmi.fseqn, adult_bmi.ethn,
adult_bmi.sex, adult_bmi.age, adult_bmi.ageunit,
adult_bmi.weight, adult_bmi.height, adult_bmi.bmi,
bmi_cat_iupui.status
FROM adult_bmi, bmi_cat_iupui
WHERE adult_bmi.sex = bmi_cat_iupui.sex AND
adult_bmi.bmi >= bmi_cat_iupui.from_bmi AND
adult_bmi.bmi < bmi_cat_iupui.to_bmi;
The advantage of using this solution for categorizing continuous variables is that both the additional table as well as the query are not very complex.
Last modified September 23, 2008 8:45 am
|