Creating a new variable in a data frame


The following example shows how to create a new variable in a data frame. Previously discussed R commands are in black, the new R commands relevant to this example are in blue.

DATA INPUT

As data source, we use the sheet data from the spreadsheet file nhanes.xls. The image below shows some of the records in the sheet data. The first row in the sheet contains the column names and there are no empty rows or columns interspersed in the data.



R OUTPUT

The image below shows the R console after the execution of the script, with R commands in red and R output in blue. The last two statements are included to serve as a check whether the data subset worked correctly.



R SCRIPT

Previously discussed R commands are in black, the new R commands relevant to this example are in blue.

library(RODBC);
connect = odbcConnectExcel("..\\spreadsheet\\nhanes.xls");
query = "SELECT * FROM [data$];"
data = sqlQuery(connect, query);
odbcClose(connect);

is.data.frame(data);
summary(data);

# create subset of data for adults (>= 20 years) with non-missing data
adultbmi = subset(data, ageunit=="Y" & age >= 20 & 
                        !is.na(ethn) & !is.na(sex) & 
                        !is.na(weight) & !is.na(height));

is.data.frame(adultbmi);
summary(adultbmi);

# create new variable bmi
adultbmi$bmi = adultbmi$weight / (adultbmi$height/100)^2;

is.data.frame(adultbmi);
summary(adultbmi);


DISCUSSION

adultbmi$weight / (adultbmi$height/100)^2

This expression uses the columns weight and height from the data frame adultbmi. The formula for body mass index is applied and the result is assigned to the new column bmi in the data frame adultbmi. As the original height is in centimeters, it needs to be converted to meters.


is.data.frame(adultbmi);

This command is not required, but serves as a check that the result of adding a new variable still results in a data frame.


summary(adultbmi);

Create summary statistics on the data frame. This command is not required, but serves as a check that the new variable bmi has been created correctly.


More details on parameters and options can be accessed from the R command line.

?is.data.frame
?summary


Last modified April 3, 2008 8:53 am