Plotting a histogram


The following example shows how to plot a histogram. Previously discussed R commands are in black, the new R commands relevant to this example are in blue.

DATA INPUT

As data source, we use the sheet data from the spreadsheet file nhanes.xls. The image below shows some of the records in the sheet data. The first row in the sheet contains the column names and there are no empty rows or columns interspersed in the data.



R OUTPUT

The image below shows the R console after the execution of the script, with R commands in red and R output in blue. The graphical output appears in a separate window from where is can be exported and saved in different formats.



R SCRIPT

Previously discussed R commands are in black, the new R commands relevant to this example are in blue.

library(RODBC);
connect = odbcConnectExcel("..\\spreadsheet\\nhanes.xls");
query = "SELECT * FROM [data$];"
data = sqlQuery(connect, query);
odbcClose(connect);

is.data.frame(data);
summary(data);

# create subset of data for adults (>= 20 years) with non-missing data
adultbmi = subset(data, ageunit=="Y" & age >= 20 & 
                        !is.na(ethn) & !is.na(sex) & 
                        !is.na(weight) & !is.na(height));

is.data.frame(adultbmi);
summary(adultbmi);

# create new variable bmi
adultbmi$bmi = adultbmi$weight / (adultbmi$height/100)^2;
 
is.data.frame(adultbmi);
summary(adultbmi);

hist(adultbmi$bmi, breaks = 25, 
                   col = grey(0.9), 
                   border = grey(0.2), 
                   proba = TRUE, 
                   main = paste("Body Mass Index of", 
                                 length(adultbmi$bmi), 
                                "adults\nfitted to normal distribution"),
                   xlab = "BMI");


DISCUSSION

hist(variable, parameters);

This function takes a variable from a data frame and plot a histogram according to the parameters specified.


breaks = 25

Create 25 equal bins over the range of the variable (bmi) values.


col = grey(0.9)

Draw the bars in a 0.9 shade of grey.


border = grey(0.2)

Draw the border in a 0.2 shade of grey.


proba = TRUE

Use the probability density for on the Y-axis, not the absolute number of observed cases. The total area of the histogram will be 1.


main = string'

Assign a string to the main title of the chart.


paste(string1, string2')

Concatenate two or more strings into a single string. This is used to create a more complex main title string.


\n

A carriage return within a string. This is used to force a line break in the main title of the chart.


length(adultbmi$bmi)

Returns the number of values in the column bmi. This is used to automatically insert the number of analyzed records into the main title of the chart.


xlab = "BMI"

Assign a label for the X-axis.


More details on parameters and options can be accessed from the R command line.

?hist
?paste
?length


Last modified April 3, 2008 8:54 am