An R Introduction to Statistics

Data Frame

A data frame is used for storing data tables. It is a list of vectors of equal length. For example, the following variable df is a data frame containing three vectors n, s, b.

> n = c(2, 3, 5) 
> s = c("aa", "bb", "cc") 
> b = c(TRUE, FALSE, TRUE) 
> df = data.frame(n, s, b)       # df is a data frame

Build-in Data Frame

We use built-in data frames in R for our tutorials. For example, here is a built-in data frame in R, called mtcars.

> mtcars 
               mpg cyl disp  hp drat   wt ... 
Mazda RX4     21.0   6  160 110 3.90 2.62 ... 
Mazda RX4 Wag 21.0   6  160 110 3.90 2.88 ... 
Datsun 710    22.8   4  108  93 3.85 2.32 ... 

The top line of the table, called the header, contains the column names. Each horizontal line afterward denotes a data row, which begins with the name of the row, and then followed by the actual data. Each data member of a row is called a cell.

To retrieve data in a cell, we would enter its row and column coordinates in the single square bracket "[]" operator. The two coordinates are separated by a comma. In other words, the coordinates begins with row position, then followed by a comma, and ends with the column position. The order is important.

Here is the cell value from the first row, second column of mtcars.

> mtcars[1, 2] 
[1] 6

Moreover, we can use the row and column names instead of the numeric coordinates.

> mtcars["Mazda RX4", "cyl"] 
[1] 6

Lastly, the number of data rows in the data frame is given by the nrow function.

> nrow(mtcars)    # number of data rows 
[1] 32

And the number of columns of a data frame is given by the ncol function.

> ncol(mtcars)    # number of columns 
[1] 11

Further details of the mtcars data set is available in the R documentation.

> help(mtcars)


Instead of printing out the entire data frame, it is often desirable to preview it with the head function beforehand.

> head(mtcars) 
               mpg cyl disp  hp drat   wt ... 
Mazda RX4     21.0   6  160 110 3.90 2.62 ...