One of the most common errors that you’ll encounter in R is:
undefined columns selected
This error occurs when you try to select a subset of a data frame and forget to add a comma.
For example, suppose we have the following data frame in R:
#create data frame with three variables
data #view DataFrame
data
var1 var2 var3
1 0 5 2
2 4 5 7
3 2 7 9
4 2 8 9
5 5 9 7
Now suppose we attempt to select all rows where var1 is greater than 3:
data[data$var1>3]
Error in `[.data.frame`(data, data$var1 > 3) : undefined columns selected
We receive an error because we forgot to add a comma after the 3. Once we add the comma, the error will go away:
data[data$var1>3, ] var1 var2 var3 2 4 5 7 5 5 9 7
The reason you need to add a comma is because R uses the following syntax for subsetting data frames:
data[rows you want, columns you want]
If you only type data[data$var1>3], then you’re telling R to return the rows where var1>3, but you’re not telling R which columns to return.
By using data[data$var1>3, ], you’re telling R to return the rows where var1>3 and all of the columns in the data frame. An equivalent command would be data[data$var1>3, 1:3].
data[data$var1>3, 1:3] var1 var2 var3 2 4 5 7 5 5 9 7
Notice that this command returns the same subset of data as before.
You can find more R tutorials here.