During machine learning one often needs to divide the two different data sets, namely training and testing datasets. While you can’t directly use the “sample” command in R, there is a simple workaround for this. Essentially, use the “sample” command to randomly select certain index number and then use the selected index numbers to divide the dataset into training and testing dataset. Below is the sample code for doing this. In the code below I use 20% of the data for testing and rest of the 80% for training.
# By default R comes with few datasets. data = mtcars dim(data) # 32 11 #Sample Indexes indexes = sample(1:nrow(data), size=0.2*nrow(data)) # Split data test = data[indexes,] dim(test) # 6 11 train = data[-indexes,] dim(train) # 26 11