Bootstrap and Monte Carlo Simulation
Published:
Methods: Bootstrap, Monte Carlo Simulation
I found myself perplexed by the difference and relationship between Bootstrap and Monte Carlo Simulation. Then, I read Comparing Groups: Randomization and Bootstrap Methods Using R and it clearly explains these two methods. This book uses simple language to explain intricate statistical concepts,offering concrete examples with orgnized code. It also introduced about effectively presenting statistical findings in research papers. This book is highly beneficial for individuals inclined toward statistical analysis within the realm of social sciences. The primary content of this blog post draws heavily from insights gleaned from this book.
The bootstrap methodology uses Monte Carlo simulation to resample many replicate data sets from a probability model assumed to underlie the population, or from a model that can be estimated from the data. (p.140)
I think both Bootstrap and Monte Carlo Simulation are resampling methods. Monte Carlo Simulation resamples in a random way, while Bootstrap resamples according to the empirical distribution of the data.
Example
The dataset Latino has 150 observations with two columns. Column 1 is “Mex” which suggests whether the person is from Mexico, and Column 2 is “Achieve” which is the level of the person’s fluency in English. In the dataset, 116 people of them are from Mexico and the other 34 are not.
H0 Hypothesis: People from and not from Mexico have the same level of English. In other words, the average difference of their English level is not statistically significant
Monte Carlo Simulation:
Step 1: Resample 5000 times
permuted <- replicate(n = 4999, expr = sample(latino$Achieve))
Step 2: For the resampled dataset, calculate the difference in means of the two groups
mean.diff <- function(data) {
mean(data [1:34]) - mean(data[35:150])
}
diffs <- apply(X = permuted, MARGIN = 2, FUN = mean.diff)
Step3:Calculate p-value
(length(diffs[abs(diffs) >= 0.39])+1) /5000
# 0.39 is the group difference in mean in the original dataset.
Bootstrap:
Step 1: Resample 5000 times under empirical distribution
Step 2: For the resampled dataset, calculate the difference in means of the two groups
Step3:Calculate p-value
library(boot)
mean.diff.np <- function(data, indices) {
d <- data[indices, ]
mean(d$Achieve[1:34]) - mean(d$Achieve[35:150])
}
nonpar.boot <- boot(data = latino, statistic = mean.diff.np, R = 4999)
(length(par.boot$t[abs(par.boot$t) >= 0.39])+1)/5000
References
Zieffler, Harring, Long (2011). Comparing Groups: Randomization and Bootstrap Methods Using R. Wiley.
