Sampling Distributions | Alfonso J. Martinez

Introduction

The notion of a sampling distribution is one of the most important concepts taught in introductory statistics because it lays down the foundations and motivates the use of hypothesis testing. For instance, in introductory statistics courses, it is common to lear the $t$-test for comparing two population means. The $t$ statistic we learn is given by the equation ***** and it is common knowledge (among statisticians and quantitative methodologists with several years of experience at least) that this statistic follows a $t$ distribution with $n - 1$ degrees of freedom. Importantly two questions arise. Importantly notice the following:

RS <- function(nDraws)
 {
  r <- 1.85
  draws <- NULL
  nTotal <- nAccept <- 0
  repeat
  { 
    nTotal <- nTotal + 1
    x <- rnorm(1, 0, sqrt(0.5))
    rgx <- r*dnorm(x, 0, sqrt(0.5))
    kx <- Kx(x)
    if(runif(1, 0, rgx) < kx) {
      draws <- c(draws, x)
      nAccept <- nAccept + 1 }
    if(length(draws) == nDraws) break
  }

Here is some more code

plot(rnorm(100))

Plotting the Mean

Test

Simulate it!

knitr::include_app("https://yihui.shinyapps.io/miniUI/",
  height = "600px")

In section 3.XX, it was mentioned that failing to remove the level-2 variation from a level-1 predictor leads to a phenomenon colloquially known as ‘smushing’ (technical terms include conflation, etc. etc.). Think of it this way: a student who attends school in district X is inherently going to be influenced by the characteristics of that particular school. Hence, even though we may be interested in a ‘person-specific’ variable (e.g., mathematics achievement), this variable will inherently contain some information that is due to the context the student is in. In other words, students don’t learn mathematics in isolation, they learn in academic environments (a.k.a school). Hence, certain practices at that school are going to invariably influence the performance of the student. If we fail to account for this in our analyses, we are actually biasing our estimates. Let’s explore this idea via a small-scale simulation. First, let’s simulate some data so that we can compare what should happen when we appropriately disaggrate the variability at the two levels versus what happens when we don’*:

knitr::include_app("https://semlab.shinyapps.io/rmsea-efa-cfa/",
  height = "600px")