B Formula Summaries
B.1 Chapter 2
B.1.1 Statistical Formulas
Mean: \(\overline{x} = \frac{1}{m}(x_1 + x_2 + ... + x_m) = \frac{1}{m}\sum_{i=1}^m x_i\)
Variance: \(s^2 = \frac{1}{m-1}\sum_{i=1}^m (x_i - \overline{x})^2 = \frac{1}{m-1}\left[\left(\sum_{i=1}^m x_i^2\right) - \left(m\overline{x}^2\right)\right]\)
Probability as long-run frequency: \(\P(S = s) = \lim_{\text{number of repetitions} \to \infty} \frac{\text{number times outcome } s \text{ is obtained}}{\text{number of repetitions}}\)
Probability for a RV: \(\P(Z = z) = \sum_{s \text{ so that when }S=s, \text{ then }Z=z}\P(S=s)\)
Expectation: \(\E[Z] = \sum_{\text{possible values }z \text{ of }Z} \P(Z = z) \times z\)
Bias: \(\text{Bias}(\widehat{\theta}) = \E[\widehat{\theta}] - \theta\)
Variance: \(Var(Z) = \E[(Z - \E[Z])^2] = Cov(Z_1, Z_1)\)
Covariance: \(Cov(Z_1, Z_2) = \E[(Z_1 - \E[Z_1])(Z_2 - \E[Z_2])] = \E[Z_1Z_2] - \E[Z_1]\E[Z_2]\)
Linearity of Expectation: \(\E\left[\sum_{i=1}^N a_iZ_i\right] = \sum_{i=1}^N a_i\E\big[Z_i\big]\)
Variance of Linear Combination: \(Var\left(\sum_{i=1}^Na_iZ_i\right) = \sum_{i=1}^N a_i^2 Var(Z_i) + 2\sum_{i \ne j}a_ia_j Cov(Z_i,Z_j)\)
B.1.2 Notation
| Summary | Population Quantity | Sample Quantity |
|---|---|---|
| Total | \(t_U = \sum_{i=1}^N y_i\) | \(T_S = \sum_{i\in S} y_i\) |
| Mean | \(\overline{y}_U = \frac{1}{N}t_U\) | \(\overline{Y}_S = \frac{1}{n}T_S\) |
| Variance | \(s_U^2 = \frac{1}{N-1}\sum_{i=1}^N (y_i - \overline{y}_U)^2\) | \(S_S^2 = \frac{1}{n-1}\sum_{i\in S}(y_i - \overline{Y}_S)^2\) |
| Standard deviation | \(s_U\) | \(S_S\) |
B.1.3 Horvitz-Thompson Estimator
Probability of selection: \(\pi_i = \P(Z_i = 1) = \E[Z_i]\)
Weights: \(w_i = 1/\pi_i\)
HT estimator for Population Total: \(\widehat{T}_{HT} = \sum_{i \in S}\frac{y_i}{\P(Z_i = 1)} = \sum_{i=1}^N \frac{Z_iy_i}{\P(Z_i = 1)}\)
Ht estimator for Population Mean: \(\widehat{\overline{Y}}_{HT} = \frac{1}{N}\widehat{T}_{HT}\)
Variance of HT Estimators: \[\begin{align*} Var(\widehat{T}_{HT}) & = \sum_{i=1}^N\frac{y_i^2}{\pi_i^2}\pi_i(1-\pi_i) + 2\sum_{i\ne j}\frac{y_iy_j}{\pi_i\pi_j}(\E[Z_iZ_j] - \pi_i\pi_j) \\ Var(\widehat{\overline{Y}}_{HT}) & = \frac{1}{N^2}Var(\widehat{T}_{HT}) \end{align*}\]
B.1.4 Simple Random Sampling
Under SRS, \(\pi_i = \frac{n}{N}\) for all \(i \in U\). Plugging this and other properties of SRS into the HT estimator:
| Summary | Estimator | Variance (\(Var_{SRS}\)) | Estimated Variance \((\widehat{Var}_{SRS})\) |
|---|---|---|---|
| Total | \(\widehat{T}_{HT} = N\overline{Y}_S\) | \(N^2\left(1-\frac{n}{N}\right)\frac{s_U^2}{n}\) | \(N^2\left(1-\frac{n}{N}\right)\frac{S_S^2}{n}\) |
| Mean | \(\widehat{\overline{Y}}_{HT} = \overline{Y}_S\) | \(\left(1-\frac{n}{N}\right)\frac{s_U^2}{n}\) | \(\left(1-\frac{n}{N}\right)\frac{S_S^2}{n}\) |
| Proportion | \(\widehat{P}_S\) | \(\left(1-\frac{n}{N}\right)\frac{p_U(1-p_U)}{n-1}\) | \(\left(1-\frac{n}{N}\right)\frac{\widehat{P}_S(1-\widehat{P}_S)}{n-1}\) |
B.2 Chapter 3
B.2.1 General Formula for HT Estimators
| Summary | Estimator | Variance |
|---|---|---|
| Total | \(\widehat{T}_{HT} = \sum_{h=1}^H\widehat{T}_{HT,h}\) | \(Var_{strt}(\widehat{T}_{HT}) = \sum_{h=1}^HVar_{SRS}(\widehat{T}_{HT,h})\) |
| Mean | \(\widehat{\overline{Y}}_{HT} = \frac{1}{N}\widehat{T}_{HT}\) | \(Var_{strt}(\widehat{\overline{Y}}_{HT}) = \frac{1}{N^2}Var_{strt}(\widehat{T}_{HT})\) |
where \[\begin{align*} \widehat{T}_{HT,h} & = N_h\overline{Y}_{S,h} \\ Var_{SRS}(\widehat{T}_{HT,h}) & = N_h^2 \left(1-\frac{n_h}{N_h}\right)\frac{s_{U,h}^2}{n_h} \\ \widehat{Var}_{SRS}(\widehat{T}_{HT,h}) & = N_h^2 \left(1-\frac{n_h}{N_h}\right)\frac{S_{S,h}^2}{n_h} \\ s_{U,h}^2 & = \frac{1}{N_h-1}\sum_{i\in U_h}(y_i - \overline{y}_{U,h})^2 \\ S_{S,h}^2 & = \frac{1}{n_h-1}\sum_{i\in S_h}(y_i - \overline{Y}_{S,h})^2 \\ \overline{y}_{U,h} & = \frac{1}{N_h}\sum_{i \in U_h}y_i = \frac{1}{N_h}\sum_{i = 1}^{N_h} y_{h,i} \\ \overline{Y}_{S,h} & = \frac{1}{n_h}\sum_{i \in S_h}y_i = \frac{1}{n_h}\sum_{i = 1}^{n_h} y_{h,i}\\ \end{align*}\]
B.2.2 Additional Formulas
Sum of squares (population version):
| Sum of Squares | Formula | Notes |
|---|---|---|
| SSW | \(\sum_{h=1}^H\sum_{i=1}^{N_h}(y_{h,i}-\overline{y}_{U,h})^2\) | \(SSW = \sum_{h=1}^H(N_h - 1)s_{U,h}^2\) |
| SSB | \(\sum_{h=1}^H N_h (\overline{y}_{U,h} - \overline{y}_U)^2\) | \(\overline{y}_U = \sum_{h=1}^H\frac{N_h}{N}\overline{y}_{U,h}\) |
| SST = SSW + SSB | \(\sum_{h=1}^H\sum_{i=1}^{N_h}(y_{h,i}-\overline{y}_{U})^2\) | \(SST = (N-1)s_{U}^2\) |
Sum of squares (sample version):
| Sum of Squares | Formula | Notes |
|---|---|---|
| \(\widehat{SSW}\) | \(\sum_{h=1}^H\sum_{i=1}^{n_h}(y_{h,i}-\overline{Y}_{S,h})^2\) | \(\widehat{SSW} = \sum_{h=1}^H(n_h - 1)S_{S,h}^2\) |
| \(\widehat{SSB}\) | \(\sum_{h=1}^H n_h (\overline{Y}_{S,h} - \overline{Y}_S)^2\) | \(\overline{Y}_S = \sum_{h=1}^H\frac{n_h}{n}\overline{Y}_{S,h}\) |
| \(\widehat{SST} = \widehat{SSW} + \widehat{SSB}\) | \(\sum_{h=1}^H\sum_{i=1}^{n_h}(y_{h,i}-\overline{Y}_{S})^2\) | \(\widehat{SST} = (n-1)S_{S}^2\) |
Design effect: \(DE = \frac{Var_{complex}(\widehat{\theta}_n)}{Var_{SRS}(\widehat{\theta}_n)}\)
Effective sample size: \(n_{eff} = \frac{n}{DE}\)
B.3 Chapter 4
B.3.1 Ratio Estimation
| Type | Estimator | Variance (approx) | Variance (est) |
|---|---|---|---|
| Mean | \(\widehat{\overline{Y}}_r = \frac{\overline{Y}_S}{\overline{X}_S}\overline{x}_U\) | \(\left(1-\frac{n}{N}\right)\frac{s_{U,d}^2}{n}\) | \(\left(1-\frac{n}{N}\right)\frac{S_{S,e}^2}{n}\) |
| Total | \(\widehat{T}_r = N\widehat{\overline{Y}}_r\) | \(N^2\left(1-\frac{n}{N}\right)\frac{s_{U,d}^2}{n}\) | \(N^2\left(1-\frac{n}{N}\right)\frac{S_{S,e}^2}{n}\) |
| Ratio | \(\widehat{B}_r = \frac{1}{\overline{x}_U}\widehat{\overline{Y}}_r\) | \(\left(\frac{1}{\overline{x}_U}\right)^2\left(1-\frac{n}{N}\right)\frac{s_{U,d}^2}{n}\) | \(\left(\frac{1}{\overline{x}_U}\right)^2\left(1-\frac{n}{N}\right)\frac{S_{S,e}^2}{n}\) |
where \[\begin{align*} d_i & = y_i - b_Ux_i = y_i - \frac{\overline{y}_U}{\overline{x}_U}x_i \\ e_i & = y_i - \widehat{B}_rx_i = y_i - \frac{\overline{Y}_S}{\overline{Y}_S}x_i \\ s_{U,d}^2 & = \frac{1}{N-1}\sum_{i=1}^Nd_i^2 = s_{U,y}^2 - 2b_Ur_Us_{U,y}s_{U,x} + b_U^2s_{U,x}^2 \\ S_{S,e}^2 & = \frac{1}{n-1}\sum_{i\in S}e_i^2 = S_{S,y}^2 - 2\hat{B}_rR_SS_{S,y}S_{S,x} + \hat{B}_r^2S_{S,x}^2 \\ S_{S,y}^2 & = \frac{1}{n-1}\sum_{i\in S}(y_i - \overline{Y}_S)^2 = \frac{(\sum_{i\in S}y_i^2) - n(\overline{Y}_S)^2}{n-1} \\ S_{S,x}^2 & = \frac{1}{n-1}\sum_{i\in S}(x_i - \overline{X}_S)^2 = \frac{(\sum_{i\in S}x_i^2) - n(\overline{X}_S)^2}{n-1} \\ R_{S} & = \frac{\sum_{i\in S}(y_i - \overline{Y}_S)(x_i - \overline{X}_S)}{(n-1)S_{S,x}S_{S,y}} = \frac{(\sum_{i\in S}x_iy_i) - n\overline{Y}_S\overline{X}_S}{(n-1)S_{S,x}S_{S,y}} \end{align*}\]
B.3.2 Post-Stratification
Post-stratified estimators:
\[\begin{align*} \widehat{T}_{post} & = \sum_{h=1}^H N_h \overline{Y}_{S,h} \\ \widehat{\overline{Y}}_{post} & = \frac{1}{N}\widehat{T}_{post} = \sum_{h=1}^H \frac{N_h}{N} \overline{Y}_{S,h} \end{align*}\]
Variance estimation: \[ \widehat{Var}_{SRS}(\widehat{\overline{Y}}_{post}) \approx \widehat{Var}_{strt, prop}(\widehat{\overline{Y}}_{HT}) = \left(1-\frac{n}{N}\right)\sum_{h=1}^H \frac{N_h}{N}\frac{S_{S,h}^2}{n}. \]
B.3.3 Regression Estimation
Regression estimators: \[\begin{align*} \widehat{T}_{reg} & = N\widehat{B}_0 + \widehat{B}_1 t_{U,x} \\ \widehat{\overline{Y}}_{reg} & = \widehat{B}_0 + \widehat{B}_1 \overline{x}_{U} \end{align*}\]
Least squares estimates of the intercept and slope of the regression line: \[\begin{align*} \widehat{B}_{S,reg,0} & = \overline{Y}_S - \widehat{B}_{S,reg,1} \times \overline{X}_S \\ \widehat{B}_{S,reg,1} & = \frac{\sum_{i\in S} (x_i - \overline{X}_S)(y_i - \overline{Y}_S)}{\sum_{i\in S}(x_i - \overline{X}_S)^2} = \frac{R_SS_{S,y}}{S_{S,x}} \\ \end{align*}\]
Variance: \[ \widehat{Var}_{SRS}(\widehat{\overline{Y}}_{reg}) \approx \widehat{MSE}_{SRS}(\widehat{\overline{Y}}_{reg}) = \left(1-\frac{n}{N}\right)\frac{S_{S,e}^2}{n} = \left(1-\frac{n}{N}\right)\frac{S_{S,y}^2(1-R_S^2)}{n} \]
where \(e_i = y_i - \widehat{B}_{S,reg,0} -\widehat{B}_{S,reg,1} x_i = y_i - \overline{Y}_S - \widehat{B}_{S,reg,1}(x_i - \overline{X}_S)\).
B.4 Chapter 5
B.4.1 One-Stage Cluster Sampling
HT Estimators:
| Quantity | Estimator | Variance | Variance (est) |
|---|---|---|---|
| Total | \(\widehat{T}_{HT} = \frac{N}{n}\sum_{i\in S}t_i\) | \(N^2\left(1-\frac{n}{N}\right)\frac{s_{U,t}^2}{n}\) | \(N^2\left(1-\frac{n}{N}\right)\frac{S_{S,t}^2}{n}\) |
| Mean | \(\widehat{\overline{Y}} = \frac{1}{M_0}\widehat{T}_{HT}\) | \(\frac{N^2}{M_0^2}\left(1-\frac{n}{N}\right)\frac{s_{U,t}^2}{n}\) | \(\frac{N^2}{M_0^2}\left(1-\frac{n}{N}\right)\frac{S_{S,t}^2}{n}\) |
where \[\begin{align*} s_{U,t}^2 & = \frac{1}{N-1}\sum_{i=1}^N\left(t_i - \frac{t_U}{N}\right)^2 \\ S_{S,t}^2 & = \frac{1}{n-1}\sum_{i\in S}\left(t_i - \frac{\sum_{i\in S}t_i}{n}\right)^2 \end{align*}\]
Ratio Estimator:
\[\begin{align*} \widehat{\overline{Y}}_r & = \frac{\sum_{i\in S}t_i}{\sum_{i\in S}M_i} \\ Var_{1clus}(\widehat{\overline{Y}}_r) & = \left(1 - \frac{n}{N}\right)\frac{s_{U,d}^2}{n\overline{M}_U^2} \\ \widehat{Var}_{1clus}(\widehat{\overline{Y}}_r) & = \left(1 - \frac{n}{N}\right)\frac{S_{S,e}^2}{n\overline{M}_S^2} \end{align*}\]
where \[\begin{align*} d_i & = t_i - \overline{y}_UM_i \\ e_i & = t_i - M_i\widehat{\overline{Y}}_r \\ s_{U,d}^2 & = \frac{1}{N-1}\sum_{i=1}^N d_i^2 \\ S_{S,e}^2 & = \frac{1}{n-1}\sum_{i\in S}e_i^2 \\ \overline{M}_U & = \frac{1}{N}\sum_{i=1}^N M_i \\ \overline{M}_S & = \frac{1}{n}\sum_{i\in S}^N M_i \end{align*}\]
Equal-sized clusters:
Population quantities:
| Sum of Squares | Formula | Degree of Freedom (df) | Mean of Squares | Notes |
|---|---|---|---|---|
| \(SSB\) | \(\sum_{i=1}^N M(\overline{y}_{U,i} - \overline{y}_U)^2\) | \(N-1\) | \(MSB = \frac{SSB}{N-1}\) | \(s_{U,t}^2 = M \times MSB\) |
| \(SSW\) | \(\sum_{i=1}^N\sum_{j\in U_i}(y_{ij} - \overline{y}_{U,i})^2\) | \(N(M-1)\) | \(MSW = \frac{SSW}{N(M-1)}\) | \(\sum_{i=1}^Ns_{U,i}^2 = N\times MSW\) |
| \(SST\) | \(\sum_{i=1}^N\sum_{j\in U_i} (y_{ij} - \overline{y}_U)^2\) | \(NM - 1\) | \(MST = \frac{SST}{NM-1}\) | \(s_{U,y}^2 = MST\) |
Sample quantities:
| Sum of Squares | Formula | Degree of Freedom (df) | Mean of Squares | Notes |
|---|---|---|---|---|
| \(\widehat{SSB}\) | \(\sum_{i\in S} M(\overline{Y}_{S,i} - \overline{Y}_S)^2\) | \(n-1\) | \(\widehat{MSB} = \frac{\widehat{SSB}}{n-1}\) | \(S_{S,t}^2 = M \times \widehat{MSB}\) |
| \(\widehat{SSW}\) | \(\sum_{i\in S}\sum_{j\in S_i}(y_{ij} - \overline{Y}_{S,i})^2\) | \(n(M-1)\) | \(\widehat{MSW} = \frac{\widehat{SSW}}{n(M-1)}\) | \(\sum_{i\in S}S_{S,i}^2 = n\times \widehat{MSW}\) |
| \(\widehat{SST}\) | \(\sum_{i\in S}\sum_{j\in S_i} (y_{ij} - \overline{Y}_S)^2\) | \(nM - 1\) | \(\widehat{MST} = \frac{\widehat{SST}}{nM-1}\) | \(S_{S,y}^2 = \widehat{MST}\) |
Additionally: \[\begin{align*} ICC & = 1 - \frac{M}{M-1}\frac{SSW}{SST} \\ R_a^2 & = 1 - \frac{MSW}{MST} \\ DE & = \frac{MSB}{MST} \\ & = \frac{NM-1}{M(N-1)}[1 + (M-1)ICC] \\ & = 1 + \frac{N(M-1)}{N-1}(1-R_a^2) \end{align*}\]
B.4.2 Two-Stage Cluster Sampling
HT estimator:
\[\begin{align*} \widehat{T}_{HT} & = \frac{N}{n}\sum_{i\in S}\widehat{T}_{HT,i} = \frac{N}{n}\sum_{i\in S}\frac{M_i}{m_i}\sum_{j\in S_i}y_{ij} \\ Var_{2clus}(\widehat{T}_{HT}) & = N^2\left(1 - \frac{n}{N}\right)\frac{s_{U,t}^2}{n} + \frac{N}{n}\sum_{i=1}^N M_i^2\left(1-\frac{m_i}{M_i}\right)\frac{s_{U,i}^2}{m_i} \\ \widehat{Var}_{2clus}(\widehat{T}_{HT}) & = N^2\left(1 - \frac{n}{N}\right)\frac{\widehat{S}_{S,t}^2}{n} + \frac{N}{n}\sum_{i\in S} M_i^2\left(1-\frac{m_i}{M_i}\right)\frac{S_{S,i}^2}{m_i} \\ \end{align*}\]
where
\[\begin{align*} s_{U,t}^2 & = \frac{1}{N-1}\sum_{i=1}^N\left(t_i - \frac{t_U}{N}\right)^2 \\ \widehat{S}_{S,t}^2 & = \frac{1}{n-1}\sum_{i\in S}\left(\widehat{T}_{HT,i} - \frac{1}{n}\sum_{i\in S}\widehat{T}_{HT,i}\right)^2 \\ s_{U,i}^2 & = \frac{1}{M_i-1}\sum_{j\in U_i}(y_{ij} - \overline{y}_{U,i})^2 \\ S_{S,i}^2 & = \frac{1}{m_i-1}\sum_{j\in S_i}(y_{ij} - \overline{Y}_{S,i})^2 \end{align*}\]
Ratio estimator:
\[\begin{align*} \widehat{\overline{Y}}_r & = \frac{\sum_{i\in S}\widehat{T}_{HT,i}}{\sum_{i\in S}M_i} = \frac{\sum_{i\in S}M_i \overline{Y}_{S,i}}{\sum_{i\in S}M_i} \\ \widehat{Var}_{2clus}(\widehat{\overline{Y}}_r) & = \frac{1}{\overline{M}^2_S}\left(1-\frac{n}{N}\right)\frac{\widehat{S}_{S,e}^2}{n} + \frac{1}{nN\overline{M}^2_S}\sum_{i\in S}M_i^2\left(1-\frac{m_i}{M_i}\right)\frac{S_{S,i}^2}{m_i} \end{align*}\]
where
\[\begin{align*} \overline{M}_S & = \frac{1}{n}\sum_{i\in S}M_i \\ \widehat{S}_{S,e}^2 & = \frac{1}{n-1}\sum_{i\in S}(\widehat{T}_{HT,i} - M_i\widehat{\overline{Y}}_S)^2 \end{align*}\]