B Formula Summaries
B.1 Chapter 2
B.1.1 Statistical Formulas
Mean: \(\overline{x} = \frac{1}{m}(x_1 + x_2 + ... + x_m) = \frac{1}{m}\sum_{i=1}^m x_i\)
Variance: \(s^2 = \frac{1}{m-1}\sum_{i=1}^m (x_i - \overline{x})^2 = \frac{1}{m-1}\left[\left(\sum_{i=1}^m x_i^2\right) - \left(m\overline{x}^2\right)\right]\)
Probability as long-run frequency: \(\P(S = s) = \lim_{\text{number of repetitions} \to \infty} \frac{\text{number times outcome } s \text{ is obtained}}{\text{number of repetitions}}\)
Probability for a RV: \(\P(Z = z) = \sum_{s \text{ so that when }S=s, \text{ then }Z=z}\P(S=s)\)
Expectation: \(\E[Z] = \sum_{\text{possible values }z \text{ of }Z} \P(Z = z) \times z\)
Bias: \(\text{Bias}(\widehat{\theta}) = \E[\widehat{\theta}] - \theta\)
Variance: \(Var(Z) = \E[(Z - \E[Z])^2] = Cov(Z_1, Z_1)\)
Covariance: \(Cov(Z_1, Z_2) = \E[(Z_1 - \E[Z_1])(Z_2 - \E[Z_2])] = \E[Z_1Z_2] - \E[Z_1]\E[Z_2]\)
Linearity of Expectation: \(\E\left[\sum_{i=1}^N a_iZ_i\right] = \sum_{i=1}^N a_i\E\big[Z_i\big]\)
Variance of Linear Combination: \(Var\left(\sum_{i=1}^Na_iZ_i\right) = \sum_{i=1}^N a_i^2 Var(Z_i) + 2\sum_{i \ne j}a_ia_j Cov(Z_i,Z_j)\)
B.1.2 Notation
| Summary | Population Quantity | Sample Quantity |
|---|---|---|
| Total | \(t_U = \sum_{i=1}^N y_i\) | \(T_S = \sum_{i\in S} y_i\) |
| Mean | \(\overline{y}_U = \frac{1}{N}t_U\) | \(\overline{Y}_S = \frac{1}{n}T_S\) |
| Variance | \(s_U^2 = \frac{1}{N-1}\sum_{i=1}^N (y_i - \overline{y}_U)^2\) | \(S_S^2 = \frac{1}{n-1}\sum_{i\in S}(y_i - \overline{Y}_S)^2\) |
| Standard deviation | \(s_U\) | \(S_S\) |
B.1.3 Horvitz-Thompson Estimator
Probability of selection: \(\pi_i = \P(Z_i = 1) = \E[Z_i]\)
Weights: \(w_i = 1/\pi_i\)
HT estimator for Population Total: \(\widehat{T}_{HT} = \sum_{i \in S}\frac{y_i}{\P(Z_i = 1)} = \sum_{i=1}^N \frac{Z_iy_i}{\P(Z_i = 1)}\)
Ht estimator for Population Mean: \(\widehat{\overline{Y}}_{HT} = \frac{1}{N}\widehat{T}_{HT}\)
Variance of HT Estimators: \[\begin{align*} Var(\widehat{T}_{HT}) & = \sum_{i=1}^N\frac{y_i^2}{\pi_i^2}\pi_i(1-\pi_i) + 2\sum_{i\ne j}\frac{y_iy_j}{\pi_i\pi_j}(\E[Z_iZ_j] - \pi_i\pi_j) \\ Var(\widehat{\overline{Y}}_{HT}) & = \frac{1}{N^2}Var(\widehat{T}_{HT}) \end{align*}\]
B.1.4 Simple Random Sampling
Under SRS, \(\pi_i = \frac{n}{N}\) for all \(i \in U\). Plugging this and other properties of SRS into the HT estimator:
| Summary | Estimator | Variance (\(Var_{SRS}\)) | Estimated Variance \((\widehat{Var}_{SRS})\) |
|---|---|---|---|
| Total | \(\widehat{T}_{HT} = N\overline{Y}_S\) | \(N^2\left(1-\frac{n}{N}\right)\frac{s_U^2}{n}\) | \(N^2\left(1-\frac{n}{N}\right)\frac{S_S^2}{n}\) |
| Mean | \(\widehat{\overline{Y}}_{HT} = \overline{Y}_S\) | \(\left(1-\frac{n}{N}\right)\frac{s_U^2}{n}\) | \(\left(1-\frac{n}{N}\right)\frac{S_S^2}{n}\) |
| Proportion | \(\widehat{P}_S\) | \(\left(1-\frac{n}{N}\right)\frac{p_U(1-p_U)}{n-1}\) | \(\left(1-\frac{n}{N}\right)\frac{\widehat{P}_S(1-\widehat{P}_S)}{n-1}\) |
B.2 Chapter 3
B.2.1 General Formula for HT Estimators
| Summary | Estimator | Variance |
|---|---|---|
| Total | \(\widehat{T}_{HT} = \sum_{h=1}^H\widehat{T}_{HT,h}\) | \(Var_{strt}(\widehat{T}_{HT}) = \sum_{h=1}^HVar_{SRS}(\widehat{T}_{HT,h})\) |
| Mean | \(\widehat{\overline{Y}}_{HT} = \frac{1}{N}\widehat{T}_{HT}\) | \(Var_{strt}(\widehat{\overline{Y}}_{HT}) = \frac{1}{N^2}Var_{strt}(\widehat{T}_{HT})\) |
where \[\begin{align*} \widehat{T}_{HT,h} & = N_h\overline{Y}_{S,h} \\ Var_{SRS}(\widehat{T}_{HT,h}) & = N_h^2 \left(1-\frac{n_h}{N_h}\right)\frac{s_{U,h}^2}{n_h} \\ \widehat{Var}_{SRS}(\widehat{T}_{HT,h}) & = N_h^2 \left(1-\frac{n_h}{N_h}\right)\frac{S_{S,h}^2}{n_h} \\ s_{U,h}^2 & = \frac{1}{N_h-1}\sum_{i\in U_h}(y_i - \overline{y}_{U,h})^2 \\ S_{S,h}^2 & = \frac{1}{n_h-1}\sum_{i\in S_h}(y_i - \overline{Y}_{S,h})^2 \\ \overline{y}_{U,h} & = \frac{1}{N_h}\sum_{i \in U_h}y_i = \frac{1}{N_h}\sum_{i = 1}^{N_h} y_{h,i} \\ \overline{Y}_{S,h} & = \frac{1}{n_h}\sum_{i \in S_h}y_i = \frac{1}{n_h}\sum_{i = 1}^{n_h} y_{h,i}\\ \end{align*}\]
B.2.2 Additional Formulas
| Sum of Squares | Formula | Notes |
|---|---|---|
| SSW | \(\sum_{h=1}^H\sum_{i=1}^{N_h}(y_{h,i}-\bar{y}_{U,h})^2\) | \(\sum_{h=1}^H(N_h - 1)s_{U,h}^2\) |
| SSB | \(\sum_{h=1}^H N_h (\bar{y}_{U,h} - \bar{y}_U)^2\) | \(\bar{y}_U = \frac{1}{N}\sum_{h=1}^HN_h\bar{y}_{U,h}\) |
| SST = SSW + SSB | \(\sum_{h=1}^H\sum_{i=1}^{N_h}(y_{h,i}-\bar{y}_{U})^2\) | \((N-1)s_{U}^2\) |
Design effect: \(DE = \frac{Var_{complex}(\widehat{\theta}_n)}{Var_{SRS}(\widehat{\theta}_n)}\)
Effective sample size: \(n_{eff} = \frac{n}{DE}\)
B.3 Chapter 4
B.3.1 Ratio Estimation
| Type | Estimator | Variance (approx) | Variance (est) |
|---|---|---|---|
| Mean | \(\widehat{\overline{Y}}_r = \frac{\overline{Y}_S}{\overline{X}_S}\overline{x}_U\) | \(\left(1-\frac{n}{N}\right)\frac{s_{U,d}^2}{n}\) | \(\left(1-\frac{n}{N}\right)\frac{S_{S,e}^2}{n}\) |
| Total | \(\widehat{T}_r = N\widehat{\overline{Y}}_r\) | \(N^2\left(1-\frac{n}{N}\right)\frac{s_{U,d}^2}{n}\) | \(N^2\left(1-\frac{n}{N}\right)\frac{S_{S,e}^2}{n}\) |
| Ratio | \(\widehat{B}_r = \frac{1}{\overline{x}_U}\widehat{\overline{Y}}_r\) | \(\left(\frac{1}{\overline{x}_U}\right)^2\left(1-\frac{n}{N}\right)\frac{s_{U,d}^2}{n}\) | \(\left(\frac{1}{\overline{x}_U}\right)^2\left(1-\frac{n}{N}\right)\frac{S_{S,e}^2}{n}\) |
where \(e_i = y_i - \widehat{B}_rx_i = y_i - \frac{\overline{Y}_S}{\overline{X}_S}x_i\) and \[\begin{align*} S_{S,e}^2 & = \frac{1}{n-1}\sum_{i\in S}e_i^2 = S_{S,y}^2 - 2\hat{B}_rR_SS_{S,y}S_{S,x} + \hat{B}_r^2S_{S,x}^2 \\ S_{S,y}^2 & = \frac{1}{n-1}\sum_{i\in S}(y_i - \overline{Y}_S)^2 = \frac{(\sum_{i\in S}y_i^2) - n(\overline{Y}_S)^2}{n-1} \\ S_{S,x}^2 & = \frac{1}{n-1}\sum_{i\in S}(x_i - \overline{X}_S)^2 = \frac{(\sum_{i\in S}x_i^2) - n(\overline{X}_S)^2}{n-1} \\ R_{S} & = \frac{\sum_{i\in S}(y_i - \overline{Y}_S)(x_i - \overline{X}_S)}{(n-1)S_{S,x}S_{S,y}} = \frac{(\sum_{i\in S}x_iy_i) - n\overline{Y}_S\overline{X}_S}{(n-1)S_{S,x}S_{S,y}} \end{align*}\]
B.3.2 Post-Stratification
Post-stratified estimators:
\[\begin{align*} \widehat{T}_{post} & = \sum_{h=1}^H N_h \overline{Y}_{S,h} \\ \widehat{\overline{Y}}_{post} & = \frac{1}{N}\widehat{T}_{post}\sum_{h=1}^H \frac{N_h}{N} \overline{Y}_{S,h} \end{align*}\]
Variance estimation: \[ \widehat{Var}_{SRS}(\widehat{\overline{Y}}_{post}) \approx \widehat{Var}_{strt, prop}(\widehat{\overline{Y}}_{HT}) = \left(1-\frac{n}{N}\right)\sum_{h=1}^H \frac{N_h}{N}\frac{S_{S,h}^2}{n}. \]
B.3.3 Regression Estimation
Regression estimators: \[\begin{align*} \widehat{T}_{reg} & = N\widehat{B}_0 + \widehat{B}_1 t_{U,x} \\ \widehat{\overline{Y}}_{reg} & = \widehat{B}_0 + \widehat{B}_1 \overline{x}_{U} \end{align*}\]
Least squares estimates of the intercept and slope of the regression line: \[\begin{align*} \widehat{B}_{S,reg,0} & = \overline{Y}_S - \widehat{B}_{S,reg,1} \times \overline{X}_S \\ \widehat{B}_{S,reg,1} & = \frac{\sum_{i\in S} (x_i - \overline{X}_S)(y_i - \overline{Y}_S)}{\sum_{i\in S}(x_i - \overline{X}_S)^2} = \frac{R_SS_{S,y}}{S_{S,x}} \\ \end{align*}\]
Variance: \[ \widehat{Var}_{SRS}(\widehat{\overline{Y}}_{reg}) \approx \widehat{MSE}_{SRS}(\widehat{\overline{Y}}_{reg}) = \left(1-\frac{n}{N}\right)\frac{S_{S,e}^2}{n} = \left(1-\frac{n}{N}\right)\frac{S_{S,y}^2(1-R_S^2)}{n} \]
where \(e_i = y_i - \widehat{B}_{S,reg,0} -\widehat{B}_{S,reg,1} x_i = y_i - \overline{Y}_S - \widehat{B}_{S,reg,1}(x_i - \overline{X}_S)\).