B Formula Summaries

B.1 Chapter 2

B.1.1 Statistical Formulas

Mean: \(\overline{x} = \frac{1}{m}(x_1 + x_2 + ... + x_m) = \frac{1}{m}\sum_{i=1}^m x_i\)

Variance: \(s^2 = \frac{1}{m-1}\sum_{i=1}^m (x_i - \overline{x})^2 = \frac{1}{m-1}\left[\left(\sum_{i=1}^m x_i^2\right) - \left(m\overline{x}^2\right)\right]\)

Probability as long-run frequency: \(\P(S = s) = \lim_{\text{number of repetitions} \to \infty} \frac{\text{number times outcome } s \text{ is obtained}}{\text{number of repetitions}}\)

Probability for a RV: \(\P(Z = z) = \sum_{s \text{ so that when }S=s, \text{ then }Z=z}\P(S=s)\)

Expectation: \(\E[Z] = \sum_{\text{possible values }z \text{ of }Z} \P(Z = z) \times z\)

Bias: \(\text{Bias}(\widehat{\theta}) = \E[\widehat{\theta}] - \theta\)

Variance: \(Var(Z) = \E[(Z - \E[Z])^2] = Cov(Z_1, Z_1)\)

Covariance: \(Cov(Z_1, Z_2) = \E[(Z_1 - \E[Z_1])(Z_2 - \E[Z_2])] = \E[Z_1Z_2] - \E[Z_1]\E[Z_2]\)

Linearity of Expectation: \(\E\left[\sum_{i=1}^N a_iZ_i\right] = \sum_{i=1}^N a_i\E\big[Z_i\big]\)

Variance of Linear Combination: \(Var\left(\sum_{i=1}^Na_iZ_i\right) = \sum_{i=1}^N a_i^2 Var(Z_i) + 2\sum_{i \ne j}a_ia_j Cov(Z_i,Z_j)\)

B.1.2 Notation

Summary Population Quantity Sample Quantity
Total \(t_U = \sum_{i=1}^N y_i\) \(T_S = \sum_{i\in S} y_i\)
Mean \(\overline{y}_U = \frac{1}{N}t_U\) \(\overline{Y}_S = \frac{1}{n}T_S\)
Variance \(s_U^2 = \frac{1}{N-1}\sum_{i=1}^N (y_i - \overline{y}_U)^2\) \(S_S^2 = \frac{1}{n-1}\sum_{i\in S}(y_i - \overline{Y}_S)^2\)
Standard deviation \(s_U\) \(S_S\)

B.1.3 Horvitz-Thompson Estimator

Probability of selection: \(\pi_i = \P(Z_i = 1) = \E[Z_i]\)

Weights: \(w_i = 1/\pi_i\)

HT estimator for Population Total: \(\widehat{T}_{HT} = \sum_{i \in S}\frac{y_i}{\P(Z_i = 1)} = \sum_{i=1}^N \frac{Z_iy_i}{\P(Z_i = 1)}\)

Ht estimator for Population Mean: \(\widehat{\overline{Y}}_{HT} = \frac{1}{N}\widehat{T}_{HT}\)

Variance of HT Estimators: \[\begin{align*} Var(\widehat{T}_{HT}) & = \sum_{i=1}^N\frac{y_i^2}{\pi_i^2}\pi_i(1-\pi_i) + 2\sum_{i\ne j}\frac{y_iy_j}{\pi_i\pi_j}(\E[Z_iZ_j] - \pi_i\pi_j) \\ Var(\widehat{\overline{Y}}_{HT}) & = \frac{1}{N^2}Var(\widehat{T}_{HT}) \end{align*}\]

B.1.4 Simple Random Sampling

Under SRS, \(\pi_i = \frac{n}{N}\) for all \(i \in U\). Plugging this and other properties of SRS into the HT estimator:

Summary Estimator Variance (\(Var_{SRS}\)) Estimated Variance \((\widehat{Var}_{SRS})\)
Total \(\widehat{T}_{HT} = N\overline{Y}_S\) \(N^2\left(1-\frac{n}{N}\right)\frac{s_U^2}{n}\) \(N^2\left(1-\frac{n}{N}\right)\frac{S_S^2}{n}\)
Mean \(\widehat{\overline{Y}}_{HT} = \overline{Y}_S\) \(\left(1-\frac{n}{N}\right)\frac{s_U^2}{n}\) \(\left(1-\frac{n}{N}\right)\frac{S_S^2}{n}\)
Proportion \(\widehat{P}_S\) \(\left(1-\frac{n}{N}\right)\frac{p_U(1-p_U)}{n-1}\) \(\left(1-\frac{n}{N}\right)\frac{\widehat{P}_S(1-\widehat{P}_S)}{n-1}\)

B.1.5 Others

Standard error: \(SE(\widehat{\theta}) = \sqrt{Var(\hat{\theta})}\)

Confidence intervals: \(\widehat{\theta} \pm z_{\alpha/2}SE(\widehat{\theta})\)

Sample size calculation: \(e = z_{\alpha/2}SE(\widehat{\theta})\)

B.2 Chapter 3

B.2.1 General Formula for HT Estimators

Summary Estimator Variance
Total \(\widehat{T}_{HT} = \sum_{h=1}^H\widehat{T}_{HT,h}\) \(Var_{strt}(\widehat{T}_{HT}) = \sum_{h=1}^HVar_{SRS}(\widehat{T}_{HT,h})\)
Mean \(\widehat{\overline{Y}}_{HT} = \frac{1}{N}\widehat{T}_{HT}\) \(Var_{strt}(\widehat{\overline{Y}}_{HT}) = \frac{1}{N^2}Var_{strt}(\widehat{T}_{HT})\)

where \[\begin{align*} \widehat{T}_{HT,h} & = N_h\overline{Y}_{S,h} \\ Var_{SRS}(\widehat{T}_{HT,h}) & = N_h^2 \left(1-\frac{n_h}{N_h}\right)\frac{s_{U,h}^2}{n_h} \\ \widehat{Var}_{SRS}(\widehat{T}_{HT,h}) & = N_h^2 \left(1-\frac{n_h}{N_h}\right)\frac{S_{S,h}^2}{n_h} \\ s_{U,h}^2 & = \frac{1}{N_h-1}\sum_{i\in U_h}(y_i - \overline{y}_{U,h})^2 \\ S_{S,h}^2 & = \frac{1}{n_h-1}\sum_{i\in S_h}(y_i - \overline{Y}_{S,h})^2 \\ \overline{y}_{U,h} & = \frac{1}{N_h}\sum_{i \in U_h}y_i = \frac{1}{N_h}\sum_{i = 1}^{N_h} y_{h,i} \\ \overline{Y}_{S,h} & = \frac{1}{n_h}\sum_{i \in S_h}y_i = \frac{1}{n_h}\sum_{i = 1}^{n_h} y_{h,i}\\ \end{align*}\]

B.2.2 Additional Formulas

Sum of Squares Formula Notes
SSW \(\sum_{h=1}^H\sum_{i=1}^{N_h}(y_{h,i}-\bar{y}_{U,h})^2\) \(\sum_{h=1}^H(N_h - 1)s_{U,h}^2\)
SSB \(\sum_{h=1}^H N_h (\bar{y}_{U,h} - \bar{y}_U)^2\) \(\bar{y}_U = \frac{1}{N}\sum_{h=1}^HN_h\bar{y}_{U,h}\)
SST = SSW + SSB \(\sum_{h=1}^H\sum_{i=1}^{N_h}(y_{h,i}-\bar{y}_{U})^2\) \((N-1)s_{U}^2\)

Design effect: \(DE = \frac{Var_{complex}(\widehat{\theta}_n)}{Var_{SRS}(\widehat{\theta}_n)}\)

Effective sample size: \(n_{eff} = \frac{n}{DE}\)

B.2.3 Allocations

Proportional allocation: \(n_h \propto N_h\)

Optimal allocation: \(n_h \propto \frac{N_hs_h}{\sqrt{c_h}}\)

Neyman allocation: \(n_h \propto N_hs_h\)

B.3 Chapter 4

B.3.1 Ratio Estimation

Type Estimator Variance (approx) Variance (est)
Mean \(\widehat{\overline{Y}}_r = \frac{\overline{Y}_S}{\overline{X}_S}\overline{x}_U\) \(\left(1-\frac{n}{N}\right)\frac{s_{U,d}^2}{n}\) \(\left(1-\frac{n}{N}\right)\frac{S_{S,e}^2}{n}\)
Total \(\widehat{T}_r = N\widehat{\overline{Y}}_r\) \(N^2\left(1-\frac{n}{N}\right)\frac{s_{U,d}^2}{n}\) \(N^2\left(1-\frac{n}{N}\right)\frac{S_{S,e}^2}{n}\)
Ratio \(\widehat{B}_r = \frac{1}{\overline{x}_U}\widehat{\overline{Y}}_r\) \(\left(\frac{1}{\overline{x}_U}\right)^2\left(1-\frac{n}{N}\right)\frac{s_{U,d}^2}{n}\) \(\left(\frac{1}{\overline{x}_U}\right)^2\left(1-\frac{n}{N}\right)\frac{S_{S,e}^2}{n}\)

where \(e_i = y_i - \widehat{B}_rx_i = y_i - \frac{\overline{Y}_S}{\overline{X}_S}x_i\) and \[\begin{align*} S_{S,e}^2 & = \frac{1}{n-1}\sum_{i\in S}e_i^2 = S_{S,y}^2 - 2\hat{B}_rR_SS_{S,y}S_{S,x} + \hat{B}_r^2S_{S,x}^2 \\ S_{S,y}^2 & = \frac{1}{n-1}\sum_{i\in S}(y_i - \overline{Y}_S)^2 = \frac{(\sum_{i\in S}y_i^2) - n(\overline{Y}_S)^2}{n-1} \\ S_{S,x}^2 & = \frac{1}{n-1}\sum_{i\in S}(x_i - \overline{X}_S)^2 = \frac{(\sum_{i\in S}x_i^2) - n(\overline{X}_S)^2}{n-1} \\ R_{S} & = \frac{\sum_{i\in S}(y_i - \overline{Y}_S)(x_i - \overline{X}_S)}{(n-1)S_{S,x}S_{S,y}} = \frac{(\sum_{i\in S}x_iy_i) - n\overline{Y}_S\overline{X}_S}{(n-1)S_{S,x}S_{S,y}} \end{align*}\]

B.3.2 Post-Stratification

Post-stratified estimators:

\[\begin{align*} \widehat{T}_{post} & = \sum_{h=1}^H N_h \overline{Y}_{S,h} \\ \widehat{\overline{Y}}_{post} & = \frac{1}{N}\widehat{T}_{post}\sum_{h=1}^H \frac{N_h}{N} \overline{Y}_{S,h} \end{align*}\]

Variance estimation: \[ \widehat{Var}_{SRS}(\widehat{\overline{Y}}_{post}) \approx \widehat{Var}_{strt, prop}(\widehat{\overline{Y}}_{HT}) = \left(1-\frac{n}{N}\right)\sum_{h=1}^H \frac{N_h}{N}\frac{S_{S,h}^2}{n}. \]

B.3.3 Regression Estimation

Regression estimators: \[\begin{align*} \widehat{T}_{reg} & = N\widehat{B}_0 + \widehat{B}_1 t_{U,x} \\ \widehat{\overline{Y}}_{reg} & = \widehat{B}_0 + \widehat{B}_1 \overline{x}_{U} \end{align*}\]

Least squares estimates of the intercept and slope of the regression line: \[\begin{align*} \widehat{B}_{S,reg,0} & = \overline{Y}_S - \widehat{B}_{S,reg,1} \times \overline{X}_S \\ \widehat{B}_{S,reg,1} & = \frac{\sum_{i\in S} (x_i - \overline{X}_S)(y_i - \overline{Y}_S)}{\sum_{i\in S}(x_i - \overline{X}_S)^2} = \frac{R_SS_{S,y}}{S_{S,x}} \\ \end{align*}\]

Variance: \[ \widehat{Var}_{SRS}(\widehat{\overline{Y}}_{reg}) \approx \widehat{MSE}_{SRS}(\widehat{\overline{Y}}_{reg}) = \left(1-\frac{n}{N}\right)\frac{S_{S,e}^2}{n} = \left(1-\frac{n}{N}\right)\frac{S_{S,y}^2(1-R_S^2)}{n} \]

where \(e_i = y_i - \widehat{B}_{S,reg,0} -\widehat{B}_{S,reg,1} x_i = y_i - \overline{Y}_S - \widehat{B}_{S,reg,1}(x_i - \overline{X}_S)\).