Overview
Methodological rigor is fundamental to producing high-quality data and generating objective, reliable, and accurate statistical information. By employing probability sampling and design-based inference, household surveys can achieve these standards within a scientifically robust and globally recognized methodological framework. The LSMS sampling team adheres strictly to this framework, leveraging established methods and best practices to address the complex data production challenges frequently encountered in low- and middle-income countries (LMICs).
The Living Standards Measurement Study (LSMS) team designs and implements sampling and estimation processes for large-scale, multi-topic household surveys and survey experiments. This work encompasses enriching and integrating sampling frames, estimating sample size requirements, conducting power calculations, designing sampling strategies and performing stratification, allocation and sample selection. The team also calculates survey weights, computes estimates, and assesses the uncertainty of these estimates.
In the resource-constrained and data-scarce environments of LMICs, standard approaches commonly employed in countries with advanced statistical infrastructure are often impractical. To overcome this challenge, our team continuously explores the latest methodological advancements and emerging technologies. This commitment to innovation enhances data quality and strengthens statistical capacity in client and partner countries, aligning seamlessly with the LSMS mission and vision.
Work Areas
1. Sampling Frame
Sampling frames are crucial for household surveys as they provide a comprehensive list of all units in the target population - area and list frames - ensuring that every sampling unit has a known and non-zero probability of being selected. This is essential for producing statistically valid and reliable results. The main characteristics of a good sampling frame include completeness - covering the entire target population - accuracy – updated and correct information - and informativeness - auxiliary information about the target population.
The LSMS helps partners to identify the correct sampling frame, checking if it meets the basic requirements and enhancing its information content. A particular advantage in this regard is related to georeferenced sampling frames, which are especially useful as they allow the integration of remote sensing or other georeferenced data into the frame to allow for more efficient sampling designs (see also section on climate sensitive sampling and geospatial sampling).
2. Sampling Design
The sampling design of a survey aims to maximize estimate precision while satisfying budgetary and logistical constraints. In LMICs, these constraints are particularly demanding, making sampling design a complex task, especially for multi-topic household surveys.
Basic approaches for calculating the minimum sample size required to achieve a desired level of estimate precision, and for optimally allocating samples across strata, are limited to handling a single estimator within dissemination domains of a single kind. While these methods yield straightforward closed-form formulas, they cannot accommodate the complexity of large-scale, multi-topic household surveys, which generate estimates for numerous population parameters across multiple dissemination domains.
To address this limitation, our sampling team employs the Bethel algorithm, an iterative convex optimization method that extends the principles of Neyman optimal allocation to a more general multivariate and multi-domain setting. While originally designed for basic estimators such as means and proportions, the team has further extended the Bethel algorithm to accommodate nonlinear estimators, including ratios like the unemployment rate.
3. Weighting & Estimation
Survey weights are numerical expansion factors that must be assigned to the analysis units to enable valid inference from the sample to the target population. This inference is achieved through weighted estimators, where the values of the variables observed on each analysis unit are multiplied by the survey weight of the unit.
Household surveys use complex sampling designs that result in unequal inclusion probabilities. Moreover, they are inevitably affected by non-sampling errors, such as under coverage and non-response. Survey weights are, therefore, necessary to compensate for differences in inclusion probabilities and to counteract bias arising from non-sampling errors.
To achieve these goals, our team implements rigorous weight calculation processes. This includes tackling complexities such as multiple-frame sampling, refreshment sampling and the creation of both cross-sectional and longitudinal weights for panel surveys. The team extensively uses calibration methods and tools, leveraging auxiliary information on the target population from external sources to increase the accuracy of survey estimates.
4. Uncertainty Estimation
Estimating the uncertainty of point estimates is crucial in both household surveys and survey experiments. Measures of uncertainty, such as standard errors and confidence intervals, enable users to evaluate the reliability of survey estimates and make informed decisions. In the context of hypothesis testing, precise estimation of uncertainty is vital to ensure that inferences about experimental effects or causal relationships are valid and not unduly influenced by sampling variability.
A common complication in survey experiments arises when control and treatment groups are not sampled independently, for instance when households from both groups are selected within the same set of primary sampling units. This induces correlations that make uncertainty estimation harder. We have developed tools to rigorously address this issue. By accurately quantifying sampling uncertainty, the team upholds the reliability of survey estimates and fosters trust in research findings.
5. Long-Term Panel Designs
Through the Living Standards Measurement Study Integrated Surveys on Agriculture initiative (LSMS-ISA), we have acquired extensive experience in the development of sampling designs and fieldwork protocols for panel surveys. As the LSMS embarks on the next generation of panel surveys under the Resilient Futures initiative, we have developed a comprehensive sampling design for long-term panel surveys that enables long-term records at the individual level.
One important aspect of the design is the approach to following and tracking rules, which has been informed by analytical research conducted by the LSMS team examining the impact of different following/tracking rules adopted in the LSMS-ISA on cross-sectional and longitudinal representativeness, as well as evolution of the sample size and survey cost. The proposed design also adopts periodic and partial refreshment of the sample to better ensure cross-sectional representativeness and limit the impact of attrition and expansion of the sample throughout the lifespan of the survey.
6. Climate-Sensitive Sampling for Household Surveys
Nationally representative household surveys offer valuable insights into the socioeconomic factors influencing climate resilience. However, they often fall short of providing rigorous estimates of the responses to extreme weather events and their impacts due to sampling designs that do not adequately cover affected areas. To address these challenges, we have dedicated to developing sampling methods that support robust climate resilience analyses.
These methods include the use of spatial resources to identify areas prone to extreme weather events and oversample communities and households within those areas. This new line of work will commence with an initial research phase focused on developing the methodology including the identification and selection of appropriate data sources and sampling techniques. Following this, the ultimate objective is to operationalize the approach, integrating it into forthcoming national data collection efforts under the new Resilient Futures initiative.
7. Phone Surveys
Through the design and implementation of the LSMS High Frequency Phone Surveys initiative (LSMS-HFPS), our team has devoted substantial attention towards sampling and weighting approaches for phone surveys. Phone surveys in LMICs present a set of challenges that are often different from those faced in face-to-face surveys.
First, identifying a suitable frame of phone numbers from which to sample. Second, concerns for coverage and non-response bias are more heightened since not all members of the target population own a mobile phone, and it can be difficult to make successful contact with a respondent over the phone. Research by our team showed that certain techniques of adjusting survey weights ex-post are effective in reducing biases related to sample selection at the household level. The findings of our work have also contributed to the development of guidance on sampling for phone surveys.
8. Geospatial Sampling
Geospatial sampling is an innovative approach that leverages spatial resources and geographic information systems (GIS) to enhance the accuracy and efficiency of household surveys. By integrating spatial data, such as satellite imagery, land use maps and population density grids, researchers can design more representative and precise sampling frames.
This method allows the identification of specific geographic areas and the selection of households within those areas, ensuring that the sample accurately reflects the spatial distribution of the population. Geospatial sampling also facilitates the monitoring and updating of sampling frames over time, improving the reliability of survey data and enabling more effective targeting of interventions and policies.
Our team of experts can advise on the appropriateness of your sampling frame and on the most efficient sampling design. You can contact us via email: lsms@worldbank.org
9. Sampling Tools
The LSMS is committed to developing free, open-source, user-friendly and standardized statistical software to improve the reproducibility and transparency of production processes and strengthening the statistical capacity of clients and partners.
Below a list of some of the tools available:
- susospatsample. Application for streamlining spatially-aware sampling–both for drawing a sample based on spatial resources and generating spatial resources for the sample drawn.
- susogrdframe. Application to generate replacement units when sampled from a grid frame.
- susolisting. Application to list and sample structures using Google Maps.
- susorastoframe. Application for updating a spatial polygons area sampling frame with one or more raster layers and using them PPS sampling of enumeration areas.
- ReGenesees. An R package for calibration, estimation and sampling error assessment in complex sample surveys.