Skip to Main Navigation

Living Standards Measurement Study

Select a EDS Sub navigation page selecting option, leaving this page

Tools for Data Quality Assurance

Burundi

Overview

Survey work is a series of complex processes. At the outset, there is sample and questionnaire design, as well as field training. During the data collection stage, there is monitoring and remedial action. And after data collection, there is cleaning, analysis and preparation of data for publication. 

The work can always be made more efficient, and that is the goal of the survey tools our team of experts has been working on for more than a decade. Where possible, tools aim to automate repeated processes. Where processes are complex, they strive to simplify and streamline. And where processes defy automation or streamlining, tools aspire to advise on how to complete processes more efficiently.

Work Areas

1.      Survey Preparation 

Before data collection begins, many activities must occur. First, the survey must be designed, that covers the strategy for sampling the population of interest and the survey instruments for collecting the data. Second, data entry applications must be developed to faithfully translate the instruments from paper to computer, and continuously confront collected data against validation rules. Third, field staff must be trained on how to understand the survey instruments’ questions, what protocol to follow when asking them, and how to capture, review and submit data using the entry application. 

Currently, survey tools focus on the sampling and training components of the survey preparation process. 

Sampling 

To draw a sample, one needs a sampling frame and a measure of size for each primary sampling unit in it. In cases where a traditional sampling frame does not exist, one needs use– and combine–spatial resources to establish a frame and draw a sample. 

To fill the void in off-the-shelf solutions, the Living Standards Measurement Study (LSMS) team developed various survey tools to manage several scenarios where a traditional sampling frame may be missing: 

  • susospatsample. Application for streamlining spatially-aware sampling, both for drawing a sample based on spatial resources and for generating spatial resources for the sample drawn. 
  • susogrdframe. Application to generate replacement units when sampled from a grid frame. 
  • susolisting. Application to list and sample structures using Google Maps. 
  • susorastoframe. Application for updating a spatial polygons area sampling frame with one or more raster layers, using a probability proportional to size (PPS) sampling of enumeration areas. 

Training 

To conduct a field staff training, a suite of materials needs to be produced to explain each question and provide guidance on how to ask them and how to record the answers. Traditionally, the production process involves a laborious copy-and-paste operation of question content from the survey instruments to the corresponding manuals.

Relying on the Solutions’ questionnaire model, our team has devised a survey tool that streamlines this process. susoquestionnairemanual creates data collection training materials in either HTML, Word, or PowerPoint format by having the computer automate the content copy-paste process, so the user only has to add explanations and instructions. 

2.      Monitoring 

Once data collection begins, data must be monitored continuously, issues identified quickly, and feedback provided to field teams frequently. To that end, our team has developed several survey tools for each part of the data monitoring process and for each type of data encountered. 

End-to-End Monitoring for Select Survey Initiatives 

For select survey initiatives, the LSMS team has developed user-friendly tools that are tailored to fit the data and needs of those surveys. For the Multi-Tier Framework energy surveys, the monitorMTF application aids with acquiring data from the server for the survey’s multiple instruments, executing high-frequency data quality checks and producing reports whose tables highlight potentially problematic trends related to key survey indicators.

For the 50x30 agricultural surveys, the LSMS team is constructing a similar set of survey tools. Where the Multi-Tier Framework initiative implements several survey instruments at the same time, 50x30 fields different survey instruments over the multi-year lifecycle of a country’s survey initiative, e.g., first a light monitoring survey, then a heavier survey. 

Accordingly, tbls50x30 produces a meaningful set of monitoring tables for each of initiative’s survey instruments. Meanwhile, monitor50x30 offers an application where the user can specify the context of use, such as which survey, which survey visit and which additional modules, among other elements, and acquire data, execute high-frequency checks and produce monitoring reports that are tailored to that implementation context. 

Tools for Discrete Parts of the Monitoring Process 

To power applications for these and other surveys, our team has developed a set of lower-level tools that streamline several parts of the survey monitoring process and facilitate the use of survey microdata, metadata and paradata. 

Acquire 

To automate regular acquisition of updated survey data, we have several tools that interact with Survey Solutions’ Application Programming Interface (API) and some that effortlessly implement frequent workflows. 

Process

When data is small in size and rectangular in form, conventional data management tools work. But for data that is not small, nor rectangular, our team of experts has designed special tools. For survey metadata - JavaScript Object Notation (JSON) data describing the data entry application – susometa parses metadata and provides methods for extracting content of interest, such as sections, roster, questions and answer options.

For survey paradata - large transactional data sets of user events like answering questions and data entry application events like enabling, disabling, and/or validating answers - susopara provides functions for processing data for custom calculations. paradataviewer generates standardized reports with processing monitoring statistics related to questionnaire, interviewer and area, such as interview duration, and number of answer changes. 

Review 

To review incoming data for anomalies or suspicious behavior, the LSMS team has developed tools to streamline both traditional and more modern checks. For the former, susoreview provides functions for composing quality checks and for automating workflow actions if data quality issues are detected, e.g., reject an interview with an appropriate set of messages provided. For more modern checks, rissk relies on machine learning to identify at-risk interviews, generating a unit risk score based on patterns found in survey microdata and paradata. 

3.      Cleaning and Preparation 

Once data collection ends, data must be cleaned and prepared for publication, a process that tends to be lengthy and laborious. At this stage, analysts typically create countless scripts to inspect and clean every variable in each survey data files. cleanstart provides an interactive graphical interface to create template cleaning programs, so the analyst can dedicate their time, skill and judgment to a task that computers are not (yet) good at doing: cleaning data. 

During batch data operations, it is often useful to select similar variables and give them similar treatment - for example, collect all numerical variables and inspect them individually for outliers. For those who find Stata’s glob patterns limiting, selector offers selection by regex pattern. And for those who use Survey Solutions, selector enables variable selection based on questionnaire metadata, e.g., question type. 

Before data can be published, it has to be documented in the data sets themselves through informative variable and value labels. Rather than performing tedious, manual inspection and remediation of labels, analysts can use labeller to find flaws in labels, address those issues, and confirm that labels conform to expectations – all without having to inspect each label individually. 

4.      Development Tools 

During or after a survey, the project team may want to create a package that implements a solution to their common problem, which is likely an issue for others as well. To facilitate this type of work, the LSMS team has developed packages to streamline Stata package development. 

adodown offers workflow commands that automate manual tasks at each stage of development. When the project starts, adodown creates the necessary scaffolding for the package, e.g., folders and pkg file. For each package command, it uses templates to create the files required, e.g., ado, documentation, unit test, and adds appropriate entries in the pkg file. For documentation, it allows developers draft in plain Markdown while creating standard help files in SMCL. 

adodownr automatically deploys a package documentation website. For users, this provides an easy way to discover packages, to understand what they do and to explore how commands work –all without having to install the package. For developers, this provides packages with a welcome web presence and offers a home for additional documentation, such as how-to guides, technical notes and FAQs, and in addition, keeps HTML documentation up to date with SMCL documentation through continuous deployment via GitHub Actions. 

Resources

All the tools mentioned above, as well as many others, are available on the Living Standards Measurement Study’s GitHub, just click here