An Update on The MatchIt Package in R

By Noah Greifer

One of the things we hope to do at Code Horizons is help steer you toward the best tools to meet your needs. Here’s a guest post by Noah Greifer, a postdoctoral fellow at Johns Hopkins and the developer of WeightIt and cobalt. Noah has just completed a massive overhaul of the workhorse MatchIt package for matching in R. This post will introduce you to some of its exciting new features. You can learn how to use all of these packages in Matching and Weighting for Causal Inference with Using R, offered by Statistical Horizons.

When evaluators cannot randomly assign participants into treatment groups, confounding variables can bias the estimated impact of a treatment, intervention, or exposure on an outcome. One method for adjusting for confounding is matching, which involves reorganizing the sample by pairing, discarding, or stratifying units so that, in the reorganized sample, the treatment is independent of the measured confounding variables, mimicking a randomized experiment. Although other approaches exist for removing confounding when the relevant confounding variables are thought to have been adequately measured, such as outcome regression and inverse probability weighting, matching has certain advantages, including imparting robustness to model misspecification and a clear separation between the design and analysis phases of an effect estimation procedure. A popular, though often misunderstood, method of matching is propensity score matching, which involves computing the propensity score—the predicted probability of receiving treatment—for each unit and pairing units on this score. Dr. Stephen Vaisey teaches an excellent course on propensity score methods that explains the theory and procedures in detail. Stuart (2010) and Austin (2011) are excellent introductory articles for applied researchers.

Many statistical packages offer tools for propensity score matching, and, for R users, the MatchIt package has been a staple since the release of its companion article, “Matching as Nonparametric Preprocessing for Reducing Model Dependence in Parametric Causal Inference” by Ho, Imai, King, and Stuart in 2007. MatchIt implements in the recommendations in Ho et al. (2007), which involve considering matching as a data preprocessing step that adds robustness to a future statistical procedure rather than a unique statistical method unto itself. MatchIt offers a familiar and simple interface and a variety of methods to customize the matching specification, and it acts as a flexible wrapper to functions in various packages that implement a number of matching methods, each of which has a different syntax that MatchIt users do not need to learn.

Despite its popularity, MatchIt has had a significant number of issues that have arisen and gone unaddressed over the years. As someone passionate about writing R packages and making causal inference methods accessible, I felt I was the right person to embark on a major update of the package, which had long ago inspired the development of my own R packages cobalt and WeightIt. This update involved completely rewriting the internals of virtually every function in the package and rewriting the documentation from scratch. The purpose of this post is to describe some of the more significant improvements to MatchIt with the hope of encouraging users to take another look at all that MatchIt 4.0.0 may have to offer for their research endeavors.

Basic Use of MatchIt

The basic use of MatchIt is the following: first, we start with a dataset (in this case the lalonde dataset included with the package) and use the matchit() function to estimate a propensity score and perform the matching. Here we perform full matching on the propensity score.

library("MatchIt")
data("lalonde")

m.out <- matchit(treat ~ age + educ + race + married + nodegree + re74 + re75,
                 data = lalonde, method = "full")

A printout of the matchit object reveals the details of the matching procedure. Other types of matching, including nearest neighbor matching with and without replacement, optimal matching, genetic matching, (coarsened) exact matching, and propensity score subclassification are available as well.

m.out

## A matchit object
##  - method: Optimal full matching
##  - distance: Propensity score
##              - estimated with logistic regression
##  - number of obs.: 614 (original), 614 (matched)
##  - target estimand: ATT
##  - covariates: age, educ, race, married, nodegree, re74, re75

Next we assess balance on the matched set, ensuring that the distribution of covariates is similar between the treatment groups.

summary(m.out, un = FALSE)

## 
## Call:
## matchit(formula = treat ~ age + educ + race + married + nodegree + 
##     re74 + re75, data = lalonde, method = "full")
## 
## Summary of Balance for Matched Data:
##            Means Treated Means Control Std. Mean Diff. Var. Ratio eCDF Mean eCDF Max Std. Pair Dist.
## distance          0.5774        0.5761          0.0060     0.9918    0.0039   0.0486          0.0192
## age              25.8162       24.6928          0.1570     0.4853    0.0838   0.3220          1.2606
## educ             10.3459       10.3227          0.0116     0.5577    0.0235   0.0620          1.2200
## raceblack         0.8432        0.8347          0.0236          .    0.0086   0.0086          0.0378
## racehispan        0.0595        0.0583          0.0049          .    0.0012   0.0012          0.5638
## racewhite         0.0973        0.1071         -0.0329          .    0.0098   0.0098          0.4168
## married           0.1892        0.1285          0.1549          .    0.0607   0.0607          0.4806
## nodegree          0.7081        0.7040          0.0090          .    0.0041   0.0041          0.9143
## re74           2095.5737     2199.7126         -0.0213     1.2008    0.0383   0.2350          0.8668
## re75           1532.0553     1524.8362          0.0022     2.0048    0.0651   0.2308          0.7932
## 
## Sample Sizes:
##               Control Treated
## All            429.       185
## Matched (ESS)   53.33     185
## Matched        429.       185
## Unmatched        0.         0
## Discarded        0.         0

If balance is achieved, we extract the matched dataset and estimate a treatment effect and its standard error, accounting for the paired nature of the data.

library("lmtest")
library("sandwich")

md <- match.data(m.out)

fit <- lm(re78 ~ treat, data = md, weights = weights)
coeftest(fit, vcov. = vcovCL, cluster = ~subclass)

## 
## t test of coefficients:
## 
##             Estimate Std. Error t value  Pr(>|t|)    
## (Intercept)  4611.20     681.48  6.7664 3.093e-11 ***
## treat        1737.95     773.92  2.2456   0.02508 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

With this basic use in mind, below we’ll go into the new features added in the update.

New Features and Fixes

A number of frequently reported issues have been fixed. These include unclear or unhelpful errors and warnings, inconsistent syntax and output, and long runtimes with little ability to monitor progress. Although a variety of features were available for several methods, now there is a consistent set of features that work and work smoothly with all methods that allow them, as displayed in the table below.

	Nearest neighbor matching	Genetic matching	Optimal matching	Full matching
Caliper matching	Improved	Improved		New
Exact matching	Existed	Improved	New	New
Mahalanobis distance matching	Existed	New	New	New
K:1 matching	Existed	Improved	Existed
Subclass output	New	New	Existed	Existed
Variable ratio matching	New		New

I will go into more detail on a few of these additions below. Some of the most important updates are related to pairing, generalizability, and documentation, and those will be the focus here. A full list of new features in MatchIt 4.0.0 is available on MatchIt’s new website.

Pairing

Most matching methods offered in MatchIt involve pairing—assigning control units to be paired with treated units with whom they are close on some metric (e.g., on the propensity score distance). With most methods, unpaired units are dropped from the sample. There has been some debate in the statistics literature about the importance of retaining and using pair membership in the estimation of treatment effects in the matched sample. Those in favor of ignoring pair membership argue that pairing is simply an instrument to achieve subset selection, that pairing on the propensity score doesn’t suggest paired units will actually be close to each other on the covariates, and that pair membership becomes obsolete when conditioning on the variables used in the matching (e.g., using outcome regression). Those in favor of retaining pair membership argue that pair membership is not just instrumental but part of the analysis procedure itself, that pairing on the covariates themselves does yield close pairs in the covariate space, and that accounting for pair membership is necessary for achieving nominal confidence interval coverage.

The MatchIt update includes several features that allow users to interface more directly with pairing than they could in previous versions. These features include consistently returning pair membership after matching, providing methods to assess pair closeness, providing methods to customize pairing and ensure closer pairs, and offering documentation on how to use pair membership in effect and standard error estimation.

Pair membership. In prior versions of MatchIt, pair membership was only available for some methods, leading people to use custom code snippets to recover pair membership when using the other forms of matching, including the default and most common method, nearest neighbor matching without replacement. Now, pair membership is included with all matching methods in the subclass component of the output object. When using the match.data() function to extract the matched dataset from the matchit output object, pair membership is always included in the subclass column. When matching with replacement, control units may be part of multiple pairs; the get_matches() function has been rewritten to create a matched dataset that allows pair membership to be included even in this case.

Pair membership is now included in the balance output provided by the summary() function in the Std. Pair Dist. column. This makes it easy to see on which covariates paired units are close to each other; all else equal, matching specifications that yield closer pairs should be preferred. The quantity is the same quantity optimized with when using optimal pair or optimal full matching, making it easy to see the benefits these methods offer and to assess their performance.

Creating close pairs. There are several ways to reduce the distance between pairs, including exact matching on some variables. Mahalanobis distance matching on others, using calipers. Now, calipers can be included not just on the propensity score but also on the covariates themselves, making it possible to supply constraints like that members of pairs must be within 5 of years of each other, an often-requested feature. When combining exact matching with another matching method, the exact argument now allows a formula interface, making it easy to request exact matching on coarsened versions of variables, e.g., by specifying exact = ~cut(x1, 5) to exact match on a 5-binned version of x1. This syntax also works for the mahvars argument, which allows for Mahalanobis distance matching when a propensity scores is estimated for another purpose (e.g., a caliper); extensive documentation on this argument explains how to use matchit() to perform Mahalanobis distance matching within propensity score calipers, a previously available but rarely used method.

Using pair membership. Finally, extensive documentation on estimating effects after matching has been included in package vignettes, also available on the new MatchIt website. The vignettes explain how to incorporate pair membership into effect and standard error estimation, adjust for covariates in the outcome model, and use bootstrapping to estimate confidence intervals. There are detailed explanations for estimating the effect of a treatment on a continuous, binary, and survival outcome and with each different form of matching. Much of the evidence guiding these recommendations comes from the extensive simulation work done by P. C. Austin and others on estimating effects after matching. Simulation work by Austin and Small (2014) and recent theoretical work by Abadie and Spiess (2020) have highlighted the importance of incorporating pair membership in standard error estimation, and the new tools and documentation available to capitalize on pair membership make following these recommendations possible and straightforward.

Generalizability

Although the primary purpose of matching is to enhance internal validity by removing the effects of confounding, matching can also affect external validity, the validity of generalization of the estimated effect to a target population. MatchIt offers a few new features to enhance generalizability, including the choice of additional estimands for some matching methods and the ability to incorporate sampling weights into propensity score estimation and balance assessment.

Specifying the estimand. In many cases, it is important to identify and retain the target population to which an estimated effect is to generalize. In the causal inference literature, the target population corresponds to an estimand, the average treatment effect in a specified population. Common estimands include the average treatment effect in the population (ATE), the average treatment effect among the treated units (ATT) and the average treatment effect among the control units (ATC). Matching is typically used to estimate the ATT when pairing control units to treated units and discarding the unmatched control units.

In prior versions of MatchIt, only the ATT was available for estimation. Some matching methods are able to allow estimation of the ATE; these methods typically do not discard units from the sample and instead re-weight units in each treatment group to resemble the target population, in a similar fashion to the related method of inverse probability weighting. The formula for computing these weights determines the estimand. With full matching and propensity score subclassification, the ATE can now be targeted using MatchIt by supplying "ATE" to the new estimand argument. In addition, all methods now support ATC estimation by supplying "ATC" to estimand; in the past, users would have to manually switch the values of the treatment assignment vector. The default for all methods remains the ATT, but these new capabilities expand MatchIt’s utility in estimating generalizable effects, ensuring that the target population is determined not just by the method used to adjust for confounding but also by the substantive considerations brought by the user.

Using sampling weights. Large, representative datasets often come with sampling weights, and it is now straightforward to include the sampling weights as part of the matching, balance assessment, and effect estimation processes. The new s.weights argument in matchit() allows users to supply sampling weights to the function used to estimate the propensity scores and ensure that the sampling weights are accounted for in the balance assessment by automatically weighting the unmatched and matched samples in the summary() output. match.data() automatically incorporates sampling weights into the matching weights used for effect estimation so the user doesn’t have to take the extra step of applying them separately. This allows for a single smooth workflow, which is described in more detail a standalone vignette dedicated just to the use of sampling weights in matching.

Documentation

Most of MatchIt’s documentation existed solely in the 2011 Journal of Statistical Software paper on MatchIt (Ho et al., 2011) and in its HTML version on author Gary King’s website. All-new documentation has been written not just for every function in MatchIt but also separately for each matching method, and all of that documentation is now included in the package and accessible in R as it is with most other R packages. In addition, five new vignettes are available detailing the recommended uses of MatchIt and reviewing the statistical literature on best practices for matching:

“Getting Started”: demonstrates the typical use of MatchIt functions to perform matching, evaluate and compare matching specifications, proceed with effect estimation, and report the results of a matching analysis
“Matching Methods”: describes a taxonomy of matching methods available in MatchIt, the many ways to customize them, and the reasons for and implications of these specifications
“Assessing Balancing”: describes how to assess balance after matching using numerical and graphical summaries of balance, including functions available both in MatchIt and in its companion package cobalt
“Estimating Effects”: describes best practices in estimating effects and standard errors after matching, synthesizing a vast literature on this topic and providing code demonstrations of estimating effects using models for continuous, binary, and survival outcomes, incorporating pair membership and covariates into the outcome model, and using robust and bootstrap-based standard errors and confidence intervals
“Matching with Sampling Weights”: explains how to perform matching in the presence of sampling weights

All documentation is available on the new MatchIt website built using pkgdown and hosted on GitHub, which makes it easy to click between cross-references and access support in a simple and straightforward way.

Other New Features

The list of new features is long, so readers are encouraged to view them at the MatchIt website. Some more notable changes not mentioned above include increased speed for nearest neighbor matching due to programming with Rcpp, more options for estimating propensity scores, and new balance measures and plots. Commonly reported errors have been fixed, and error and warning messages are more informative. Procedures that unexpectedly involved random processes have been made to be consistent each time to enhance replicability.

With these updates, we hope users will find new ways MatchIt can help them in performing robust, replicable, and valid research. We welcome users to provide comments, questions, bug reports, and suggestions on MatchIt’s GitHub Issues page.

References

Abadie, A., & Spiess, J. (2020). Robust Post-Matching Inference. Journal of the American Statistical Association, 0(ja), 1–37. https://doi.org/10.1080/01621459.2020.1840383

Austin, P. C. (2011). An introduction to propensity score methods for reducing the effects of confounding in observational studies. Multivariate Behavioral Research, 46(3), 399–424. https://doi.org/10.1080/00273171.2011.568786

Austin, P. C., & Small, D. S. (2014). The use of bootstrapping when using propensity-score matching without replacement: A simulation study. Statistics in Medicine, 33(24), 4306–4319. https://doi.org/10.1002/sim.6276

Ho, D. E., Imai, K., King, G., & Stuart, E. A. (2007). Matching as Nonparametric Preprocessing for Reducing Model Dependence in Parametric Causal Inference. Political Analysis, 15(3), 199–236. https://doi.org/10.1093/pan/mpl013

Ho, D. E., Imai, K., King, G., & Stuart, E. A. (2011). MatchIt: Nonparametric Preprocessing for Parametric Causal Inference. Journal of Statistical Software, 42(8). https://doi.org/10.18637/jss.v042.i08

Stuart, E. A. (2010). Matching Methods for Causal Inference: A Review and a Look Forward. Statistical Science, 25(1), 1–21. https://doi.org/10.1214/09-STS313