/**************************************************************** MATCHING2.DO This code available at: http://pauldickman.com/software/stata/matching2.do Construct a matched cohort study from two data sets (exposed and unexposed). Note we are matching on baseline characteristics (i.e., this is not for a nested case-control study). The code illustrates how to randomly select up to 5 unexposed comparators matched on sex, year, and age (plus or minus 5 years). exposed.dta contains the data for the exposed (n=156) unexposed.dta contains data for the unexposed (n=7775) See http://pauldickman.com/software/stata/matching/ for details. Need to install rangejoin: ssc install rangestat matching.do contains code written by Paul Dickman. matching2.do contains more elegant and easier to generalise code thanks to Bjarte Aagnes. Bjarte Aagnes & Paul Dickman 16 March 2023 *****************************************************************/ clear all local base http://pauldickman.com/software/stata/ tempfile exposed unexposed copy `base'/exposed.dta `exposed' copy `base'/unexposed.dta `unexposed' use `exposed', clear rangejoin age -5 5 using `unexposed', by(sex yydx) rename (id age status dx exit)(=1) // exposed rename (*_U)(*0) // unexposed // randomly select 5 unexposed if there are more than 5 matches set seed 8675309 gen double shuffle = runiform() by id1 (shuffle), sort: keep if _n <= 5 drop shuffle // reshape from wide format to long format keep id1 sex yydx *0 rename (id1 *0)(set_id *) append using `exposed' replace set_id = id if mi(set_id) gen byte exposed = (set_id==id ), after(set_id) order *id exposed gsort set_id -exposed id list set_id id exposed sex yydx age status dx exit in 1/18 , sepby(set_id)