Matching an exposed cohort with an unexposed cohort
The code used in this tutorial is available here. There is some more elegant code here.
The code uses the user-written -rangejoin- command. Install using:
ssc install rangestat
Introduction
This page illustrates how to construct a matched cohort study from two data sets (exposed and unexposed). Note we are matching on baseline characteristics (i.e., this is not for a nested case-control study).
The page illustrates how to use the rangejoin
command to randomly select up to 5 unexposed comparators matched on sex, year, and age (plus or minus 5 years). It assumes you have a dataset containing the exposed and a separate data set containing the unexposed. In my example, these data sets are called exposed
(with 156 observations) and unexposed
(6054 observations).
Here is how we use the rangejoin
command:
use exposed, clear
rangejoin age -5 5 using unexposed, by(sex yydx)
For each observation in the exposed dataset, rangejoin
creates an observation with every match. For the exposed patient with id 8353 we identified 53 matches. Here is a list of the first 10.
. list id id_U sex yydx dx dx_U age age_U if id==8353
+-------------------------------------------------------------------+
| id id_U sex yydx dx dx_U age age_U |
|-------------------------------------------------------------------|
1. | 8353 3349 Female 1987 16jul1987 15jun1987 73 68 |
2. | 8353 3962 Female 1987 16jul1987 15jun1987 73 68 |
3. | 8353 4050 Female 1987 16jul1987 14jan1987 73 68 |
4. | 8353 4391 Female 1987 16jul1987 14feb1987 73 68 |
5. | 8353 3741 Female 1987 16jul1987 16may1987 73 69 |
|-------------------------------------------------------------------|
6. | 8353 3939 Female 1987 16jul1987 16mar1987 73 69 |
7. | 8353 3992 Female 1987 16jul1987 15dec1987 73 69 |
8. | 8353 4195 Female 1987 16jul1987 16apr1987 73 69 |
9. | 8353 4240 Female 1987 16jul1987 04jan1987 73 69 |
10. | 8353 3623 Female 1987 16jul1987 15oct1987 73 70 |
The next step is to randomly select 5 unexposed if there are more than 5 matches. We assign a random number to each observation, and then for each exposed patients we keep only the 5 lowest values of the random number.
set seed 8675309
gen double shuffle = runiform()
by id (shuffle), sort: keep if _n <= 5
drop shuffle
There are now 5 matches for exposed patient with ID 8353.
. count if id==8353
5
. list id id_U sex yydx dx dx_U age age_U if id==8353
+-------------------------------------------------------------------+
| id id_U sex yydx dx dx_U age age_U |
|-------------------------------------------------------------------|
396. | 8353 4052 Female 1987 16jul1987 17aug1987 73 75 |
397. | 8353 4196 Female 1987 16jul1987 28aug1987 73 70 |
398. | 8353 3962 Female 1987 16jul1987 15jun1987 73 68 |
399. | 8353 4050 Female 1987 16jul1987 14jan1987 73 68 |
400. | 8353 4085 Female 1987 16jul1987 16apr1987 73 72 |
+-------------------------------------------------------------------+
The final step is to reshape from wide format to long. Have a look at the code in matching.do or matching2. matching.do contains code that is less elegant but may be easier to understand whereas matching2.do contains code that is more elegant and easier to gereralise.
After reshaping and some data manipulation we have the following observations for ID 8353. We created a variable, set_id
to index the matched sets and a binary variable exposed
. Each matched set contains one exposed and five unexposed.
+--------------------------------------------------------------------------------------+
| set_id id exposed sex yydx age status dx exit |
|--------------------------------------------------------------------------------------|
| 8353 8353 1 Female 1987 73 Alive 16jul1987 31dec1995 |
|--------------------------------------------------------------------------------------|
| 8353 3962 0 Female 1987 68 Dead: cancer 15jun1987 30jan1992 |
| 8353 4050 0 Female 1987 68 Alive 14jan1987 31dec1995 |
| 8353 4052 0 Female 1987 75 Dead: cancer 17aug1987 01feb1988 |
| 8353 4085 0 Female 1987 72 Alive 16apr1987 31dec1995 |
| 8353 4196 0 Female 1987 70 Dead: other 28aug1987 13may1989 |
+--------------------------------------------------------------------------------------+