Comparison of life table estimators of relative/net survival

Paul Dickman, Enzo Coviello

The code used in this tutorial, along with links to the data, is available here.

In this tutorial we will estimate net survival using both the Ederer II approach and the Pohar Perme approach. We will do this using both stnet and strs (and get the same results).

We first load the colon cancer data (restricting to localised stage) and stset.

. use colon if stage==1, clear
(Colon carcinoma, diagnosed 1975-94, follow-up to 1995)

. stset exit, origin(dx) fail(status==1,2) id(id) scale(365.24)
[output omitted]

. generate birthdate=dx-age*365.24

In some other examples, we stset using pre-calculated survival times (surv_mm). Here we specify the day of diagnosis (dx) and day of exit (exit) and let Stata calculate the survival time for us. stnet requires that we specify a date of birth; we don’t have this in our data set so we approximate it from day of diagnosis and age (in completed years) at diagnosis. Note that the day part of day of diagnosis has also been simulated (before the data were made available).

stnet requires us to specify date of birth and date of diagnosis; it then calculates attained age and attained year (i.e., the values of age and year during follow-up). strs requires us to specify age at diagnosis and year of diagnosis; it then calculates attained age and attained year. Our data set uses the strs default variables names for age at diagnosis (age) and year of diagnosis (yydx) so we don’t need to specify them in the code. strs allows decimal values for year of diagnosis, but the variable in the data set in yhear as an integer. Because stnet calculates year of diagnosis as a decimal, we will update the variable we have in our data from an integer to a decimal so that stnet and strs calculate year in the same way.

. replace yydx = 1960 + dx/365.241

We now call stnet.

. stnet using popmort, mergeby(_year sex _age) ///
>  breaks(0(.083333333)10) diagdate(dx) birthdate(birthdate) ederer ///
>  list(n d cre2 cns locns upcns secns) listyearly

         failure _d:  status == 1 2
   analysis time _t:  (exit-origin)/365.24
             origin:  time dx
                 id:  id

Cumulative net survival according to Pohar Perme, Stare and Estève method.
and cumulative relative survival according to Ederer II method.

  +----------------------------------------------------------------------+
  | start   end      n    d     cre2      cns    locns    upcns    secns |
  |----------------------------------------------------------------------|
  | .9167     1   5566   44   0.9207   0.9202   0.9112   0.9282   0.0043 |
  | 1.917     2   4681   34   0.8662   0.8651   0.8532   0.8760   0.0058 |
  | 2.917     3   3950   32   0.8276   0.8277   0.8136   0.8408   0.0069 |
  | 3.917     4   3324   22   0.7983   0.7994   0.7831   0.8146   0.0080 |
  | 4.917     5   2835   23   0.7755   0.7797   0.7612   0.7970   0.0091 |
  |----------------------------------------------------------------------|
  | 5.917     6   2367   16   0.7476   0.7508   0.7292   0.7710   0.0107 |
  | 6.917     7   1981    6   0.7271   0.7279   0.7033   0.7509   0.0121 |
  | 7.917     8   1667    9   0.7162   0.7187   0.6907   0.7446   0.0137 |
  | 8.917     9   1410   11   0.7115   0.7212   0.6891   0.7505   0.0157 |
  | 9.917    10   1173    9   0.7065   0.7198   0.6803   0.7554   0.0192 |
  +----------------------------------------------------------------------+

We see that the Pohar Perme (cns) and Ederer II (cre2) approaches give similar estimates of net survival, especially in the first 5 years. The two estimates will be even closer if we stratify on age (see later example).

We now call strs using the pohar option. The ht option specifies that the hazard transformation approach to estimation (rather than the actuarial) is used. The results are identical to those obtained with stnet.

. strs using popmort, breaks(0(.083333333)10) mergeby(_year sex _age) ///
>      ht pohar list(n d cr_e2 cns_pp lo_cns_pp hi_cns_pp) notables save(replace)

         failure _d:  status == 1 2
   analysis time _t:  (exit-origin)/365.24
             origin:  time dx
                 id:  id

The conditional survival proportion (p) is estimated by transforming the
estimated cumulative hazard rather than by the actuarial method (default).
See http://pauldickman.com/rsmodel/stata_colon/standard_errors.pdf for details.
       
. use grouped, clear
(Collapsed (or grouped) survival data)

. list end n d cr_e2 cns_pp lo_cns_pp hi_cns_pp if floor(end)==end

     +---------------------------------------------------------+
     | end      n    d    cr_e2   cns_pp   lo_cns~p   hi_cns~p |
     |---------------------------------------------------------|
 12. |   1   5566   44   0.9207   0.9202     0.9112     0.9282 |
 24. |   2   4681   34   0.8662   0.8651     0.8532     0.8760 |
 36. |   3   3950   32   0.8276   0.8277     0.8136     0.8408 |
 48. |   4   3324   22   0.7983   0.7994     0.7831     0.8146 |
 60. |   5   2835   23   0.7755   0.7797     0.7612     0.7970 |
     |---------------------------------------------------------|
 72. |   6   2367   16   0.7476   0.7508     0.7292     0.7710 |
 84. |   7   1981    6   0.7271   0.7279     0.7033     0.7509 |
 96. |   8   1667    9   0.7162   0.7187     0.6907     0.7446 |
108. |   9   1410   11   0.7115   0.7212     0.6891     0.7505 |
120. |  10   1173    9   0.7065   0.7198     0.6803     0.7554 |
     +---------------------------------------------------------+

Age-specific estimates

We now estimate survival within age groups, by including the by(agegrp) option to strs. The Ederer II and Pohar Perme estimates will be identical if expected survival is identical for all individuals. This won’t happen in practice, but the difference between them (i.e., the bias in Ederer II) will be propoprtional to the heterogeneity in expected survival. As such, performing the analysis within age groups will reduce the heterogeneity in expected survival and therefore the bias in in Ederer II (Lambert et al 2015).

We see from the results below, that the Ederer II and Pohar Perme estimates are closer than they were when all ages were analysed in a single life table. The biggest differences are for the oldest age group, which is to be expected because the heterogeneity in expected survival is greater for that age group.

. strs using popmort, breaks(0(.083333333)10) mergeby(_year sex _age) by(agegrp) ///
>      ht pohar list(n d cr_e2 cns_pp lo_cns_pp hi_cns_pp) notables save(replace)

[output omitted]

. use grouped, clear
(Collapsed (or grouped) survival data)

. list agegrp end n d cr_e2 cns_pp lo_cns_pp hi_cns_pp if floor(end)==end

     +------------------------------------------------------------------+
     | agegrp   end      n    d    cr_e2   cns_pp   lo_cns~p   hi_cns~p |
     |------------------------------------------------------------------|
 12. |   0-44     1    288    3   0.9618   0.9618     0.9315     0.9788 |
 24. |   0-44     2    254    0   0.8916   0.8916     0.8490     0.9227 |
 36. |   0-44     3    229    1   0.8498   0.8498     0.8016     0.8870 |
 48. |   0-44     4    207    1   0.8101   0.8101     0.7576     0.8523 |
 60. |   0-44     5    183    1   0.7797   0.7798     0.7242     0.8255 |
     |------------------------------------------------------------------|
 72. |   0-44     6    167    1   0.7555   0.7554     0.6973     0.8040 |
 84. |   0-44     7    149    0   0.7484   0.7483     0.6890     0.7980 |
 96. |   0-44     8    137    0   0.7410   0.7409     0.6803     0.7918 |
108. |   0-44     9    125    0   0.7271   0.7267     0.6640     0.7798 |
120. |   0-44    10    115    1   0.7179   0.7174     0.6528     0.7721 |
     |------------------------------------------------------------------|
132. |  45-59     1    947    3   0.9589   0.9589     0.9431     0.9704 |
144. |  45-59     2    850    5   0.9188   0.9188     0.8978     0.9356 |
156. |  45-59     3    739    6   0.8670   0.8670     0.8412     0.8889 |
168. |  45-59     4    647    1   0.8381   0.8382     0.8098     0.8628 |
180. |  45-59     5    586    4   0.8076   0.8078     0.7768     0.8350 |
     |------------------------------------------------------------------|
192. |  45-59     6    505    0   0.7803   0.7805     0.7472     0.8101 |
204. |  45-59     7    452    0   0.7704   0.7707     0.7357     0.8017 |
216. |  45-59     8    401    1   0.7464   0.7464     0.7089     0.7798 |
228. |  45-59     9    356    0   0.7512   0.7513     0.7127     0.7855 |
240. |  45-59    10    315    1   0.7512   0.7517     0.7116     0.7872 |
     |------------------------------------------------------------------|
252. |  60-74     1   2496   19   0.9386   0.9385     0.9265     0.9486 |
264. |  60-74     2   2118   12   0.8795   0.8791     0.8629     0.8935 |
276. |  60-74     3   1810   11   0.8367   0.8360     0.8170     0.8533 |
288. |  60-74     4   1546    8   0.8032   0.8025     0.7810     0.8221 |
300. |  60-74     5   1330   12   0.7713   0.7693     0.7454     0.7913 |
     |------------------------------------------------------------------|
312. |  60-74     6   1127    6   0.7377   0.7361     0.7098     0.7604 |
324. |  60-74     7    956    4   0.7177   0.7161     0.6874     0.7426 |
336. |  60-74     8    801    3   0.7122   0.7119     0.6808     0.7407 |
348. |  60-74     9    676    7   0.7015   0.7004     0.6660     0.7320 |
360. |  60-74    10    546    4   0.6885   0.6879     0.6497     0.7229 |
     |------------------------------------------------------------------|
372. |    75+     1   1835   19   0.8768   0.8758     0.8564     0.8927 |
384. |    75+     2   1459   17   0.8229   0.8212     0.7961     0.8436 |
396. |    75+     3   1172   14   0.7978   0.7978     0.7676     0.8245 |
408. |    75+     4    924   12   0.7757   0.7775     0.7416     0.8091 |
420. |    75+     5    736    6   0.7748   0.7812     0.7382     0.8180 |
     |------------------------------------------------------------------|
432. |    75+     6    568    9   0.7559   0.7565     0.7037     0.8012 |
444. |    75+     7    424    2   0.7195   0.7208     0.6596     0.7729 |
456. |    75+     8    328    5   0.7077   0.7117     0.6398     0.7718 |
468. |    75+     9    253    4   0.7102   0.7340     0.6463     0.8032 |
480. |    75+    10    197    3   0.7226   0.7470     0.6314     0.8311 |
     +------------------------------------------------------------------+
Paul Dickman
Paul Dickman
Professor of Biostatistics

Biostatistician working with register-based cancer epidemiology.