Further survival analyses

Set up

Let’s first load the packages required.

library(CDMConnector)
library(CohortSurvival)
library(dplyr)
library(cmprsk)
library(survival)

We’ll create a cdm reference to use our example MGUS2 survival dataset. In practice you would use the CDMConnector package to connect to your data mapped to the OMOP CDM.

cdm <- CohortSurvival::mockMGUS2cdm()

The CohortSurvival package does not have implemented functionality to do more complex survival analyses than Kaplar Meier curves, like Cox Proportional Hazards modelling. However, the format the data has to be in to be inputted to well-known modelling functions from packages like survival or cmprskcan be retrieved from OMOP data with some in-built functions in this package. Let’s see how to do it in both single event and competing risk survival settings.

Further analysis with single event survival

To get the time and status information we need for the coxph function in the package survival, for instance, we only need to call addCohortSurvival. The stratification variables need to be columns previously added to the cohort by the user.

input_survival_single <- cdm$mgus_diagnosis %>%
       addCohortSurvival(
       cdm = cdm,
       outcomeCohortTable = "death_cohort",
       outcomeCohortId = 1
       ) 

input_survival_single %>% 
  glimpse()
#> Rows: ??
#> Columns: 13
#> Database: DuckDB v0.10.3-dev797 [eburn@Windows 10 x64:R 4.2.1/:memory:]
#> $ cohort_definition_id <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
#> $ subject_id           <dbl> 1, 2, 4, 5, 6, 10, 11, 12, 13, 15, 17, 18, 19, 20…
#> $ cohort_start_date    <date> 1981-01-01, 1968-01-01, 1977-01-01, 1973-01-01, …
#> $ cohort_end_date      <date> 1981-01-01, 1968-01-01, 1977-01-01, 1973-01-01, …
#> $ age                  <dbl> 88, 78, 68, 90, 90, 79, 86, 89, 87, 85, 94, 86, 8…
#> $ sex                  <fct> F, F, M, F, M, F, M, F, M, M, F, M, M, M, F, F, F…
#> $ hgb                  <dbl> 13.1, 11.5, 15.2, 10.7, 12.9, 9.4, 11.8, 11.3, 11…
#> $ creat                <dbl> 1.30, 1.20, 1.20, 0.80, 1.00, 1.10, 1.00, 1.30, 1…
#> $ mspike               <dbl> 0.5, 2.0, 1.2, 1.0, 0.5, 2.3, 2.3, 1.2, 1.3, 1.0,…
#> $ age_group            <chr> ">=70", ">=70", "<70", ">=70", ">=70", ">=70", ">…
#> $ days_to_exit         <int> 30, 25, 92, 8, 4, 136, 2, 108, 10, 18, 34, 67, 16…
#> $ status               <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
#> $ time                 <dbl> 30, 25, 92, 8, 4, 136, 2, 108, 10, 18, 34, 67, 16…

This information should be enough to call any advanced function, like:

survival::coxph(survival::Surv(time, status) ~ age + sex, data = input_survival_single)
#> Call:
#> survival::coxph(formula = survival::Surv(time, status) ~ age + 
#>     sex, data = input_survival_single)
#> 
#>          coef exp(coef) se(coef)      z        p
#> age  0.061622  1.063561 0.003402 18.114  < 2e-16
#> sexM 0.358258  1.430835 0.065693  5.454 4.94e-08
#> 
#> Likelihood ratio test=391.2  on 2 df, p=< 2.2e-16
#> n= 1384, number of events= 963
survival::survdiff(survival::Surv(time, status) ~ sex, data = input_survival_single)
#> Call:
#> survival::survdiff(formula = survival::Surv(time, status) ~ sex, 
#>     data = input_survival_single)
#> 
#>         N Observed Expected (O-E)^2/E (O-E)^2/V
#> sex=F 631      423      471      4.88      9.67
#> sex=M 753      540      492      4.67      9.67
#> 
#>  Chisq= 9.7  on 1 degrees of freedom, p= 0.002

Further analysis with competing risk survival

For competing risk, there is a similar function that adds time and status information to the cohort of interest. We only need to specify which are the outcome and competing outcome of interest. We can also choose other options such as follow up time, censoring on a specific date, washout periods, or others.

input_survival_cr <- cdm$mgus_diagnosis %>%
  addCompetingRiskCohortSurvival(
    cdm = cdm,
    outcomeCohortTable = "progression",
    outcomeCohortId = 1,
    competingOutcomeCohortTable = "death_cohort",
    competingOutcomeCohortId = 1
  ) %>% 
  glimpse()
#> Rows: 1,384
#> Columns: 13
#> $ cohort_definition_id <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
#> $ subject_id           <dbl> 56, 124, 127, 147, 163, 167, 180, 186, 195, 206, …
#> $ cohort_start_date    <date> 1978-01-01, 1974-01-01, 1978-01-01, 1975-01-01, …
#> $ cohort_end_date      <date> 1978-01-01, 1974-01-01, 1978-01-01, 1975-01-01, …
#> $ age                  <dbl> 78, 73, 73, 58, 57, 54, 70, 76, 54, 79, 67, 62, 7…
#> $ sex                  <fct> M, M, M, M, F, F, F, F, F, M, M, M, M, F, M, F, M…
#> $ hgb                  <dbl> 10.3, 15.3, 12.4, 13.1, 12.2, 11.1, 14.3, 12.8, 1…
#> $ creat                <dbl> 3.0, 1.2, 1.6, 1.1, 0.8, 1.1, 1.2, 0.9, 1.1, 1.1,…
#> $ mspike               <dbl> 1.9, 1.7, 1.4, 0.8, 1.9, 2.0, 1.6, 1.9, 1.8, 2.1,…
#> $ age_group            <chr> ">=70", ">=70", ">=70", "<70", "<70", "<70", ">=7…
#> $ days_to_exit         <int> 44, 82, 60, 189, 260, 172, 107, 104, 171, 11, 206…
#> $ time                 <dbl> 29, 80, 30, 188, 201, 118, 81, 51, 153, 8, 190, 2…
#> $ status               <fct> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…

We can use the package cmprsk to fit a Fine and Gray model to the competing risk data. We first change our sex covariate to numeric.

input_survival_cr <- input_survival_cr %>%
  dplyr::mutate(sex = dplyr::if_else(sex == "M", 0, 1))

covs <- data.frame(input_survival_cr$age, input_survival_cr$sex)
names(covs) <- c("age", "sex")

summary(cmprsk::crr(ftime = input_survival_cr$time,
            fstatus = input_survival_cr$status,
            cov1 = covs,
            failcode = 1,
            cencode = 0))
#> Competing Risks Regression
#> 
#> Call:
#> cmprsk::crr(ftime = input_survival_cr$time, fstatus = input_survival_cr$status, 
#>     cov1 = covs, failcode = 1, cencode = 0)
#> 
#>        coef exp(coef) se(coef)     z p-value
#> age -0.0192     0.981  0.00585 -3.28   0.001
#> sex  0.2871     1.333  0.19309  1.49   0.140
#> 
#>     exp(coef) exp(-coef)  2.5% 97.5%
#> age     0.981       1.02 0.970 0.992
#> sex     1.333       0.75 0.913 1.945
#> 
#> Num. cases = 1384
#> Pseudo Log-likelihood = -726 
#> Pseudo likelihood ratio test = 8.32  on 2 df,

Disconnect from the cdm database connection

cdm_disconnect(cdm)