logo

[time-series] 생존 분석

pip install lifelines
import pandas as pd
cancer = pd.read_excel('cancer_survive.xlsx')
cancer.head()
typetimedelta
0111
1131
2131
3141
41101
cancer.type.value_counts()
1    52
2    28
Name: type, dtype: int64
import seaborn as sns
sns.stripplot(data=cancer, x='time', hue='delta')
<Axes: xlabel='time'>
 

카플란 마이어 추정치

from lifelines import KaplanMeierFitter
kmf = KaplanMeierFitter()
kmf.fit(cancer['time'], cancer['delta'])
<lifelines.KaplanMeierFitter:"KM_estimate", fitted with 80 total observations, 27 right-censored observations>
kmf.survival_function_
kmf.plot_survival_function()
<Axes: xlabel='timeline'>
cancer1 = cancer.query('type == 1')
kmf1 = KaplanMeierFitter()
kmf1.fit(cancer1['time'], event_observed=cancer1['delta'], label='type 1')
<lifelines.KaplanMeierFitter:"type 1", fitted with 52 total observations, 21 right-censored observations>
cancer2 = cancer.query('type == 2')
kmf2 = KaplanMeierFitter()
kmf2.fit(cancer2['time'], event_observed=cancer2['delta'], label='type 2')
<lifelines.KaplanMeierFitter:"type 2", fitted with 28 total observations, 6 right-censored observations>
ax = kmf1.survival_function_.plot()
kmf2.survival_function_.plot(ax=ax)
<Axes: xlabel='timeline'>
 

생존 함수 차이의 통계적 가설 검정

from lifelines.statistics import logrank_test
res = logrank_test(
    cancer1['time'], cancer2['time'],
    cancer1['delta'], cancer2['delta'], alpha=.95)
res.print_summary()
t_0-1
null_distributionchi squared
degrees_of_freedom1
alpha0.95
test_namelogrank_test
test_statisticp-log2(p)
02.790.093.40
 

넬슨 알렌 추정치

from lifelines import NelsonAalenFitter
naf1 = NelsonAalenFitter()
naf1.fit(cancer1['time'], event_observed=cancer1['delta'],
         label='type 1')
naf1.plot()
<Axes: xlabel='timeline'>