# Building a Cox Proportional Hazard Model

• Machine Learning

### Introduction

Proportional Hazard Models belong to the class of survival models relating time that passes to the occurrence of a particular event. In this post, I’ll guide you on how to make a Cox Proportional hazard regression model using TensorFlow which is mainly used for quantitative variables.

### Required Packages

```import numpy as np
import pandas as pd
import sklearn
import matplotlib.pyplot as plt
from lifelines import CoxPHFitter
from lifelines.utils import concordance_index as cindex
from sklearn.model_selection import train_test_split```

#### The Dataset

For the purpose of illustrating I’ve used the “Mayo Clinic Primary Biliary Cirrhosis Data” dataset from Kaggle. You can find this dataset here

#### A Bit of pre-processing

The Cox Proportional models can also be used with categorical values like those present in this dataset. In this dataset, the female patients were represented by “f” and the male patients by “m”. With a bit of pre-processing, I changed “f” to 0 and “m” to 1 so that they can be used in the model for regression analysis.

```for i in df.index:
df.at[i, 'sex'] = 0 if df.loc[i, 'sex'] == "f" else 1```
```np.random.seed(0)
df_dev, df_test = train_test_split(df, test_size = 0.2)
df_train, df_val = train_test_split(df_dev, test_size = 0.25)```

Splitting the dataset into training and testing

```np.random.seed(0)
df_dev, df_test = train_test_split(df, test_size = 0.2)
df_train, df_val = train_test_split(df_dev, test_size = 0.25)```

#### Normalizing the Data

If you view the dataset, you’ll observe that the data isn’t normalized at all. To avoid overfitting, we shall now normalize the data

```continuos_columns = ['age','bili','chol','albumin','copper','alk.phos', 'ast', 'trig', 'platelet', 'protime']
mean = df_train.loc[:, continuous_columns].mean()
std = df_train[:, continuous_columns].std()
df_train.loc[:, continuous_columns] = (df_train.loc[:, continuous_columns] - mean) / std
df_val.loc[:, continuous_columns] = (df_val.loc[:, continuous_columns] - mean) / std
df_test.loc[:, continuous_columns] = (df_test.loc[:, continuous_columns] - mean) / std

```

### One-Hot Encoding the Values

```def one_hot_encode(dataframe, columns):
return pd.get_dummies(dataframe, columns = columns, drop_first=True, dtype = np.float)
to_encode = ["edema", "stage"]
one_hot_train = one_hot_encode(df_train, to_encode)
one_hot_test = one_hot_encode(df_test, to_encode)
one_hot_val = one_hot_encode(df_val, to_encode)
```

#### Removing the NaN Values

```one_hot_train.dropna(inplace=True)
```

#### Fitting the Model

```cph = CoxPHFitter()
cph.fit(one_hot_train, duration_col = 'time', event_col = 'status', step_size = 0.1)```

#### Analysing the Results

```cph.print_summary()
cph.plot_covariate_groups('trt', values = [0,1])```