INTRODCTION

In this Data Analysis Project, I am going to work with Alzheimer’s Disease(AD) and Factors affecting it. The Project will mainly be focused around building a Logistic Regression Model to fit to the data.

MOTIVE

  • To understand how the parameter levels, scores and factors affect a person, in terms of them having AD or not.

DATA OVERVIEW

Let’s take a look at the Data

First few rows of AD Data
DX_bl AGE PTGENDER PTEDUCAT FDG AV45 HippoNV e2_1 e4_1 rs3818361 rs744373 rs11136000 rs610932 rs3851179 rs3764650 rs3865444 MMSCORE TOTAL13 ID
0 71.7 2 14 6.82111 1.105695 0.5292994 1 0 1 1 1 1 1 0 0 26 8.00 1
0 77.7 1 18 6.36744 1.105695 0.5377612 0 0 1 0 1 1 0 0 1 30 1.67 2
0 72.8 2 18 6.36744 1.105695 0.2688816 0 1 1 1 1 0 1 0 1 30 12.00 3
0 69.6 1 13 6.36744 1.105695 0.5762121 0 0 1 1 1 1 0 0 0 28 3.00 4
0 70.9 1 13 6.36744 1.105695 0.6007317 1 0 1 1 1 0 0 0 0 29 10.00 5
0 65.1 2 20 6.36744 1.105695 0.4944231 0 1 1 0 0 1 1 0 0 30 3.67 6
  • The Dataset contains 517 rows each representing one individual’s measure of 19 parameters.
  • The Dataset is obtained from ADNI.
  • The Dataset consists of a Healthy Control Group and AD patients. The description of each column is given below:
    Column Description
    column names Description
    DX_bl AD Diagnosis Result- 0: AD negative, 1: AD positive
    AGE Age
    PTGENDER Gender
    PTEDUCAT Education Level
    FDG Biomarker measure after administration of 18F-FDG
    AV45 Biomarker measure after administration of AV45
    HippoNV Nomalized Hippocampus Volume
    e2_1 APOE2 Gene- 0: Variant Absent, 1: Variant Present
    e4_1 APOE4 Gene- 0: Variant Absent, 1: Variant Present
    rs3818361 specific gene- 0: Variant Absent, 1: Variant Present
    rs744373 specific gene- 0: Variant Absent, 1: Variant Present
    rs11136000 specific gene- 0: Variant Absent, 1: Variant Present
    rs610932 specific gene- 0: Variant Absent, 1: Variant Present
    rs3851179 specific gene- 0: Variant Absent, 1: Variant Present
    rs3764650 specific gene- 0: Variant Absent, 1: Variant Present
    rs3865444 specific gene- 0: Variant Absent, 1: Variant Present
    MMSCORE Mental Medical Examination Score
    TOTAL13 Neurobattery Score
    ID ID of Individual

So, this wraps up the Introduction and Overview of the data, we now understand what we have in the data. We now proceed to visually inspecting the data.

VISUAL OVERVIEW

Here we will explore the data visually through plots, diagrams, etc. to understand what parameters affect the diagnosis status and to what degree. Our main focus will be to understand graphically or on an upper level how well each parameter measures segregate between an AD patient and a healthy patient.

This wraps up the visual overview of our data and we here found some of the parameters which are quite important in segregating between the 2 classes we are aiming to fit a model to explain. In the next section we will actually build the model.

DATA MODELLING

Here what we are trying to explain using all the other parameter levels is whether a person has AD or not. So, it is quite rational to fit a Logistic Regression Model, which has a linear decision boundary cause we see there doesn’t exist too much non-linearity in the data. So, below we will go through the fitting and choosing of the Model.

## Analysis of Deviance Table
## 
## Model 1: DX_bl ~ (AGE + PTGENDER + PTEDUCAT + FDG + AV45 + HippoNV + e2_1 + 
##     e4_1 + rs3818361 + rs744373 + rs11136000 + rs610932 + rs3851179 + 
##     rs3764650 + rs3865444 + MMSCORE + TOTAL13 + ID) - ID
## Model 2: DX_bl ~ FDG + HippoNV + rs3818361 + rs11136000 + rs610932 + MMSCORE + 
##     TOTAL13
##   Resid. Df Resid. Dev  Df Deviance Pr(>Chi)
## 1       499     179.44                      
## 2       509     184.92 -10  -5.4803   0.8569

CONCLUSION

The above modelling gives us the following conclusion: - FDG Biomarker measure, Hippocampus Volume, Mental Examination Score, TOTAL13, genes: rs3818361, rs11136000, rs610932 play an important role in predicting and explaining a person’s AD Status. - Medications improving body’s condition to increase the above parameter(except TOTAL13) values can greatly impact in reducing AD condition.

BIBLIOGRAPHY

The data used for this Analysis is obtained from ADNI.