Thrombosis Diagnosis Analysis Report
Abstract Autoimmune conditions known as collagen diseases occur when the body’s immune system attacks its own skin, tissues, and organs. One of these conditions is thrombosis, a serious complication and one of the leading killers of people with collagen diseases. Physicians recently learned that anti-cardiolipin antibodies and thrombosis are closely related. The aim of this project is to find other features such as Demographic information and/or common medical tests which can also be taken into consideration while performing diagnosis on Thrombosis.
I. INTRODUCTION
Collagen is a fibrous protein found in cartilage and other connective tissue. Collagen diseases are autoimmune diseases in which the immune system of the body attacks its own skin, tissues, and organs. For example, if a patient generates antibodies for lung, they will lose their ability to do respiration and will die. The extent and causes of these diseases are partially known and not well understood and hence their classification can be a challenging task. One of these diseases is Thrombosis, which is an important and severe complication and is also one of the major caused of death in collgen diseases. It was recently discovered by medical physicians that Thrombosis is closely related to anti-cardiolipin antibodies.
II. BACKGROUND
Collagen is a powerful thrombotic stimulus that functions by direct and indirect binding to various platelet receptors. A variety of collagen types are known and several (e.g., collagen Types I, III, IV) are found in vascular tissues and are exposed upon disruption of the endothelium or more extensive vessel wall rupture. Some murine models of thrombosis purport to expose collagen to initiate thrombosis, however, the nature and extent of this exposure is not clear (Cooley 2011). Deep vein thrombosis (DVT) and its complication pulmonary embolism (PE), collectively known as venous thromboembolism (VTE), affect more than one per 1000 persons a year in western populations, with a higher risk among the elderly and patients with recent surgery, immobilization, fractures, pregnancy, and cancer (Kyrle and Eichinger 2005).
Anticardiolipin antibodies (ACLAs) are strongly associated with thrombosis and appear to be the most common of the acquired blood protein defects causing thrombosis. Although the precise mechanism(s) whereby ACLAs alter hemostasis to induce a hypercoagulable state remain unclear, several theories, have been advanced. The most common thrombotic events associated with ACLAs are deep vein thrombosis and pulmonary embolus (type I syndrome), coronary or peripheral artery thrombosis (type II syndrome), or cerebrovascular/retinal vessel thrombosis (type III syndrome), and occasionally patients present with mixtures (type IV syndrome). The relative frequency of ACLAs in association with arterial and venous thrombosis strongly suggests that these should be looked for in any individual with unexplained thrombosis; all three idiotypes (IgG, IgA, and IgM) should be assessed (Bick and Baker 1992).
III. DATA DESCRIPTION
The Databases used in this project are donated by one of the physicians from a University Hospital where patients visited regarding collagen diseases and/or were recommended by their local physicians, home doctors, other medical specialists. The 3 datasets provided by the University Hospital revolved mainly around the Thrombosis Diagnosis. All the datasets had ID of the patient which was the joining parameter used to perform analytical joins. These datasets roughly covered the demographic aspect of each patient, the medical test history of each patient and the factors for thrombosis diagnosis of some patients. The datasets were provided in the form of CSV files and were cleaned and processed using Python. The datasets were cleaned by removing the missing values, the unwanted columns and the unwanted rows.
IV. EXPLORATORY DATA ANALYSIS
There are a variety of columns present in the Dataset. The features fall into the Demographic category which gives information such as Age group and Gender or under the Thrombosis category which tells about the Thrombosis specific tests performed on the patient such as Antinucleus Antibody (ANA) and Anticardiolipin antibodies (ACLAs) or finally under the Common Medical tests category which gives us all the common Medical tests of the patient in a certain time period such as Urine tests, Blood Count and Chemistry Tests. To better understand these columns and their relation to the Thrombosis, some visual analysis is performed.
A. DEMOGRAPHIC ANALYSIS
The Demographics such as Age and Gender of a patient can be very useful while predicting any kind of disease. Females are more likely to test positive for Thrombosis as compared to Males. But these reports are from just one hospital and this observation cannot be generalized for the entire world. To make that generalization we would need more information about other Demographics of the area where the University is located and to what extent have those factors affected the patients (see Figure 1 (a)).
Additionally, there are more Mild Thrombosis patients in all age groups than other Degrees of Thrombosis, but patients aged 19 to 45 have the highest chance of being diagnosed with Mild Thrombosis. This pattern shifts for Severe Thrombosis, where patients who are not yet adults (0-18 years) are more likely to be diagnosed. Finally, patients between the ages of 31 and 60 are diagnosed with Extremely Severe Thrombosis. This data suggests that people aged 61 and up are the least likely to be diagnosed with Thrombosis (see Figure 1 (b)).
B. FEATURE ANALYSIS
Preliminary Analysis to check for Relation between Medical Tests and Thrombosis Diagnosis
A Correlation Plot is used to find the relation between any two variables. For this analysis, we are focused on the relation between Medical tests and Diagnosis of Thrombosis. The plot (see Figure 2) shows that there is a correlation between the Thrombosis Diagnosis and some of the Thrombosis Medical tests such as Antinucleus Antibody (ANA) and Anticardiolipin Antibodies (ACL_IGG). This correlation is also observed in the Common Medical tests such as U-Pro (Proteinuria), GPT (ALT glutamic pylvic transaminase), GOT (AST glutamic oxaloacetic transaminase), C3 (Complement 3), C4 (Complement 4), HCT (Hematoclit), PLT (Platelet Count) and HGB (Hemoglobin).
Since correlation coefficients cannot be used alone to determine whether these tests are relevant for diagnosing thrombosis, Visual analysis is used to determine whether there are any significant patterns in these tests to look for when diagnosing thrombosis.
Deeper Visual Analysis to detect some patterns in the Medical Tests
Since Anti-Cardiolipin Antibody (IgG) is the correlated feature to Thrombosis (see Figure 2 and Section 4.2.1), it was compared with other medical tests that were correlated to Thrombosis (see Figure 2 and Section 4.2.1) to see if there are any significant patterns while diagnosing Thrombosis.
The normal range for GPT and GOT is less than 60. It can be seen that many patients with Thrombosis had GPTs of 60 or higher, with only a few exceptions where patients with GPTs and GOTs outside the normal range were also diagnosed with Thrombosis (see Figure 3). Further investigation into Degrees of Thrombosis reveals that, with the exception of a few patients, the majority of patients diagnosed with thrombosis have normal GPT. Focusing on the patient with a GPT of 300 or higher, the patient has a severe case of thrombosis, which is intriguing (see Figure 3 (a)). Concentrating on the patients who have GOT of 100+, the patients have a severe case of thrombosis which is also interesting (see Figure 3 (b)). Hence it can be inferred that GPT and GOT are not good tests to use when diagnosing Mild Thrombosis, but more research into abnormally high GPT/GOT and Thrombosis Severity can be done with a larger volume of data.
C3 has a normal range of more than 35 while C4 has a normal range of greater than 10. Many patients who were diagnosed with Thrombosis had their C3 less than 35 along, implying that patients with an abnormal C3 are more prone to Thrombosis. It can also be observed that almost all patients who were diagnosed with Thrombosis had their C4 less than 18 which implies that the patients who have an abnormal C4 (from 0 to 15) are more prone to Thrombosis. This demonstrates that C3 and C4 are good tests to consider when diagnosing Thrombosis (see Figure 4).
HGB has a normal range between 10 and 17 while HCT has a normal range between 29 and 52. It can be observed that all patients who were diagnosed with Thrombosis had their HGB and HCT inside their normal ranges. This shows that HGB and HCT are not good tests to be taken into consideration while diagnosing Thrombosis (see Figure 5).
PLT has a normal range between 100 and 400. It can be observed that more than 95% of the patients who were diagnosed with Thrombosis had their PLT between 100 and 400. This shows that PLT is not a good test to be taken into consideration while diagnosing Thrombosis (see Figure 6 (a)).
On the other hand, U-PRO has a normal range between 0 and 30. It can be observed that many patients who were diagnosed with Thrombosis had their U-PRO between 0 and 30 and there were only a few exceptions where patients with U-PRO outside the normal range were also diagnosed with Thrombosis. Delving further into Degrees of Thrombosis, it can be observed that most of the patients who are diagnosed with thrombosis have normal U-PRO except a few patients. Concentrating on the patients who have U-PRO of 100+, the patients have a severe case of thrombosis which is interesting. This shows that U-PRO is not a good test to be taken into consideration while diagnosing Mild Thrombosis but further research with regards to abnormally high U-PRO and Severeness of Thrombosis can be performed with a larger volume of data (see Figure 6 (b)).
INSIGHTS
Since the data is from a specific University hospital (without any addition knowledge about the area), required a lot of assumptions while imputing the missing values and the analysis performed was only preliminary, the inferences drawn from the analysis are not conclusive. However, the analysis performed can be used as a starting point for further research into the diagnosis of Thrombosis. The analysis performed can be used to determine which tests are good to be taken into consideration while diagnosing Thrombosis and which tests are not good to be taken into consideration while diagnosing Thrombosis. Tests like C3 and C4 had significant patterns while tests like HGB, HCT and Platelet count did not show any significant patterns while diagnosing Thrombosis. Some tests like U-PRO, GOT and GPT had few patterns that might be good to look at while diagnosing extreme severe cases of Thrombosis. Demographic information such as Age groups and Gender proved to be relevant while analysing the Diagnosis and Degrees of Thrombosis and can be combined with other Quantitative variables (Medical Tests). For further research, Statistical tests and Machine Learning algorithms can be used to determine the significance of the patterns observed in the analysis performed.