Titanic Survival Predictor
Data Science project to predict if you would have survived the Titanic disaster
Machine learning can sound very mysterious at times, and it is not always clear what you can do with it. Therefore, I created the Titanic Survival Predictor to show the power of data science, and more precisely, machine learning.
In this project, I used a Kaggle dataset that is used by many data scientists to improve their machine learning skills. The goal of this dataset is to predict whether a person survived the titanic based on passenger information, such as age, passenger class, embarking port and number of relatives on board. The final product is a Heroku-app to find out if you would have survived the Titanic disaster.
As with most datasets, the data first needed to be cleaned, and a selection of information needed to be made, as garbage in = garbage out. I then enriched the data through feature engineering, and scaled and encoded the data for the data preparation. Since there is no “1 model fits” all in machine learning, I tried out 5 different classification models and scored them all on the accuracy of predicting whether the passenger would have survived the Titanic disaster. The best scoring model was a Random Forest with an accuracy of 78% on the Kaggle test dataset. To allow others to interact with the Titanic prediction model, I created a Heroku-app that predicts the survival probability based on user inputs.
Would you have survived the titanic disaster?
Find out here: https://too-titanic-app.herokuapp.com/