
I am Elina Chhabra. I'm a Junior pursuing Bachelors of Science degree in Computer Science along with a minor in Mathematics. I am totally in love with Data Analysis. I aspire to be a Data scientist and be able to slice and dice data to create meaningful information for the good of the society.
To some, numbers may look scary but to me all kinds of statistics are an exciting journey into new horizons. My love for mathematics and numbers came in handy. With some creativity I was able to overcome the overdose of data. So as a first project I took up a topic that had huge amount of data. I wished to peep into the metadata and figure out what it depicted. Hence, I took up the topic of Crime in US. Later, I realized the data was humungous and I needed to focus on a smaller set to be able to make some sense of it. I pulled out a shorter version and restricted myself to Gun Crimes in USA between the years 2012-2014.
For my project I had a lot of data for every age group and any graph I would use made the representation look very messy. I searched for some functions to consolidate data into smaller sets so that it becomes more manageable without loosing meaning. To be able to convert a lot of data into meaningful representation which is easily comprehensible it is important that one chooses the right methodology. Though representation is important the data at the same time should not loose its significance. So I grouped age groups into ranges for clearer depiction. Instead of showing a graph for each age, I used the cut function to aggregate data over a range of 10 years and plotted pie chart to find what all ages are prone to gun deaths. I even made a grouped bar graph comparing the cause of gun death for every age group.
I also looked for interesting attributes captured along with the data that would give an insight into the social aspect. So l presented different graphs showing attributes like gender, race, location, etc. This gave a view of the social consideration behind the crimes. For this I used the function graphs like mosaic plot, stacked bar graph.
From this project I learnt that once the data is well charted out, taking out meaningful inferences from the data becomes simpler. Things which are not obvious looking at reams and reams of data become much clearer after segregating and classifying data. It was very important to choose the right attributes for classification to provide a real picture from the data.
So don't get bogged down ever with lots of data identify the classification that serves your purpose and watch the data turn into information with the numerous tools available today. To get a copy of this data contact: elina.chhabra.16@gmail.com.
Happy Data Mining!
Data source: https://www.kaggle.com/hakabuk/gun-deaths-in-the-us