"Data Science is a field that is growing at a rapid pace. It is a field that is constantly evolving and changing. The demand for data scientists is increasing every day."
This is a statement I have heard many times and is the reason I decided to pursue my Masters in Data Science in the US. But when I arrived here the situation was not as I expected. The job market was not as I expected. Numerous layoffs and hiring freezes by hundreds of companies really startled me. I was not sure if I had made the right decision. Being a Data Scientist, I decided to analyze the job market and see how it is currently. This would not only help me but also help other students who are in the same situation as me to get a better understanding of the market and take informed decisions about their future career.
The Title of the Job posted
The Name of the Company that posted the Job
The Platform through which the Job was posted
The Description of the posted Job
The Qualifications required for the posted Job
The Responsibilities of the posted Job
The Benefits of the posted Job
The Location of the posted Job
As mentioned above the data was extracted in a JSON format which is difficult to use while doing analysis. Hence, I converted the JSON file into a CSV file using Python. This included combining the data from all the JSON files into a single CSV file.
The data that was obtained was mostly text data and needed to be cleaned. Additionally there were many missing values as well since not all jobs posted are the same and do not provide all the information.
Finally, the data was analyzed using different methods of Data Visualizations tools such as Plotly, Altair, Highcharts and so on. The inferences from these plots would help me and other students to understand the current job market better.
This pie chart displays the distribution of job titles and areas of expertise within the field of data science and artificial intelligence. The data shows that Neural Networks and Deep Learning is the most common area of expertise, representing 19.48% of the data. Data Scientist and Data Analyst are also among the most common job titles, representing 13.28% and 12.59% of the data respectively. Machine Learning is another common area of expertise, representing over 12.24% of the data. The remaining areas of expertise, including Big Data and Cloud Computing, Reinforcement Learning, Block Chain, Natural Language Processing, and Time Series, each represent smaller portions of the data.
This map displays the distribution of job postings across the United States. The data shows that the majority of job postings are located in California, DMV, Texas, New York, and Washington. These Distributions change when the data is filtered by job title. For example, the majority of Data Scientist job postings are located in DMV, Illinois and Texas. However, the majority of Reinforcement Learning job postings are located in California, New York, Massachusetts and Washington. Hence different states and regions of the US are more likely to have job postings for different job titles.
This chart shows the distribution of job attributes by job title, using data from a job postings dataset. The chart is composed of four bar charts, each representing a different job attribute category: salary, qualifications, responsibilities, and benefits. Each category is further broken down into job titles, with the y-axis showing the number of jobs for each attribute. Mostly all the jobs provide some Qualifications and Responsibilities while posting a job but do around 60% of the jobs posted do no specify a salary range or Benefits that come with the job. One interesting fact is that for Machine Learning Jobs, there are a higher percentage of jobs that are posted with all of the details.
This plot displays the distribution of job postings across different platforms. The data shows that the majority of job postings are located on LinkedIn, followed by ZipRecruiter, Upwork and AngelList. These distributions change when the data is filtered by job title. For example, Upwork has the highest number of Job postings for Neural Networks and Deep Learning and Karkidi has the highest number of job postings for Natural Language Processing. Hence different platforms are more likely to have job postings for different job titles.
These wordclouds represent the highest number of words used in the job description for each job title. The size of the word represents the frequency of the word in the job description. I was surprised to see that the word "Data" or "Python" did not appear in any of the wordclouds. The word "Experience" had a high frequency in all the wordclouds. The word "Machine Learning" had a high frequency in the Machine Learning wordcloud but it also had a high frequency in the Neural Networks and Deep Learning job descriptions as well since it lies under ML. Reports and Businesses appear more in Data Analyst job descriptions while develop and model appear more in Data Scientist job descriptions. This helped me in getting a clearer understanding and the differences between the job titles.
Out of the Total Active Jobs the companies in the above plot have the highest number of jobs posted. Note that Upwork was the highest in the company name column but it has been removed since it is a platform and not a company.
From the previous plot we can see that Booz Allen Hamilton has the highest number of jobs posted. A deeped dive into the job postings of Booz Allen Hamilton shows that the majority of the jobs are for the position of Data Scientist. The above plot shows the distribution of job postings for Booz Allen Hamilton by job title. As this is the company with the highest number of job postings, if you possess the skills required for the job, there is a higher chance of getting hired by this company.
Our last analysis shows that majority of the jobs posted are not work from home jobs. This is surprising since the pandemic has forced many companies to allow their employees to work from home, it is believed that this trend will continue even after the pandemic. But that is not the case and several companies are still not offering work from home jobs. Another concerning fact is that the majority of the jobs currently active are Full time jobs. This is concerning because many students like me who are in their first year of degree are looking for Part time jobs and Internships to gain experience and earn some money. But there are only a fraction of jobs that lie under that category.
"Data Science is a field that is growing at a rapid pace. It is a field that is constantly evolving and changing. The demand for data scientists is increasing every day."
Recall this statement from the Introduction. This statement is very well valid after analyzing the data. The number of jobs posted that are related to Data Science or emerge from the field of Data Science are really high (Around 720 jobs active as of 04/14/2023). These jobs are evenly distributed across different sections of Data Science with the exception being Jobs related to Time Series. There may be relatively lower demand for time series skills in the US job market compared to other skills, such as machine learning, data analysis, or software engineering. Employers may not require a large number of employees with this specific skill set. This project has also made me realize that since I want to be a Data Scientist in the future, I should look for Jobs for DMV, Texas and Illinois area since there is a higher density of Data Scientist jobs in these areas. I would also not like to work for a company that does not post Salary description or Benefits along with the Qualification and Requirements. Last and the most concerning part is that there are very few jobs that are Internships right now which makes me wonder if I will be able to find an Internship before Summer. But on the bright side, there are a lot of jobs that are Full time jobs which is what I am aiming for since I will be a second year graduate student next year.