500 Latest Job Listings for “Data Scientist” Position Analysed: EDA

And end-to-end task comprising data collection, analysis & visualization

Vijay Vankayalapati
4 min readJun 18, 2021
Gamut of Skills Sought in ‘Data Scientist’ Job Listings

Analysing job listings gives us an idea of what exactly employers are seeking from prospective employees for the advertised positions. In this post, I will walk you through how I performed exploratory data analysis on around 500 job listings of “Data Scientist” on Indeed.com, one of the top websites for job listings. I opted for “United States” as location but since our objective is to get general idea about job requirements, this shouldn’t be problem for people from elsewhere.

Scraping Data:

Scraping is a bit of hack and it achieved by inspecting elements on webpage, finding the tags and grabbing the associated data. I modified a script from here to include full job descriptions on second page after grabbing the link from first one. I couldn’t get location data during my first try, so I scraped it separately. This lead to mismatch between the number of job titles and locations but that shouldn’t be a problem since we are looking for general trends not exact data.

Indeed.com Webpage Displaying Search Results

The following script fetches our data and saves it in a CSV file.

Exploratory Data Analysis:

First we load the downloaded CSV file using pandas dataframe. Exploring the file data, we can see it has 490 rows and three columns — Title, Company & Description. My script couldn’t grab all the company names but we can get a sense of who are all recruiting from 402 data points collected.

Job Titles: We will now see which job titles for Data Scientist are the most popular using df[‘Title’].value_counts() function and plot them using matplotlib.

A word cloud of job titles will be an interesting way to visualise the data.

Top Job Titles in ‘Data Scientist’ Job Listings

Companies: Visualising the data for ‘company’ column:

Top Companies Posting ‘Data Scientist’ Job Listings

Locations: As already mentioned, location data was collected separately and we shall now explore this data and visualise it.

Top Locations for ‘Data Scientist’ Job listings

Job Descriptions: This is key part of analysing job listings since all the job requirements/skills are mentioned in this column. After converting this column to a list and appending all the job descriptions separated by a white space, this text corpus will be of almost 2 million characters. We can analyse this using regular expressions. It isn’t completely accurate but gives a quick and dirty idea of the information we are looking for.

First we will try to capture the “years of experience” in the corpus using the following regex pattern and plot it. Only experience upto 12 years is considered since anything greater than that would be mostly about the record of companies. Also in case of pattern like “3–5 years”, only upper bound values are captured.

Similarly, all the data for various requirements are captured using respective regex patterns, grouped and plotted. In case of pie-charts, the data is relative and not absolute. Note that this data is just raw counting of the frequency of occurrence of keywords and may not represent contextual understanding.

Finally the word cloud of gamut of skills sought in the job descriptions.

Skills Sought in ‘Data Scientist’ Job Listings

Github link here.

--

--