Comparison of Adzuna and IVI Job Ads Data

Andreas Duenser

CSIRO Data61

Introduction

I’m Senior Research Scientist with Data61 in Hobart with a background in psychology / Human-Factors. I’m interested in the convergence of human behaviour & cognition, and emerging interactive technology. Combining these areas enables innovative Human-Machine Interaction (HMI) research. My research focuses on:

My Project

Online job advertisements are now being used around the world as a source of insight into the nature of our rapidly changing labour markets. However, it is important to check the quality and potential bias of this data sourcebe in order to determine what inferences are appropriate form the data or if/how we can use the data for informing decisions about labour markets. Our goal of the wider project this report is part of, is to understand the strengths and weaknesses of online job advertisements as a source of information about changing demand for workers and skills. We here present one example of the data validation and characterisation process: a comparison of two online job ad datasets: Adzuna Australia job ads data (a dataset we are using to create an online skills dashbaord) and the Internet Vacancy Index (IVI - a dataset collected by The Department of Jobs and Small Business). Both of these datasets comprise online job ads from different sources (online job ads portals).

Results

Comparison of overall Adzuna Australia and IVI job ads counts between 2015 - 2019

Figure 1 compares job ads counts from both datasets over time. The IVI data applies a three-month moving average filter . The red line shows the original (unfiltered) Adzuna Australia data and the green line the same data with a three-month moving average filter (mav) applied. As expected, the latter corresponds better with the IVI data (also see Table 1). Therefore, we will only use the filtered Adzuna Australia data for further visualisation and analysis.

Adzuna Australia and IVI job ad counts - collapsed over all occupations and GCCSAs

Figure 1: Adzuna Australia and IVI job ad counts - collapsed over all occupations and GCCSAs

Table 1: Correlations Adzuna Australia and IVI job ads counts over time, all occupations and GCCSAs
Adzuna_count Adzuna_mav_count IVI_count
Adzuna_count 1.00 0.99 0.92
Adzuna_mav_count 0.99 1.00 0.93
IVI_count 0.92 0.93 1.00

Comparison of Adzuna Australia and IVI job ads counts per GCCSAs

To get another perspective of the comparison the counts from the two datasets over time, we scatter plot the data and fit a LOESS curve (locally estimated scatterplot smoothing) to assess temporal trends. Figure 2 shows data per GCCSA (Greater Capital City Statistical Area) for all States and Territories.

Adzuna Australia and IVI job ads counts - GCCSA

Figure 2: Adzuna Australia and IVI job ads counts - GCCSA

Table 2 shows correlation between Adzuna Australia and IVI job ads counts per GCCSA over time and all occupations.

df_merged_GCCSA <- df_merged %>%
  group_by(Month, GCCSA) %>%
  summarise(Adzuna_mav_count = sum(Adzuna_mav_count),
            IVI_count = sum(IVI_count))

correlations_GCCSA <- data.frame(state=character(),
                           capital_city=numeric(), 
                           rest_of_state=numeric(),
                           stringsAsFactors=FALSE)
correlations_GCCSA[1,1] <- "NSW"
correlations_GCCSA[1,2] <- with(df_merged_GCCSA %>% filter(GCCSA == "Greater Sydney"), round(cor(Adzuna_mav_count, IVI_count), 2))
correlations_GCCSA[1,3] <- with(df_merged_GCCSA %>% filter(GCCSA == "Rest of NSW"), round(cor(Adzuna_mav_count, IVI_count), 2))
correlations_GCCSA[2,1] <- "VIC"
correlations_GCCSA[2,2] <- with(df_merged_GCCSA %>% filter(GCCSA == "Greater Melbourne"), round(cor(Adzuna_mav_count, IVI_count), 2))
correlations_GCCSA[2,3] <- with(df_merged_GCCSA %>% filter(GCCSA == "Rest of Vic."), round(cor(Adzuna_mav_count, IVI_count), 2))
correlations_GCCSA[3,1] <- "QLD"
correlations_GCCSA[3,2] <- with(df_merged_GCCSA %>% filter(GCCSA == "Greater Brisbane"), round(cor(Adzuna_mav_count, IVI_count), 2))
correlations_GCCSA[3,3] <- with(df_merged_GCCSA %>% filter(GCCSA == "Rest of Qld"), round(cor(Adzuna_mav_count, IVI_count), 2))
correlations_GCCSA[4,1] <- "SA"
correlations_GCCSA[4,2] <- with(df_merged_GCCSA %>% filter(GCCSA == "Greater Adelaide"), round(cor(Adzuna_mav_count, IVI_count), 2))
correlations_GCCSA[4,3] <- with(df_merged_GCCSA %>% filter(GCCSA == "Rest of SA"), round(cor(Adzuna_mav_count, IVI_count), 2))
correlations_GCCSA[5,1] <- "WA"
correlations_GCCSA[5,2] <- with(df_merged_GCCSA %>% filter(GCCSA == "Greater Perth"), round(cor(Adzuna_mav_count, IVI_count), 2))
correlations_GCCSA[5,3] <- with(df_merged_GCCSA %>% filter(GCCSA == "Rest of WA"), round(cor(Adzuna_mav_count, IVI_count), 2))
correlations_GCCSA[6,1] <- "TAS"
correlations_GCCSA[6,2] <- with(df_merged_GCCSA %>% filter(GCCSA == "Greater Hobart"), round(cor(Adzuna_mav_count, IVI_count), 2))
correlations_GCCSA[6,3] <- with(df_merged_GCCSA %>% filter(GCCSA == "Rest of Tas."), round(cor(Adzuna_mav_count, IVI_count), 2))
correlations_GCCSA[7,1] <- "NT"
correlations_GCCSA[7,2] <- with(df_merged_GCCSA %>% filter(GCCSA == "Greater Darwin"), round(cor(Adzuna_mav_count, IVI_count), 2))
correlations_GCCSA[7,3] <- with(df_merged_GCCSA %>% filter(GCCSA == "Rest of NT"), round(cor(Adzuna_mav_count, IVI_count), 2))
correlations_GCCSA[8,1] <- "ATC"
correlations_GCCSA[8,2] <- with(df_merged_GCCSA %>% filter(GCCSA == "Australian Capital Territory"), round(cor(Adzuna_mav_count, IVI_count), 2))

knitr::kable(correlations_GCCSA, format = "html", caption = "Correlations Adzuna Australia and IVI job ads counts - GCCSAs") %>%
  kable_styling("striped", full_width = F)
Table 2: Correlations Adzuna Australia and IVI job ads counts - GCCSAs
state capital_city rest_of_state
NSW 0.19 0.51
VIC 0.06 0.77
QLD 0.17 0.20
SA 0.27 0.37
WA -0.14 0.78
TAS 0.11 0.76
NT 0.09 0.58
ATC 0.39 NA

Comparison of Adzuna Australia and IVI job ads counts per occupation

Figure 3 compares counts of Adzuna Australia and IVI job ads per major occupational group (unsing ANZCO - Australian and New Zealand Standard Classification of Occupations) over time and all GCCSAs. Again, we scatter plot the data and fit a LOESS curve.

IVI and Adzuna job ads count - Occupations

Figure 3: IVI and Adzuna job ads count - Occupations

Table 3 shows correlation between Adzuna Australia and IVI job ads counts per major ANZCO occupational group over time and all GCCSAs.

df_merged_occupation <- df_merged %>%
  group_by(Month, ANZSCO_TITLE) %>%
  summarise(Adzuna_mav_count = sum(Adzuna_mav_count),
            IVI_count = sum(IVI_count))

correlations_occupation <- data.frame("Occupation"=character(), 
                                      "Correlation"=numeric(),
                                      stringsAsFactors=FALSE)

correlations_occupation[1,1] <- "Managers"
correlations_occupation[1,2] <- with(df_merged_occupation %>% filter(ANZSCO_TITLE == "Managers"), round(cor(Adzuna_mav_count, IVI_count), 2))
correlations_occupation[2,1] <- "Professionals"
correlations_occupation[2,2] <- with(df_merged_occupation %>% filter(ANZSCO_TITLE == "Professionals"), round(cor(Adzuna_mav_count, IVI_count), 2))
correlations_occupation[3,1] <- "Technicians and Trades Workers"
correlations_occupation[3,2] <- with(df_merged_occupation %>% filter(ANZSCO_TITLE == "Technicians and Trades Workers"), round(cor(Adzuna_mav_count, IVI_count), 2))
correlations_occupation[4,1] <- "Community and Personal Service Workers"
correlations_occupation[4,2] <- with(df_merged_occupation %>% filter(ANZSCO_TITLE == "Community and Personal Service Workers"), round(cor(Adzuna_mav_count, IVI_count), 2))
correlations_occupation[5,1] <- "Clerical and Administrative Workers"
correlations_occupation[5,2] <- with(df_merged_occupation %>% filter(ANZSCO_TITLE == "Clerical and Administrative Workers"), round(cor(Adzuna_mav_count, IVI_count), 2))
correlations_occupation[6,1] <- "Sales Workers"
correlations_occupation[6,2] <- with(df_merged_occupation %>% filter(ANZSCO_TITLE == "Sales Workers"), round(cor(Adzuna_mav_count, IVI_count), 2))
correlations_occupation[7,1] <- "Machinery Operators and Drivers"
correlations_occupation[7,2] <- with(df_merged_occupation %>% filter(ANZSCO_TITLE == "Machinery Operators and Drivers"), round(cor(Adzuna_mav_count, IVI_count), 2))
correlations_occupation[8,1] <- "Labourers"
correlations_occupation[8,2] <- with(df_merged_occupation %>% filter(ANZSCO_TITLE == "Labourers"), round(cor(Adzuna_mav_count, IVI_count), 2))

knitr::kable(correlations_occupation, format = "html", caption = "Correlations Adzuna Australia and IVI job ads counts - occupation") %>%
  kable_styling("striped", full_width = F)
Table 3: Correlations Adzuna Australia and IVI job ads counts - occupation
Occupation Correlation
Managers 0.33
Professionals 0.39
Technicians and Trades Workers -0.38
Community and Personal Service Workers -0.49
Clerical and Administrative Workers 0.27
Sales Workers 0.51
Machinery Operators and Drivers -0.01
Labourers 0.03

Summary / Discussion:

The main take-away points from the above data comparisons are:

My time went …

…by mostly with trying to figure out nitty gritty details, deciphering error messages, and why some of my code did not work. And continuous iterations of tidying, visualising, analysing, tidying…

Next steps

My next challenge (apart from honing my R skills) will be getting into Python.

My Data School Experience

Although it was a considerable time investment and often challenging with managing project and other work, attending Data School gave me time to practise my (emerging) R skills.