Reproducible and dynamic access to OECD data

Introduction

The OECD package allows the user to download data from the OECD’s API in a dynamic and reproducible way.

The package can be installed from either CRAN or Github (development version):

# from CRAN
install.packages("OECD")

# from Github
library(devtools)
install_github("expersso/OECD")

library(OECD)

How to use the package

The best way to use the package is to use the OECD Data Explorer to both browse available datasets and filter specific datasets.

In this example we will use data National Accounts at a Glance Chapter 1: GDP:

After filtering the data using the in-browser data explorer, click the “Developer API” button as seen in the screenshot below.

We extract the first string (respresenting the dataset as a whole) and the second string (representing the filter we’ve applied):

dataset <- "OECD.SDD.NAD,DSD_NAAG@DF_NAAG_I,1.0"
filter <- "A.USA+EU.B1GQ_R_POP+B1GQ_R_GR.USD_PPP_PS+PC."

We then use the get_dataset function to retrieve the data:

df <- get_dataset(dataset, filter)
head(df)

When run, this returns a data frame like:

# A tibble: 6 × 9
  DATAFLOW     REF_AREA MEASURE       UNIT_MEASURE TIME_PERIOD ObsValue UNIT_MULT OBS_STATUS DECIMALS
  <chr>        <chr>    <chr>         <chr>              <dbl>    <dbl>     <dbl> <chr>         <dbl>
1 DSD_NAAG@DF… EU       B1GQ_R_POP    USD_PPP_PS          2010     35.7         3 ""                1
2 DSD_NAAG@DF… EU       B1GQ_R_POP    USD_PPP_PS          2011     36.7         3 ""                1
3 DSD_NAAG@DF… EU       B1GQ_R_POP    USD_PPP_PS          2012     37.0         3 ""                1
4 DSD_NAAG@DF… EU       B1GQ_R_POP    USD_PPP_PS          2013     37.2         3 ""                1
5 DSD_NAAG@DF… EU       B1GQ_R_POP    USD_PPP_PS          2014     37.8         3 ""                1
6 DSD_NAAG@DF… EU       B1GQ_R_POP    USD_PPP_PS          2015     38.5         3 ""                1

We select the relevant variables:

df <- df |>
  subset(select = c(REF_AREA, MEASURE, UNIT_MEASURE, TIME_PERIOD, ObsValue)) |>
  transform(
    ObsValue = as.numeric(ObsValue),
    TIME_PERIOD = as.numeric(TIME_PERIOD)
  )

names(df) <- tolower(names(df))

head(df)

It’s not immediately clear what the values of the variables measure and unit_measure represent, so we fetch a data dictionary and join in to the dataset:

data_structure <- get_data_structure(dataset)
str(data_structure, max.level = 1)

names(data_structure$CL_MEASURE_NA_DASH) <- c("measure", "measure_lbl")
names(data_structure$CL_UNIT_MEASURE) <- c("unit_measure", "unit_measure_lbl")

df <- df |>
  merge(data_structure$CL_MEASURE_NA_DASH, by = "measure") |>
  merge(data_structure$CL_UNIT_MEASURE, by = "unit_measure")

head(df)

The get_data_structure function returns a list of dataframes with human-readable values for variable names and values. The first data frame contains the variable names and shows the dimensions of a dataset:

data_structure$VAR_DESC

This would typically return:

        id                      description
1 DATAFLOW           Data flow identifier
2 REF_AREA              Reference area
3  MEASURE                      Measure
4 UNIT_MEASURE             Unit measure  
5 TIME_PERIOD              Time period
6 OBS_STATUS            Observation Status
7 UNIT_MULT            Unit multiplier
8 DECIMALS                     Decimals

Other information

This package is in no way officially related to or endorsed by the OECD.