The OECD package allows the user to download data from
the OECD’s API in a dynamic and reproducible way.
The package can be installed from either CRAN or Github (development version):
The best way to use the package is to use the OECD Data Explorer to both browse available datasets and filter specific datasets.
In this example we will use data National Accounts at a Glance Chapter 1: GDP:
After filtering the data using the in-browser data explorer, click the “Developer API” button as seen in the screenshot below.
We extract the first string (respresenting the dataset as a whole) and the second string (representing the filter we’ve applied):
dataset <- "OECD.SDD.NAD,DSD_NAAG@DF_NAAG_I,1.0"
filter <- "A.USA+EU.B1GQ_R_POP+B1GQ_R_GR.USD_PPP_PS+PC."We then use the get_dataset function to retrieve the
data:
When run, this returns a data frame like:
# A tibble: 6 × 9
DATAFLOW REF_AREA MEASURE UNIT_MEASURE TIME_PERIOD ObsValue UNIT_MULT OBS_STATUS DECIMALS
<chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <chr> <dbl>
1 DSD_NAAG@DF… EU B1GQ_R_POP USD_PPP_PS 2010 35.7 3 "" 1
2 DSD_NAAG@DF… EU B1GQ_R_POP USD_PPP_PS 2011 36.7 3 "" 1
3 DSD_NAAG@DF… EU B1GQ_R_POP USD_PPP_PS 2012 37.0 3 "" 1
4 DSD_NAAG@DF… EU B1GQ_R_POP USD_PPP_PS 2013 37.2 3 "" 1
5 DSD_NAAG@DF… EU B1GQ_R_POP USD_PPP_PS 2014 37.8 3 "" 1
6 DSD_NAAG@DF… EU B1GQ_R_POP USD_PPP_PS 2015 38.5 3 "" 1
We select the relevant variables:
df <- df |>
subset(select = c(REF_AREA, MEASURE, UNIT_MEASURE, TIME_PERIOD, ObsValue)) |>
transform(
ObsValue = as.numeric(ObsValue),
TIME_PERIOD = as.numeric(TIME_PERIOD)
)
names(df) <- tolower(names(df))
head(df)It’s not immediately clear what the values of the variables
measure and unit_measure represent, so we
fetch a data dictionary and join in to the dataset:
data_structure <- get_data_structure(dataset)
str(data_structure, max.level = 1)
names(data_structure$CL_MEASURE_NA_DASH) <- c("measure", "measure_lbl")
names(data_structure$CL_UNIT_MEASURE) <- c("unit_measure", "unit_measure_lbl")
df <- df |>
merge(data_structure$CL_MEASURE_NA_DASH, by = "measure") |>
merge(data_structure$CL_UNIT_MEASURE, by = "unit_measure")
head(df)The get_data_structure function returns a list of
dataframes with human-readable values for variable names and values. The
first data frame contains the variable names and shows the dimensions of
a dataset:
This would typically return:
id description
1 DATAFLOW Data flow identifier
2 REF_AREA Reference area
3 MEASURE Measure
4 UNIT_MEASURE Unit measure
5 TIME_PERIOD Time period
6 OBS_STATUS Observation Status
7 UNIT_MULT Unit multiplier
8 DECIMALS Decimals
This package is in no way officially related to or endorsed by the OECD.