--- title: "Reproducible and dynamic access to OECD data" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Reproducible and programmatic access to OECD data} %\VignetteEngine{knitr::knitr} %\usepackage[utf8]{inputenc} --- ```{r options, echo=FALSE} # Don't evaluate code chunks during R CMD check knitr::opts_chunk$set(cache = FALSE, warning = FALSE, error = FALSE, eval = FALSE) library(OECD) ``` ### Introduction The `OECD` package allows the user to download data from the OECD's API in a dynamic and reproducible way. The package can be installed from either CRAN or Github (development version): ```{r loadLibrary, eval=FALSE} # from CRAN install.packages("OECD") # from Github library(devtools) install_github("expersso/OECD") library(OECD) ``` ### How to use the package The best way to use the package is to use the [OECD Data Explorer](https://data-explorer.oecd.org) to both browse available datasets and filter specific datasets. In this example we will use data National Accounts at a Glance Chapter 1: GDP: ![](figures/search_result.png) After filtering the data using the in-browser data explorer, click the "Developer API" button as seen in the screenshot below. ![](figures/filter.png) We extract the first string (respresenting the dataset as a whole) and the second string (representing the filter we've applied): ```{r} dataset <- "OECD.SDD.NAD,DSD_NAAG@DF_NAAG_I,1.0" filter <- "A.USA+EU.B1GQ_R_POP+B1GQ_R_GR.USD_PPP_PS+PC." ``` We then use the `get_dataset` function to retrieve the data: ```{r} df <- get_dataset(dataset, filter) head(df) ``` When run, this returns a data frame like: ``` # A tibble: 6 × 9 DATAFLOW REF_AREA MEASURE UNIT_MEASURE TIME_PERIOD ObsValue UNIT_MULT OBS_STATUS DECIMALS 1 DSD_NAAG@DF… EU B1GQ_R_POP USD_PPP_PS 2010 35.7 3 "" 1 2 DSD_NAAG@DF… EU B1GQ_R_POP USD_PPP_PS 2011 36.7 3 "" 1 3 DSD_NAAG@DF… EU B1GQ_R_POP USD_PPP_PS 2012 37.0 3 "" 1 4 DSD_NAAG@DF… EU B1GQ_R_POP USD_PPP_PS 2013 37.2 3 "" 1 5 DSD_NAAG@DF… EU B1GQ_R_POP USD_PPP_PS 2014 37.8 3 "" 1 6 DSD_NAAG@DF… EU B1GQ_R_POP USD_PPP_PS 2015 38.5 3 "" 1 ``` We select the relevant variables: ```{r} df <- df |> subset(select = c(REF_AREA, MEASURE, UNIT_MEASURE, TIME_PERIOD, ObsValue)) |> transform( ObsValue = as.numeric(ObsValue), TIME_PERIOD = as.numeric(TIME_PERIOD) ) names(df) <- tolower(names(df)) head(df) ``` It's not immediately clear what the values of the variables `measure` and `unit_measure` represent, so we fetch a data dictionary and join in to the dataset: ```{r} data_structure <- get_data_structure(dataset) str(data_structure, max.level = 1) names(data_structure$CL_MEASURE_NA_DASH) <- c("measure", "measure_lbl") names(data_structure$CL_UNIT_MEASURE) <- c("unit_measure", "unit_measure_lbl") df <- df |> merge(data_structure$CL_MEASURE_NA_DASH, by = "measure") |> merge(data_structure$CL_UNIT_MEASURE, by = "unit_measure") head(df) ``` The `get_data_structure` function returns a list of dataframes with human-readable values for variable names and values. The first data frame contains the variable names and shows the dimensions of a dataset: ```{r show_var_desc} data_structure$VAR_DESC ``` This would typically return: ``` id description 1 DATAFLOW Data flow identifier 2 REF_AREA Reference area 3 MEASURE Measure 4 UNIT_MEASURE Unit measure 5 TIME_PERIOD Time period 6 OBS_STATUS Observation Status 7 UNIT_MULT Unit multiplier 8 DECIMALS Decimals ``` ### Other information This package is in no way officially related to or endorsed by the OECD.