Analyzing growth in two datasets

In this tutorial, we'll use the R library to extract two datasets, then calculate and compare the wage and population growth rates through a D3-based visualization package.

Before we get started, you'll need to register for a BEA (API key). It'll be needed to make API requests.

key = "your-BEA-key-goes-here"

As usual, the library needs to be called.

library(eu.us.opendata)

Get the data

We'll now look for two datasets that contain the same geographic units. As it turns out, Population by State/NUTS2 and Wage and salary employment by State/NUTS2 have similar geographic units. We'll save these as ‘dataset' and ‘dataset2'.

  searchRel("state nuts2", asHTML = T) 
  dataset <- getRel("<JOINT#SAL_A_2>", beaKey = key)
## [1] "You have selected '<JOINT#SAL_A_2>'"
## [1] "A total of 32535 records were retrieved."
## [1] "EU = 27315, US = 5220"
  dataset2 <- getRel("<JOINT#POP_A_2>", beaKey = key) 
## [1] "You have selected '<JOINT#POP_A_2>'"
## [1] "A total of 32475 records were retrieved."
## [1] "EU = 27255, US = 5220"

Process the data

To process the data, we'll need to first convert all values into indices so that the levels are comparable. Below, we've written a simple function that uses as number of manipulation methods developed for the eu.us.opendata library:

  • timeSync() extracts the data based on overlapping date ranges. Specifying option “2” will pull overlapping dates whereas specifying “1” will pull the latest year.
  • wide() converts the data from long form to wide form such that each row contains a unique time period and each column contains the values for each geographic unit
  • the remaining code transforms data so that values are expressed as an index of the latest year of data (e.g. 1 = index leve for most recent year of data)
#Standardize values to latest record
  indexVal <- function(dataset){
    data <- timeSync(dataset,2) #an eu.us.opendata function for extracting data by overlapping time ranges
    data = wide(data) #an eu.us.opendata method for reshaping data wide
    data <- data[order(data$TIME),]
    time <- data$TIME 
    data = as.matrix(data)
    data = as.data.frame(t(t(data)/data[nrow(data),])) #index values based on latest year
    data$TIME <- time
    return(data)
  }

#apply indexVal() to datasets
  out1 = indexVal(dataset)
  out2 = indexVal(dataset2)

#example
  plot(out1[,c(1,3)])

To calculate the period growth rate, we'll calculate the simple linear time trend using a regression (e.g. Y ~ B0 + B1*Time + E). Remember that each column now represents a distinct, which means we'll need to loop through each column. To make this process simple, we've written a function called growthCalc() and have applied it to each resulting dataset from the section above.

#Calculate growth rate in each column using linear time trend
  growthCalc <- function(data){
    
    #Set up a empty dataframe to collect results
    val = data.frame()
    
    #Loop through each column 
    for(k in 2:ncol(data)){
      
      #If there are no blanks
      if(sum(is.na(data[,k]))==0){
        
        #Record the location, linear time trend coeficient, check if location ID is numeric or not
        val = rbind(val,
                    data.frame(loc = gsub("OBS_VALUE.","",colnames(data)[k]),
                               val = lm(data[,c(k,1)])$coef[2]))
      }
    }
    
    #fill in data_source columns with US vs EU location name
    val$loc <- as.character(val$loc)
    return(val)
  }

##apply growthCalc to indexed values
  grow1 <- growthCalc(out1)
  grow2 <- growthCalc(out2)

As a last step, we'll merge the datasets using location, then infer the region based on if the location code is numeric (US) or alphanumeric (EU).

##Merge
  join <- merge(grow1, grow2, by=c("loc"))
  colnames(join) <- c("location","wage","pop")
  
##Label region    
  join$region <- "EU" 
  join$region[!is.na(as.numeric(join$location))] <- "US"

Visualize the relationships

Lastly, we'll use the scatterD3 library to create a color-coded, interactive scatter plot. The library can be installed using the following command.

  install.packages("scatterD3")

To create a color-coded scatter plot, we'll need to plug in field names and options.

library(scatterD3)
scatterD3(data = join, 
          x = pop, 
          y = wage, 
          lab = location, 
          col_var = region, 
          labels_size = 9, 
          point_opacity = 0.8, 
          ellipses = TRUE)