Data wrangling nc4 files

Author

Clay Ford

Published

September 18, 2023

Read in nc4 data

library(ncdf4) # package for netcdf manipulation
nc_data <- nc_open('R01_JULES_1901.nc4') # Open a netCDF File (nc4)

The nc_data objects is a list with 15 components, some of which contain more lists. The ncvar_get() function extracts data of interest from the nc_data object. Below we extract longitude, latitude, time and soil moisture fraction.

lon <- ncvar_get(nc_data, "lon")
lat <- ncvar_get(nc_data, "lat")
time <- ncvar_get(nc_data, "time")
frac <- ncvar_get(nc_data, "SoilMoistFrac")

The first three are vectors. The last, frac, is a large 4-dimensional array.

dim(frac)
[1] 720 123  16  12

The contents of this array are soil moisture fraction values from 1901 at various coordinates, various depths, and at various points in time.

  • 720 (longitude coordinate)
  • 123 (latitude coordinate)
  • 16 (16 depths)
  • 12 (12 months)

The dimension contains data about location, depth, and time. However the dimensions are not labeled! We need to label them.

Label the dimensions

The dimension labels are in the lon, lat and time vectors. Oddly there was no component in the nc_data object with depth values. That may not be available for 1901 data. So I just used the numbers 1 - 16.

dimnames(frac) <- list("lon" = lon, "lat" = lat, 
                       "depth" = 1:16, "month" = time)

Convert to data frame

They only wanted data for a subset of the latitude and longitude coordinates, so I extracted the relevant row and column numbers.

i <- which(lon <=-128.0 & lon >= -162.0)
j <- which(lat <=69.5 & lat >=59.0)

Now we can use the as.data.table() function from the {data.table} package to convert the array to a data frame. We can also use the base R as.data.frame.table() function for this, but the {data.table} version automatically drops all NAs. Notice I used the i and j vectors to subset the array on-the-fly.

library(data.table)
fracDF <- as.data.table(frac[i,j,,], value.name = "SoilMoistFrac")
head(fracDF)
       lon   lat depth   month SoilMoistFrac
1: -128.25 59.25     1 73427.5     0.9293682
2: -128.25 59.25     1   73457     0.9294412
3: -128.25 59.25     1 73486.5     0.9467671
4: -128.25 59.25     1   73517     0.9907405
5: -128.25 59.25     1 73547.5     0.9604079
6: -128.25 59.25     1   73578     0.6936272

Next we need to convert the first three columns to numeric:

fracDF[,1:3] <- lapply(fracDF[,1:3], as.numeric)

Then we format the time column as a date and add the year. The origin date for this data is “1700-01-01”, not the usual “1970-01-01” that base R defaults to.

fracDF$month <- as.Date(as.numeric(fracDF$month), 
                       origin = "1700-01-01")
fracDF$year <- substr(fracDF$month[1], 1, 4)
head(fracDF)
       lon   lat depth      month SoilMoistFrac year
1: -128.25 59.25     1 1901-01-15     0.9293682 1901
2: -128.25 59.25     1 1901-02-14     0.9294412 1901
3: -128.25 59.25     1 1901-03-15     0.9467671 1901
4: -128.25 59.25     1 1901-04-15     0.9907405 1901
5: -128.25 59.25     1 1901-05-15     0.9604079 1901
6: -128.25 59.25     1 1901-06-15     0.6936272 1901