library(ncdf4) # package for netcdf manipulation
<- nc_open('R01_JULES_1901.nc4') # Open a netCDF File (nc4) nc_data
Data wrangling nc4 files
Read in nc4 data
The nc_data objects is a list with 15 components, some of which contain more lists. The ncvar_get()
function extracts data of interest from the nc_data object. Below we extract longitude, latitude, time and soil moisture fraction.
<- ncvar_get(nc_data, "lon")
lon <- ncvar_get(nc_data, "lat")
lat <- ncvar_get(nc_data, "time")
time <- ncvar_get(nc_data, "SoilMoistFrac") frac
The first three are vectors. The last, frac, is a large 4-dimensional array.
dim(frac)
[1] 720 123 16 12
The contents of this array are soil moisture fraction values from 1901 at various coordinates, various depths, and at various points in time.
- 720 (longitude coordinate)
- 123 (latitude coordinate)
- 16 (16 depths)
- 12 (12 months)
The dimension contains data about location, depth, and time. However the dimensions are not labeled! We need to label them.
Label the dimensions
The dimension labels are in the lon, lat and time vectors. Oddly there was no component in the nc_data object with depth values. That may not be available for 1901 data. So I just used the numbers 1 - 16.
dimnames(frac) <- list("lon" = lon, "lat" = lat,
"depth" = 1:16, "month" = time)
Convert to data frame
They only wanted data for a subset of the latitude and longitude coordinates, so I extracted the relevant row and column numbers.
<- which(lon <=-128.0 & lon >= -162.0)
i <- which(lat <=69.5 & lat >=59.0) j
Now we can use the as.data.table()
function from the {data.table} package to convert the array to a data frame. We can also use the base R as.data.frame.table()
function for this, but the {data.table} version automatically drops all NAs. Notice I used the i
and j
vectors to subset the array on-the-fly.
library(data.table)
<- as.data.table(frac[i,j,,], value.name = "SoilMoistFrac")
fracDF head(fracDF)
lon lat depth month SoilMoistFrac
1: -128.25 59.25 1 73427.5 0.9293682
2: -128.25 59.25 1 73457 0.9294412
3: -128.25 59.25 1 73486.5 0.9467671
4: -128.25 59.25 1 73517 0.9907405
5: -128.25 59.25 1 73547.5 0.9604079
6: -128.25 59.25 1 73578 0.6936272
Next we need to convert the first three columns to numeric:
1:3] <- lapply(fracDF[,1:3], as.numeric) fracDF[,
Then we format the time column as a date and add the year. The origin date for this data is “1700-01-01”, not the usual “1970-01-01” that base R defaults to.
$month <- as.Date(as.numeric(fracDF$month),
fracDForigin = "1700-01-01")
$year <- substr(fracDF$month[1], 1, 4)
fracDFhead(fracDF)
lon lat depth month SoilMoistFrac year
1: -128.25 59.25 1 1901-01-15 0.9293682 1901
2: -128.25 59.25 1 1901-02-14 0.9294412 1901
3: -128.25 59.25 1 1901-03-15 0.9467671 1901
4: -128.25 59.25 1 1901-04-15 0.9907405 1901
5: -128.25 59.25 1 1901-05-15 0.9604079 1901
6: -128.25 59.25 1 1901-06-15 0.6936272 1901