Backend API

Core functions

xagg.core.aggregate(ds, wm)

Aggregate raster variable(s) to polygon(s)

Aggregates (N-D) raster variables in ds to the polygons in gfd_out - in other words, gives the weighted average of the values in [ds] based on each pixel’s relative area overlap with the polygons.

The values will be additionally weighted if a weight was inputted into xagg.core.create_raster_polygons()

The code checks whether the input lat/lon grid in ds is equivalent to the linearly indexed grid in wm, or if it can be cropped to that grid.

Parameters
dsxarray.Dataset

an xarray.Dataset containing one or more variables with dimensions lat, lon (and possibly more). The dataset’s geographic grid has to include the lat/lon coordinates used in determining the pixel overlaps in xagg.core.get_pixel_overlaps() (and saved in wm['source_grid'])

wmxagg.classes.weightmap

the output to xagg.core.get_pixel_overlaps(); a xagg.classes.weightmap object containing

  • ['agg']

    a dataframe, with one row per polygon, and the columns pix_idxs and rel_area, giving the linear indices and the relative area of each pixel over the polygon, respectively

  • ['source_grid']

    the lat/lon grid on which the aggregating parameters were calculated (and on which the linear indices are based)

Returns
agg_outxagg.classes.aggregated

an xagg.classes.aggregated object with the aggregated variables

xagg.core.create_raster_polygons(ds, mask=None, subset_bbox=None, weights=None, weights_target='ds')

Create polygons for each pixel in a raster

Note: ‘lat_bnds’ and ‘lon_bnds’ can be created through the xagg.aux.get_bnds() function if they are not already included in the input raster file.

Note: Currently this code only supports regular rectangular grids (so where every pixel side is a straight line in lat/lon space). Future versions may include support for irregular grids.

Parameters
dsxarray.Dataset

an xarray dataset with the variables ‘lat_bnds’ and ‘lon_bnds’, which are both lat/lon x 2 arrays giving the min and max values of lat and lon for each pixel given by lat/lon

subset_bboxgeopandas.GeoDataFrame, optional, default = None

if a geopandas.GeoDataFrame is entered, the bounding box around the geometries in the gdf are used to mask the grid, to reduce the number of pixel polygons created

Returns
pix_agg: dict

a dictionary containing:

  • 'gdf_pixels'

    a geopandas.GeoDataFrame containing a ‘geometry’ giving the pixel boundaries for each ‘lat’ / ‘lon’ pair

  • 'source_grid'

    a dictionary containing the original lat and lon inputs under the keys “lat” and “lon” (just the xarray.DataArray of those variables in the input ds)

xagg.core.get_pixel_overlaps(gdf_in, pix_agg)

Get, for each polygon, the pixels that overlap and their area of overlap

Finds, for each polygon in gdf_in, which pixels intersect it, and by how much.

Note: Uses EASE-Grid 2.0 on the WGS84 datum to calculate relative areas (see https://nsidc.org/data/ease)

Parameters
gdf_ingeopandas.GeoDataFrane

a geopandas.GeoDataFrame giving the polygons over which the variables should be aggregated. Can be just a read shapefile (with the added column of “poly_idx”, which is just the index as a column).

pix_aggdict

the output of xagg.core.create_raster_polygons(); a dict containing:

  • 'gdf_pixels'

    a geopandas.GeoDataFrame giving for each row the columns “lat” and “lon” (with coordinates) and a polygon giving the boundary of the pixel given by lat/lon

  • 'source_grid'

    [da.lat,da.lon] of the grid used to create the pixel polygons

Returns
wm_out: dict

A dictionary containing:

  • 'agg'

    a dataframe containing all the fields of gdf_in (except geometry) and the additional columns:

    • coords: the lat/lon coordiates of all pixels that overlap

    the polygon of that row

    • pix_idxs: the linear indices of those pixels within the

    gdf_pixels grid

    • rel_area: the relative area of each of the overlaps between

    the pixels and the polygon (summing to 1 - e.g. if the polygon is exactly the size and location of two pixels, their rel_areas would be 0.5 each)

  • 'source_grid':

    a dictionary with keys ‘lat’ and ‘lon’ giving the original lat/lon grid whose overlaps with the polygons was calculated

  • 'geometry':

    just the polygons from gdf_in

xagg.core.process_weights(ds, weights=None, target='ds')

Process weights - including regridding

If target == 'ds', regrid weights to ds. If target == 'weights', regrid ds to weights.

Parameters
dsxarray.Dataset, xarray.DataArray

an xarray.Dataset/xarray.DataArray to regrid

weightsxarray.DataArray, optional, default = None

an xarray.DataArray containing a weight (numeric) at each location

targetstr, optional, default = 'ds'

whether weights should be regridded to the ds grid (by default) or vice-versa (not yet supported, returns NotImplementedError)

Returns
dsxarray.Dataset, xarray.DataArrays

the input xarray.Dataset/xarray.DataArray, with a new variable weights specifying weights for each pixel

weights_infodict

a dictionary storing information about the weights regridding process, with the fields:

  • target: showing which of the two grids was retained

  • ds_grid: a dictionary with the grid {"lat":ds.lat,"lon",ds.lon}

  • weights_grid: a dictionary with the grid {"lat":weights.lat,"lon":weights.lon}

Export functions

xagg.export.output_data(agg_obj, output_format, output_fn, loc_dim='poly_idx')

Wrapper for prep_for_* functions

Parameters
agg_objxagg.classes.aggregated

object to be exported

output_formatstr

‘netcdf’, ‘csv’, or ‘shp’

output_fn: str

the output filename

loc_dimstr, optional. default = 'poly_idx'

the name of the dimension with location indices; used only by xagg.export.prep_for_nc()

Returns
the variable that gets saved, so depending on the output_format:
  • “netcdf”: the xarray.Dataset on which .to_netcdf was called

  • “csv”: the pandas.Dataframe on which .to_csv was called

  • “shp”: the geopandas.GeoDataDrame on which .to_file was called

xagg.export.prep_for_csv(agg_obj)

Preps aggregated data for output as a csv

Concretely, aggregated data is placed in a new pandas dataframe and expanded wide - each aggregated variable is placed in new columns; one column per coordinate in each dimension that isn’t the location (poolygon). So, for example, a lat x lon x time variable “tas”, aggregated to location x time, would be reshaped long to columns “tas0”, “tas1”, “tas2”,… for timestep 0, 1, etc.

Note: Currently no support for variables with more than one extra dimension beyond their location dimensions. Potential options: a multi-index column name, so [var]0-0, [var]0-1, etc…

Parameters
agg_objxagg.classes.aggregated

the output from aggregate()

Returns
dfpandas.DataFrame

a pandas dataframe containing all the fields from the original location polygons + columns containing the values of the aggregated variables at each location. This can then easily be exported as a csv directly (using df.to_csv) or to shapefiles by first turning into a geodataframe.

xagg.export.prep_for_nc(agg_obj, loc_dim='poly_idx')

Preps aggregated data for output as a netcdf

Concretely, aggregated data is placed in a new xarray dataset with dimensions of location (the different polygons in gdf_out) and any other dimension(s) in the original input raster data. All fields from the input polygons are kept as variables with dimension of location.

Parameters
agg_objxagg.classes.aggregated
loc_dimstr, optional

the name of the location dimension; by definition ‘poly_idx’. Values of that dimension are currently only an integer index (with further information given by the field variables). Future versions may allow, if loc_dim is set to the name of a field in the input polygons, to replace the dimension with the values of that field.