pyspatialml: machine learning for raster datasets

Submodules

pyspatialml.plotting module

class pyspatialml.plotting.RasterPlot

Bases: object

plot(cmap=None, norm=None, figsize=None, out_shape=None, title_fontsize=8, label_fontsize=6, legend_fontsize=6, names=None, fig_kwds=None, legend_kwds=None, subplots_kwds=None)

Plot a Raster object as a raster matrix

Parameters
  • cmap (str (opt), default=None) – Specify a single cmap to apply to all of the RasterLayers. This overides the cmap attribute of each RasterLayer.

  • norm (matplotlib.colors.Normalize (opt), default=None) – A matplotlib.colors.Normalize to apply to all of the RasterLayers. This overides the norm attribute of each RasterLayer.

  • figsize (tuple (opt), default=None) – Size of the resulting matplotlib.figure.Figure.

  • out_shape (tuple, default=(100, 100)) – Number of rows, cols to read from the raster datasets for plotting.

  • title_fontsize (any number, default=8) – Size in pts of titles.

  • label_fontsize (any number, default=6) – Size in pts of axis ticklabels.

  • legend_fontsize (any number, default=6) – Size in pts of legend ticklabels.

  • names (list (opt), default=None) – Optionally supply a list of names for each RasterLayer to override the default layer names for the titles.

  • fig_kwds (dict (opt), default=None) – Additional arguments to pass to the matplotlib.pyplot.figure call when creating the figure object.

  • legend_kwds (dict (opt), default=None) – Additional arguments to pass to the matplotlib.pyplot.colorbar call when creating the colorbar object.

  • subplots_kwds (dict (opt), default=None) – Additional arguments to pass to the matplotlib.pyplot.subplots_adjust function. These are used to control the spacing and position of each subplot, and can include {left=None, bottom=None, right=None, top=None, wspace=None, hspace=None}.

Returns

axs – array of matplotlib.axes._subplots.AxesSubplot or a single matplotlib.axes._subplots.AxesSubplot if Raster object contains only a single layer.

Return type

numpy.ndarray

pyspatialml.plotting.discrete_cmap(N, base_cmap=None)

Create an N-bin discrete colormap from the specified input map.

Source: https://gist.github.com/jakevdp/91077b0cae40f8f8244a

Parameters
  • N (int) – The number of colors in the colormap

  • base_cmap (str) – The name of the matplotlib cmap to convert into a discrete map.

Returns

The cmap converted to a discrete map.

Return type

matplotlib.cmap

pyspatialml.plotting.rasterio_normalize(arr, axis=None)

Scales an array using min-max scaling.

Parameters
  • arr (ndarray) – A numpy array containing the image data.

  • axis (int (opt)) – The axis to perform the normalization along.

Returns

The normalized array

Return type

numpy.ndarray

pyspatialml.plotting.shiftedColorMap(cmap, start=0, midpoint=0.5, stop=1.0, name='shiftedcmap')

Function to offset the “center” of a colormap. Useful for data with a negative min and positive max and you want the middle of the colormap’s dynamic range to be at zero.

Source: http://stackoverflow.com/questions/7404116/defining-the-midpoint-of-a-colormap-in-matplotlib

Parameters
  • cmap (str) – The matplotlib colormap to be altered

  • start (any number) – Offset from lowest point in the colormap’s range. Defaults to 0.0 (no lower offset). Should be between 0.0 and midpoint.

  • midpoint (any number between 0.0 and 1.0) – The new center of the colormap. Defaults to 0.5 (no shift). In general, this should be 1 - vmax/(vmax + abs(vmin)). For example if your data range from -15.0 to +5.0 and you want the center of the colormap at 0.0, midpoint should be set to 1 - 5/(5 + 15)) or 0.75.

  • stop (any number between midpoint and 1.0) – Offset from highets point in the colormap’s range. Defaults to 1.0 (no upper offset).

Returns

The colormap with its centre shifted to the midpoint value.

Return type

matplotlib.cmap

pyspatialml.raster module

class pyspatialml.raster.Raster(src, crs=None, transform=None, nodata=None, mode='r', file_path=None, driver=None, tempdir=None, in_memory=False)

Bases: pyspatialml.plotting.RasterPlot, pyspatialml.rasterbase.BaseRaster

Creates a collection of file-based GDAL-supported raster datasets that share a common coordinate reference system and geometry.

Raster objects encapsulate RasterLayer objects, which represent single band raster datasets that can physically be represented by either separate single-band raster files, multi-band raster files, or any combination of individual bands from multi-band raster and single-band raster datasets.

Methods defined in a Raster class comprise those that would typically applied to a stack of raster datasets. In addition, these methods always return a new Raster object.

loc

Access pyspatialml.RasterLayer objects within a Raster using a key or a list of keys.

Type

_LocIndexer object

iloc

Access pyspatialml.RasterLayer objects using an index position. A wrapper around _LocIndexer to enable integer-based indexing of the items in the OrderedDict. Setting and getting items can occur using a single index position, a list or tuple of positions, or a slice of positions.

Type

_ILocIndexer object

files

A list of the raster dataset files that are used in the Raster. This does not have to be the same length as the number of RasterLayers because some files may have multiple bands.

Type

list

dtypes

A list of numpy dtypes for each RasterLayer.

Type

list

nodatavals

A list of the nodata values for each RasterLayer.

Type

list

count

The number of RasterLayers in the Raster.

Type

int

res

The resolution in (x, y) dimensions of the Raster.

Type

tuple

meta

A dict containing the raster metadata. The dict contains the following keys/values:

crs : the crs object transform : the Affine.affine transform object width : width of the Raster in pixels height : height of the Raster in pixels count : number of RasterLayers within the Raster dtype : the numpy datatype that represents lowest common

denominator of the different dtypes for all of the layers in the Raster.

Type

dict

names

A list of the RasterLayer names.

Type

list

block_shape

The default block_shape in (rows, cols) for reading windows of data in the Raster for out-of-memory processing.

Type

tuple

tempdir

Path to a directory to store temporary files that are produced during geoprocessing operations.

Type

str, default is tempfile.tempdir

in_memory

Whether to initiated the Raster from an array and store the data in-memory using Rasterio’s in-memory files.

Type

bool, default is False

aggregate(out_shape, resampling='nearest', file_path=None, in_memory=False, driver='GTiff', dtype=None, nodata=None, **kwargs)

Aggregates a raster to (usually) a coarser grid cell size.

Parameters
  • out_shape (tuple) – New shape in (rows, cols).

  • resampling (str (default 'nearest')) – Resampling method to use when applying decimated reads when out_shape is specified. Supported methods are: ‘average’, ‘bilinear’, ‘cubic’, ‘cubic_spline’, ‘gauss’, ‘lanczos’, ‘max’, ‘med’, ‘min’, ‘mode’, ‘q1’, ‘q3’.

  • file_path (str (optional, default None)) – File path to save to cropped raster. If not supplied then the aggregated raster is saved to a temporary file.

  • in_memory (bool, default is False) – Whether to initiated the Raster from an array and store the data in-memory using Rasterio’s in-memory files.

  • driver (str (default 'GTiff')) – Named of GDAL-supported driver for file export.

  • dtype (str (optional, default None)) – Coerce RasterLayers to the specified dtype. If not specified then the new intersected Raster is created using the dtype of the existing Raster dataset, which uses a dtype that can accommodate the data types of all of the individual RasterLayers.

  • nodata (any number (optional, default None)) – Nodata value for new dataset. If not specified then a nodata value is set based on the minimum permissible value of the Raster’s dtype. Note that this does not change the pixel nodata values of the raster, it only changes the metadata of what value represents a nodata pixel.

  • kwargs (opt) – Optional named arguments to pass to the format drivers. For example can be compress=”deflate” to add compression.

Returns

Raster object aggregated to a new pixel size.

Return type

Raster

append(other, in_place=False)

Method to add new RasterLayers to a Raster object.

Note that this modifies the Raster object in-place by default.

Parameters
  • other (Raster object, or list of Raster objects) – Object to append to the Raster.

  • in_place (bool (default False)) – Whether to change the Raster object in-place or leave original and return a new Raster object.

Returns

Returned only if in_place is False

Return type

Raster

apply(function, file_path=None, in_memory=False, driver='GTiff', dtype=None, nodata=None, progress=False, n_jobs=1, function_args={}, **kwargs)

Apply user-supplied function to a Raster object.

Parameters
  • function (function) – Function that takes an numpy array as a single argument.

  • file_path (str (optional, default None)) – Optional path to save calculated Raster object. If not specified then a tempfile is used.

  • in_memory (bool, default is False) – Whether to initiated the Raster from an array and store the data in-memory using Rasterio’s in-memory files.

  • driver (str (default 'GTiff')) – Named of GDAL-supported driver for file export.

  • dtype (str (optional, default None)) – Coerce RasterLayers to the specified dtype. If not specified then the new Raster is created using the dtype of the calculation result.

  • nodata (any number (optional, default None)) – Nodata value for new dataset. If not specified then a nodata value is set based on the minimum permissible value of the Raster’s data type. Note that this changes the values of the pixels that represent nodata pixels.

  • n_jobs (int (default 1)) – Number of processing cores to use for parallel execution. Default of -1 is all cores.

  • progress (bool (default False)) – Optionally show progress of transform operations.

  • function_args (dict (optional)) – Optionally pass arguments to the function as a dict or keyword arguments.

  • kwargs (opt) – Optional named arguments to pass to the format drivers. For example can be compress=”deflate” to add compression.

Returns

Raster containing the calculated result.

Return type

Raster

copy()

Creates a shallow copy of a Raster object

Note that shallow in the context of a Raster object means that an immutable copy of the object is made, however the on-disk file locations remain the same.

Returns

Return type

Raster

crop(bounds, file_path=None, in_memory=False, driver='GTiff', dtype=None, nodata=None, **kwargs)

Crops a Raster object by the supplied bounds.

Parameters
  • bounds (tuple) – A tuple containing the bounding box to clip by in the form of (xmin, ymin, xmax, ymax).

  • file_path (str (optional, default None)) – File path to save to cropped raster. If not supplied then the cropped raster is saved to a temporary file.

  • in_memory (bool, default is False) – Whether to initiated the Raster from an array and store the data in-memory using Rasterio’s in-memory files.

  • driver (str (default 'GTiff') Default is 'GTiff') – Named of GDAL-supported driver for file export.

  • dtype (str (optional, default None)) – Coerce RasterLayers to the specified dtype. If not specified then the new intersected Raster is created using the dtype of the existing Raster dataset, which uses a dtype that can accommodate the data types of all of the individual RasterLayers.

  • nodata (any number (optional, default None)) – Nodata value for new dataset. If not specified then a nodata value is set based on the minimum permissible value of the Raster’s data type. Note that this does not change the pixel nodata values of the raster, it only changes the metadata of what value represents a nodata pixel.

  • kwargs (opt) – Optional named arguments to pass to the format drivers. For example can be compress=”deflate” to add compression.

Returns

Raster cropped to new extent.

Return type

Raster

drop(labels, in_place=False)

Drop individual RasterLayers from a Raster object

Note that this modifies the Raster object in-place by default.

Parameters
  • labels (single label or list-like) – Index (int) or layer name to drop. Can be a single integer or label, or a list of integers or labels.

  • in_place (bool (default False)) – Whether to change the Raster object in-place or leave original and return a new Raster object.

Returns

Returned only if in_place is True

Return type

pyspatialml.Raster

extract_raster(src, return_array=False, progress=False)

Sample a Raster object by an aligned raster of labelled pixels.

Parameters
  • src (rasterio DatasetReader) – Single band raster containing labelled pixels as an open rasterio DatasetReader object.

  • return_array (bool (opt), default=False) – By default the extracted pixel values are returned as a geopandas.GeoDataFrame. If return_array=True then the extracted pixel values are returned as a tuple of numpy.ndarrays.

  • progress (bool (opt), default=False) – Show a progress bar for extraction.

Returns

  • geopandas.GeoDataFrame – Geodataframe containing extracted data as point features if return_array=False

  • tuple with three items if `return_array is True –

    • numpy.ndarray

      Numpy masked array of extracted raster values, typically 2d.

    • numpy.ndarray

      1d numpy masked array of labelled sampled.

    • numpy.ndarray

      2d numpy masked array of row and column indexes of training pixels.

extract_vector(gdf, return_array=False, progress=False)

Sample a Raster/RasterLayer using a geopandas GeoDataframe containing points, lines or polygon features.

Parameters
  • gdf (geopandas.GeoDataFrame) – Containing either point, line or polygon geometries. Overlapping geometries will cause the same pixels to be sampled.

  • return_array (bool (opt), default=False) – By default the extracted pixel values are returned as a geopandas.GeoDataFrame. If return_array=True then the extracted pixel values are returned as a tuple of numpy.ndarrays.

  • progress (bool (opt), default=False) – Show a progress bar for extraction.

Returns

  • geopandas.GeoDataframe – Containing extracted data as point geometries (one point per pixel) if return_array=False. The resulting GeoDataFrame is indexed using a named pandas.MultiIndex, with pixel_idx index referring to the index of each pixel that was sampled, and the geometry_idx index referring to the index of the each geometry in the supplied gdf. This makes it possible to keep track of how sampled pixel relates to the original geometries, i.e. multiple pixels being extracted within the area of a single polygon that can be referred to using the geometry_idx.

    The extracted data can subsequently be joined with the attribute table of the supplied gdf using:

    training_py = geopandas.read_file(nc.polygons) df = self.stack.extract_vector(gdf=training_py) df = df.dropna()

    df = df.merge(

    right=training_py.loc[:, (“id”, “label”)], left_on=”polygon_idx”, right_on=”id”, right_index=True

    )

  • tuple – A tuple (geodataframe index, extracted values, coordinates) of the extracted raster values as a masked array and the coordinates of the extracted pixels if as_gdf=False.

extract_xy(xys, return_array=False, progress=False)

Samples pixel values using an array of xy locations.

Parameters
  • xys (2d array-like) – x and y coordinates from which to sample the raster (n_samples, xys).

  • return_array (bool (opt), default=False) – By default the extracted pixel values are returned as a geopandas.GeoDataFrame. If return_array=True then the extracted pixel values are returned as a tuple of numpy.ndarrays.

  • progress (bool (opt), default=False) – Show a progress bar for extraction.

Returns

  • geopandas.GeoDataframe – Containing extracted data as point geometries if return_array=False.

  • numpy.ndarray – 2d masked array containing sampled raster values (sample, bands) at the x,y locations.

intersect(file_path=None, in_memory=False, driver='GTiff', dtype=None, nodata=None, **kwargs)

Perform a intersect operation on the Raster object.

Computes the geometric intersection of the RasterLayers with the Raster object. This will cause nodata values in any of the rasters to be propagated through all of the output rasters.

Parameters
  • file_path (str (optional, default None)) – File path to save to resulting Raster. If not supplied then the resulting Raster is saved to a temporary file.

  • in_memory (bool, default is False) – Whether to initiated the Raster from an array and store the data in-memory using Rasterio’s in-memory files.

  • driver (str (default 'GTiff')) – Named of GDAL-supported driver for file export.

  • dtype (str (optional, default None)) – Coerce RasterLayers to the specified dtype. If not specified then the new intersected Raster is created using the dtype of the existing Raster dataset, which uses a dtype that can accommodate the data types of all of the individual RasterLayers.

  • nodata (any number (optional, default None)) – Nodata value for new dataset. If not specified then a nodata value is set based on the minimum permissible value of the Raster’s data type. Note that this changes the values of the pixels that represent nodata to the new value.

  • kwargs (opt) – Optional named arguments to pass to the format drivers. For example can be compress=”deflate” to add compression.

Returns

Raster with layers that are masked based on a union of all masks in the suite of RasterLayers.

Return type

Raster

mask(shapes, invert=False, crop=True, pad=False, file_path=None, in_memory=False, driver='GTiff', dtype=None, nodata=None, **kwargs)

Mask a Raster object based on the outline of shapes in a geopandas.GeoDataFrame

Parameters
  • shapes (geopandas.GeoDataFrame) – GeoDataFrame containing masking features.

  • invert (bool (default False)) – If False then pixels outside shapes will be masked. If True then pixels inside shape will be masked.

  • crop (bool (default True)) – Crop the raster to the extent of the shapes.

  • pad (bool (default False)) – If True, the features will be padded in each direction by one half of a pixel prior to cropping raster.

  • file_path (str (optional, default None)) – File path to save to resulting Raster. If not supplied then the resulting Raster is saved to a temporary file

  • in_memory (bool, default is False) – Whether to initiated the Raster from an array and store the data in-memory using Rasterio’s in-memory files.

  • driver (str (default 'GTiff')) – Named of GDAL-supported driver for file export.

  • dtype (str (optional, default None)) – Coerce RasterLayers to the specified dtype. If not specified then the cropped Raster is created using the existing dtype, which uses a dtype that can accommodate the data types of all of the individual RasterLayers.

  • nodata (any number (optional, default None)) – Nodata value for cropped dataset. If not specified then a nodata value is set based on the minimum permissible value of the Raster’s data type. Note that this changes the values of the pixels to the new nodata value, and changes the metadata of the raster.

  • kwargs (opt) – Optional named arguments to pass to the format drivers. For example can be compress=”deflate” to add compression.

predict(estimator, file_path=None, in_memory=False, driver='GTiff', dtype=None, nodata=None, n_jobs=1, progress=False, **kwargs)

Apply prediction of a scikit learn model to a Raster.

The model can represent any scikit learn model or compatible api with a fit and predict method. These can consist of classification or regression models. Multi-class classifications and multi-target regressions are also supported.

Parameters
  • estimator (estimator object implementing 'fit') – The object to use to fit the data.

  • file_path (str (optional, default None)) – Path to a GeoTiff raster for the prediction results. If not specified then the output is written to a temporary file.

  • in_memory (bool, default is False) – Whether to initiated the Raster from an array and store the data in-memory using Rasterio’s in-memory files.

  • driver (str (default 'GTiff')) – Named of GDAL-supported driver for file export

  • dtype (str (optional, default None)) – Optionally specify a GDAL compatible data type when saving to file. If not specified, np.float32 is assumed.

  • nodata (any number (optional, default None)) – Nodata value for file export. If not specified then the nodata value is derived from the minimum permissible value for the given data type.

  • n_jobs (int (default 1)) – Number of processing cores to use for parallel execution. Default is n_jobs=1. -1 is all cores; -2 is all cores -1.

  • progress (bool (default False)) – Show progress bar for prediction.

  • kwargs (opt) – Optional named arguments to pass to the format drivers. For example can be compress=”deflate” to add compression.

Returns

Raster object containing prediction results as a RasterLayers. For classification and regression models, the Raster will contain a single RasterLayer, unless the model is multi-class or multi-target. Layers are named automatically as pred_raw_n with n = 1, 2, 3 ..n.

Return type

Raster

predict_proba(estimator, file_path=None, in_memory=False, indexes=None, driver='GTiff', dtype=None, nodata=None, progress=False, n_jobs=1, **kwargs)

Apply class probability prediction of a scikit learn model to a Raster.

Parameters
  • estimator (estimator object implementing 'fit') – The object to use to fit the data.

  • file_path (str (optional, default None)) – Path to a GeoTiff raster for the prediction results. If not specified then the output is written to a temporary file.

  • in_memory (bool, default is False) – Whether to initiated the Raster from an array and store the data in-memory using Rasterio’s in-memory files.

  • indexes (list of integers (optional, default None)) – List of class indices to export. In some circumstances, only a subset of the class probability estimations are desired, for instance when performing a binary classification only the probabilities for the positive class may be desired.

  • driver (str (default 'GTiff')) – Named of GDAL-supported driver for file export.

  • dtype (str (optional, default None)) – Optionally specify a GDAL compatible data type when saving to file. If not specified, a data type is set based on the data type of the prediction.

  • nodata (any number (optional, default None)) – Nodata value for file export. If not specified then the nodata value is derived from the minimum permissible value for the given data type.

  • n_jobs (int (default 1)) – Number of processing cores to use for parallel execution. Default is n_jobs=1. -1 is all cores; -2 is all cores -1.

  • progress (bool (default False)) – Show progress bar for prediction.

  • kwargs (opt) – Optional named arguments to pass to the format drivers. For example can be compress=”deflate” to add compression.

Returns

Raster containing predicted class probabilities. Each predicted class is represented by a RasterLayer object. The RasterLayers are named prob_n for 1,2,3..n, with n based on the index position of the classes, not the number of the class itself.

For example, a classification model predicting classes with integer values of 1, 3, and 5 would result in three RasterLayers named ‘prob_1’, ‘prob_2’ and ‘prob_3’.

Return type

Raster

read(masked=False, window=None, out_shape=None, resampling='nearest', as_df=False, **kwargs)

Reads data from the Raster object into a numpy array.

Parameters
  • masked (bool (default False)) – Read data into a masked array.

  • window (rasterio.window.Window object (optional, default None)) – Tuple of col_off, row_off, width, height of a window of data to read a chunk of data into a ndarray.

  • out_shape (tuple (optional, default None)) – Shape of shape of array (rows, cols) to read data into using decimated reads.

  • resampling (str (default 'nearest')) – Resampling method to use when applying decimated reads when out_shape is specified. Supported methods are: ‘average’, ‘bilinear’, ‘cubic’, ‘cubic_spline’, ‘gauss’, ‘lanczos’, ‘max’, ‘med’, ‘min’, ‘mode’, ‘q1’, ‘q3’.

  • as_df (bool (default False)) – Whether to return the data as a pandas.DataFrame with columns named by the RasterLayer names.

  • **kwargs (dict) – Other arguments to pass to rasterio.DatasetReader.read method

Returns

Raster values in 3d ndarray with the dimensions in order of (band, row, and column).

Return type

ndarray

rename(names, in_place=False)

Rename a RasterLayer within the Raster object.

Parameters
  • names (dict) – dict of old_name : new_name

  • in_place (bool (default False)) – Whether to change names of the Raster object in-place or leave original and return a new Raster object.

Returns

Returned only if in_place is False

Return type

pyspatialml.Raster

sample(size, strata=None, return_array=False, random_state=None)

Generates a random sample of according to size, and samples the pixel values.

Parameters
  • size (int) – Number of random samples or number of samples per strata if strategy=’stratified’.

  • strata (rasterio DatasetReader (opt)) – Whether to use stratified instead of random sampling. Strata can be supplied using an open rasterio DatasetReader object.

  • return_array (bool (opt), default=False) – Optionally return extracted data as separate X, y and xy masked numpy arrays.

  • random_state (int (opt)) – integer to use within random.seed.

Returns

A tuple containing two elements:

  • numpy.ndarray

    Numpy array of extracted raster values, typically 2d.

  • numpy.ndarray

    2D numpy array of xy coordinates of extracted values.

Return type

tuple

to_crs(crs, resampling='nearest', file_path=None, in_memory=False, driver='GTiff', nodata=None, n_jobs=1, warp_mem_lim=0, progress=False, **kwargs)

Reprojects a Raster object to a different crs.

Parameters
  • crs (rasterio.transform.CRS object, or dict) – Example: CRS({‘init’: ‘EPSG:4326’})

  • resampling (str (default 'nearest')) – Resampling method to use. One of the following: nearest, bilinear, cubic, cubic_spline, lanczos, average, mode, max (GDAL >= 2.2), min (GDAL >= 2.2), med (GDAL >= 2.2), q1 (GDAL >= 2.2), q3 (GDAL >= 2.2)

  • file_path (str (optional, default None)) – Optional path to save reprojected Raster object. If not specified then a tempfile is used.

  • in_memory (bool, default is False) – Whether to initiated the Raster from an array and store the data in-memory using Rasterio’s in-memory files.

  • driver (str (default 'GTiff')) – Named of GDAL-supported driver for file export.

  • nodata (any number (optional, default None)) – Nodata value for new dataset. If not specified then the existing nodata value of the Raster object is used, which can accommodate the dtypes of the individual layers in the Raster.

  • n_jobs (int (default 1)) – The number of warp worker threads.

  • warp_mem_lim (int (default 0)) – The warp operation memory limit in MB. Larger values allow the warp operation to be carried out in fewer chunks. The amount of memory required to warp a 3-band uint8 2000 row x 2000 col raster to a destination of the same size is approximately 56 MB. The default (0) means 64 MB with GDAL 2.2.

  • progress (bool (default False)) – Optionally show progress of transform operations.

  • kwargs (opt) – Optional named arguments to pass to the format drivers. For example can be compress=”deflate” to add compression.

Returns

Raster following reprojection.

Return type

Raster

to_pandas(max_pixels=None, resampling='nearest')

Raster to pandas DataFrame.

Parameters
  • max_pixels (int (default None)) – Maximum number of pixels to sample. By default all pixels are used.

  • resampling (str (default 'nearest')) – Resampling method to use when applying decimated reads when out_shape is specified. Supported methods are: ‘average’, ‘bilinear’, ‘cubic’, ‘cubic_spline’, ‘gauss’, ‘lanczos’, ‘max’, ‘med’, ‘min’, ‘mode’, ‘q1’, ‘q3’.

Returns

DataFrame containing values of names of RasterLayers in the Raster as columns, and pixel values as rows.

Return type

pandas.DataFrame

write(file_path, driver='GTiff', dtype=None, nodata=None, **kwargs)

Write the Raster object to a file.

Overrides the write RasterBase class method, which is a partial function of the rasterio.DatasetReader.write method.

Parameters
  • file_path (str) – File path used to save the Raster object.

  • driver (str (default is 'GTiff')) – Name of GDAL driver used to save Raster data.

  • dtype (str (opt, default None)) – Optionally specify a numpy compatible data type when saving to file. If not specified, a data type is selected based on the data types of RasterLayers in the Raster object.

  • nodata (any number (opt, default None)) – Optionally assign a new nodata value when saving to file. If not specified a nodata value based on the minimum permissible value for the data types of RasterLayers in the Raster object is used. Note that this does not change the pixel nodata values of the raster, it only changes the metadata of what value represents a nodata pixel.

  • kwargs (opt) – Optional named arguments to pass to the format drivers. For example can be compress=”deflate” to add compression.

Returns

New Raster object from saved file.

Return type

Raster

pyspatialml.rasterlayer module

class pyspatialml.rasterlayer.RasterLayer(band)

Bases: object

Represents a single raster band derived from a single or multi-band raster dataset

Simple wrapper around a rasterio.Band object with additional methods. Used because the Rasterio.Band.ds.read method reads all bands from a multi-band dataset, whereas the RasterLayer read method only reads a single band.

Methods encapsulated in RasterLayer objects represent those that typically would only be applied to a single-band of a raster, i.e. sieve-clump, distance to non-NaN pixels, or arithmetic operations on individual layers.

bidx

The band index of the RasterLayer within the file dataset.

Type

int

dtype

The data type of the RasterLayer.

Type

str

ds

The underlying rasterio.band object.

Type

rasterio.band

name

A syntactically valid name for the RasterLayer.

Type

str

file

The file path to the dataset.

Type

str

nodata

The number that is used to represent nodata pixels in the RasterLayer.

Type

any number

driver

The name of the GDAL format driver.

Type

str

meta

A python dict storing the RasterLayer metadata.

Type

dict

transform

The affine transform parameters.

Type

affine.Affine object

count

Number of layers; always equal to 1.

Type

int

shape

Shape of RasterLayer in (rows, columns)

Type

tuple

width, height

The width (cols) and height (rows) of the dataset.

Type

int

bounds

A named tuple with left, bottom, right and top coordinates of the dataset.

Type

BoundingBox named tuple

cmap

The name of matplotlib map, or a custom matplotlib.cm.LinearSegmentedColormap or ListedColormap object.

Type

str

norm

A matplotlib.colors.Normalize to apply to the RasterLayer. This overides the norm attribute of the RasterLayer.

Type

matplotlib.colors.Normalize (opt)

close()
max(max_pixels=10000)

Maximum value.

Parameters

max_pixels (int) – Number of pixels used to inform statistical estimate.

Returns

The maximum value of the object’s pixels.

Return type

numpy.float32

mean(max_pixels=10000)

Mean value

Parameters

max_pixels (int) – Number of pixels used to inform statistical estimate.

Returns

The mean value of the object’s pixels.

Return type

numpy.float32

median(max_pixels=10000)

Median value

Parameters

max_pixels (int) – Number of pixels used to inform statistical estimate.

Returns

The medium value of the object’s pixels.

Return type

numpy.float32

min(max_pixels=10000)

Minimum value.

Parameters

max_pixels (int) – Number of pixels used to inform statistical estimate.

Returns

The minimum value of the object

Return type

numpy.float32

plot(cmap=None, norm=None, ax=None, cax=None, figsize=None, out_shape=(100, 100), categorical=None, legend=False, vmin=None, vmax=None, fig_kwds=None, legend_kwds=None)

Plot a RasterLayer using matplotlib.pyplot.imshow :param cmap: The name of a colormap recognized by matplotlib.

Overrides the cmap attribute of the RasterLayer.

Parameters
  • norm (matplotlib.colors.Normalize (opt)) – A matplotlib.colors.Normalize to apply to the RasterLayer. This overrides the norm attribute of the RasterLayer.

  • ax (matplotlib.pyplot.Artist (optional, default None)) – axes instance on which to draw to plot.

  • cax (matplotlib.pyplot.Artist (optional, default None)) – axes on which to draw the legend.

  • figsize (tuple of integers (optional, default None)) – Size of the matplotlib.figure.Figure. If the ax argument is given explicitly, figsize is ignored.

  • out_shape (tuple, default=(100, 100)) – Number of rows, cols to read from the raster datasets for plotting.

  • categorical (bool (optional, default False)) – if True then the raster values will be considered to represent discrete values, otherwise they are considered to represent continuous values. This overrides the RasterLayer ‘categorical’ attribute. Setting the argument categorical to True is ignored if the RasterLayer.categorical is already True.

  • legend (bool (optional, default False)) – Whether to plot the legend.

  • vmin (scale (optional, default None)) – vmin and vmax define the data range that the colormap covers. By default, the colormap covers the complete value range of the supplied data. vmin, vmax are ignored if the norm parameter is used.

  • xmax (scale (optional, default None)) – vmin and vmax define the data range that the colormap covers. By default, the colormap covers the complete value range of the supplied data. vmin, vmax are ignored if the norm parameter is used.

  • fig_kwds (dict (optional, default None)) – Additional arguments to pass to the matplotlib.pyplot.figure call when creating the figure object. Ignored if ax is passed to the plot function.

  • legend_kwds (dict (optional, default None)) – Keyword arguments to pass to matplotlib.pyplot.colorbar().

Returns

ax

Return type

matplotlib axes instance

read(**kwargs)

Read method for a single RasterLayer.

Reads the pixel values from a RasterLayer into a ndarray that always will have two dimensions in the order of (rows, columns).

Parameters
  • **kwargs (named arguments that can be passed to the the) –

  • method. (rasterio.DatasetReader.read) –

write(file_path, driver='GTiff', dtype=None, nodata=None, **kwargs)

Write method for a single RasterLayer.

Parameters
  • file_path (str (opt)) – File path to save the dataset.

  • driver (str) – GDAL-compatible driver used for the file format.

  • dtype (str (opt)) – Numpy dtype used for the file. If omitted then the RasterLayer’s dtype is used.

  • nodata (any number (opt)) – A value used to represent the nodata pixels. If omitted then the RasterLayer’s nodata value is used (if assigned already).

  • kwargs (opt) – Optional named arguments to pass to the format drivers. For example can be compress=”deflate” to add compression.

Returns

Return type

pyspatialml.RasterLayer

pyspatialml.transformers module

class pyspatialml.transformers.GeoDistTransformer(refs, log=False)

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

Transformer to add new features based on geographical distances to reference locations.

Parameters
  • refs (ndarray) –

    Array of coordinates of reference locations in (m, n-dimensional) order, such as {n_locations, x_coordinates, y_coordinates, …} for as many dimensions as required. For example to calculate distances to a single x,y,z location:

    refs = [-57.345, -110.134, 1012]

    And to calculate distances to three x,y reference locations:

    refs = [

    [-57.345, -110.134], [-56.345, -109.123], [-58.534, -112.123]

    ]

    The supplied array has to have at least x,y coordinates with a (1, 2) shape for a single location.

  • log (bool (opt), default=False) – Optionally log-transform the distance measures.

Returns

X_new – Array of shape (n_samples, n_features) with new geodistance features appended to the right-most columns of the array.

Return type

ndarray

fit(X, y=None)
transform(X, y=None)
class pyspatialml.transformers.KNNTransformer(n_neighbors=7, weights='distance', measure='mean', radius=1.0, algorithm='auto', leaf_size=30, metric='minkowski', p=2, normalize=True, metric_params=None, kernel_params=None, n_jobs=1)

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

Transformer to generate new lag features by weighted aggregation of K-neighboring observations.

A lag transformer uses a weighted mean/mode of the values of the K-neighboring observations to generate new lagged features. The weighted mean/mode of the surrounding observations are appended as a new feature to the right-most column in the training data.

The K-neighboring observations are determined using the distance metric specified in the metric argument. The default metric is minkowski, and with p=2 is equivalent to the standard Euclidean metric.

Parameters
  • n_neighbors (int, default = 7) – Number of neighbors to use by default for kneighbors queries.

  • weights ({‘uniform’, ‘distance’} or callable, default=’distance’) –

    Weight function used in prediction. Possible values:

    • ‘uniform’uniform weights. All points in each neighborhood are

      weighted equally.

    • ‘distance’weight points by the inverse of their distance. In

      this case, closer neighbors of a query point will have a greater influence than neighbors which are further away.

    • [callable]a user-defined function which accepts an array of

      distances, and returns an array of the same shape containing the weights.

  • measure ({'mean', 'mode'}) – Function that is used to apply the weights to y. Use ‘mean’ if the target variable is continuous and ‘mode’ if the target variable is discrete.

  • radius (float, default=1.0) – Range of parameter space to use by default for radius_neighbors queries.

  • algorithm ({‘auto’, ‘ball_tree’, ‘kd_tree’, ‘brute’}, default=’auto’) –

    Algorithm used to compute the nearest neighbors:

    • ‘ball_tree’ will use BallTree

    • ‘kd_tree’ will use KDTree

    • ‘brute’ will use a brute-force search.

    • ‘auto’ will attempt to decide the most appropriate algorithm

      based on the values passed to fit method.

    • Note: fitting on sparse input will override the setting of this

      parameter, using brute force.

  • leaf_size (int, default=30) – Leaf size passed to BallTree or KDTree. This can affect the speed of the construction and query, as well as the memory required to store the tree. The optimal value depends on the nature of the problem.

  • metric (str or callable, default=’minkowski’) – The distance metric to use for the tree. The default metric is minkowski, and with p=2 is equivalent to the standard Euclidean metric. See the documentation of DistanceMetric for a list of available metrics. If metric is “precomputed”, X is assumed to be a distance matrix and must be square during fit. X may be a sparse graph, in which case only “nonzero” elements may be considered neighbors.

  • p (int, default=2) – Parameter for the Minkowski metric from sklearn.metrics.pairwise.pairwise_distances. When p = 1, this is equivalent to using manhattan_distance (l1), and euclidean_distance (l2) for p = 2. For arbitrary p, minkowski_distance (l_p) is used.

  • normalize (bool, default=True) – Whether to standardize the inputs using sklearn.preprocessing.Normalizer

  • metric_params (dict, default=None) – Additional keyword arguments for the metric function.

  • kernel_params (dict, default=None) – Additional keyword arguments to pass to a custom kernel function.

  • n_jobs (int, default=None) – The number of parallel jobs to run for neighbors search. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors. See Glossary for more details.

fit(X, y=None)

Fit the base_estimator with features from X {n_samples, n_features} and with an additional spatially lagged variable added to the right-most column of the training data.

During fitting, the k-neighbors to each training point are used to estimate the spatial lag component. The training point is not included in the calculation, i.e. the training point is not considered its own neighbor.

Parameters
  • X (array-like of sample {n_samples, n_features} using for model) – fitting The training input samples

  • y (array-like of shape (n_samples,)) – The target values (class labels in classification, real numbers in regression).

transform(X, y=None)

Transform method for spatial lag models.

Augments new observations with a spatial lag variable created from a weighted mean/mode (regression/classification) of k-neighboring observations.

Parameters
  • X (array-like of sample {n_samples, n_features}) – New samples for the prediction.

  • y (None) – Not used.

pyspatialml.vector module

pyspatialml.vector.filter_points(gdf, min_dist=0, remove='first')

Filter points in geodataframe using a minimum distance buffer.

Parameters
  • gdf (Geopandas GeoDataFrame) – Containing point geometries.

  • min_dist (int or float, optional (default=0)) – Minimum distance by which to filter out closely spaced points.

  • remove (str, optional (default='first')) – Optionally choose to remove ‘first’ occurrences or ‘last’ occurrences.

Returns

xy – Numpy array filtered coordinates

Return type

2d array-like

pyspatialml.vector.get_random_point_in_polygon(poly)

Generates random shapely Point geometry objects within a single shapely Polygon object.

Parameters

poly (Shapely Polygon object) –

Returns

p

Return type

Shapely Point object