pyspatialml: machine learning for raster datasets

Submodules

pyspatialml.estimators module

class pyspatialml.estimators.SpatialLagBase(base_estimator, n_neighbors=7, weights='distance', radius=1.0, algorithm='auto', leaf_size=30, metric='minkowski', p=2, metric_params=None, kernel_params=None, feature_indices=None, n_jobs=1)

Bases: abc.ABC, sklearn.base.BaseEstimator

Base class for spatial lag estimators.

A spatial lag estimator uses a weighted mean/mode of the values of the K-neighboring observations to augment the base_estimator. The weighted mean/mode of the surrounding observations are appended as a new feature to the right-most column in the training data.

The K-neighboring observations are determined using the distance metric specified in the metric argument. The default metric is minkowski, and with p=2 is equivalent to the standard Euclidean metric.

Parameters
  • base_estimator (estimator object.) – This is assumed to implement the scikit-learn estimator interface. Either estimator needs to provide a score function, or scoring must be passed.

  • n_neighbors (int, default = 7) – Number of neighbors to use by default for kneighbors queries.

  • weights ({‘uniform’, ‘distance’} or callable, default=’distance’) –

    Weight function used in prediction. Possible values:

    • ‘uniform’ : uniform weights. All points in each neighborhood are weighted equally.

    • ‘distance’ : weight points by the inverse of their distance. in this case, closer neighbors of a query point will have a greater influence than neighbors which are further away.

    • [callable] : a user-defined function which accepts an array of distances, and returns an array of the same shape containing the weights.

  • radius (float, default=1.0) – Range of parameter space to use by default for radius_neighbors queries.

  • algorithm ({‘auto’, ‘ball_tree’, ‘kd_tree’, ‘brute’}, default=’auto’) –

    Algorithm used to compute the nearest neighbors:

    • ‘ball_tree’ will use BallTree

    • ‘kd_tree’ will use KDTree

    • ‘brute’ will use a brute-force search.

    • ‘auto’ will attempt to decide the most appropriate algorithm based on the values passed to fit method.

    • Note: fitting on sparse input will override the setting of this parameter, using brute force.

  • leaf_size (int, default=30) – Leaf size passed to BallTree or KDTree. This can affect the speed of the construction and query, as well as the memory required to store the tree. The optimal value depends on the nature of the problem.

  • metric (str or callable, default=’minkowski’) – The distance metric to use for the tree. The default metric is minkowski, and with p=2 is equivalent to the standard Euclidean metric. See the documentation of DistanceMetric for a list of available metrics. If metric is “precomputed”, X is assumed to be a distance matrix and must be square during fit. X may be a sparse graph, in which case only “nonzero” elements may be considered neighbors.

  • p (int, default=2) – Parameter for the Minkowski metric from sklearn.metrics.pairwise.pairwise_distances. When p = 1, this is equivalent to using manhattan_distance (l1), and euclidean_distance (l2) for p = 2. For arbitrary p, minkowski_distance (l_p) is used.

  • metric_params (dict, default=None) – Additional keyword arguments for the metric function.

  • kernel_params (dict, default=None) – Additional keyword arguments to pass to a custom kernel function.

  • feature_indices (list, default=None) – By default, the nearest neighbors are determined from the distance metric calculated using all of the features. If feature_indices are supplied then the distance calculation is restricted to the specific column indices. For spatial data, these might represent the x,y coordinates for example.

  • n_jobs (int, default=None) – The number of parallel jobs to run for neighbors search. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors. See Glossary for more details.

fit(X, y)

Fit the base_estimator with features from X {n_samples, n_features} and with an additional spatially lagged variable added to the right-most column of the training data.

During fitting, the k-neighbors to each training point are used to estimate the spatial lag component. The training point is not included in the calculation, i.e. the training point is not considered its own neighbor.

Parameters
  • X (array-like of sample {n_samples, n_features} using for model fitting) – The training input samples

  • y (array-like of shape (n_samples,)) – The target values (class labels in classification, real numbers in regression).

predict(X, y=None)

Predict method for spatial lag models.

Augments new osbservations with a spatial lag variable created from a weighted mean/mode (regression/classification) of k-neighboring observations.

Parameters
  • X (array-like of sample {n_samples, n_features}) – New samples for the prediction.

  • y (None) – Not used.

class pyspatialml.estimators.SpatialLagClassifier(base_estimator, n_neighbors=7, weights='distance', radius=1.0, algorithm='auto', leaf_size=30, metric='minkowski', p=2, metric_params=None, kernel_params=None, feature_indices=None, n_jobs=1)

Bases: sklearn.base.ClassifierMixin, pyspatialml.estimators.SpatialLagBase

class pyspatialml.estimators.SpatialLagRegressor(base_estimator, n_neighbors=7, weights='distance', radius=1.0, algorithm='auto', leaf_size=30, metric='minkowski', p=2, metric_params=None, kernel_params=None, feature_indices=None, n_jobs=1)

Bases: sklearn.base.RegressorMixin, pyspatialml.estimators.SpatialLagBase

class pyspatialml.estimators.ThresholdClassifierCV(estimator, thresholds=array([0.1, 0.11, 0.12, 0.13, 0.14, 0.15, 0.16, 0.17, 0.18, 0.19, 0.2, 0.21, 0.22, 0.23, 0.24, 0.25, 0.26, 0.27, 0.28, 0.29, 0.3, 0.31, 0.32, 0.33, 0.34, 0.35, 0.36, 0.37, 0.38, 0.39, 0.4, 0.41, 0.42, 0.43, 0.44, 0.45, 0.46, 0.47, 0.48, 0.49, 0.5, 0.51, 0.52, 0.53, 0.54, 0.55, 0.56, 0.57, 0.58, 0.59, 0.6, 0.61, 0.62, 0.63, 0.64, 0.65, 0.66, 0.67, 0.68, 0.69, 0.7, 0.71, 0.72, 0.73, 0.74, 0.75, 0.76, 0.77, 0.78, 0.79, 0.8, 0.81, 0.82, 0.83, 0.84, 0.85, 0.86, 0.87, 0.88, 0.89]), scoring=None, refit=False, cv=3, random_state=None)

Bases: sklearn.base.BaseEstimator, sklearn.base.ClassifierMixin

Metaclassifier to perform cutoff threshold optimization

This implementation is restricted to binary classification problems

Notes

  • The training data are partitioned in k-1, and k sets

  • The metaclassifier trains the BaseEstimator on the k-1 partitions

  • The Kth paritions are used to determine the optimal cutoff taking the mean of the thresholds that maximize the scoring metric

  • The optimal cutoff is applied to all classifier predictions

fit(X, y=None, groups=None, **fit_params)

Run fit method with all sets of parameters

Parameters
  • X (array-like, shape = [n_samples, n_features]) – Training vector, where n_samples is the number of samples and n_features is the number of features

  • y (array-like, shape = [n_samples] or [n_samples, n_output], optional) – Target relative to X for classification or regression; None for unsupervised learning

  • groups (array-like, shape = [n_samples], optional) – Training vector groups for cross-validation

  • **fit_params (dict of string -> object) – Parameters passed to the fit method of the estimator

Notes

Rules

  • Parameters are checked during the fit method

  • New attributes created during fitting should end in _, i.e. fitted_

  • Fit method needs to return self for compatibility reasons with sklearn

  • The response vector, i.e. y, should be initiated with None

predict(X, y=None)
predict_proba(X)
score(X, y)

Overloading of classifier score method score method is required for compatibility with GridSearch The scoring metric should be one that can be maximized (bigger=better)

pyspatialml.plotting module

class pyspatialml.plotting.RasterPlot

Bases: object

plot(cmap=None, norm=None, figsize=None, out_shape=(100, 100), title_fontsize=8, label_fontsize=6, legend_fontsize=6, names=None, fig_kwds=None, legend_kwds=None, subplots_kwds=None)

Plot a Raster object as a raster matrix

Parameters
  • cmap (str (opt), default=None) – Specify a single cmap to apply to all of the RasterLayers. This overides the cmap attribute of each RasterLayer.

  • norm (matplotlib.colors.Normalize (opt), default=None) – A matplotlib.colors.Normalize to apply to all of the RasterLayers. This overides the norm attribute of each RasterLayer.

  • figsize (tuple (opt), default=None) – Size of the resulting matplotlib.figure.Figure.

  • out_shape (tuple, default=(100, 100)) – Number of rows, cols to read from the raster datasets for plotting.

  • title_fontsize (any number, default=8) – Size in pts of titles.

  • label_fontsize (any number, default=6) – Size in pts of axis ticklabels.

  • legend_fontsize (any number, default=6) – Size in pts of legend ticklabels.

  • names (list (opt), default=None) – Optionally supply a list of names for each RasterLayer to override the default layer names for the titles.

  • fig_kwds (dict (opt), default=None) – Additional arguments to pass to the matplotlib.pyplot.figure call when creating the figure object.

  • legend_kwds (dict (opt), default=None) – Additional arguments to pass to the matplotlib.pyplot.colorbar call when creating the colorbar object.

  • subplots_kwds (dict (opt), default=None) – Additional arguments to pass to the matplotlib.pyplot.subplots_adjust function. These are used to control the spacing and position of each subplot, and can include {left=None, bottom=None, right=None, top=None, wspace=None, hspace=None}.

Returns

axs – array of matplotlib.axes._subplots.AxesSubplot or a single matplotlib.axes._subplots.AxesSubplot if Raster object contains only a single layer.

Return type

numpy.ndarray

pyspatialml.plotting.discrete_cmap(N, base_cmap=None)

Create an N-bin discrete colormap from the specified input map.

Source: https://gist.github.com/jakevdp/91077b0cae40f8f8244a

Parameters
  • N (int) – The number of colors in the colormap

  • base_cmap (str) – The name of the matplotlib cmap to convert into a discrete map.

Returns

The cmap converted to a discrete map.

Return type

matplotlib.cmap

pyspatialml.plotting.rasterio_normalize(arr, axis=None)

Scales an array using min-max scaling.

Parameters
  • arr (ndarray) – A numpy array containing the image data.

  • axis (int (opt)) – The axis to perform the normalization along.

Returns

The normalized array

Return type

numpy.ndarray

pyspatialml.plotting.shiftedColorMap(cmap, start=0, midpoint=0.5, stop=1.0, name='shiftedcmap')

Function to offset the “center” of a colormap. Useful for data with a negative min and positive max and you want the middle of the colormap’s dynamic range to be at zero.

Source: http://stackoverflow.com/questions/7404116/defining-the-midpoint-of-a-colormap-in-matplotlib

Parameters
  • cmap (str) – The matplotlib colormap to be altered

  • start (any number) – Offset from lowest point in the colormap’s range. Defaults to 0.0 (no lower offset). Should be between 0.0 and midpoint.

  • midpoint (any number between 0.0 and 1.0) – The new center of the colormap. Defaults to 0.5 (no shift). In general, this should be 1 - vmax/(vmax + abs(vmin)). For example if your data range from -15.0 to +5.0 and you want the center of the colormap at 0.0, midpoint should be set to 1 - 5/(5 + 15)) or 0.75.

  • stop (any number between midpoint and 1.0) – Offset from highets point in the colormap’s range. Defaults to 1.0 (no upper offset).

Returns

The colormap with its centre shifted to the midpoint value.

Return type

matplotlib.cmap

pyspatialml.raster module

class pyspatialml.raster.Raster(src=None, arr=None, crs=None, transform=None, nodata=None, mode='r', file_path=None)

Bases: pyspatialml.plotting.RasterPlot, pyspatialml.base.BaseRaster

Flexible class that represents a collection of file-based GDAL-supported raster datasets which share a common coordinate reference system and geometry.

Raster objects encapsulate RasterLayer objects, which represent single band raster datasets that can physically be represented by either separate single-band raster files, multi-band raster files, or any combination of individual bands from multi-band raster and single-band raster datasets.

Methods defined in a Raster class comprise those that would typically applied to a stack of raster datasets. In addition, these methods always return a new Raster object.

aggregate(out_shape, resampling='nearest', file_path=None, driver='GTiff', dtype=None, nodata=None, **kwargs)

Aggregates a raster to (usually) a coarser grid cell size.

Parameters
  • out_shape (tuple) – New shape in (rows, cols).

  • resampling (str (default 'nearest')) – Resampling method to use when applying decimated reads when out_shape is specified. Supported methods are: ‘average’, ‘bilinear’, ‘cubic’, ‘cubic_spline’, ‘gauss’, ‘lanczos’, ‘max’, ‘med’, ‘min’, ‘mode’, ‘q1’, ‘q3’.

  • file_path (str (optional, default None)) – File path to save to cropped raster. If not supplied then the aggregated raster is saved to a temporary file.

  • driver (str (default 'GTiff')) – Named of GDAL-supported driver for file export.

  • dtype (str (optional, default None)) – Coerce RasterLayers to the specified dtype. If not specified then the new intersected Raster is created using the dtype of the existing Raster dataset, which uses a dtype that can accommodate the data types of all of the individual RasterLayers.

  • nodata (any number (optional, default None)) – Nodata value for new dataset. If not specified then a nodata value is set based on the minimum permissible value of the Raster’s dtype. Note that this does not change the pixel nodata values of the raster, it only changes the metadata of what value represents a nodata pixel.

  • kwargs (opt) – Optional named arguments to pass to the format drivers. For example can be compress=”deflate” to add compression.

Returns

Raster object aggregated to a new pixel size.

Return type

pyspatialml.Raster

append(other, in_place=True)

Method to add new RasterLayers to a Raster object.

Note that this modifies the Raster object in-place by default.

Parameters
  • other (Raster object, or list of Raster objects) – Object to append to the Raster.

  • in_place (bool (default True)) – Whether to change the Raster object in-place or leave original and return a new Raster object.

Returns

Returned only if in_place is True

Return type

pyspatialml.Raster

apply(function, file_path=None, driver='GTiff', dtype=None, nodata=None, progress=False, n_jobs=- 1, **kwargs)

Apply user-supplied function to a Raster object.

Parameters
  • function (function) – Function that takes an numpy array as a single argument.

  • file_path (str (optional, default None)) – Optional path to save calculated Raster object. If not specified then a tempfile is used.

  • driver (str (default 'GTiff')) – Named of GDAL-supported driver for file export.

  • dtype (str (optional, default None)) – Coerce RasterLayers to the specified dtype. If not specified then the new Raster is created using the dtype of the calculation result.

  • nodata (any number (optional, default None)) – Nodata value for new dataset. If not specified then a nodata value is set based on the minimum permissible value of the Raster’s data type. Note that this changes the values of the pixels that represent nodata pixels.

  • n_jobs (int (default -1)) – Number of processing cores to use for parallel execution. Default of -1 is all cores.

  • progress (bool (default False)) – Optionally show progress of transform operations.

  • kwargs (opt) – Optional named arguments to pass to the format drivers. For example can be compress=”deflate” to add compression.

Returns

Raster containing the calculated result.

Return type

pyspatialml.Raster

astype(dtype, file_path=None, driver='GTiff', nodata=None, **kwargs)

Coerce Raster to a different dtype.

Parameters
  • dtype (str or np.dtype) – Datatype to coerce Raster object

  • file_path (str (optional, default None)) – Optional path to save calculated Raster object. If not specified then a tempfile is used.

  • driver (str (default 'GTiff')) – Named of GDAL-supported driver for file export.

  • nodata (any number (optional, default None)) – Nodata value for new dataset. If not specified then a nodata value is set based on the minimum permissible value of the Raster’s data type. Note that this changes the values of the pixels that represent nodata pixels.

Returns

Return type

pyspatialml.Raster

property block_shape

Return the windows size used for raster calculations, specified as a tuple (rows, columns).

Returns

Block window shape that is currently set for the Raster as a tuple in the format of (n_rows, n_columns) in pixels.

Return type

tuple

block_shapes(rows, cols, min_rows=5, min_cols=5, overlap=0)

Generator for windows for optimal reading and writing based on the raster format Windows are returns as a tuple with xoff, yoff, width, height.

Parameters
  • rows (int) – Height of window in rows.

  • cols (int) – Width of window in columns.

close()

Close all of the RasterLayer objects in the Raster.

Note that this will cause any rasters based on temporary files to be removed. This is intended as a method of clearing temporary files that may have accumulated during an analysis session.

crop(bounds, file_path=None, driver='GTiff', dtype=None, nodata=None, **kwargs)

Crops a Raster object by the supplied bounds.

Parameters
  • bounds (tuple) – A tuple containing the bounding box to clip by in the form of (xmin, ymin, xmax, ymax).

  • file_path (str (optional, default None)) – File path to save to cropped raster. If not supplied then the cropped raster is saved to a temporary file.

  • driver (str (default 'GTiff') Default is 'GTiff') – Named of GDAL-supported driver for file export.

  • dtype (str (optional, default None)) – Coerce RasterLayers to the specified dtype. If not specified then the new intersected Raster is created using the dtype of the existing Raster dataset, which uses a dtype that can accommodate the data types of all of the individual RasterLayers.

  • nodata (any number (optional, default None)) – Nodata value for new dataset. If not specified then a nodata value is set based on the minimum permissible value of the Raster’s data type. Note that this does not change the pixel nodata values of the raster, it only changes the metadata of what value represents a nodata pixel.

  • kwargs (opt) – Optional named arguments to pass to the format drivers. For example can be compress=”deflate” to add compression.

Returns

Raster cropped to new extent.

Return type

pyspatialml.Raster

drop(labels, in_place=True)

Drop individual RasterLayers from a Raster object

Note that this modifies the Raster object in-place by default.

Parameters
  • labels (single label or list-like) – Index (int) or layer name to drop. Can be a single integer or label, or a list of integers or labels.

  • in_place (bool (default True)) – Whether to change the Raster object in-place or leave original and return a new Raster object.

Returns

Returned only if in_place is True

Return type

pyspatialml.Raster

intersect(file_path=None, driver='GTiff', dtype=None, nodata=None, **kwargs)

Perform a intersect operation on the Raster object.

Computes the geometric intersection of the RasterLayers with the Raster object. This will cause nodata values in any of the rasters to be propagated through all of the output rasters.

Parameters
  • file_path (str (optional, default None)) – File path to save to resulting Raster. If not supplied then the resulting Raster is saved to a temporary file.

  • driver (str (default 'GTiff')) – Named of GDAL-supported driver for file export.

  • dtype (str (optional, default None)) – Coerce RasterLayers to the specified dtype. If not specified then the new intersected Raster is created using the dtype of the existing Raster dataset, which uses a dtype that can accommodate the data types of all of the individual RasterLayers.

  • nodata (any number (optional, default None)) – Nodata value for new dataset. If not specified then a nodata value is set based on the minimum permissible value of the Raster’s data type. Note that this changes the values of the pixels that represent nodata to the new value.

  • kwargs (opt) – Optional named arguments to pass to the format drivers. For example can be compress=”deflate” to add compression.

Returns

Raster with layers that are masked based on a union of all masks in the suite of RasterLayers.

Return type

pyspatialml.Raster

mask(shapes, invert=False, crop=True, pad=False, file_path=None, driver='GTiff', dtype=None, nodata=None, **kwargs)

Mask a Raster object based on the outline of shapes in a geopandas.GeoDataFrame

Parameters
  • shapes (geopandas.GeoDataFrame) – GeoDataFrame containing masking features.

  • invert (bool (default False)) – If False then pixels outside shapes will be masked. If True then pixels inside shape will be masked.

  • crop (bool (default True)) – Crop the raster to the extent of the shapes.

  • pad (bool (default False)) – If True, the features will be padded in each direction by one half of a pixel prior to cropping raster.

  • file_path (str (optional, default None)) – File path to save to resulting Raster. If not supplied then the resulting Raster is saved to a temporary file

  • driver (str (default 'GTiff')) – Named of GDAL-supported driver for file export.

  • dtype (str (optional, default None)) – Coerce RasterLayers to the specified dtype. If not specified then the cropped Raster is created using the existing dtype, which uses a dtype that can accommodate the data types of all of the individual RasterLayers.

  • nodata (any number (optional, default None)) – Nodata value for cropped dataset. If not specified then a nodata value is set based on the minimum permissible value of the Raster’s data type. Note that this changes the values of the pixels to the new nodata value, and changes the metadata of the raster.

  • kwargs (opt) – Optional named arguments to pass to the format drivers. For example can be compress=”deflate” to add compression.

property names

Return the names of the RasterLayers in the Raster object

Returns

List of names of RasterLayer objects

Return type

list

predict(estimator, file_path=None, driver='GTiff', dtype=None, nodata=None, as_df=False, n_jobs=- 1, progress=False, **kwargs)

Apply prediction of a scikit learn model to a Raster.

The model can represent any scikit learn model or compatible api with a fit and predict method. These can consist of classification or regression models. Multi-class classifications and multi-target regressions are also supported.

Parameters
  • estimator (estimator object implementing 'fit') – The object to use to fit the data.

  • file_path (str (optional, default None)) – Path to a GeoTiff raster for the prediction results. If not specified then the output is written to a temporary file.

  • driver (str (default 'GTiff')) – Named of GDAL-supported driver for file export

  • dtype (str (optional, default None)) – Optionally specify a GDAL compatible data type when saving to file. If not specified, np.float32 is assumed.

  • nodata (any number (optional, default None)) – Nodata value for file export. If not specified then the nodata value is derived from the minimum permissible value for the given data type.

  • n_jobs (int (default -1)) – Number of processing cores to use for parallel execution. Default is n_jobs=1. -1 is all cores; -2 is all cores -1.

  • progress (bool (default False)) – Show progress bar for prediction.

  • kwargs (opt) – Optional named arguments to pass to the format drivers. For example can be compress=”deflate” to add compression.

Returns

Raster object containing prediction results as a RasterLayers. For classification and regression models, the Raster will contain a single RasterLayer, unless the model is multi-class or multi-target. Layers are named automatically as pred_raw_n with n = 1, 2, 3 ..n.

Return type

pyspatial.Raster

predict_proba(estimator, file_path=None, indexes=None, driver='GTiff', dtype=None, nodata=None, progress=False, **kwargs)

Apply class probability prediction of a scikit learn model to a Raster.

Parameters
  • estimator (estimator object implementing 'fit') – The object to use to fit the data.

  • file_path (str (optional, default None)) – Path to a GeoTiff raster for the prediction results. If not specified then the output is written to a temporary file.

  • indexes (list of integers (optional, default None)) – List of class indices to export. In some circumstances, only a subset of the class probability estimations are desired, for instance when performing a binary classification only the probabilities for the positive class may be desired.

  • driver (str (default 'GTiff')) – Named of GDAL-supported driver for file export.

  • dtype (str (optional, default None)) – Optionally specify a GDAL compatible data type when saving to file. If not specified, a data type is set based on the data type of the prediction.

  • nodata (any number (optional, default None)) – Nodata value for file export. If not specified then the nodata value is derived from the minimum permissible value for the given data type.

  • progress (bool (default False)) – Show progress bar for prediction.

  • kwargs (opt) – Optional named arguments to pass to the format drivers. For example can be compress=”deflate” to add compression.

Returns

Raster containing predicted class probabilities. Each predicted class is represented by a RasterLayer object. The RasterLayers are named prob_n for 1,2,3..n, with n based on the index position of the classes, not the number of the class itself.

For example, a classification model predicting classes with integer values of 1, 3, and 5 would result in three RasterLayers named prob_1, prob_2 and prob_3.

Return type

pyspatialml.Raster

read(masked=False, window=None, out_shape=None, resampling='nearest', as_df=False, **kwargs)

Reads data from the Raster object into a numpy array.

Overrides read BaseRaster class read method and replaces it with a method that reads from multiple RasterLayer objects.

Parameters
  • masked (bool (default False)) – Read data into a masked array.

  • window (rasterio.window.Window object (optional, default None)) – Tuple of col_off, row_off, width, height of a window of data to read a chunk of data into a ndarray.

  • out_shape (tuple (optional, default None)) – Shape of shape of array (rows, cols) to read data into using decimated reads.

  • resampling (str (default 'nearest')) – Resampling method to use when applying decimated reads when out_shape is specified. Supported methods are: ‘average’, ‘bilinear’, ‘cubic’, ‘cubic_spline’, ‘gauss’, ‘lanczos’, ‘max’, ‘med’, ‘min’, ‘mode’, ‘q1’, ‘q3’.

  • as_df (bool (default False)) – Whether to return the data as a pandas.DataFrame with columns named by the RasterLayer names.

  • **kwargs (dict) – Other arguments to pass to rasterio.DatasetReader.read method

Returns

Raster values in 3d ndarray with the dimensions in order of (band, row, and column).

Return type

ndarray

rename(names, in_place=True)

Rename a RasterLayer within the Raster object.

Note that by default this modifies the Raster object in-place.

Parameters
  • names (dict) – dict of old_name : new_name

  • in_place (bool (default True)) – Whether to change names of the Raster object in-place or leave original and return a new Raster object.

Returns

Returned only if in_place is True

Return type

pyspatialml.Raster

to_crs(crs, resampling='nearest', file_path=None, driver='GTiff', nodata=None, n_jobs=1, warp_mem_lim=0, progress=False, **kwargs)

Reprojects a Raster object to a different crs.

Parameters
  • crs (rasterio.transform.CRS object, or dict) – Example: CRS({‘init’: ‘EPSG:4326’})

  • resampling (str (default 'nearest')) – Resampling method to use. One of the following: nearest, bilinear, cubic, cubic_spline, lanczos, average, mode, max (GDAL >= 2.2), min (GDAL >= 2.2), med (GDAL >= 2.2), q1 (GDAL >= 2.2), q3 (GDAL >= 2.2)

  • file_path (str (optional, default None)) – Optional path to save reprojected Raster object. If not specified then a tempfile is used.

  • driver (str (default 'GTiff')) – Named of GDAL-supported driver for file export.

  • nodata (any number (optional, default None)) – Nodata value for new dataset. If not specified then the existing nodata value of the Raster object is used, which can accommodate the dtypes of the individual layers in the Raster.

  • n_jobs (int (default 1)) – The number of warp worker threads.

  • warp_mem_lim (int (default 0)) – The warp operation memory limit in MB. Larger values allow the warp operation to be carried out in fewer chunks. The amount of memory required to warp a 3-band uint8 2000 row x 2000 col raster to a destination of the same size is approximately 56 MB. The default (0) means 64 MB with GDAL 2.2.

  • progress (bool (default False)) – Optionally show progress of transform operations.

  • kwargs (opt) – Optional named arguments to pass to the format drivers. For example can be compress=”deflate” to add compression.

Returns

Raster following reprojection.

Return type

pyspatialml.Raster

write(file_path, driver='GTiff', dtype=None, nodata=None, **kwargs)

Write the Raster object to a file.

Overrides the write RasterBase class method, which is a partial function of the rasterio.DatasetReader.write method.

Parameters
  • file_path (str) – File path used to save the Raster object.

  • driver (str (default is 'GTiff')) – Name of GDAL driver used to save Raster data.

  • dtype (str (opt, default None)) – Optionally specify a numpy compatible data type when saving to file. If not specified, a data type is selected based on the data types of RasterLayers in the Raster object.

  • nodata (any number (opt, default None)) – Optionally assign a new nodata value when saving to file. If not specified a nodata value based on the minimum permissible value for the data types of RasterLayers in the Raster object is used. Note that this does not change the pixel nodata values of the raster, it only changes the metadata of what value represents a nodata pixel.

  • kwargs (opt) – Optional named arguments to pass to the format drivers. For example can be compress=”deflate” to add compression.

Returns

New Raster object from saved file.

Return type

Raster

pyspatialml.rasterlayer module

class pyspatialml.rasterlayer.RasterLayer(band)

Bases: pyspatialml.base.BaseRaster

Represents a single raster band derived from a single or multi-band raster dataset

Simple wrapper around a rasterio.Band object with additional methods. Used because the Rasterio.Band.ds.read method reads all bands from a multi-band dataset, whereas the RasterLayer read method only reads a single band.

Methods encapsulated in RasterLayer objects represent those that typically would only be applied to a single-band of a raster, i.e. sieve-clump, distance to non-NaN pixels, or arithmetic operations on individual layers.

bidx

The band index of the RasterLayer within the file dataset.

Type

int

dtype

The data type of the RasterLayer.

Type

str

nodata

The number that is used to represent nodata pixels in the RasterLayer.

Type

any number

file

The file path to the dataset.

Type

str

ds

The underlying rasterio.band object.

Type

rasterio.band

driver

The name of the GDAL format driver.

Type

str

meta

A python dict storing the RasterLayer metadata.

Type

dict

cmap

The name of matplotlib map, or a custom matplotlib.cm.LinearSegmentedColormap or ListedColormap object.

Type

str

norm

A matplotlib.colors.Normalize to apply to the RasterLayer. This overides the norm attribute of the RasterLayer.

: int

Number of layers; always equal to 1.

Type

matplotlib.colors.Normalize (opt)

close :
close()

Close the RasterLayer for reading/writing

plot(cmap=None, norm=None, ax=None, cax=None, figsize=None, out_shape=(100, 100), categorical=None, legend=False, vmin=None, vmax=None, fig_kwds=None, legend_kwds=None)

Plot a RasterLayer using matplotlib.pyplot.imshow

Parameters
  • cmap (str (default None)) – The name of a colormap recognized by matplotlib. Overrides the cmap attribute of the RasterLayer.

  • norm (matplotlib.colors.Normalize (opt)) – A matplotlib.colors.Normalize to apply to the RasterLayer. This overides the norm attribute of the RasterLayer.

  • ax (matplotlib.pyplot.Artist (optional, default None)) – axes instance on which to draw to plot.

  • cax (matplotlib.pyplot.Artist (optional, default None)) – axes on which to draw the legend.

  • figsize (tuple of integers (optional, default None)) – Size of the matplotlib.figure.Figure. If the ax argument is given explicitly, figsize is ignored.

  • out_shape (tuple, default=(100, 100)) – Number of rows, cols to read from the raster datasets for plotting.

  • categorical (bool (optional, default False)) – if True then the raster values will be considered to represent discrete values, otherwise they are considered to represent continuous values. This overrides the RasterLayer ‘categorical’ attribute. Setting the argument categorical to True is ignored if the RasterLayer.categorical is already True.

  • legend (bool (optional, default False)) – Whether to plot the legend.

  • vmin (scale (optional, default None)) – vmin and vmax define the data range that the colormap covers. By default, the colormap covers the complete value range of the supplied data. vmin, vmax are ignored if the norm parameter is used.

  • xmax (scale (optional, default None)) – vmin and vmax define the data range that the colormap covers. By default, the colormap covers the complete value range of the supplied data. vmin, vmax are ignored if the norm parameter is used.

  • fig_kwds (dict (optional, default None)) – Additional arguments to pass to the matplotlib.pyplot.figure call when creating the figure object. Ignored if ax is passed to the plot function.

  • legend_kwds (dict (optional, default None)) – Keyword arguments to pass to matplotlib.pyplot.colorbar().

Returns

ax

Return type

matplotlib axes instance

read(**kwargs)

Read method for a single RasterLayer.

Reads the pixel values from a RasterLayer into a ndarray that always will have two dimensions in the order of (rows, columns).

Parameters
  • **kwargs (named arguments that can be passed to the the) –

  • method. (rasterio.DatasetReader.read) –

write(file_path, driver='GTiff', dtype=None, nodata=None, **kwargs)

Write method for a single RasterLayer.

Parameters
  • file_path (str (opt)) – File path to save the dataset.

  • driver (str) – GDAL-compatible driver used for the file format.

  • dtype (str (opt)) – Numpy dtype used for the file. If omitted then the RasterLayer’s dtype is used.

  • nodata (any number (opt)) – A value used to represent the nodata pixels. If omitted then the RasterLayer’s nodata value is used (if assigned already).

  • kwargs (opt) – Optional named arguments to pass to the format drivers. For example can be compress=”deflate” to add compression.

Returns

Return type

pyspatialml.RasterLayer

pyspatialml.transformers module

class pyspatialml.transformers.GeoDistTransformer(ref_xs=None, ref_ys=None, log=False)

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

Transformer to add new features based on geographical distances to reference locations.

Parameters
  • ref_xs (list) – A list of x-coordinates of reference locations.

  • ref_ys (list) – A list of x-coordinates of reference locations.

  • log (bool (opt), default=False) – Optionally log-transform the distance measures.

Returns

X_new – array of shape (n_samples, n_features) with new geodistance features appended to the right-most columns of the array.

Return type

ndarray

fit(X, y=None)
transform(X, y=None)

pyspatialml.vector module

pyspatialml.vector.filter_points(gdf, min_dist=0, remove='first')

Filter points in geodataframe using a minimum distance buffer.

Parameters
  • gdf (Geopandas GeoDataFrame) – Containing point geometries.

  • min_dist (int or float, optional (default=0)) – Minimum distance by which to filter out closely spaced points.

  • remove (str, optional (default='first')) – Optionally choose to remove ‘first’ occurrences or ‘last’ occurrences.

Returns

xy – Numpy array filtered coordinates

Return type

2d array-like

pyspatialml.vector.get_random_point_in_polygon(poly)

Generates random shapely Point geometry objects within a single shapely Polygon object.

Parameters

poly (Shapely Polygon object) –

Returns

p

Return type

Shapely Point object