pyspatialml: machine learning for raster datasets¶
Subpackages¶
Submodules¶
pyspatialml.estimators module¶
-
class
pyspatialml.estimators.
SpatialLagBase
(base_estimator, n_neighbors=7, weights='distance', radius=1.0, algorithm='auto', leaf_size=30, metric='minkowski', p=2, metric_params=None, kernel_params=None, feature_indices=None, n_jobs=1)¶ Bases:
abc.ABC
,sklearn.base.BaseEstimator
Base class for spatial lag estimators.
A spatial lag estimator uses a weighted mean/mode of the values of the K-neighboring observations to augment the base_estimator. The weighted mean/mode of the surrounding observations are appended as a new feature to the right-most column in the training data.
The K-neighboring observations are determined using the distance metric specified in the metric argument. The default metric is minkowski, and with p=2 is equivalent to the standard Euclidean metric.
- Parameters
base_estimator (estimator object.) – This is assumed to implement the scikit-learn estimator interface. Either estimator needs to provide a score function, or scoring must be passed.
n_neighbors (int, default = 7) – Number of neighbors to use by default for kneighbors queries.
weights ({‘uniform’, ‘distance’} or callable, default=’distance’) –
Weight function used in prediction. Possible values:
‘uniform’ : uniform weights. All points in each neighborhood are weighted equally.
‘distance’ : weight points by the inverse of their distance. in this case, closer neighbors of a query point will have a greater influence than neighbors which are further away.
[callable] : a user-defined function which accepts an array of distances, and returns an array of the same shape containing the weights.
radius (float, default=1.0) – Range of parameter space to use by default for radius_neighbors queries.
algorithm ({‘auto’, ‘ball_tree’, ‘kd_tree’, ‘brute’}, default=’auto’) –
Algorithm used to compute the nearest neighbors:
‘ball_tree’ will use BallTree
‘kd_tree’ will use KDTree
‘brute’ will use a brute-force search.
‘auto’ will attempt to decide the most appropriate algorithm based on the values passed to fit method.
Note: fitting on sparse input will override the setting of this parameter, using brute force.
leaf_size (int, default=30) – Leaf size passed to BallTree or KDTree. This can affect the speed of the construction and query, as well as the memory required to store the tree. The optimal value depends on the nature of the problem.
metric (str or callable, default=’minkowski’) – The distance metric to use for the tree. The default metric is minkowski, and with p=2 is equivalent to the standard Euclidean metric. See the documentation of DistanceMetric for a list of available metrics. If metric is “precomputed”, X is assumed to be a distance matrix and must be square during fit. X may be a sparse graph, in which case only “nonzero” elements may be considered neighbors.
p (int, default=2) – Parameter for the Minkowski metric from sklearn.metrics.pairwise.pairwise_distances. When p = 1, this is equivalent to using manhattan_distance (l1), and euclidean_distance (l2) for p = 2. For arbitrary p, minkowski_distance (l_p) is used.
metric_params (dict, default=None) – Additional keyword arguments for the metric function.
kernel_params (dict, default=None) – Additional keyword arguments to pass to a custom kernel function.
feature_indices (list, default=None) – By default, the nearest neighbors are determined from the distance metric calculated using all of the features. If feature_indices are supplied then the distance calculation is restricted to the specific column indices. For spatial data, these might represent the x,y coordinates for example.
n_jobs (int, default=None) – The number of parallel jobs to run for neighbors search. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors. See Glossary for more details.
-
fit
(X, y)¶ Fit the base_estimator with features from X {n_samples, n_features} and with an additional spatially lagged variable added to the right-most column of the training data.
During fitting, the k-neighbors to each training point are used to estimate the spatial lag component. The training point is not included in the calculation, i.e. the training point is not considered its own neighbor.
- Parameters
X (array-like of sample {n_samples, n_features} using for model fitting) – The training input samples
y (array-like of shape (n_samples,)) – The target values (class labels in classification, real numbers in regression).
-
predict
(X, y=None)¶ Predict method for spatial lag models.
Augments new osbservations with a spatial lag variable created from a weighted mean/mode (regression/classification) of k-neighboring observations.
- Parameters
X (array-like of sample {n_samples, n_features}) – New samples for the prediction.
y (None) – Not used.
-
class
pyspatialml.estimators.
SpatialLagClassifier
(base_estimator, n_neighbors=7, weights='distance', radius=1.0, algorithm='auto', leaf_size=30, metric='minkowski', p=2, metric_params=None, kernel_params=None, feature_indices=None, n_jobs=1)¶ Bases:
sklearn.base.ClassifierMixin
,pyspatialml.estimators.SpatialLagBase
-
class
pyspatialml.estimators.
SpatialLagRegressor
(base_estimator, n_neighbors=7, weights='distance', radius=1.0, algorithm='auto', leaf_size=30, metric='minkowski', p=2, metric_params=None, kernel_params=None, feature_indices=None, n_jobs=1)¶ Bases:
sklearn.base.RegressorMixin
,pyspatialml.estimators.SpatialLagBase
-
class
pyspatialml.estimators.
ThresholdClassifierCV
(estimator, thresholds=array([0.1, 0.11, 0.12, 0.13, 0.14, 0.15, 0.16, 0.17, 0.18, 0.19, 0.2, 0.21, 0.22, 0.23, 0.24, 0.25, 0.26, 0.27, 0.28, 0.29, 0.3, 0.31, 0.32, 0.33, 0.34, 0.35, 0.36, 0.37, 0.38, 0.39, 0.4, 0.41, 0.42, 0.43, 0.44, 0.45, 0.46, 0.47, 0.48, 0.49, 0.5, 0.51, 0.52, 0.53, 0.54, 0.55, 0.56, 0.57, 0.58, 0.59, 0.6, 0.61, 0.62, 0.63, 0.64, 0.65, 0.66, 0.67, 0.68, 0.69, 0.7, 0.71, 0.72, 0.73, 0.74, 0.75, 0.76, 0.77, 0.78, 0.79, 0.8, 0.81, 0.82, 0.83, 0.84, 0.85, 0.86, 0.87, 0.88, 0.89]), scoring=None, refit=False, cv=3, random_state=None)¶ Bases:
sklearn.base.BaseEstimator
,sklearn.base.ClassifierMixin
Metaclassifier to perform cutoff threshold optimization
This implementation is restricted to binary classification problems
Notes
The training data are partitioned in k-1, and k sets
The metaclassifier trains the BaseEstimator on the k-1 partitions
The Kth paritions are used to determine the optimal cutoff taking the mean of the thresholds that maximize the scoring metric
The optimal cutoff is applied to all classifier predictions
-
fit
(X, y=None, groups=None, **fit_params)¶ Run fit method with all sets of parameters
- Parameters
X (array-like, shape = [n_samples, n_features]) – Training vector, where n_samples is the number of samples and n_features is the number of features
y (array-like, shape = [n_samples] or [n_samples, n_output], optional) – Target relative to X for classification or regression; None for unsupervised learning
groups (array-like, shape = [n_samples], optional) – Training vector groups for cross-validation
**fit_params (dict of string -> object) – Parameters passed to the
fit
method of the estimator
Notes
Rules
Parameters are checked during the fit method
New attributes created during fitting should end in _, i.e. fitted_
Fit method needs to return self for compatibility reasons with sklearn
The response vector, i.e. y, should be initiated with None
-
predict
(X, y=None)¶
-
predict_proba
(X)¶
-
score
(X, y)¶ Overloading of classifier score method score method is required for compatibility with GridSearch The scoring metric should be one that can be maximized (bigger=better)
pyspatialml.plotting module¶
-
class
pyspatialml.plotting.
RasterPlot
¶ Bases:
object
-
plot
(cmap=None, norm=None, figsize=None, out_shape=(100, 100), title_fontsize=8, label_fontsize=6, legend_fontsize=6, names=None, fig_kwds=None, legend_kwds=None, subplots_kwds=None)¶ Plot a Raster object as a raster matrix
- Parameters
cmap (str (opt), default=None) – Specify a single cmap to apply to all of the RasterLayers. This overides the cmap attribute of each RasterLayer.
norm (matplotlib.colors.Normalize (opt), default=None) – A matplotlib.colors.Normalize to apply to all of the RasterLayers. This overides the norm attribute of each RasterLayer.
figsize (tuple (opt), default=None) – Size of the resulting matplotlib.figure.Figure.
out_shape (tuple, default=(100, 100)) – Number of rows, cols to read from the raster datasets for plotting.
title_fontsize (any number, default=8) – Size in pts of titles.
label_fontsize (any number, default=6) – Size in pts of axis ticklabels.
legend_fontsize (any number, default=6) – Size in pts of legend ticklabels.
names (list (opt), default=None) – Optionally supply a list of names for each RasterLayer to override the default layer names for the titles.
fig_kwds (dict (opt), default=None) – Additional arguments to pass to the matplotlib.pyplot.figure call when creating the figure object.
legend_kwds (dict (opt), default=None) – Additional arguments to pass to the matplotlib.pyplot.colorbar call when creating the colorbar object.
subplots_kwds (dict (opt), default=None) – Additional arguments to pass to the matplotlib.pyplot.subplots_adjust function. These are used to control the spacing and position of each subplot, and can include {left=None, bottom=None, right=None, top=None, wspace=None, hspace=None}.
- Returns
axs – array of matplotlib.axes._subplots.AxesSubplot or a single matplotlib.axes._subplots.AxesSubplot if Raster object contains only a single layer.
- Return type
numpy.ndarray
-
-
pyspatialml.plotting.
discrete_cmap
(N, base_cmap=None)¶ Create an N-bin discrete colormap from the specified input map.
Source: https://gist.github.com/jakevdp/91077b0cae40f8f8244a
- Parameters
N (int) – The number of colors in the colormap
base_cmap (str) – The name of the matplotlib cmap to convert into a discrete map.
- Returns
The cmap converted to a discrete map.
- Return type
matplotlib.cmap
-
pyspatialml.plotting.
rasterio_normalize
(arr, axis=None)¶ Scales an array using min-max scaling.
- Parameters
arr (ndarray) – A numpy array containing the image data.
axis (int (opt)) – The axis to perform the normalization along.
- Returns
The normalized array
- Return type
numpy.ndarray
-
pyspatialml.plotting.
shiftedColorMap
(cmap, start=0, midpoint=0.5, stop=1.0, name='shiftedcmap')¶ Function to offset the “center” of a colormap. Useful for data with a negative min and positive max and you want the middle of the colormap’s dynamic range to be at zero.
Source: http://stackoverflow.com/questions/7404116/defining-the-midpoint-of-a-colormap-in-matplotlib
- Parameters
cmap (str) – The matplotlib colormap to be altered
start (any number) – Offset from lowest point in the colormap’s range. Defaults to 0.0 (no lower offset). Should be between 0.0 and midpoint.
midpoint (any number between 0.0 and 1.0) – The new center of the colormap. Defaults to 0.5 (no shift). In general, this should be 1 - vmax/(vmax + abs(vmin)). For example if your data range from -15.0 to +5.0 and you want the center of the colormap at 0.0, midpoint should be set to 1 - 5/(5 + 15)) or 0.75.
stop (any number between midpoint and 1.0) – Offset from highets point in the colormap’s range. Defaults to 1.0 (no upper offset).
- Returns
The colormap with its centre shifted to the midpoint value.
- Return type
matplotlib.cmap
pyspatialml.raster module¶
-
class
pyspatialml.raster.
Raster
(src=None, arr=None, crs=None, transform=None, nodata=None, mode='r', file_path=None)¶ Bases:
pyspatialml.plotting.RasterPlot
,pyspatialml.base.BaseRaster
Flexible class that represents a collection of file-based GDAL-supported raster datasets which share a common coordinate reference system and geometry.
Raster objects encapsulate RasterLayer objects, which represent single band raster datasets that can physically be represented by either separate single-band raster files, multi-band raster files, or any combination of individual bands from multi-band raster and single-band raster datasets.
Methods defined in a Raster class comprise those that would typically applied to a stack of raster datasets. In addition, these methods always return a new Raster object.
-
aggregate
(out_shape, resampling='nearest', file_path=None, driver='GTiff', dtype=None, nodata=None, **kwargs)¶ Aggregates a raster to (usually) a coarser grid cell size.
- Parameters
out_shape (tuple) – New shape in (rows, cols).
resampling (str (default 'nearest')) – Resampling method to use when applying decimated reads when out_shape is specified. Supported methods are: ‘average’, ‘bilinear’, ‘cubic’, ‘cubic_spline’, ‘gauss’, ‘lanczos’, ‘max’, ‘med’, ‘min’, ‘mode’, ‘q1’, ‘q3’.
file_path (str (optional, default None)) – File path to save to cropped raster. If not supplied then the aggregated raster is saved to a temporary file.
driver (str (default 'GTiff')) – Named of GDAL-supported driver for file export.
dtype (str (optional, default None)) – Coerce RasterLayers to the specified dtype. If not specified then the new intersected Raster is created using the dtype of the existing Raster dataset, which uses a dtype that can accommodate the data types of all of the individual RasterLayers.
nodata (any number (optional, default None)) – Nodata value for new dataset. If not specified then a nodata value is set based on the minimum permissible value of the Raster’s dtype. Note that this does not change the pixel nodata values of the raster, it only changes the metadata of what value represents a nodata pixel.
kwargs (opt) – Optional named arguments to pass to the format drivers. For example can be compress=”deflate” to add compression.
- Returns
Raster object aggregated to a new pixel size.
- Return type
pyspatialml.Raster
-
append
(other, in_place=True)¶ Method to add new RasterLayers to a Raster object.
Note that this modifies the Raster object in-place by default.
- Parameters
other (Raster object, or list of Raster objects) – Object to append to the Raster.
in_place (bool (default True)) – Whether to change the Raster object in-place or leave original and return a new Raster object.
- Returns
Returned only if in_place is True
- Return type
pyspatialml.Raster
-
apply
(function, file_path=None, driver='GTiff', dtype=None, nodata=None, progress=False, n_jobs=- 1, **kwargs)¶ Apply user-supplied function to a Raster object.
- Parameters
function (function) – Function that takes an numpy array as a single argument.
file_path (str (optional, default None)) – Optional path to save calculated Raster object. If not specified then a tempfile is used.
driver (str (default 'GTiff')) – Named of GDAL-supported driver for file export.
dtype (str (optional, default None)) – Coerce RasterLayers to the specified dtype. If not specified then the new Raster is created using the dtype of the calculation result.
nodata (any number (optional, default None)) – Nodata value for new dataset. If not specified then a nodata value is set based on the minimum permissible value of the Raster’s data type. Note that this changes the values of the pixels that represent nodata pixels.
n_jobs (int (default -1)) – Number of processing cores to use for parallel execution. Default of -1 is all cores.
progress (bool (default False)) – Optionally show progress of transform operations.
kwargs (opt) – Optional named arguments to pass to the format drivers. For example can be compress=”deflate” to add compression.
- Returns
Raster containing the calculated result.
- Return type
pyspatialml.Raster
-
astype
(dtype, file_path=None, driver='GTiff', nodata=None, **kwargs)¶ Coerce Raster to a different dtype.
- Parameters
dtype (str or np.dtype) – Datatype to coerce Raster object
file_path (str (optional, default None)) – Optional path to save calculated Raster object. If not specified then a tempfile is used.
driver (str (default 'GTiff')) – Named of GDAL-supported driver for file export.
nodata (any number (optional, default None)) – Nodata value for new dataset. If not specified then a nodata value is set based on the minimum permissible value of the Raster’s data type. Note that this changes the values of the pixels that represent nodata pixels.
- Returns
- Return type
pyspatialml.Raster
-
property
block_shape
¶ Return the windows size used for raster calculations, specified as a tuple (rows, columns).
- Returns
Block window shape that is currently set for the Raster as a tuple in the format of (n_rows, n_columns) in pixels.
- Return type
tuple
-
block_shapes
(rows, cols, min_rows=5, min_cols=5, overlap=0)¶ Generator for windows for optimal reading and writing based on the raster format Windows are returns as a tuple with xoff, yoff, width, height.
- Parameters
rows (int) – Height of window in rows.
cols (int) – Width of window in columns.
-
close
()¶ Close all of the RasterLayer objects in the Raster.
Note that this will cause any rasters based on temporary files to be removed. This is intended as a method of clearing temporary files that may have accumulated during an analysis session.
-
crop
(bounds, file_path=None, driver='GTiff', dtype=None, nodata=None, **kwargs)¶ Crops a Raster object by the supplied bounds.
- Parameters
bounds (tuple) – A tuple containing the bounding box to clip by in the form of (xmin, ymin, xmax, ymax).
file_path (str (optional, default None)) – File path to save to cropped raster. If not supplied then the cropped raster is saved to a temporary file.
driver (str (default 'GTiff') Default is 'GTiff') – Named of GDAL-supported driver for file export.
dtype (str (optional, default None)) – Coerce RasterLayers to the specified dtype. If not specified then the new intersected Raster is created using the dtype of the existing Raster dataset, which uses a dtype that can accommodate the data types of all of the individual RasterLayers.
nodata (any number (optional, default None)) – Nodata value for new dataset. If not specified then a nodata value is set based on the minimum permissible value of the Raster’s data type. Note that this does not change the pixel nodata values of the raster, it only changes the metadata of what value represents a nodata pixel.
kwargs (opt) – Optional named arguments to pass to the format drivers. For example can be compress=”deflate” to add compression.
- Returns
Raster cropped to new extent.
- Return type
pyspatialml.Raster
-
drop
(labels, in_place=True)¶ Drop individual RasterLayers from a Raster object
Note that this modifies the Raster object in-place by default.
- Parameters
labels (single label or list-like) – Index (int) or layer name to drop. Can be a single integer or label, or a list of integers or labels.
in_place (bool (default True)) – Whether to change the Raster object in-place or leave original and return a new Raster object.
- Returns
Returned only if in_place is True
- Return type
pyspatialml.Raster
-
intersect
(file_path=None, driver='GTiff', dtype=None, nodata=None, **kwargs)¶ Perform a intersect operation on the Raster object.
Computes the geometric intersection of the RasterLayers with the Raster object. This will cause nodata values in any of the rasters to be propagated through all of the output rasters.
- Parameters
file_path (str (optional, default None)) – File path to save to resulting Raster. If not supplied then the resulting Raster is saved to a temporary file.
driver (str (default 'GTiff')) – Named of GDAL-supported driver for file export.
dtype (str (optional, default None)) – Coerce RasterLayers to the specified dtype. If not specified then the new intersected Raster is created using the dtype of the existing Raster dataset, which uses a dtype that can accommodate the data types of all of the individual RasterLayers.
nodata (any number (optional, default None)) – Nodata value for new dataset. If not specified then a nodata value is set based on the minimum permissible value of the Raster’s data type. Note that this changes the values of the pixels that represent nodata to the new value.
kwargs (opt) – Optional named arguments to pass to the format drivers. For example can be compress=”deflate” to add compression.
- Returns
Raster with layers that are masked based on a union of all masks in the suite of RasterLayers.
- Return type
pyspatialml.Raster
-
mask
(shapes, invert=False, crop=True, pad=False, file_path=None, driver='GTiff', dtype=None, nodata=None, **kwargs)¶ Mask a Raster object based on the outline of shapes in a geopandas.GeoDataFrame
- Parameters
shapes (geopandas.GeoDataFrame) – GeoDataFrame containing masking features.
invert (bool (default False)) – If False then pixels outside shapes will be masked. If True then pixels inside shape will be masked.
crop (bool (default True)) – Crop the raster to the extent of the shapes.
pad (bool (default False)) – If True, the features will be padded in each direction by one half of a pixel prior to cropping raster.
file_path (str (optional, default None)) – File path to save to resulting Raster. If not supplied then the resulting Raster is saved to a temporary file
driver (str (default 'GTiff')) – Named of GDAL-supported driver for file export.
dtype (str (optional, default None)) – Coerce RasterLayers to the specified dtype. If not specified then the cropped Raster is created using the existing dtype, which uses a dtype that can accommodate the data types of all of the individual RasterLayers.
nodata (any number (optional, default None)) – Nodata value for cropped dataset. If not specified then a nodata value is set based on the minimum permissible value of the Raster’s data type. Note that this changes the values of the pixels to the new nodata value, and changes the metadata of the raster.
kwargs (opt) – Optional named arguments to pass to the format drivers. For example can be compress=”deflate” to add compression.
-
property
names
¶ Return the names of the RasterLayers in the Raster object
- Returns
List of names of RasterLayer objects
- Return type
list
-
predict
(estimator, file_path=None, driver='GTiff', dtype=None, nodata=None, as_df=False, n_jobs=- 1, progress=False, **kwargs)¶ Apply prediction of a scikit learn model to a Raster.
The model can represent any scikit learn model or compatible api with a fit and predict method. These can consist of classification or regression models. Multi-class classifications and multi-target regressions are also supported.
- Parameters
estimator (estimator object implementing 'fit') – The object to use to fit the data.
file_path (str (optional, default None)) – Path to a GeoTiff raster for the prediction results. If not specified then the output is written to a temporary file.
driver (str (default 'GTiff')) – Named of GDAL-supported driver for file export
dtype (str (optional, default None)) – Optionally specify a GDAL compatible data type when saving to file. If not specified, np.float32 is assumed.
nodata (any number (optional, default None)) – Nodata value for file export. If not specified then the nodata value is derived from the minimum permissible value for the given data type.
n_jobs (int (default -1)) – Number of processing cores to use for parallel execution. Default is n_jobs=1. -1 is all cores; -2 is all cores -1.
progress (bool (default False)) – Show progress bar for prediction.
kwargs (opt) – Optional named arguments to pass to the format drivers. For example can be compress=”deflate” to add compression.
- Returns
Raster object containing prediction results as a RasterLayers. For classification and regression models, the Raster will contain a single RasterLayer, unless the model is multi-class or multi-target. Layers are named automatically as pred_raw_n with n = 1, 2, 3 ..n.
- Return type
pyspatial.Raster
-
predict_proba
(estimator, file_path=None, indexes=None, driver='GTiff', dtype=None, nodata=None, progress=False, **kwargs)¶ Apply class probability prediction of a scikit learn model to a Raster.
- Parameters
estimator (estimator object implementing 'fit') – The object to use to fit the data.
file_path (str (optional, default None)) – Path to a GeoTiff raster for the prediction results. If not specified then the output is written to a temporary file.
indexes (list of integers (optional, default None)) – List of class indices to export. In some circumstances, only a subset of the class probability estimations are desired, for instance when performing a binary classification only the probabilities for the positive class may be desired.
driver (str (default 'GTiff')) – Named of GDAL-supported driver for file export.
dtype (str (optional, default None)) – Optionally specify a GDAL compatible data type when saving to file. If not specified, a data type is set based on the data type of the prediction.
nodata (any number (optional, default None)) – Nodata value for file export. If not specified then the nodata value is derived from the minimum permissible value for the given data type.
progress (bool (default False)) – Show progress bar for prediction.
kwargs (opt) – Optional named arguments to pass to the format drivers. For example can be compress=”deflate” to add compression.
- Returns
Raster containing predicted class probabilities. Each predicted class is represented by a RasterLayer object. The RasterLayers are named prob_n for 1,2,3..n, with n based on the index position of the classes, not the number of the class itself.
For example, a classification model predicting classes with integer values of 1, 3, and 5 would result in three RasterLayers named prob_1, prob_2 and prob_3.
- Return type
pyspatialml.Raster
-
read
(masked=False, window=None, out_shape=None, resampling='nearest', as_df=False, **kwargs)¶ Reads data from the Raster object into a numpy array.
Overrides read BaseRaster class read method and replaces it with a method that reads from multiple RasterLayer objects.
- Parameters
masked (bool (default False)) – Read data into a masked array.
window (rasterio.window.Window object (optional, default None)) – Tuple of col_off, row_off, width, height of a window of data to read a chunk of data into a ndarray.
out_shape (tuple (optional, default None)) – Shape of shape of array (rows, cols) to read data into using decimated reads.
resampling (str (default 'nearest')) – Resampling method to use when applying decimated reads when out_shape is specified. Supported methods are: ‘average’, ‘bilinear’, ‘cubic’, ‘cubic_spline’, ‘gauss’, ‘lanczos’, ‘max’, ‘med’, ‘min’, ‘mode’, ‘q1’, ‘q3’.
as_df (bool (default False)) – Whether to return the data as a pandas.DataFrame with columns named by the RasterLayer names.
**kwargs (dict) – Other arguments to pass to rasterio.DatasetReader.read method
- Returns
Raster values in 3d ndarray with the dimensions in order of (band, row, and column).
- Return type
ndarray
-
rename
(names, in_place=True)¶ Rename a RasterLayer within the Raster object.
Note that by default this modifies the Raster object in-place.
- Parameters
names (dict) – dict of old_name : new_name
in_place (bool (default True)) – Whether to change names of the Raster object in-place or leave original and return a new Raster object.
- Returns
Returned only if in_place is True
- Return type
pyspatialml.Raster
-
to_crs
(crs, resampling='nearest', file_path=None, driver='GTiff', nodata=None, n_jobs=1, warp_mem_lim=0, progress=False, **kwargs)¶ Reprojects a Raster object to a different crs.
- Parameters
crs (rasterio.transform.CRS object, or dict) – Example: CRS({‘init’: ‘EPSG:4326’})
resampling (str (default 'nearest')) – Resampling method to use. One of the following: nearest, bilinear, cubic, cubic_spline, lanczos, average, mode, max (GDAL >= 2.2), min (GDAL >= 2.2), med (GDAL >= 2.2), q1 (GDAL >= 2.2), q3 (GDAL >= 2.2)
file_path (str (optional, default None)) – Optional path to save reprojected Raster object. If not specified then a tempfile is used.
driver (str (default 'GTiff')) – Named of GDAL-supported driver for file export.
nodata (any number (optional, default None)) – Nodata value for new dataset. If not specified then the existing nodata value of the Raster object is used, which can accommodate the dtypes of the individual layers in the Raster.
n_jobs (int (default 1)) – The number of warp worker threads.
warp_mem_lim (int (default 0)) – The warp operation memory limit in MB. Larger values allow the warp operation to be carried out in fewer chunks. The amount of memory required to warp a 3-band uint8 2000 row x 2000 col raster to a destination of the same size is approximately 56 MB. The default (0) means 64 MB with GDAL 2.2.
progress (bool (default False)) – Optionally show progress of transform operations.
kwargs (opt) – Optional named arguments to pass to the format drivers. For example can be compress=”deflate” to add compression.
- Returns
Raster following reprojection.
- Return type
pyspatialml.Raster
-
write
(file_path, driver='GTiff', dtype=None, nodata=None, **kwargs)¶ Write the Raster object to a file.
Overrides the write RasterBase class method, which is a partial function of the rasterio.DatasetReader.write method.
- Parameters
file_path (str) – File path used to save the Raster object.
driver (str (default is 'GTiff')) – Name of GDAL driver used to save Raster data.
dtype (str (opt, default None)) – Optionally specify a numpy compatible data type when saving to file. If not specified, a data type is selected based on the data types of RasterLayers in the Raster object.
nodata (any number (opt, default None)) – Optionally assign a new nodata value when saving to file. If not specified a nodata value based on the minimum permissible value for the data types of RasterLayers in the Raster object is used. Note that this does not change the pixel nodata values of the raster, it only changes the metadata of what value represents a nodata pixel.
kwargs (opt) – Optional named arguments to pass to the format drivers. For example can be compress=”deflate” to add compression.
- Returns
New Raster object from saved file.
- Return type
-
pyspatialml.rasterlayer module¶
-
class
pyspatialml.rasterlayer.
RasterLayer
(band)¶ Bases:
pyspatialml.base.BaseRaster
Represents a single raster band derived from a single or multi-band raster dataset
Simple wrapper around a rasterio.Band object with additional methods. Used because the Rasterio.Band.ds.read method reads all bands from a multi-band dataset, whereas the RasterLayer read method only reads a single band.
Methods encapsulated in RasterLayer objects represent those that typically would only be applied to a single-band of a raster, i.e. sieve-clump, distance to non-NaN pixels, or arithmetic operations on individual layers.
-
bidx
¶ The band index of the RasterLayer within the file dataset.
- Type
int
-
dtype
¶ The data type of the RasterLayer.
- Type
str
-
nodata
¶ The number that is used to represent nodata pixels in the RasterLayer.
- Type
any number
-
file
¶ The file path to the dataset.
- Type
str
-
ds
¶ The underlying rasterio.band object.
- Type
rasterio.band
-
driver
¶ The name of the GDAL format driver.
- Type
str
-
meta
¶ A python dict storing the RasterLayer metadata.
- Type
dict
-
cmap
¶ The name of matplotlib map, or a custom matplotlib.cm.LinearSegmentedColormap or ListedColormap object.
- Type
str
-
norm
¶ A matplotlib.colors.Normalize to apply to the RasterLayer. This overides the norm attribute of the RasterLayer.
- : int
Number of layers; always equal to 1.
- Type
matplotlib.colors.Normalize (opt)
-
close :
-
close
()¶ Close the RasterLayer for reading/writing
-
plot
(cmap=None, norm=None, ax=None, cax=None, figsize=None, out_shape=(100, 100), categorical=None, legend=False, vmin=None, vmax=None, fig_kwds=None, legend_kwds=None)¶ Plot a RasterLayer using matplotlib.pyplot.imshow
- Parameters
cmap (str (default None)) – The name of a colormap recognized by matplotlib. Overrides the cmap attribute of the RasterLayer.
norm (matplotlib.colors.Normalize (opt)) – A matplotlib.colors.Normalize to apply to the RasterLayer. This overides the norm attribute of the RasterLayer.
ax (matplotlib.pyplot.Artist (optional, default None)) – axes instance on which to draw to plot.
cax (matplotlib.pyplot.Artist (optional, default None)) – axes on which to draw the legend.
figsize (tuple of integers (optional, default None)) – Size of the matplotlib.figure.Figure. If the ax argument is given explicitly, figsize is ignored.
out_shape (tuple, default=(100, 100)) – Number of rows, cols to read from the raster datasets for plotting.
categorical (bool (optional, default False)) – if True then the raster values will be considered to represent discrete values, otherwise they are considered to represent continuous values. This overrides the RasterLayer ‘categorical’ attribute. Setting the argument categorical to True is ignored if the RasterLayer.categorical is already True.
legend (bool (optional, default False)) – Whether to plot the legend.
vmin (scale (optional, default None)) – vmin and vmax define the data range that the colormap covers. By default, the colormap covers the complete value range of the supplied data. vmin, vmax are ignored if the norm parameter is used.
xmax (scale (optional, default None)) – vmin and vmax define the data range that the colormap covers. By default, the colormap covers the complete value range of the supplied data. vmin, vmax are ignored if the norm parameter is used.
fig_kwds (dict (optional, default None)) – Additional arguments to pass to the matplotlib.pyplot.figure call when creating the figure object. Ignored if ax is passed to the plot function.
legend_kwds (dict (optional, default None)) – Keyword arguments to pass to matplotlib.pyplot.colorbar().
- Returns
ax
- Return type
matplotlib axes instance
-
read
(**kwargs)¶ Read method for a single RasterLayer.
Reads the pixel values from a RasterLayer into a ndarray that always will have two dimensions in the order of (rows, columns).
- Parameters
**kwargs (named arguments that can be passed to the the) –
method. (rasterio.DatasetReader.read) –
-
write
(file_path, driver='GTiff', dtype=None, nodata=None, **kwargs)¶ Write method for a single RasterLayer.
- Parameters
file_path (str (opt)) – File path to save the dataset.
driver (str) – GDAL-compatible driver used for the file format.
dtype (str (opt)) – Numpy dtype used for the file. If omitted then the RasterLayer’s dtype is used.
nodata (any number (opt)) – A value used to represent the nodata pixels. If omitted then the RasterLayer’s nodata value is used (if assigned already).
kwargs (opt) – Optional named arguments to pass to the format drivers. For example can be compress=”deflate” to add compression.
- Returns
- Return type
pyspatialml.RasterLayer
-
pyspatialml.transformers module¶
-
class
pyspatialml.transformers.
GeoDistTransformer
(ref_xs=None, ref_ys=None, log=False)¶ Bases:
sklearn.base.BaseEstimator
,sklearn.base.TransformerMixin
Transformer to add new features based on geographical distances to reference locations.
- Parameters
ref_xs (list) – A list of x-coordinates of reference locations.
ref_ys (list) – A list of x-coordinates of reference locations.
log (bool (opt), default=False) – Optionally log-transform the distance measures.
- Returns
X_new – array of shape (n_samples, n_features) with new geodistance features appended to the right-most columns of the array.
- Return type
ndarray
-
fit
(X, y=None)¶
-
transform
(X, y=None)¶
pyspatialml.vector module¶
-
pyspatialml.vector.
filter_points
(gdf, min_dist=0, remove='first')¶ Filter points in geodataframe using a minimum distance buffer.
- Parameters
gdf (Geopandas GeoDataFrame) – Containing point geometries.
min_dist (int or float, optional (default=0)) – Minimum distance by which to filter out closely spaced points.
remove (str, optional (default='first')) – Optionally choose to remove ‘first’ occurrences or ‘last’ occurrences.
- Returns
xy – Numpy array filtered coordinates
- Return type
2d array-like
-
pyspatialml.vector.
get_random_point_in_polygon
(poly)¶ Generates random shapely Point geometry objects within a single shapely Polygon object.
- Parameters
poly (Shapely Polygon object) –
- Returns
p
- Return type
Shapely Point object