Machine Learning Regression Algorithms Toolbox

Biophysical parameter mapping from optical remote sensing images always require an intermediate modeling step to transform spectral observations into useful estimates. This modeling step can be approached with either statistical, physical or hybrid methods. Here emphasis is put on statistical methods. Statistical methods can be categorized into either parametric or non-parametric approaches.

The here presented machine learning regression algorithms (MLRAs) assessment toolbox provides a suite of non-parametric techniques to enable semiautomatic mapping of surface biophysical variables.

Non-parametric models are adjusted to predict a variable of interest using a training dataset of input-output data pairs, which come from concurrent measurements of the parameter and the corresponding radiometric observation.

MLRAs have the potential to generate adaptive, robust relationships and, once trained, they are very fast to apply. Typically, MLRAs are able to cope with the strong nonlinearity of the functional dependence between the biophysical variable and the observed reflected radiance. They may therefore be powerful candidates for mapping applications.

Conceptual design of MLRA-based biophysical variable mapping.

The MLRA toolbox requires training data to train an advanced regression model (e.g. MLRA). Training data may originate from simulations, e.g. as generated by the optical radiative transfer models in ARTMO, or from field campaigns. This trained model can then be validated and applied to a remote sensing image to enable mapping.

ARTMO’s MLRA toolbox v.1.28.

The MLRA toolbox is based on the simple Regression toolbox, (simpleR) developed by Gustavo Camps-Valls.

SimpleR contains a set of functions in Matlab to illustrate the capabilities of the following statistical regression algorithms:

Least squares linear regression
Partial least squares regression
Regularized least-squares regression
Principal components regression
Elastic Net regression
Adaptive Regression Splines
K-nearest neighbors regression
Weighted k-nearest neighbors regression
Regression tree
Regression tree (LS boosting)
Boosting trees
Bagging trees
Gradient Boosting/Boosted Trees
Random Forest (TreeBagger)
Canonical Correlation Forests
Extreme Learning Machine
Bayesian Regularized Neural Networks
Neural Network
Neural Network (Adam)
Radial Basis Function Neural Networks
Relevance vector Machine
Support Vector Regression - Matlab
Kernel ridge Regression
Kernel signal to noise ratio
Gaussian Kernel Regression
Gaussian Processes Regression
Gaussian Processes Regression - Matlab
Gaussian Processes Regression
Sparse Spectrum Gaussian Process Regression
Warped Gaussian Processes regression
Twin Gaussian process

In short, the MLRA toolbox enables:

To apply and evaluate multiple MLRAs according to customized training strategies, e.g. with different noise and train/validation partitioning.
To choose between either single-output or multi-output models.
Data can either come from radiative transfer models or from field measurements, or can be mixed.
If a land cover map is provided, then for each land cover class a distinct MLRA can be optimized.
When having validation data available then multiple MLRA strategies can be analyzed against the validation dataset by using goodness-of-fit statistics. Results are stored in a relational database.
The best performing strategy can be loaded and applied to an imagery, or a model can be directly developed and applied to an imagery, for mapping applications.
From v. 1.17 onwards an active learning module and an automated band analysis tool (GPR-BAT) has been added.