Machine Learning Regression Algorithms Toolbox

 

Biophysical parameter mapping from optical remote sensing images always require an intermediate modeling step to transform spectral observations into useful estimates. This modeling step can be approached with either statistical, physical or hybrid methods. Here emphasis is put on statistical methods. Statistical methods can be categorized into either parametric or non-parametric approaches.

 

The here presented machine learning regression algorithms (MLRAs) assessment toolbox provides a suite of non-parametric techniques to enable semiautomatic mapping of surface biophysical variables.

Non-parametric models are adjusted to predict a variable of interest using a training dataset of input-output data pairs, which come from concurrent measurements of the parameter and the corresponding radiometric observation.

MLRAs have the potential to generate adaptive, robust relationships and, once trained, they are very fast to apply. Typically, MLRAs are able to cope with the strong nonlinearity of the functional dependence between the biophysical variable and the observed reflected radiance. They may therefore be powerful candidates for mapping applications.

 


Conceptual design of MLRA-based biophysical variable mapping.

 

The MLRA toolbox requires training data to train an advanced regression model (e.g. MLRA). Training data may originate from simulations, e.g. as generated by the optical radiative transfer models in ARTMO, or from field campaigns. This trained model can then be validated and applied to a remote sensing image to enable mapping.

 


ARTMO’s MLRA toolbox v.1.28.

The MLRA toolbox is based on the simple Regression toolbox, (simpleR) developed by Gustavo Camps-Valls.

SimpleR contains a set of functions in Matlab to illustrate the capabilities of the following statistical regression algorithms: 

 

  • Least squares linear regression
  • Partial least squares regression
  • Regularized least-squares regression
  • Principal components regression
  • Elastic Net regression
  • Adaptive Regression Splines
  • K-nearest neighbors regression
  • Weighted k-nearest neighbors regression
  • Regression tree
  • Regression tree (LS boosting)
  • Boosting trees
  • Bagging trees
  • Gradient Boosting/Boosted Trees
  • Random Forest (TreeBagger)
  • Canonical Correlation Forests
  • Extreme Learning Machine
  • Neural Network
  • Relevance vector Machine
  • Support Vector Regression - Matlab
  • Kernel ridge Regression
  • Kernel signal to noise ratio
  • Gaussian Processes Regression
  • Gaussian Processes Regression - Matlab
  • Gaussian Processes Regression
  • Sparse Spectrum Gaussian Process Regression
  • Warped Gaussian Processes regression
  • Twin Gaussian process

 In short, the MLRA toolbox enables:

  • To apply and evaluate multiple MLRAs according to customized training strategies, e.g. with different noise and train/validation partitioning.
  • To choose between either single-output or multi-output models.
  • Data can either come from radiative transfer models or from field measurements, or can be mixed.
  • If a land cover map is provided, then for each land cover class a distinct MLRA can be optimized.
  • When having validation data available then multiple MLRA strategies can be analyzed against the validation dataset by using goodness-of-fit statistics. Results are stored in a relational database.
  • The best performing strategy can be loaded and applied to an imagery, or a model can be directly developed and applied to an imagery, for mapping applications.
  • From v. 1.17 onwards an active learning module  and an automated band analysis tool (GPR-BAT) has been added.