Extreme Learning Machine

by Kemal Erdem | @burnpiro

ELM what?

Guang-Bin Huang - "Extreme learning machine: Theory and applications" 2006
Does not depend on backpropagation!

ML Basics - SLFN

Single hidden Layer Feedforward Neural network © Shifei Ding under CC BY 3.0

ELM

ELM structure © Shifei Ding under CC BY 3.0

Theorem

$$|| H_{N \times L}\beta_{L \times m} - T_{N \times m} || < \epsilon, \epsilon > 0$$ For $L \leq N$ if $g$ is infinitely differentiable then $H$ is invertible

More on https://erdem.pl/2020/05/introduction-to-extreme-learning-machines

ELM - Equations

$$ \small o_j = \sum_{i=1}^{L}\beta_ig_i(x) = \sum_{i=1}^{L}\beta_ig(w_i * x_j + b_i), j = 1,...,N $$

$$ T = H\beta $$ Where $H$ is called the hidden layer output matrix

ELM - Vector version

$$ H = \begin{bmatrix} g(w_1 * x_1 + b_1) & ... & g(w_L*x_1+b_L) \\ \vdots & ... & \vdots \\ g(w_1 * x_N + b_1) & ... & g(w_L * x_N + b_L) \end{bmatrix}_{N \times L} $$ $$ \beta = \begin{bmatrix} \beta_1^T \\ \vdots \\ \beta_L^T \end{bmatrix}_{L \times m} T = \begin{bmatrix} t_1^T \\ \vdots \\ t_N^T \end{bmatrix}_{N \times m} $$

Matrix inverse solution

$$|| H\hat\beta - T || = \min_{\beta}|| H\beta - T ||$$

$$ \hat\beta = H^{\dagger}T $$ Where $H^{\dagger}$ is the Moore-Penrose generalized inverse of $H$ ELM-NC-2006 - Theorem 5.1 -> C.R. Rao, S.K. Mitra 1971

ELM algorithm

Randomly assign weight $w_i$ and bias $b_i$, $i = 1,...L$
Calculate hidden layer output H
Calculate output weight matrix $\hat\beta = H^\dagger T$
Use $\hat\beta$ to make a prediction on new data $T = H\hat\beta$

Live example

Offline version available here: https://github.com/burnpiro/elm-pure/blob/master/ELM%20example.ipynb

Network performance $$ \begin{array} {|r|r|}\hline \text{Problems samples} & \text{Training samples} & \text{Testing} & \text{Attributes} & \text{Classes} \\ \hline \text{Satellite image} & 4400 & 2000 & 36 & 7 \\ \hline \text{Image segmentation} & 1500 & 810 & 18 & 7 \\ \hline \text{Shuttle} & 43500 & 14500 & 9 & 7 \\ \hline \text{Banana} & 5200 & 490000 & 2 & 2 \\ \hline \end{array} $$ Testing env: Pentium 4, 1.9 GHZ CPU (it's 2006!!!)

Network performance $$ \begin{array} {|rr|} \hline \text{Problems} & \text{Algorithms} & \text{Training [s]} & \text{Testing[s]} & \text{Acc Train [%]} & \text{Acc Train Dev [%]} & \text{Acc Test [%]} & \text{Acc Test Dev [%]} & \text{Nodes}\\ \hline \text{Satellite_image} & ELM & 14.92 & 0.34 & 93.52 & 1.46 & 89.04 & 1.50 & 500 \\ & BP & 12561 & 0.08 & 95.26 & 0.97 & 82.34 & 1.25 & 100 \\ \hline & & & & & & & & \\ \hline \text{Image_segment} & ELM & 1.40 & 0.07 & 97.35 & 0.32 & 95.01 & 0.78 & 200 \\ & BP & 4745.7 & 0.04 & 96.92 & 0.45 & 86.27 & 1.80 & 100 \\ \hline & & & & & & & & \\ \hline \text{Shuttle} & ELM & 5.740 & 0.23 & 99.65 & 0.12 & 99.40 & 0.12 & 50 \\ & BP & 6132.2 & 0.22 & 99.77 & 0.10 & 99.27 & 0.13 & 50 \\ \hline & & & & & & & & \\ \hline\text{Banana} & ELM & 2.19 & 20.06 & 92.36 & 0.17 & 91.57 & 0.25 & 100 \\ & BP & 6132.2 & 21.10 & 90.26 & 0.27 & 88.09 & 0.70 & 100 \\ \hline \end{array} $$

Early evolution of ELMs

I-ELM (incremental) 2006 - add new nodes to hidden layer and froze existing ones
P-ELM (pruning) 2008 - start with a huge network and remove nodes
Regularized ELM 2009 - $\hat\beta = \left (\frac{1}{C}+H^TH \right )^{-1} H^TT$
TS-ELM (two-stage) 2010 - combination of I-ELM and P-ELM
V-ELM (voting) 2013 - create many ELMs and remove nodes base on misclassification results
KELM (kernel) 2014 - kernel function instead of $HH^T$
ELM-AE (autoencoder) 2014 - unsupervised mapping

Changes in ELM structure - ELM-LC

$$ H^{\dagger} = \begin{cases} (H^THH^T)^{-1} & N \leq L \\ (H^TH)^{-1}H & N > L \end{cases} $$

ELM goes Deep Learning - (TELM, MELM, DELM)

A Multiple Hidden Layers Extreme Learning Machine Method and Its Application 2017 - D. Xiao, B. Li, and Y. Mao

Benchmarks

$$ \begin{array} {|rr|} \hline \text{Dataset} & \text{Algorithms} & \text{Acc Test [%]} \\ \hline \text{CIFAR-10} & \text{ELM 1000 (1x)} & 10.64 \\ & \text{ELM 3000 (20x)} & 71.40 \\ & \text{ELM 3500 (30x)} & 87.55 \\ & \text{ReNet (2015)} & 87.65 \\ & \text{EfficientNet (2019)} & 98.90 \\ \hline & & & & & & & & \\ \hline \text{MNIST} & \text{ELM 512} & 92.15 \\ & \text{DELM 15000} & 99.43 \\ & \text{RNN} & 99.55 \\ & \text{BP 6-layer 5700} & 99.65 \\ \hline \end{array} $$

Image Super-Resolution by KELM

Image super-resolution by extreme learning machine, 2012 - Le An, Bir Bhanu

Wind speed prediction in France

An efficient scenario-based and fuzzy self-adaptive learning particle swarm optimization approach for dynamic economic emission dispatch considering load and wind power uncertainties. 2013 - Bahmani-Firouzi B, Farjah E, Azizipanah-Abarghooee R

Energy price forecasting

Electricity price forecasting with extreme learning machine and bootstrapping 2012 - X. Chen, Z.Y. Dong, K. Meng, Y. Xu, K.P. Wong, H.W. Ngan

3D shape segmentation

3D Shape Segmentation and Labeling via Extreme Learning Machine 2015 - J. Tang, C. Deng and G. Huang

Object tracking

Extreme Learning Machine for Multilayer Perceptron 2014 - Zhige Xie, Kai Xu, Ligang Liu, Yueshan Xiong

References

"Extreme learning machine: Theory and applications" 2006 G.B. Huang, Q.Y. Zhu, C.K. Siew
“Extreme learning machine for regression and multiclass classification” 2012 - G.-B. Huang, H. Zhou, X. Ding and R. Zhang
"Clustering in Extreme Learning Machine Feature Space" 2014 - He Qing, Xin Jin, Changying Du, Fuzhen Zhuang and Zhongzhi Shi
“Deep Extreme Learning Machine and Its Application in EEG Classification” 2015 - S. Ding, N. Zhang, X. Xu, L. Guo and J. Zhang
"Extreme Learning Machine: A Review." 2017- Albadr, Musatafa & Tiuna, Sabrina.
“A Multiple Hidden Layers Extreme Learning Machine Method and Its Application” 2017 - Dong Xiao, Beijing Li and Yachun Mao
"Extreme Learning Machines" 2013 Erik Cambria, Guang-Bin Huang
“EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks” 2019 - Mingxing Tan, Quoc V. Le
"ReNet: A Recurrent Neural Network Based Alternative to Convolutional Networks." 2015 - Visin, F.; Kastner, K.; Cho, K.; Matteucci, M.; Courville, A.; Bengio, Y.

References 2

“Deep, big, simple neural nets for handwritten digit recognition." 2010 - Cireşan DC, Meier U, Gambardella LM, Schmidhuber J.
"Fast, Simple and Accurate Handwritten Digit Classification by Training Shallow Neural Network Classifiers with the 'Extreme Learning Machine' Algorithm". 2015 - McDonnell MD, Tissera MD, Vladusich T, van Schaik A, Tapson J.
"A Survey of Handwritten Character Recognition with MNIST and EMNIST." 2019 - Alejandro Baldominos, Yago Saez and Pedro Isasi
"An Insight into Extreme Learning Machines: Random Neurons, Random Features and Kernels" 2014 Guang-Bin Huang
“Extreme Learning Machine with Local Connections” 2018 - Feng Li, Sibo Yang, Huanhuan Huang, and Wei Wu
"Image super-resolution by extreme learning machine," 2012 - L. An and B. Bhanu,
“An efficient scenario-based and fuzzy self-adaptive learning particle swarm optimization approach for dynamic economic emission dispatch considering load and wind power uncertainties." 2013 - Bahmani-Firouzi B, Farjah E, Azizipanah-Abarghooee R
"Electricity price forecasting with extreme learning machine and bootstrapping" 2012 - X. Chen, Z.Y. Dong, K. Meng, Y. Xu, K.P. Wong, H.W. Ngan
"3D Shape Segmentation and Labeling via Extreme Learning Machine" 2014 - Zhige Xie, Kai Xu, Ligang Liu, Yueshan Xiong

Thanks

"There's no such thing as a stupid question!"

Kemal Erdem | @burnpiro
https://erdem.pl