|
Deep and Shallow Networks |
||
When and Why Are Deep Networks Better
than Shallow Ones? (Hrushikesh Mhaskar, Qianli Liao,
Tomaso Poggio) |
This paper describes technically very
well what are the advantages of Deep Learning technology over Shallow neural
models. The document describes when the DL is
preferable and why. It is an in-depth and balanced technical analysis that does not
support those who consider shallow neural models "obsolete" and
those who consider the DL a sort of "alchemy". |
|
OUR HISTORY AND OUR THOUGHT We started doing research and development
with neural networks in the 80s and obviously the Multilayer Perceptron with
Error Back Propagation algorithm was the most used model in the first years
of activity. In those years there were no BIG-DATA and the best that could be
obtained to do pattern recognition tests on images was the set of MNIST
handwritten digits. The most we could have as an
accelerator for our 33MHz Intel 386DX was an ISA card with a DSP (Digital
Signal processor): being the DSP designed to calculate FFT (Fast Fourier Transform),
it was optimized to speed up multiplication and accumulation operations (MAC)
and this feature allowed to speed up the operations on the single neuron. Intel
386SX and an ISA board with DSP (Digital Signal Processing) We developed alternative and
additional methods of Error Back-Propagation algorithm such as Genetic
Algorithms and Simulated Annealing which were automatically triggered to
overcome local minima. We dreamed of giving the image pixels
directly to the network, assuming to use at least ten Hidden layers: there
was the software ready to do all this, but there were not enough images and
the memory and computing power of the computer we had were orders of
magnitude smaller than required. Despite these limitations, with a
Multilayer Perceptron with two Hidden layers we have created one of the first
systems of Infrared Image Recognition for Land Mine Detection. In a second phase we mainly used
shallow neural networks with supervised and unsupervised learning algorithms.
From Adaptive Resonance Theory (ART) to Self Organizing Map (SOM), from
Support Vector Machine (SVM) to Extreme Learning Machine (ELM), from Radial
Basis Function (RBF) to classifiers with Restricted Coulomb Energy (RCE). Currently we think that the
Convolutional Neural Networks (CNN) which are nothing more than the evolution
of Kunihiko Fukushima's Neocognitron (1979) are the most effective solution
for image recognition where large labelled datasets are available. There are quality control contexts in
the industrial environment in which the number of images available is not
sufficient for the use of a solution with DL and, therefore, other
methodologies must be used. DL technology has amply proven to be
extremely efficient but equally vulnerable and therefore unreliable in
safety-critical applications (DARPA GARD Program). An
example of Deep Learning deception There are contexts in which for
ethical, moral reasons or even of necessary cooperation between man and
algorithm, in which the inference of a neural network must be explainable
(DARPA XAI Program). There are contexts in which current
GPUs cannot operate (AEROSPACE) because they are too vulnerable to cosmic
radiation and we need to develop algorithms capable of being efficient on
radiation-hardened processors typically operating at very low clock
frequencies.
A GPU
for Machine Learning |
OUR CURRENT TECHNOLOGY An application based entirely on DEEP
LEARNING is unable to learn new data in real time. As effective as this solution
is, it can be specialized for a single task and cannot evolve in real time
based on experience, as the human brain does. In fact it is a computer
program written by learning data. Our technologies derive from neuronal
models based on three fundamental theories: 1) Hebb's rule (Donald O. Hebb) 2) ART (Adaptive Resonance Theory)
(Stephen Grossberg / Gail Carpenter) 3) RCE (Restricted Coulomb Energy)
(Leon Cooper) We are inclined of using Deep
Learning technology only in the field of image processing but our technology
uses DL only as a tool for the extraction of features (e.g. DESERT™). In
image processing it is decidedly more critical to obtain explainable
inference also using traditional features extraction methods. The decision
layer is always based on our SHARP™, LASER™ and ROCKET™ classification
algorithms. In this way we combine the potential
of Deep Learning technology with the continuous learning ability of our
classifier models. This approach also allows for greater robustness of the
inference that is no longer bound to the Error Back-propagation algorithm. Reading the text "When and Why
Are Deep Networks Better than Shallow Ones?" you can understand that
shallow neural networks can solve the same problems as deep neural networks
as both are universal approximators. The price to pay for using shallow
neural networks is that the number of paramaters (synapses) will grow almost
exponentially as the complexity of the problem increases. The same does not
happen for a Deep Neural Network. If we analyze the true numbers of the
parameters necessary to solve the most complex problems (currently solved on
the current Deep Neural Networks) in a shallow neural classifier, we realize
that the problem is the lack of a SIMD (Single Instruction Multiple Data) machine
that can process all those parameters simultaneously. Are we therefore facing a
technological problem? Yes, just like when there were no potentially adequate
GPUs to implement learning processes with deep neural networks. In fact, we cannot have SIMD
processors with millions of Processing Elements (PE) and even less with
billions of Processing Elements: these devices would be enough to overcome
the leap in computational capacity between deep neural networks and shallow
neural networks. But what technology has reached these numbers? RAM memory
and FLASH memory technologies. MYTHOS™ technology uses memory to
exponentially accelerate the execution speed of prototype-based neural
classifiers. MYTHOS™ technology is convenient when the neural classifier has
to learn BIG DATA or HUGE DATA: the scan time of the prototypes will increase
linearly with respect to an exponential increase in the number of prototypes. MYTHOS™ technology together with
Neuromem® technology and NAND-FLASH memory allow to carry out a "broadcasting"
operation of the input pattern on billions of prototypes in a constant time. Our technological response is
application dependent. We are mainly oriented towards
defence and aerospace applications, where we want to use MYTHOS™ technology
with Radiation Hardened CPU. We design and build hardware solutions aimed to
support DL in RAD-HARD devices for aerospace applications. General Vision Neuromem® with 5500 neurons General Vision Neuromem® with 500 neurons BAE SYSTEMS RAD750™ MIND™ with shallow (Neuromem®) NN accelerators and deep
(TPU) NN accelerators is a RAD-HARD AI device for aerospace applications |
|
©2024_Luca_Marchese_All_Rights_Reserved Aerospace_&_Defence_Machine_Learning_Company VAT:_IT0267070992 NATO_CAGE_CODE:_AK845 Email:_luca.marchese@synaptics.org |
Contacts_and_Social_Media |