Introduction
Contributors
Hobson (hobs at totalgood dot com?Subject=BYO%20Brain%20Workshop)
Chick (chick at thewells dot org?Subject=BYO%20Brain%20Slides)
Thunder (melange.au.bleu at gmail dot com?Subject=BYO%20Brain%20Hyperopt%20Talk)
Zeke (me at ze6ke dot com?Subject=BYO%20Brain%20Workshop)
Total Good
Topics
Turing
Layers
Logic Gate
Biology
Neuron
Images
Learning
Time Series
Automata
XOR Fail
AI: "Artificial" intelligence -> "machine" intelligence
Think of "artificial" as "pretend".
or model.
Like the model you have for how society works.
If you are nice people will be nice to you.
If you work hard, you'll get paid by your boss.
Advanced Topics
Feedback/Recursion
Optimization
Reinforcement
Dropout
Bonus
NN Applications (by Zeke)
NN Tuning (by Thunder)
History
1943 MCP neuron (binary logic)
1948 Hebbian Learning
1958 Rasenblatt Perceptron
1964 Kernel Perceptron
1969 XOR fail
1972 ELIZA vs. Parry
AI Dies
We'll start with a single neuron cell, like biology.
Artificial neurons (perceptrons) need a way to learn.
Two neurons are better than one (required for XOR)
Then I'll show you how to leverage others' code to build a real brain
It's Alive!
1970 Conway: Game of Life
1972 [ELIZA]https://en.wikipedia.org/wiki/ELIZA)
1980s Multilayer FFNN
1990 robotics and games
2000 sports (basketball)
2010 Deep learning
2012 convolutional
2013 hyperopt
Turing's Universal Computer
After winning WWII by cracking the Enigma code, Turing designed the first Programmable Computer
Logic Gates (Relays)
Memory (Tape)
Biological Brains
Neuroscientists simulated a whole brain
... of a nematode worm (C Elegans)
~300 neurons
~200 in central nervous system
C Elegans Most photographed organism of all time?
Pretend Brains
Artificial brains aren't at all like human brains,
. . .
or even a worm brain
Neuron simulations are broad abstractions - pulses in time not modeled - chemistry - neuron internal feedback loops
The "artificial" in AI is being replaced by "Machine" Or you can think of artifical as "pretend". Or a model. Like the model you have in your head for how the economy or the world works. If you are nice people will be nice to you. If you work hard, you'll get paid by your boss.
Brain? Really?
No, not really . Just pretend. In a computer.
Zeke pointed out that we're not building a brain, but an approximate model of a brain, an abstraciton What you build today won't be able to have a conversation with you, but it might be able to tell you whether to bring your umbrella to work on Monday. And it won't act in the physical world (except the light emitted from your laptop LEDs)
Abstraction
18th century: stars modeled as crystal spheres
Copernicus: Earth at the center
Too much complexity can hide the truth. In the middle ages they could predict the motions of the stars to a few degrees Now we can predict satellites, planets, stars to microradians (millimeters for satellites)
C Elegans' brain is shaped like a donut or our pharynx
What you build today won't be able to have a conversation with you, but it might be able to tell you whether to bring your umbrella to work on Monday. And it won't act in the physical world (except the light emitted from) It can't even slither through the dirt like this nematode worm
Neuron
Typical animal neuron
"Pretend" Neuron
Math mirrors life
Math
You don't need to know Linear Algebra, just...
multiply
add
check thresholds
Equation/code is short (in python)
100s of neurons
Train for minutes
McCulloch Pitts Neuron (MCP)
Modeled after biological neurons. Can be combined to perform any logical or mathematical operation.
Binary output: 0 or +1 Any Number of binary inputs Inhibitory input with "veto" power
Let's Simulate
raise your hand if:
either of your neighbors raises her hand
put your hand down:
either if both of your neighbors are same "state"
XOR
Cellular Automata, Wolfram Rule 82 = XOR
Complexity
Cellular Automata, Wolfram Rule 110 = Complex
Game of Life
Game of Life gliders
Rosenblatt's Perceptron
Designed to be "trainable" Rosenblatt provided a training algorithm
Binary output: -1 or +1 Any number of real inputs Threshold = 0 Weights and inputs can be real-valued
Activation functions
sigmoid
saturation
threshold
linear
sync
tanh
Priorities Matter
Neanderthals' big eyes likely drove them to extinction
Too much of a good thing
Less room for imagination
Less neurons for social interaction
Neanderthals' large eyes may have hurt their chances
Lesson: Depth Wins
A deeper brain may be better
than
High resolution sensing
Layers
Many layers (6+ for)
Many neurons/layer
Sophisticated Connection Architectures
fully-connected
convolutional
recursive
sparse
random
scale-free
Neural Nets were "made" for ... Hyperdimensionality
Images (object recognition)
Sound (speech recognition)
Time series (weather, finance, election prediction)
Pattern Recognition
Prediction
Segmentation (sound, image)
Feature detection
Fraud detection
Intrusion detection
Game cheating detection . . .
But often they can produce useful features that seemingly don't make sense
. . .
except for images
Neural Nets help when ...
You don't know what to look for (feature engineering)
FFT
DCT
Wavelets
PCA/SVD
RF
Statistics (mean, std, diff, polynomial)
LPF/BPF/HPF
Resampling/Interpolation/Extrapolation
Conventional control laws fail
shooting a basketball
kicking a soccer ball
stabilizing an inverted pendulum
helicopter stunts
Neural Nets can help invert "Physics" models
Infer reflectance despite shadow/glare/haze
2-D image -> 3-D object
When direct measurement of 3-D not possible
stereoscopic vision
structured light
lidar
radar
sonar
Kinect or RealSense
NNs "see" through structured noise
Both images and sound often suffer from
occlusion
obsucration/haze/fog/fade
rotation/translation/warping
NNs need data and power
Lots of examples to learn from
CPU/GPU cycles to burn
Google speech recognition doesn't run on your phone...yet
Classification
The most basic ML task is classification
Predict "rain" (1) "no rain" (0) for PDX tomorrow
Supervised Learning
We have historical "examples" of rain and shine
Weather Underground
Since we know the classification (training set)...
Supervised classification (association)
Rain, Shine, Partly-Cloudy ?
Wunderground lists several possible "conditions" or classes
If we wanted to predict them all
We would just make a binary classifier for each one
All classification problems can be reduced a binary classification
Sounds mysterious, like a "flux capacitor" or something...
It's just a multiply and threshold check:
{% highlight python %} if (weights * inputs) > 0: output = 1 else: output = 0 {% endhighlight %}
Need something a little better
Works fine for "using" (activating ) your NN
But for learning (backpropagation ) you need it to be predictable...
Again, sounds mysterious... like a transcendental function
It is a transcendental function, but the word just means
Curved, smooth like the letter "C"
What Greek letter do you think of when I say "Sigma"?
"Σ"
What Roman (English) character?
You didn't know this was a Latin/Greek class, did you...
Σ (uppercase) σ (lowercase) ς (last letter in word) c (alternatively)
Most English speakers think of an "S"
when they hear "Sigma".
So the meaning has evolved to mean S-shaped.
Shaped like an "S"
The trainer ((backpropagator)[https://en.wikipedia.org/wiki/Backpropagation] ) can predict the change in weights
required Wants to nudge the output
closer to the target
target
: known classification for training examples output
: predicted classification your network spits out
But just a nudge.
Don't get greedy and push all the way to the answer Because your linear sloper predictions are wrong And there may be nonlinear interactions between the weights (multiply layers)
So set the learning rate () to somthething less than 1 the portion of the predicted nudge you want to "dial back" to
Example: Predict Rain in Portland
Visualizing a Brain
watch the weights evolve
activate with examples and watch intermediate layers
Output column heatmap
Input row heatmap
Get historical weather for Portland then ...
Backpropagate: train a perceptron
Activate: predict the weather for tomorrow!
NN Advantages
Easy
No math!
No tuning!
Just plug and chug.
General
One model can apply to many problems
Advanced
They often beat all other "tuned" approaches
Disadvantage #1: Slow to Learn
Example
24+ hr for complex Kaggle example on laptop
90x30x20x10 ~= 1M DOF
90 input dimensions (regressors)
30 nodes for hidden layer 1
20 nodes for hidden layer 2
10 output dimensions (predicted values)
Disadvantage #2: They don't often scale (difficult to parallelize)
Fully-connected NNs can't be easily hyper-parallelized (GPU)
Large matrix multiplications
Layers depend on all elements of previous layers
Scaling Workaround
At Kaggle workshop we discussed paralleling linear algebra
Split matrices up and work on "tiles"
Theano, Keras for python
PLASMA for BLAS
Scaling Workaround Limitations
But tiles must be shared/consolidated and theirs redundancy
Disadvantage #3: They overfit
Too manu nodes = overfitting
What is the big O?
Degrees of freedom grow with number of nodes & layers
Each layer's nodes connected to each previous layer's
That a lot of wasted "freedom"
Many weights are randomly zeroed/ignored (Random Dropout)
O(N^2) to activate
O(N^3) to learn
Not so fast, big O...
{% highlight python %} >>> np.prod([30, 20, 10]) 6000 >>> np.sum([30, 20, 10])**2 3600 {% endhighlight %}
Rule of thumb
NOT N**2
But M * N**2
N: number of nodes M: number of layers
Automated Architecture Limits
assert(M * N**2 < len(training_set) / 10.)
I'm serious... put this into your code. I wasted a lot of time training models for Kaggle that was overfit.
Augment with your Brain
Imprint your net with the structure of the problem
Feature engineering
Choose activation function
Partition your NN
Prune and evolve your NN
This is a virtuous cycle!
More structure (no longer fully connected)
Each independent path (segment) is parallelizable!
Automatic tuning, pruning, evolving is all parallelizable!
Just train each NN separately
Check back in with Prefrontal to "compete"
Engineering
jargon: receptive fields
jargon: weight sharing
All the rage: convolutional networks
Ideas
limit weight ranges (e.g. -1 to 1, 0 to 1, etc)
weight "snap to grid" (snap learning)
dream up your own activation function
improve the back-propagation function
Code highlighting test
{% highlight javascript %} function linkify( selector ) { if( supports3DTransforms ) {
var nodes = document.querySelectorAll( selector );
for( var i = 0, len = nodes.length; i < len; i++ ) {
var node = nodes[i];
if( !node.className ) {
node.className += ' roll';
}
}
} } {% endhighlight %}