How Convolutional Neural Networks Work

drewm1980 · on Sept 26, 2016

I am supportive of clear explanations of some of the building blocks, but I worry repeatedly describing things as being grade school math level gives the wrong impression about the actual learning curve for getting up to speed on working with CNN's. Yes, the building blocks are easy to understand, but actually understanding why a given network structure, or optimization technique isn't working, is a black art. And if you don't have a workstation with a $2k gpu or two, you're probably not going to have a good time.

lettergram · on Sept 26, 2016

I setup an automated script to setup a AWS g2 instance, train my neural net using tensorflow, copy my model to my personal computer, and spin down. It costs like $5-$10 to train/test most neural network models. My most expensive model cost like $100 and required a ton of time and resources. It took like 4 days or something.

You really dont need $2k workstation..

Of course, for personal use I do have a gtx 1080 because I like to game and play tensorflow/caffe

anantzoid · on Sept 27, 2016

Spot instances cost way cheaper. The only downside is you need to create an AMI everytime before termination. But, also, AWS g2 has NVIDIA Grid K50 with 4GB memory, so it's not very good with performance.

alien_ · on Sept 29, 2016

The AMI creation is only needed if you store data on the machines, which you shouldn't do anyway, not even with on-demand ones.

You should always try to keep the instances stateless and store any data outside the instances, such as on S3 or EFS.

danieltillett · on Sept 26, 2016

Would you mind sharing this script? I am thinking of investing in the 2K workstation, but this sounds like a much more efficient way to go.

lettergram · on Sept 26, 2016

I tried to get it released before, but was shut down by the "open source office". So I can't give you the exact script. However, h2o has a script that launches a cluster, that's very similar:

https://github.com/ledell/h2oEnsemble-benchmarks/blob/master...

You can also do stuff like following in python to execute some code:

    os.system("scp script_demo.py " + sys.argv[1] + ":/home/ubuntu/")
    os.system("ssh " + sys.argv[1] + " sudo chmod 777 /home/ubuntu/script_demo.py")
    os.system("ssh " + sys.argv[1] + " './script_demo.py'")
    os.system("scp " + sys.argv[1] + ":/home/ubuntu/output/* ./output/")

Then it will execute, after it executes - shut it down.

viraptor · on Sept 26, 2016

Even if it may not matter in this case specifically, this is a terrible example. "sudo chmod 777" in public code is basically "I don't know what I'm doing, but go on, do the same thing yourself" :(

You don't need sudo, because it's your file. You need just "chmod 755", not "777". And you don't need chmod in the first place - just run "python script_demo.py".

lettergram · on Sept 27, 2016

Very true it doesn't need a 777. It's also not my script, just copied and pasted some code that fit what I was doing in mine. Basically, it follows the same format, specific implementation varys.

Also, you are being rather pedantic here. If people can figure out how to take what I copied & pasted and turn it into a script they can probably know chmod 777 isn't great.

That being said, and why I think this is ridiculous, is that you are assuming this matters. Going into the weeds here, to play along: The script is immediately being ran, on temporary and very recently launched ec2 instance, probably with the use of a pem and that presumably can even be part of AWS security group that only allows your IP, and is shut down following it's execution.

I can't picture this being a security vulnerability at all. Calling it a terrible example is relative - I wrote this copy and pasted on a cell phone trying to help someone. Honestly, didn't even see the "sudo chmod 777", just pasted away.

empath75 · on Sept 27, 2016

If you're launching an AWS instance that doesn't have any ports open and is going to terminate in a few hours, who cares?

viraptor · on Sept 27, 2016

As I said: "Even if it may not matter in this case". But what about the next time? (poster obviously doesn't understand the code) What about people who read this comment and follow the advice because it works? (happens all the time with SO) What about when your script becomes the standard deployment method for the company? (happens everywhere) What about people who don't know about file privileges either and blindly copy what they see here?

There's lots of things we do that you could say "who cares?" about. Single letter variables. Comments. Const-correctness. Unnecessary N^2 algorithms. Usually it turns out that either you or someone you work with cares a few months/years afterwards. So just learn to do it correctly the first time. Especially if the correct way takes less time than the "magic fix".

danieltillett · on Sept 27, 2016

These things come about because people learn the simple way to solve all permission problems is just do everything as su (or root) and chmod everything to 777. The problem is it does solve all problems (well sweep them under the rug) - I guess when the only tool you have is a hammer everything looks like a nail.

oh_sigh · on Sept 27, 2016

Yes. On the other hand, there isn't a strong overlap with machine learning experts and security experts(I'm sure there will be in coming years though, as security experts start using NNs to detect anomalous/dangerous behavior).

viraptor · on Sept 27, 2016

I know what you mean, but I disagree with "security experts". You don't need to be a security expert, just know the basics of how privileges work on your system. And if people don't, we just need to keep calling it out, because otherwise code like this ends up in production one day, where it actually matters.

Simpler bash equivalent for the record:

    scp script_demo.py "$1"
    ssh "$1" python script_demo.py
    scp "$1":output/* ./output/

anantzoid · on Sept 27, 2016

Exactly. If you're a ML person, you must be working with Linux in most cases. And in such case, you need to be familiar with the types of permissions, cause that's relevant even during the installation of various libraries, doing ssh etc.

ClassyJacket · on Sept 27, 2016

>but was shut down by the "open source office"

Can you expand on that? What happened?

lettergram · on Sept 27, 2016

Basically, I couldn't get authorization to publish internal code. In this case, it was a combination of who was going to maintain it and ensuring nothing internal was leaked.

mrfusion · on Sept 26, 2016

What kind of models do y'all make?

ionforce · on Sept 27, 2016

It's a better start than having no visibility into the topic at all.

spiderfarmer · on Sept 26, 2016

Question: I have a database with 1.000.000 vehicle pictures, organized by make and model. What would be the easiest way to play with this data, so that I can train it to predict the make / model? I don't want to reinvent the wheel now so much tutorials are written and software is being released. What would be the easiest way to start?

polite_cancer · on Sept 26, 2016

Oh, if you don't want to get into the nitty gritty of it, you should use Digits (https://github.com/NVIDIA/DIGITS).

This is the easiest way to setup a CNN and train it with your sample images (at least compared to Caffe, Tensorflow, and Theano). I say that because it's all GUI based! Real convenient.

mrfusion · on Sept 26, 2016

Wow that is cool! What if I wanted to make a model to detect cats? Can I just load a bunch of pictures from the web? Do I need negative examples?

polite_cancer · on Sept 27, 2016

Good question. Detecting just one class (in this case, cats) will require negative examples. Finding good negatives is somewhat of challenging task because they should be pretty comprehensive, but if you create an account on image-net (http://image-net.org/), then you can download thousands of images.

Any questions, feel free to pm me.

spiderfarmer · on Sept 27, 2016

Will check it out, thanks!

stared · on Sept 26, 2016

With that number you can train a model from scratch. Alternatively, you can retrain the last layers (it requires much less data).

The easiest way is to use some well-known architecture (e.g. VGG16) and go. See: https://github.com/leriomaggio/deep-learning-keras-euroscipy...

misiti3780 · on Sept 27, 2016

do you have any examples of scripts/examples where only specific layers of a network are being trained?

sidarape · on Sept 27, 2016

As I said in another thread, I used that (https://www.tensorflow.org/versions/r0.9/how_tos/image_retra...) which seems to work well. I think especially with your problem since it is similar to the type of classification of ImageNet.

DAddYE · on Sept 26, 2016

OT, I'm interested in this field as well. Are all these pictures USA cars or international?

Do you mind disclose where you got them?

Thanks!

spiderfarmer · on Sept 27, 2016

It's not cars, but mostly tractors. I have been running an online community for almost 10 years now and the images have been submitted and organized by the members.

nilved · on Sept 26, 2016

Good post, but the author needs to read this article. I interpret the tone in some places to be condescending.

https://css-tricks.com/words-avoid-educational-writing/

dicroce · on Sept 26, 2016

This video is a great introduction to convolutions and pooling.

The other best resource, IMHO, is http://karpathy.github.io/neuralnets/.

j1vms · on Sept 27, 2016

> CNNs can be used to categorize other types of data too. The trick is, whatever data type you start with, to transform it to make it look like an image.

This is an interesting point, and I assume that 'make it look look like an image' means the same thing as 'think of it as an image'. Can others here who works with CNNs regularly or professionally, comment on whether the author's intuition is essentially correct (give or take some details of course)?

arketyp · on Sept 27, 2016

It comes down to the characterizing architecture of convolutional nets, that is weight sharing, and the assumption on data this makes. If by image one means something where you can expect any pattern (at some level in the hierarchy) being equally likely to occur anywhere across an input dimension, then yes this is true. Personally I would say that this is too narrow of a definition of an image (too great of an assumption), and, interestingly enough, perhaps too broad too. I am not a pro.

[Edit] Too broad in the sense that, intuitively, there is perhaps an implied assumption of continuity of the input function defining the image. Note that such assumptions can be made explicit with various so-called statistical priors incorporated in the network.

sprobertson · on Sept 27, 2016

One thing that might work well with a CNN is geospatial data (easy to consider as an image).

partycoder · on Sept 26, 2016

I think the most intuitive example of a neural network in action is this: http://swaption.net

This is not convolutional though.

elcct · on Sept 26, 2016

This is brilliant, so far one of the easiest to understand explanations.

jimkittridge · on Sept 26, 2016

Great explanation. Thanks for sharing.

oh_sigh · on Sept 26, 2016

Really great write up. Ive been trying to wrap my not too mathematically talented head around convolutional filters and this really helped in visualizing what is happening.

armandtamzarian · on Sept 26, 2016

Great post. I like how he didn't go into too much detail on the math of backprop etc. I find the conceptual understanding of ML is more interesting as a lay person.

banned4life · on Sept 27, 2016

If you google "Hinton machine learning" on youtube, you will find hinton's lecture's they are non-mathematical, he is a psychologist/math guy, and he is the inventor of almost all this stuff, backprop, drop-out,

You will find his lectures to be very entertaining and easy to understand, being a psychologist whose desire is to make a computer operate like a human brain, he's more interested in how the brain actually works, than hacking ML code.

Hinton describes backprop, why he invented it, and exactly how it emulates the way the human brain works.

Hinton now works at microsoft, he is considered the modern day 'godfather' of DeepLearning/ML

eli_gottlieb · on Sept 27, 2016

>You will find his lectures to be very entertaining and easy to understand, being a psychologist whose desire is to make a computer operate like a human brain, he's more interested in how the brain actually works, than hacking ML code. > >Hinton describes backprop, why he invented it, and exactly how it emulates the way the human brain works.

Of course, basically no actual neuroscientists or cognitive scientists think the brain actually works via supervised backpropagation. So he actually has a bit of a holy war going on with the people who properly work on human learning rather than machine learning.

MrFeynmannsJoke · on Sept 26, 2016

So ist works just like i thought it would. Why are CNN so hyped? Wasnt all this already known decades ago? Or is it just because we can afford the computing power?

dougabug · on Sept 26, 2016

The basic CNN structure was in place, but as the saying goes, "The Devil's in the details." Early CNN's were applied to problems such as handwritten character recognition with rows of small grayscale image cells as inputs, and were much shallower, smaller models. Today's CNN's operate on full resolution, multi-channel images and video, and can be orders of magnitudes deeper and larger. For instance, ResNets have been proven to demonstrate monotonic performance improvements out to 1200 layers on benchmark datasets. This would have been unthinkable even a couple years ago. By way of comparison, even the state of the art VGG network architecture of a couple years ago originally had to be trained in stages to reach 16 and 19 layers for submission to ILSVRC 2014 (Xavier / MSRA initialization makes this unnecessary now). At the time, VGG and GoogleNet (22 layers) were considered to be extraordinarily deep CNN's.

bbctol · on Sept 26, 2016

The underlying math was figured out a long time ago, but it's only been in recent years that we've had the computing power to test these out on lots of complicated, real-world classification problems, and had some incredible success.

zackmorris · on Sept 26, 2016

I argued back in 2000 (a year after I got my computer engineering degree) that AI wouldn't take off until computing moved from single threaded/single core to multithreaded/ multicore processing. The fact that we are only hearing about this stuff 15 years later makes me feel that that assertion was largely right.

The biggest problem I see in AI is that the algorithms are generally fairly straightforward, but people haven't had the computing power to explore the problem space. We are seeing drastic improvement in things like video cards (routinely 1000+ cores) and data processing locality (map reduce). But processors have stagnated.

If we really want AI in any reasonable timescale, we need large arrays of general-purpose cores with a sane communication protocol that doesn't fixate on things like caching, we need a hybrid between Go and Erlang to do concurrent functional programming in a readable way with automagic scaling over a network, and we need all this yesterday. The fancy schmancy AI algorithms will become apparent when processing power is no longer the primary limitation, and at that point we can optimize them.

gcp · on Sept 26, 2016

Computing power, but also some implementation tricks that turn out to make things a lot better.

For example, activation via ReLU instead of sigmoid/tanh significantly improves the performance of deep neural networks.

Then there's stuff like BatchNorm, Pooling, Dropout etc...

zwieback · on Sept 26, 2016

Decades ago I played around with neural nets but was frustrated because I either had to preprocess and normalize my inputs to the point where I didn't need a network anymore or I had to train a large network with so much data that it was not practical.

Having a cookbook approach with a catchy name and orders of magnitude more processing power have revived neural nets and now they are finally doing something useful.

Now everyone is jumping on the bandwagon so the field is progressing very quickly. Just because it's hyped doesn't mean it's not worth giving it a second look (although I'm still on the sidelines myself.)

nkozyra · on Sept 26, 2016

1. Computing power 2. Data availability 3. Fast, large local storage

on Sept 27, 2016

[dead]

visarga · on Sept 27, 2016

> I suspect that BIG-OIL, and BIG NSA of today have stuff that is super good and advanced and most of what they leak to GIT HUB is just garbage

I don't think it works that way now. What I see is timely publishing of papers, code and sometimes, data. It's more advantageous to cooperate.

The bottleneck is not caused by algorithms, but expert knowledge on their fine-tuning and correct application. We have lots of algorithms already, and more are published. They are not "garbage", if used properly.

nstj · on Sept 26, 2016

Did someone say burritos?