How to intuitively understand neural networks

A framework for thinking about neural networks, and supervised machine learning.

I was recently asked the question

How can I better understand neural networks?

There are two powerful things which enable neural networks to be useful.

  1. Universality
  2. Parameter optimization

These things may sound mystic to you now, but you already have an intuitive understanding of the concepts.

But before we get started, let's establish the basics.

A neural network is just a function.

Really, a neural network is just a function; like f(x)f(x).

This function depends on a number of parameters. This is also not anything special; you’ve probably seen something like


Here, aa and bb are parameters which change how the function behaves. With some fancy notation, we can write θ=[a,b]\theta = [a,b].

This again is not scary; it’s just a fancy way of writing both aa and bb using just a single symbol. (We must save the trees, right?)

We can now write fθ(x)f_\theta(x) which means we have a function ff which depends on the variable θ\theta and which runs over the independent variable xx.

Take a moment to appreciate that writing fθ(x)=θ1x+θ2f_\theta(x)=\theta_1 x + \theta_2 is still fundamentally no different from writing f(x)=ax+bf(x)=ax+b where θ1=a\theta_1 = a, and θ2=b\theta_2=b.

The reason people are interested in neural network functions is that they have some neat properties one of which being that if we find the right parameters θ\theta, a neural network can estimate any other function with arbitrary precision.[1]

This means that you can find a value for θ\theta such that our neural network is basically equal to f(x)=ax+bf(x)=ax+b or f(x)=f^(ξ)e2πiξxdξf(x) = \int_{-\infty}^\infty \hat f(\xi)\,e^{2 \pi i \xi x} \,d\xi, or anything else you might write.

This is really, really powerful!

With this general form fθ(x)f_\theta(x), you can do anything which you can describe as a function. And since everything is a function, this includes everything from finding finding an optimal route, to understanding the contents of an image or a book, and to knowing which movie you want to watch, or which dress you want to buy.


I can’t stress enough how powerful this universality is.

But we still have one huge problem.

How do we find θ\theta?

This is where the second powerful thing comes in.

In order to find the optimal value for θ\theta, we just need a way of expressing how bad or wrong our current estimate fθ(x)f_\theta(x) is, and then minimize that expression.

This too is really, really powerful.

We don’t need to know the solution beforehand, all we need is a way of assessing how good our current solution is, and then the computer automagically figures out an optimal solution.[2]

I don’t know how to describe how awesome that is. When I read that, my brain just goes ‘wow this is cheating’.

If you read through all of that, you now know much more about neural networks than people worrying about neural networks becoming sentient and taking over the world. You now understand why researchers are amused about comment such as those.

You know it’s silly that a function which is fundamentally no different from f(x)=ax+bf(x)=ax+b should suddenly become sentient.

If you want a mathematically more rigorous introduction which addresses all the “some conditions apply” notes, as well as exactly how neural network functions are constructed, and how we optimize θ\theta, you may want to read Introduction to Neural Networks.

  1. Some conditions apply. ↩︎

  2. Some conditions apply. ↩︎

continue reading

What Is A Neural Network?
Overview and introduction to feed forward neural networks. Forward propagation is discussed in detail, and we see how we might train a network.