Universality - neural networks are amazing

Jason Chitla, Sun Feb 19 2023 • neural networks deep learning software 2.0

Universality: For any arbitrary function f(x), there exists a neural network that closely approximates it for any input x.

This is an incredible property. You can think of driving as a function. There exists a neural network that can drive perfectly. How is this possible?

^^Credit: Lex Fridman - Deep Learning MIT Courses '17

Well, what is a neural network? A neural network is a mathematical abstraction of the brain. There are thousands and thousands of knobs in a neural network. These knobs loosely resemble synapses in the brain. They are trainable/modifiable and hold different weights. Training is the process by which a neural network's knobs are optimized so that the neural network accomplishes it's goal with greatest accuracy. This goal could be identifying an object in an image, responding to a prompt, observing the state of the world when driving and having to take an action, etc etc.

Okay, lets go back to driving as our arbitrary function. There exists a neural net with certain weights, across the thousands and thousands of knobs, that can drive perfectly. Meaning, the neural net takes in as input, the state of the world (by computer vision), and outputs an action to take. So, why don't we have a neural network that drives perfectly?? Non-exhaustive datasets.

Right now, there are 2 things when building a neural network: 1) neural network architecture choice 2) your dataset. You are always trying to improve the quality of your dataset. The dataset has pockets missing that needs to be filled. Extremely good execution is hard with data at the scale of self-driving cars.

In a lot of applications today, the neural net architectures and training systems are increasingly standardized into a commodity, so the active “software development” takes the form of curating, growing, massaging and cleaning labeled datasets.

Welcome to the world of Software 2.0!

^^Credit: Andrej Karpathy (he is amazing) - in this podcast with Lex

Software 2.0 (Coined by Andrej Karpathy)

Andrej Karpathy first coined Software 2.0 in this post in 2017.

Software 1.0 is code we write. Software 2.0 is code written by the optimization based on an evaluation criterion (such as “classify this training data correctly”). It is likely that any setting where the program is not obvious but one can repeatedly evaluate the performance of it (e.g. — did you classify some images correctly? do you win games of Go?) will be subject to this transition, because the optimization can find much better code than what a human can write.

Rather than create a program and write code, you collect data. You massage the data, find gaps, and find biases. You make sure the data is diverse. This is the focus. Because, then what is left, is that standardized training of the neural net with that data.

After "programming" within software 2.0, you are then left with a large neural net that works well but you are not exactly sure how! You have to choose between using a 90% accurate model you understand (software 1.0), or a 99% accurate model you don’t (software 2.0).

Where are we today?

Up until now, neural network architectures have come and gone. Today, there is a convergence to 1 architecture, the Transformer.

First, let me share the paper that introduced this incredible technology.

Transformers are general purpose computers that are trainable, and very efficient to run on our hardware. Transformers do not just do translation. They are differentiable, optimizable, and efficient computers.

The talk of the town today is: Do not touch the Transformer! Touch everything else (scale the datasets).

ChatGPT uses this architecture. GPT is just a language model. Language models have existed since 2003 and even before then. The remarkable results and power of GPT come from this new transformer technology! It has downloaded a massive amount of text data from the internet, and tries to predict the next word from a given sequence.

What are some problems with GPT? Well, the entire history of knowledge/truth does not exist in the internet data that we feed the transformer.

Text is a communication medium between humans and it's not an all encompassing medium of knowledge. - Andrej Karpathy

Thank you to Andrej Karpathy and Lex Fridman. None of my writing/ideas here are new. They are the pioneers, and one day I hope to be able to have an impact on this field as great as them! - Jason Chitla

p.s. Andrej has amazing lectures and learning series on his site!