Implementing Neural Networks in 16 lines of raw Julia

When it comes to building Neural Networks and Deep Learning models, Tensorflow and PyTorch are the de-facto standard. While everyday models are quickly implemented out-of-box, customized algorithms can, at times, result in quite verbose looking code.

This is owed, in parts, to the fact that Tensorflow and PyTorch both use Python merely as a ‘frontend’ to their lower-level APIs. Julia, on the other hand, promises performant differentiable Machine Learning, end-to-end, in a single language. This certainly raises hopes of cleaner codebases!

During preparations for an introductory Julia talk, I therefore tried to implement a simple feedforward Neural Network in as few lines as possible. Without blank lines, the code could be cut down even further to 14 lines. However, this would have reduced readability quite a lot. Here is the result:

Let us go through each component step-by-step. If you are unfamiliar with the internals of a feedforward Neural Network, you might find these wikipedia references useful.

Writing a feedforward layer as a Julia struct

This is the standard setup for a feedforward layer. We need a weight matrix, W, a bias vector, b, and an activation function. Next, we define the feedforward pass of the feedforward layer:

Julia allows us to call a function via an instantiated struct. This makes later usage of layer and network instances, to some extent, more elegant.

The implementation of the REctifier Linear Unit and identity activation functions should be self-explanatory.

Aggregating layers into a full feedforward Neural Network

The network itself is nothing more than a collection of its individual layers:

We use a Vararg argument in the constructor to allow for an arbitrary amount of layers. While we could just use a Vector in the constructor call, this approach saves us two brackets there :).

Finally, we want to define the global feedforward pass. Remember that this can be expressed as function composition over all layers:

NN(x)=(f_L\circ\cdots\circ f_1)(x)

Here, the f‘s denote the layer functions over L feedforward layers in total. We can write this in Julia as follows:

As our case of function composition is an associative operation, we can use the reduce() function. This allows us to make this crucial element another one-liner in Julia.

If you look closely, you’ll notice that we reverse the order of layer::Vector{Layer} during reduce(). While it is easier to reason about the network topology from left-to-right, the composition operation itself is performed from right-to-left.

Also, notice that we cannot pre-compute this composition if we want to optimize the network. Let us actually do that in order to verify that our implementation works as expected.

Training our Julia Neural Network on a toy example

As gradient calculation and optimizer implementation would be a larger task, we will use two Julia libraries to help with this step. For the toy dataset, let us just use a sine-function on an interpolated line in [-3.0,3.0]:

In summary, the above snippet performs the following steps:

  1. Create toy data and define a loss function
  2. Instantiate the Neural Network
  3. Extract the target parameters (network weights) from the network instance
  4. Train the model (i.e. optimize its parameters)

Inspecting the output

Finally, let us verify that everything we have done so far is indeed correct. If everything went well, our model should just have learnt to approximate the sine-function.

This produces the following plot:

In-sample fit and loss curve a Neural Network implemented in raw Julia
From a randomly initialized Neural Network to learning a sine-function. While the loss is rapidly decreasing towards zero, it took some time (empirically) until the network output somewhat resembles a smooth sine-function.

The output confirms that our implementation is, indeed, correct. We could take this example even further and implement e.g. regularization in a compact manner with Julia.

Great! Now can I just drop Tensorflow and PyTorch?

It definitely depends! On the one hand, Julia is a great language for writing fast Machine Learning algorithms in an efficient manner. On the other hand, Julia is still a relatively young language. Personally, I would not use Julia for any non-personal commercial product yet.

The following two issues that I have either experienced personally or seen in online discussion still prevent me from recommending Julia to my clients:

Rare sightings of erroneous gradients – while I have never encountered this issue myself during three years of regular Julia usage, these types of bugs are particularly blocking. This argument is, of course, only valid when it comes to differentiable Machine Learning models.

Deployability is challenging with JIT compilation – in general, Julia provides all the functionality that is necessary for successfully deploying a model or even complex applications. However, the warmup time of the JIT compiler makes it quite difficult to do so efficiently.

Every time you start a new Julia session, you basically have to wait for a minute or two until your code is somewhat useable. If you need your application to scale quickly to sudden spikes in usage, this can be a prohibitive issue.

Nevertheless, Julia is a fantastic language for doing research or creating prototypes of new algorithms. Once your Julia Neural Network or related model performs reasonably well, you can easily transfer it to PyTorch or Tensorflow via ONNX. From there, you are able to deploy it with the usual, proven toolset.

If you are open to some experimentation and can handle occasional quirks, you should definitely give Julia a try.

Leave a Reply

Your email address will not be published.

WordPress Theme by RichWP