In this post, we will explain how to change the learning rate in each iteration or batch in the training process of a Keras model using R language.

Change the learning rate (lr) in each epoch is usually the most common usage, this can be done easily with callback_learning_rate_scheduler() if you are using the Keras package for R with the backend of Tensorflow, and can be efficiently implemented in the fitting process.

On the other hand, to change lr in each iteration, although it is easy to do, it’s not so obvious and could be difficult to find some documentation. The idea of changing the learning rate in each iteration comes from the results obtained in the paper Cyclical Learning Rates for Training Neural Networks (paper) published in 2015, the authors called this method as training with cyclical learning rates.

In this post, we will explain the necessary code for changing the learning rate on each iteration, and also how to log the loss and the accuracy of every batch. We must remember that there are near sample training/batch size iterations on each epoch, so changing the lr in each iteration becomes a more dynamic training process than changing it on each epoch.

In the next post, we will have a look if there are benefits using this technique.

Loading mnist data

First of all we will need to install and load the keras package. Please visit R interface to Keras for installing instructions.

keras package comes with some examples datasets, we will use the mnist dataset that is a handwritten digit database formed for the training set by 60000 grayscale images of 28×28 pixels of the 10 number digits, and 10000 images for the test set.

In the code below we will load the dataset into train and test objects.

Below you can see an example of the handwritten digit images and its associated labels.

mnist handwritten digit images examples

First of all, because we will train a basic neural network (with only dense/fully connected layers) we will reshape the 28×28 pixel images into vectors. We will not cover the details about that, you can learn more at Deep Learning with R – Github (François Chollet, J.J. Allaire) chapter: 2.1 – A first look at a neural network.

Cyclic learning rate function

The next step is to create the learning rate mathematical function. For this example, we will use a cyclic learning rate that follows the wave of a sinus with a exponential decay.

We will use in the training, 25 epochs, 128 batch size, and a training sample size of 60000. This results in more than 11718.75 iterations [25(60000/128)]. Below you will find the plot of our **cyclic learning rate* example function.

cyclic learning rate

Callback for changing learnig rate

Next, we will define our callbacks functions. In keras a callback is a function or a set of functions that can be applied at given stages of the training procedure (before/end of training/epoch/batch).

We will define three functions:

  • callback_lr_init: set some global variables to the initial stage
  • callback_lr_set: changes the learning rate for each iteration according to the clr() function
  • callback_lr_log: logs the learning rate of the model (is for verification only)

These functions must be embedded into the callback_lambda() function as you will find in the code below.

Callback for logging metrics

The next code will log the accuracy and the loss of the training set for each iteration. It’s usefull in our case, because keras nativelly will only offer this statistics on each epoch. So implementing this piece of code you will have more control how the cyclic lr works on each iteration.

Keras model training

Next, we will set up a simple keras model with 2 fully connected layers that output ten units, one for each digit to evaluate.

The code below is where the callback functions are implemented. Note the callbacks=list(callback_lr,callback_logger,callback_log_acc), where we are telling keras to execute the callback functions on each step (mini-batch iteration) of the training process. We will train 25 epochs with a batch size of 128 samples.

Although the purpose of this post is to demonstrate how to implement the cyclic learning rate on each batch iteration code, I will show you the evolution of the training process below for each epoch.

training result

Results for each iteration

For verification only, we can plot the learning rate used by the model plotting the lr_hist vector.

cyclic lr

Finally, we can plot the accuracy and loss for the training dataset for each iteration.

Train accuracy each iteration

Train accuracy each iteration

Train loss each iteration

Train loss each iteration

I hope you find it useful, in the next post we will try to squeeze some of the benefits of this new tool to improve the performance of neural networks.

Session Info:

Appendix, all the code:

Share it!:

Leave a Reply

Your email address will not be published. Required fields are marked *