In this post, we will explain how to change the learning rate in each iteration or batch in the training process of a Keras model using R language.

Change the learning rate (lr) in each epoch is usually the most common usage, this can be done easily with callback_learning_rate_scheduler() if you are using the Keras package for R with the backend of Tensorflow, and can be efficiently implemented in the fitting process.

On the other hand, to change lr in each iteration, although it is easy to do, it’s not so obvious and could be difficult to find some documentation. The idea of changing the learning rate in each iteration comes from the results obtained in the paper Cyclical Learning Rates for Training Neural Networks (paper) published in 2015, the authors called this method as training with cyclical learning rates.

In this post, we will explain the necessary code for changing the learning rate on each iteration, and also how to log the loss and the accuracy of every batch. We must remember that there are near sample training/batch size iterations on each epoch, so changing the lr in each iteration becomes a more dynamic training process than changing it on each epoch.

In the next post, we will have a look if there are benefits using this technique.

Loading mnist data

First of all we will need to install and load the keras package. Please visit R interface to Keras for installing instructions.

keras package comes with some examples datasets, we will use the mnist dataset that is a handwritten digit database formed for the training set by 60000 grayscale images of 28×28 pixels of the 10 number digits, and 10000 images for the test set.

In the code below we will load the dataset into train and test objects.

library(keras)
use_session_with_seed(1)

mnist <- dataset_mnist()
train_images <- mnist$train$x
train_labels <- mnist$train$y
test_images <- mnist$test$x
test_labels <- mnist$test$y

library(keras)

use_session_with_seed(1)

mnist <- dataset_mnist()

train_images <- mnist$train$x

train_labels <- mnist$train$y

test_images <- mnist$test$x

test_labels <- mnist$test$y

Below you can see an example of the handwritten digit images and its associated labels.

library(imager)
IM <- list()
for(i in 1:(20*40)) IM[[i]] <- as.cimg(abs(t(train_images[i,,])-255))
par(mfrow=c(20,40), mar=c(0,0,0,0))
for(i in 1:(20*40)){
      plot(IM[[i]], axes=FALSE)
      text(3,5,train_labels[i], cex=1, pos=4, col="red", offset=0)
}

library(imager)

IM <- list()

for(i in 1:(20*40)) IM[[i]] <- as.cimg(abs(t(train_images[i,,])-255))

par(mfrow=c(20,40), mar=c(0,0,0,0))

for(i in 1:(20*40)){

plot(IM[[i]], axes=FALSE)

text(3,5,train_labels[i], cex=1, pos=4, col="red", offset=0)

}

First of all, because we will train a basic neural network (with only dense/fully connected layers) we will reshape the 28×28 pixel images into vectors. We will not cover the details about that, you can learn more at Deep Learning with R – Github (François Chollet, J.J. Allaire) chapter: 2.1 – A first look at a neural network.

train_images <- array_reshape(train_images, c(60000, 28*28))
train_images <- train_images/255

test_images <- array_reshape(test_images, c(10000, 28*28))
test_images <- test_images/255

train_labels <- to_categorical(train_labels)
test_labels <- to_categorical(test_labels)

train_images <- array_reshape(train_images, c(60000, 28*28))

train_images <- train_images/255

test_images <- array_reshape(test_images, c(10000, 28*28))

test_images <- test_images/255

train_labels <- to_categorical(train_labels)

test_labels <- to_categorical(test_labels)

Cyclic learning rate function

The next step is to create the learning rate mathematical function. For this example, we will use a cyclic learning rate that follows the wave of a sinus with a exponential decay.

We will use in the training, 25 epochs, 128 batch size, and a training sample size of 60000. This results in more than 11718.75 iterations [25(60000/128)]. Below you will find the plot of our **cyclic learning rate* example function.

clr <- function(x) (sin(x/300)+1)*exp(-x/10000)/100
plot(clr(1:(25*(60000/128))), cex=0.1, xlab="iteration", ylab="learning rate")

clr <- function(x) (sin(x/300)+1)*exp(-x/10000)/100

plot(clr(1:(25*(60000/128))), cex=0.1, xlab="iteration", ylab="learning rate")

Callback for changing learnig rate

Next, we will define our callbacks functions. In keras a callback is a function or a set of functions that can be applied at given stages of the training procedure (before/end of training/epoch/batch).

We will define three functions:

callback_lr_init: set some global variables to the initial stage
callback_lr_set: changes the learning rate for each iteration according to the clr() function
callback_lr_log: logs the learning rate of the model (is for verification only)

callback_lr_init <- function(x){
      iter <<- 0
      lr_hist <<- c()
}
callback_lr_set <- function(batch, logs){
      iter <<- iter + 1
      learning_r <- clr(iter)
      k_set_value(model$optimizer$lr, learning_r)
}
callback_lr_log <- function(batch, logs){
      # k_get_value(): https://keras.rstudio.com/articles/backend.html#backend-functions
      lr_hist <<- c(lr_hist, k_get_value(model$optimizer$lr))
      #cat(paste0("iter: ",iter," - lr: ",k_get_value(model$optimizer$lr),"\n"))
}

callback_lr_init <- function(x){

iter <<- 0

lr_hist <<- c()

}

callback_lr_set <- function(batch, logs){

iter <<- iter + 1

learning_r <- clr(iter)

k_set_value(model$optimizer$lr, learning_r)

}

callback_lr_log <- function(batch, logs){

# k_get_value(): https://keras.rstudio.com/articles/backend.html#backend-functions

lr_hist <<- c(lr_hist, k_get_value(model$optimizer$lr))

#cat(paste0("iter: ",iter," - lr: ",k_get_value(model$optimizer$lr),"\n"))

}

These functions must be embedded into the callback_lambda() function as you will find in the code below.

callback_lr <- callback_lambda(on_train_begin=callback_lr_init, on_batch_begin=callback_lr_set)
callback_logger <- callback_lambda(on_batch_begin=callback_lr_log)

callback_lr <- callback_lambda(on_train_begin=callback_lr_init, on_batch_begin=callback_lr_set)

callback_logger <- callback_lambda(on_batch_begin=callback_lr_log)

Callback for logging metrics

The next code will log the accuracy and the loss of the training set for each iteration. It’s usefull in our case, because keras nativelly will only offer this statistics on each epoch. So implementing this piece of code you will have more control how the cyclic lr works on each iteration.

LogMetrics <- R6::R6Class("LogMetrics",
  inherit = KerasCallback,
  public = list(
    loss = NULL,
    acc = NULL,
    on_batch_end = function(batch, logs=list()) {
      self$loss <- c(self$loss, logs[["loss"]])
      self$acc <- c(self$acc, logs[["acc"]])
    }
))

# initializing "LogMetrics" object
callback_log_acc <- LogMetrics$new()

LogMetrics <- R6::R6Class("LogMetrics",

inherit = KerasCallback,

public = list(

loss = NULL,

acc = NULL,

on_batch_end = function(batch, logs=list()) {

self$loss <- c(self$loss, logs[["loss"]])

self$acc <- c(self$acc, logs[["acc"]])

}

))

# initializing "LogMetrics" object

callback_log_acc <- LogMetrics$new()

Keras model training

Next, we will set up a simple keras model with 2 fully connected layers that output ten units, one for each digit to evaluate.

model <- keras_model_sequential() %>% 
  layer_dense(units=512, activation="relu", input_shape=c(28*28)) %>%
  layer_dense(units=10, activation="softmax")

model %>% compile(
  optimizer=optimizer_rmsprop(),
  loss="categorical_crossentropy",
  metrics=c("accuracy")
)

model <- keras_model_sequential() %>%

layer_dense(units=512, activation="relu", input_shape=c(28*28)) %>%

layer_dense(units=10, activation="softmax")

model %>% compile(

optimizer=optimizer_rmsprop(),

loss="categorical_crossentropy",

metrics=c("accuracy")

)

The code below is where the callback functions are implemented. Note the callbacks=list(callback_lr,callback_logger,callback_log_acc), where we are telling keras to execute the callback functions on each step (mini-batch iteration) of the training process. We will train 25 epochs with a batch size of 128 samples.

# start trainning
history <- model %>% fit(train_images, train_labels, epochs=25, batch_size=128,
                              validation_data=list(test_images, test_labels),
                              callbacks=list(callback_lr,callback_logger,callback_log_acc), verbose=2)

# start trainning

history <- model %>% fit(train_images, train_labels, epochs=25, batch_size=128,

validation_data=list(test_images, test_labels),

callbacks=list(callback_lr,callback_logger,callback_log_acc), verbose=2)

Although the purpose of this post is to demonstrate how to implement the cyclic learning rate on each batch iteration code, I will show you the evolution of the training process below for each epoch.

plot(history, theme_bw=getOption("keras.plot.history.theme_bw", TRUE))

1 2	plot(history, theme_bw=getOption("keras.plot.history.theme_bw", TRUE))

Results for each iteration

For verification only, we can plot the learning rate used by the model plotting the lr_hist vector.

# number of total iterarions
NROW(lr_hist)
## [1] 11725
plot(lr_hist, type="l", cex=0.2, xlab="iterations")

# number of total iterarions

NROW(lr_hist)

## [1] 11725

plot(lr_hist, type="l", cex=0.2, xlab="iterations")

Finally, we can plot the accuracy and loss for the training dataset for each iteration.

plot(callback_log_acc$acc, type="l", cex=0.2, xlab="iterations", ylab="train accuracy", ylim=c(0.8, 1))

1 2	plot(callback_log_acc$acc, type="l", cex=0.2, xlab="iterations", ylab="train accuracy", ylim=c(0.8, 1))

library(zoo)
plot(rollmean(callback_log_acc$acc,25), type="l", cex=0.2, xlab="iterations", ylab="train accuracy: rollmean(25)", ylim=c(0.8, 1))

library(zoo)

plot(rollmean(callback_log_acc$acc,25), type="l", cex=0.2, xlab="iterations", ylab="train accuracy: rollmean(25)", ylim=c(0.8, 1))

plot(callback_log_acc$loss, type="l", cex=0.2, xlab="iterations", ylab="train loss", ylim=c(0, 1))

1 2	plot(callback_log_acc$loss, type="l", cex=0.2, xlab="iterations", ylab="train loss", ylim=c(0, 1))

plot(rollmean(callback_log_acc$loss,25), type="l", cex=0.2, xlab="iterations", ylab="train loss: rollmean(25)", ylim=c(0, 1))

1 2	plot(rollmean(callback_log_acc$loss,25), type="l", cex=0.2, xlab="iterations", ylab="train loss: rollmean(25)", ylim=c(0, 1))

I hope you find it useful, in the next post we will try to squeeze some of the benefits of this new tool to improve the performance of neural networks.

Session Info:

------------------------------------
Total R execution time:  8.4 mins 
------------------------------------
 setting  value                       
 version  R version 3.4.3 (2017-11-30)
 os       macOS High Sierra 10.13.6   
 system   x86_64, darwin15.6.0        
 ui       RStudio                     
 language (EN)                        
 collate  es_ES.UTF-8                 
 tz       Europe/Madrid               
 date     2018-12-27                  

------------------------------------
Packages:
 [1] "bitops - 1.0-6 - 2013-08-17 - CRAN (R 3.4.0)"                          
 [2] "imager - 0.40.2 - 2017-04-24 - CRAN (R 3.4.0)"                         
 [3] "keras - 2.2.0 - 2018-08-24 - CRAN (R 3.4.4)"                           
 [4] "knitr - 1.20 - 2018-02-20 - CRAN (R 3.4.3)"                            
 [5] "magrittr - 1.5 - 2014-11-22 - CRAN (R 3.4.0)"                          
 [6] "plyr - 1.8.4 - 2016-06-08 - CRAN (R 3.4.0)"                            
 [7] "RCurl - 1.95-4.11 - 2018-07-15 - cran (@1.95-4.)"                      
 [8] "reshape2 - 1.4.2 - 2016-10-22 - CRAN (R 3.4.0)"                        
 [9] "RWordPress - 0.2-3 - 2018-03-04 - Github (duncantl/RWordPress@ce6d2d6)"
[10] "sessioninfo - 1.0.0 - 2017-06-21 - CRAN (R 3.4.1)"                     
[11] "stringr - 1.2.0 - 2017-02-18 - CRAN (R 3.4.0)"                         
[12] "XMLRPC - 0.3-1 - 2018-08-17 - Github (duncantl/XMLRPC@add9496)"        
[13] "zoo - 1.8-0 - 2017-04-12 - CRAN (R 3.4.0)"

------------------------------------

Total R execution time: 8.4 mins

------------------------------------

setting value

version R version 3.4.3 (2017-11-30)

os macOS High Sierra 10.13.6

system x86_64, darwin15.6.0

ui RStudio

language (EN)

collate es_ES.UTF-8

tz Europe/Madrid

date 2018-12-27

------------------------------------

Packages:

[1] "bitops - 1.0-6 - 2013-08-17 - CRAN (R 3.4.0)"

[2] "imager - 0.40.2 - 2017-04-24 - CRAN (R 3.4.0)"

[3] "keras - 2.2.0 - 2018-08-24 - CRAN (R 3.4.4)"

[4] "knitr - 1.20 - 2018-02-20 - CRAN (R 3.4.3)"

[5] "magrittr - 1.5 - 2014-11-22 - CRAN (R 3.4.0)"

[6] "plyr - 1.8.4 - 2016-06-08 - CRAN (R 3.4.0)"

[7] "RCurl - 1.95-4.11 - 2018-07-15 - cran (@1.95-4.)"

[8] "reshape2 - 1.4.2 - 2016-10-22 - CRAN (R 3.4.0)"

[9] "RWordPress - 0.2-3 - 2018-03-04 - Github (duncantl/RWordPress@ce6d2d6)"

[10] "sessioninfo - 1.0.0 - 2017-06-21 - CRAN (R 3.4.1)"

[11] "stringr - 1.2.0 - 2017-02-18 - CRAN (R 3.4.0)"

[12] "XMLRPC - 0.3-1 - 2018-08-17 - Github (duncantl/XMLRPC@add9496)"

[13] "zoo - 1.8-0 - 2017-04-12 - CRAN (R 3.4.0)"

Appendix, all the code:

library(keras)
use_session_with_seed(1)

mnist <- dataset_mnist()
train_images <- mnist$train$x
train_labels <- mnist$train$y
test_images <- mnist$test$x
test_labels <- mnist$test$y
library(imager)
IM <- list()
for (i in 1:(20 * 40)) IM[[i]] <- as.cimg(abs(t(train_images[i, , ]) - 255))
par(mfrow = c(20, 40), mar = c(0, 0, 0, 0))
for (i in 1:(20 * 40)) {
    plot(IM[[i]], axes = FALSE)
    text(3, 5, train_labels[i], cex = 1, pos = 4, col = "red", offset = 0)
}
train_images <- array_reshape(train_images, c(60000, 28 * 28))
train_images <- train_images/255

test_images <- array_reshape(test_images, c(10000, 28 * 28))
test_images <- test_images/255

train_labels <- to_categorical(train_labels)
test_labels <- to_categorical(test_labels)
clr <- function(x) (sin(x/300) + 1) * exp(-x/10000)/100
plot(clr(1:(25 * (60000/128))), cex = 0.1, xlab = "iteration", ylab = "learning rate")
callback_lr_init <- function(x) {
    iter <<- 0
    lr_hist <<- c()
}
callback_lr_set <- function(batch, logs) {
    iter <<- iter + 1
    learning_r <- clr(iter)
    k_set_value(model$optimizer$lr, learning_r)
}
callback_lr_log <- function(batch, logs) {
    # k_get_value():
    # https://keras.rstudio.com/articles/backend.html#backend-functions
    lr_hist <<- c(lr_hist, k_get_value(model$optimizer$lr))
    # cat(paste0('iter: ',iter,' - lr: ',k_get_value(model$optimizer$lr),'\n'))
}
callback_lr <- callback_lambda(on_train_begin = callback_lr_init, on_batch_begin = callback_lr_set)
callback_logger <- callback_lambda(on_batch_begin = callback_lr_log)
LogMetrics <- R6::R6Class("LogMetrics", inherit = KerasCallback, public = list(loss = NULL, 
    acc = NULL, on_batch_end = function(batch, logs = list()) {
        self$loss <- c(self$loss, logs[["loss"]])
        self$acc <- c(self$acc, logs[["acc"]])
    }))

# initializing 'LogMetrics' object
callback_log_acc <- LogMetrics$new()
model <- keras_model_sequential() %>% layer_dense(units = 512, activation = "relu", 
    input_shape = c(28 * 28)) %>% layer_dense(units = 10, activation = "softmax")

model %>% compile(optimizer = optimizer_rmsprop(), loss = "categorical_crossentropy", 
    metrics = c("accuracy"))
# start trainning
history <- model %>% fit(train_images, train_labels, epochs = 25, batch_size = 128, 
    validation_data = list(test_images, test_labels), callbacks = list(callback_lr, 
        callback_logger, callback_log_acc), verbose = 2)
plot(history, theme_bw = getOption("keras.plot.history.theme_bw", TRUE))
# number of total iterarions
NROW(lr_hist)
plot(lr_hist, type = "l", cex = 0.2, xlab = "iterations")
plot(callback_log_acc$acc, type = "l", cex = 0.2, xlab = "iterations", ylab = "train accuracy", 
    ylim = c(0.8, 1))
library(zoo)
plot(rollmean(callback_log_acc$acc, 25), type = "l", cex = 0.2, xlab = "iterations", 
    ylab = "train accuracy: rollmean(25)", ylim = c(0.8, 1))
plot(callback_log_acc$loss, type = "l", cex = 0.2, xlab = "iterations", ylab = "train loss", 
    ylim = c(0, 1))
plot(rollmean(callback_log_acc$loss, 25), type = "l", cex = 0.2, xlab = "iterations", 
    ylab = "train loss: rollmean(25)", ylim = c(0, 1))

library(keras)

use_session_with_seed(1)

mnist <- dataset_mnist()

train_images <- mnist$train$x

train_labels <- mnist$train$y

test_images <- mnist$test$x

test_labels <- mnist$test$y

library(imager)

IM <- list()

for (i in 1:(20 * 40)) IM[[i]] <- as.cimg(abs(t(train_images[i, , ]) - 255))

par(mfrow = c(20, 40), mar = c(0, 0, 0, 0))

for (i in 1:(20 * 40)) {

plot(IM[[i]], axes = FALSE)

text(3, 5, train_labels[i], cex = 1, pos = 4, col = "red", offset = 0)

}

train_images <- array_reshape(train_images, c(60000, 28 * 28))

train_images <- train_images/255

test_images <- array_reshape(test_images, c(10000, 28 * 28))

test_images <- test_images/255

train_labels <- to_categorical(train_labels)

test_labels <- to_categorical(test_labels)

clr <- function(x) (sin(x/300) + 1) * exp(-x/10000)/100

plot(clr(1:(25 * (60000/128))), cex = 0.1, xlab = "iteration", ylab = "learning rate")

callback_lr_init <- function(x) {

iter <<- 0

lr_hist <<- c()

}

callback_lr_set <- function(batch, logs) {

iter <<- iter + 1

learning_r <- clr(iter)

k_set_value(model$optimizer$lr, learning_r)

}

callback_lr_log <- function(batch, logs) {

# k_get_value():

# https://keras.rstudio.com/articles/backend.html#backend-functions

lr_hist <<- c(lr_hist, k_get_value(model$optimizer$lr))

# cat(paste0('iter: ',iter,' - lr: ',k_get_value(model$optimizer$lr),'\n'))

}

callback_lr <- callback_lambda(on_train_begin = callback_lr_init, on_batch_begin = callback_lr_set)

callback_logger <- callback_lambda(on_batch_begin = callback_lr_log)

LogMetrics <- R6::R6Class("LogMetrics", inherit = KerasCallback, public = list(loss = NULL,

acc = NULL, on_batch_end = function(batch, logs = list()) {

self$loss <- c(self$loss, logs[["loss"]])

self$acc <- c(self$acc, logs[["acc"]])

}))

# initializing 'LogMetrics' object

callback_log_acc <- LogMetrics$new()

model <- keras_model_sequential() %>% layer_dense(units = 512, activation = "relu",

input_shape = c(28 * 28)) %>% layer_dense(units = 10, activation = "softmax")

model %>% compile(optimizer = optimizer_rmsprop(), loss = "categorical_crossentropy",

metrics = c("accuracy"))

# start trainning

history <- model %>% fit(train_images, train_labels, epochs = 25, batch_size = 128,

validation_data = list(test_images, test_labels), callbacks = list(callback_lr,

callback_logger, callback_log_acc), verbose = 2)

plot(history, theme_bw = getOption("keras.plot.history.theme_bw", TRUE))

# number of total iterarions

NROW(lr_hist)

plot(lr_hist, type = "l", cex = 0.2, xlab = "iterations")

plot(callback_log_acc$acc, type = "l", cex = 0.2, xlab = "iterations", ylab = "train accuracy",

ylim = c(0.8, 1))

library(zoo)

plot(rollmean(callback_log_acc$acc, 25), type = "l", cex = 0.2, xlab = "iterations",

ylab = "train accuracy: rollmean(25)", ylim = c(0.8, 1))

plot(callback_log_acc$loss, type = "l", cex = 0.2, xlab = "iterations", ylab = "train loss",

ylim = c(0, 1))

plot(rollmean(callback_log_acc$loss, 25), type = "l", cex = 0.2, xlab = "iterations",

ylab = "train loss: rollmean(25)", ylim = c(0, 1))

Share it!:

a blog about data science

Changing learning rate during training on each batch/iteration (using callbacks in R keras)

Loading mnist data

Cyclic learning rate function

Callback for changing learnig rate

Callback for logging metrics

Keras model training

Results for each iteration

Leave a Reply Cancel reply