Chapter 4 Deep Learning

  • In recent years there has been a lot of hype about Deep Learning (DL)
  • Deep Neural Networks are Neural Networks with many hidden layers
  • Several heuristics are often used in DL:
    • Dropout. Some connections are ignored during learning: regularization
    • ReLU units: (avoid gradient vanishing)
    • Transfer learning: use weights already trained with different datasets (and maybe fine-tune training with your database)
  • DL includes some novel architectures
    • Convolutional Neural Networks (CNN): images
    • Long Short Term Memory (LSTM): time series
  • Improvements outside Machine Learning theory
    • Hardware: GPUs
    • Software: e.g. tensorflow, H2O (using keras as interface), fast.ai, torch, etc.
    • Funding: Netflix, Google, Facebook…

4.1 Regression with deep Neural Networks

This is the task in Chapter 10.9 of An Introduction to Statistical Learning. The code is from the R torch version.

### Lab: Deep Learning

## In this version of the Ch10 lab, we  use the `luz` package, which interfaces to the
## `torch` package which in turn links to efficient
## `C++` code in the LibTorch library.

## This version of the lab was produced by Daniel Falbel and Sigrid
## Keydana, both data scientists at Rstudio where these packages were
## produced.

## An advantage over our original `keras` implementation is that this
## version does not require a separate `python` installation.

## Single Layer Network on Hitters Data


###
library(ISLR2)
Gitters <- na.omit(Hitters)
n <- nrow(Gitters)
set.seed(13)
ntest <- trunc(n / 3)
testid <- sample(1:n, ntest)
###

###
lfit <- lm(Salary ~ ., data = Gitters[-testid, ])
lpred <- predict(lfit, Gitters[testid, ])
with(Gitters[testid, ], mean(abs(lpred - Salary)))
###


###
x <- scale(model.matrix(Salary ~ . - 1, data = Gitters))
y <- Gitters$Salary
###



###
library(torch)
library(luz) # high-level interface for torch
library(torchvision) # for datasets and image transformation
library(torchdatasets) # for datasets we are going to use
library(zeallot)
torch_manual_seed(13)
###

###
modnn <- nn_module(
    initialize = function(input_size) {
        self$hidden <- nn_linear(input_size, 50)
        self$activation <- nn_relu()
        self$dropout <- nn_dropout(0.4)
        self$output <- nn_linear(50, 1)
    },
    forward = function(x) {
        x %>%
            self$hidden() %>%
            self$activation() %>%
            self$dropout() %>%
            self$output()
    }
)
###

###
modnn <- modnn %>%
    setup(
        loss = nn_mse_loss(),
        optimizer = optim_rmsprop,
        metrics = list(luz_metric_mae())
    ) %>%
    set_hparams(input_size = ncol(x))
###

###
fitted <- modnn %>%
    fit(
        data = list(x[-testid, ], matrix(y[-testid], ncol = 1)),
        valid_data = list(x[testid, ], matrix(y[testid], ncol = 1)),
        epochs = 20
    )
###

###
plot(fitted)
###


###
npred <- predict(fitted, x[testid, ])
mean(abs(y[testid] - npred))
###

4.2 Generative Networks

  • Generative Models produce new data with the some underlying probability distribution of observed data
  • Generative Models are Unsupervised Learning techniques
  • Generative Adversarial Networks use Supervised Learning (regression and classification) to build an unsupervised generative model

By Zhang, Aston and Lipton, Zachary C. and Li, Mu and Smola, Alexander J. - https://github.com/d2l-ai/d2l-en, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=152265649

This code is obtained from RGAN

library(torch)
library(RGAN)

# Sample some toy data to play with.
data <- sample_toydata()

# Transform (here standardize) the data to facilitate learning.
# First, create a new data transformer.
transformer <- data_transformer$new()

# Fit the transformer to your data.
transformer$fit(data)

# Use the fitted transformer to transform your data.
transformed_data <- transformer$transform(data)

# Have a look at the transformed data.
par(mfrow = c(3, 2))
# Margins!!
par(mar=c(1,1,1,1))
plot(
    transformed_data,
    bty = "n",
    col = viridis::viridis(2, alpha = 0.7)[1],
    pch = 19,
    xlab = "Var 1",
    ylab = "Var 2",
    main = "The Real Data",
    las = 1
)

# No cuda device!!
device <- "cpu"

# Now train the GAN and observe some intermediate results.
res <-
    gan_trainer(
        transformed_data,
        eval_dropout = TRUE,
        plot_progress = TRUE,
        plot_interval = 600,
        device = device
    )