Chapter 4 Deep Learning

In recent years there has been a lot of hype about Deep Learning (DL)
Deep Neural Networks are Neural Networks with many hidden layers
Several heuristics are often used in DL:
- Dropout. Some connections are ignored during learning: regularization
- ReLU units: (avoid gradient vanishing)
- Transfer learning: use weights already trained with different datasets (and maybe fine-tune training with your database)
DL includes some novel architectures
- Convolutional Neural Networks (CNN): images
- Long Short Term Memory (LSTM): time series
Improvements outside Machine Learning theory
- Hardware: GPUs
- Software: e.g. tensorflow, H2O (using keras as interface), fast.ai, torch, etc.
- Funding: Netflix, Google, Facebook…

4.1 Regression with deep Neural Networks

This is the task in Chapter 10.9 of An Introduction to Statistical Learning. The code is from the R torch version.

### Lab: Deep Learning

## In this version of the Ch10 lab, we  use the `luz` package, which interfaces to the
## `torch` package which in turn links to efficient
## `C++` code in the LibTorch library.

## This version of the lab was produced by Daniel Falbel and Sigrid
## Keydana, both data scientists at Rstudio where these packages were
## produced.

## An advantage over our original `keras` implementation is that this
## version does not require a separate `python` installation.

## Single Layer Network on Hitters Data


###
library(ISLR2)
Gitters <- na.omit(Hitters)
n <- nrow(Gitters)
set.seed(13)
ntest <- trunc(n / 3)
testid <- sample(1:n, ntest)
###

###
lfit <- lm(Salary ~ ., data = Gitters[-testid, ])
lpred <- predict(lfit, Gitters[testid, ])
with(Gitters[testid, ], mean(abs(lpred - Salary)))
###


###
x <- scale(model.matrix(Salary ~ . - 1, data = Gitters))
y <- Gitters$Salary
###



###
library(torch)
library(luz) # high-level interface for torch
library(torchvision) # for datasets and image transformation
library(torchdatasets) # for datasets we are going to use
library(zeallot)
torch_manual_seed(13)
###

###
modnn <- nn_module(
    initialize = function(input_size) {
        self$hidden <- nn_linear(input_size, 50)
        self$activation <- nn_relu()
        self$dropout <- nn_dropout(0.4)
        self$output <- nn_linear(50, 1)
    },
    forward = function(x) {
        x %>%
            self$hidden() %>%
            self$activation() %>%
            self$dropout() %>%
            self$output()
    }
)
###

###
modnn <- modnn %>%
    setup(
        loss = nn_mse_loss(),
        optimizer = optim_rmsprop,
        metrics = list(luz_metric_mae())
    ) %>%
    set_hparams(input_size = ncol(x))
###

###
fitted <- modnn %>%
    fit(
        data = list(x[-testid, ], matrix(y[-testid], ncol = 1)),
        valid_data = list(x[testid, ], matrix(y[testid], ncol = 1)),
        epochs = 20
    )
###

###
plot(fitted)
###


###
npred <- predict(fitted, x[testid, ])
mean(abs(y[testid] - npred))
###

4.2 Generative Networks

Generative Models produce new data with the some underlying probability distribution of observed data
Generative Models are Unsupervised Learning techniques
Generative Adversarial Networks use Supervised Learning (regression and classification) to build an unsupervised generative model

By Zhang, Aston and Lipton, Zachary C. and Li, Mu and Smola, Alexander J. - https://github.com/d2l-ai/d2l-en, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=152265649

This code is obtained from RGAN

library(torch)
library(RGAN)

# Sample some toy data to play with.
data <- sample_toydata()

# Transform (here standardize) the data to facilitate learning.
# First, create a new data transformer.
transformer <- data_transformer$new()

# Fit the transformer to your data.
transformer$fit(data)

# Use the fitted transformer to transform your data.
transformed_data <- transformer$transform(data)

# Have a look at the transformed data.
par(mfrow = c(3, 2))
# Margins!!
par(mar=c(1,1,1,1))
plot(
    transformed_data,
    bty = "n",
    col = viridis::viridis(2, alpha = 0.7)[1],
    pch = 19,
    xlab = "Var 1",
    ylab = "Var 2",
    main = "The Real Data",
    las = 1
)

# No cuda device!!
device <- "cpu"

# Now train the GAN and observe some intermediate results.
res <-
    gan_trainer(
        transformed_data,
        eval_dropout = TRUE,
        plot_progress = TRUE,
        plot_interval = 600,
        device = device
    )