Chapter 4 Deep Learning
- In recent years there has been a lot of hype about Deep Learning (DL)
- Deep Neural Networks are Neural Networks with many hidden layers
- Several heuristics are often used in DL:
- Dropout. Some connections are ignored during learning: regularization
- ReLU units: (avoid gradient vanishing)
- Transfer learning: use weights already trained with different datasets (and maybe fine-tune training with your database)
- DL includes some novel architectures
- Convolutional Neural Networks (CNN): images
- Long Short Term Memory (LSTM): time series
- Improvements outside Machine Learning theory
- Hardware: GPUs
- Software: e.g. tensorflow, H2O (using keras as interface), fast.ai, torch, etc.
- Funding: Netflix, Google, Facebook…
4.1 Regression with deep Neural Networks
This is the task in Chapter 10.9 of An Introduction to Statistical Learning. The code is from the R torch version.
### Lab: Deep Learning
## In this version of the Ch10 lab, we use the `luz` package, which interfaces to the
## `torch` package which in turn links to efficient
## `C++` code in the LibTorch library.
## This version of the lab was produced by Daniel Falbel and Sigrid
## Keydana, both data scientists at Rstudio where these packages were
## produced.
## An advantage over our original `keras` implementation is that this
## version does not require a separate `python` installation.
## Single Layer Network on Hitters Data
###
library(ISLR2)
Gitters <- na.omit(Hitters)
n <- nrow(Gitters)
set.seed(13)
ntest <- trunc(n / 3)
testid <- sample(1:n, ntest)
###
###
lfit <- lm(Salary ~ ., data = Gitters[-testid, ])
lpred <- predict(lfit, Gitters[testid, ])
with(Gitters[testid, ], mean(abs(lpred - Salary)))
###
###
x <- scale(model.matrix(Salary ~ . - 1, data = Gitters))
y <- Gitters$Salary
###
###
library(torch)
library(luz) # high-level interface for torch
library(torchvision) # for datasets and image transformation
library(torchdatasets) # for datasets we are going to use
library(zeallot)
torch_manual_seed(13)
###
###
modnn <- nn_module(
initialize = function(input_size) {
self$hidden <- nn_linear(input_size, 50)
self$activation <- nn_relu()
self$dropout <- nn_dropout(0.4)
self$output <- nn_linear(50, 1)
},
forward = function(x) {
x %>%
self$hidden() %>%
self$activation() %>%
self$dropout() %>%
self$output()
}
)
###
###
modnn <- modnn %>%
setup(
loss = nn_mse_loss(),
optimizer = optim_rmsprop,
metrics = list(luz_metric_mae())
) %>%
set_hparams(input_size = ncol(x))
###
###
fitted <- modnn %>%
fit(
data = list(x[-testid, ], matrix(y[-testid], ncol = 1)),
valid_data = list(x[testid, ], matrix(y[testid], ncol = 1)),
epochs = 20
)
###
###
plot(fitted)
###
###
npred <- predict(fitted, x[testid, ])
mean(abs(y[testid] - npred))
###
4.2 Generative Networks
- Generative Models produce new data with the some underlying probability distribution of observed data
- Generative Models are Unsupervised Learning techniques
- Generative Adversarial Networks use Supervised Learning (regression and classification) to build an unsupervised generative model
By Zhang, Aston and Lipton, Zachary C. and Li, Mu and Smola, Alexander J. - https://github.com/d2l-ai/d2l-en, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=152265649
This code is obtained from RGAN
library(torch)
library(RGAN)
# Sample some toy data to play with.
data <- sample_toydata()
# Transform (here standardize) the data to facilitate learning.
# First, create a new data transformer.
transformer <- data_transformer$new()
# Fit the transformer to your data.
transformer$fit(data)
# Use the fitted transformer to transform your data.
transformed_data <- transformer$transform(data)
# Have a look at the transformed data.
par(mfrow = c(3, 2))
# Margins!!
par(mar=c(1,1,1,1))
plot(
transformed_data,
bty = "n",
col = viridis::viridis(2, alpha = 0.7)[1],
pch = 19,
xlab = "Var 1",
ylab = "Var 2",
main = "The Real Data",
las = 1
)
# No cuda device!!
device <- "cpu"
# Now train the GAN and observe some intermediate results.
res <-
gan_trainer(
transformed_data,
eval_dropout = TRUE,
plot_progress = TRUE,
plot_interval = 600,
device = device
)