Can auto-encoders generate images?

Gunjan Chhablani
5 min readMay 17, 2020

Long back, while reading about generative models and deep learning, I came across auto-encoders. The first thought that came to my mind is whether I could use them to generate image data. I’m sure some of you have thought about this before. Hence, in this article, I want to provide empirical results and answer this question. This post contains basic PyTorch implementation of Auto-encoders and Variational auto-encoders. I also check the ability of both these techniques to generate images similar to the FashionMNIST dataset. People interested in reading a brief article on various kinds of auto-encoders should read Lilian Weng’s blog article.

Auto-encoders

Auto-encoders are models that are used to get an encoding for any form of data. They are made up of two kinds of networks — Encoder and Decoder.

The purpose of the encoder part is to convert the input to a reduced dimension latent space that can contain enough information. It should contain enough information to be able to reconstruct the input using the decoder network.

Auto-encoder architecture (Image Source: Lilian Weng’s Article)

The aim of the auto-encoder is to learn a good enough representation of the data while keeping the dimensions lesser than the input data. Thus, a bottle-neck layer is required in the middle to learn good representations, and also to prevent learning of the same features as in the input data.

The PyTorch implementation of auto-encoders is very simple, and the only main consideration is the shape of the neural network architecture. The bottleneck layer should have a lesser number of features than the input data and the output data.

Basic AutoEncoder Code

The __init__ function of the auto-encoder contains the encoder and decoder layers. The network isn’t very deep as it is only here for a simple comparison between auto-encoders and variational-autoencoders.

This module can be trained on the DataLoader object which can be easily created on the FashionMNIST dataset.

Auto-encoder Training

After running the model for 100 epochs on the training set of FashionMNIST data, the reconstruction loss (MSELoss in this case) was very less. The decoder results on a batch of the images are shown along with the original batch.

Reconstruction using Auto-Encoders

To be able to ‘generate’ images, we would have to provide a random image to the decoder part of the auto-encoder network. Then, we will be able to plot the images.

Autoencoder Generation

Finally, the results were as follows :

Auto-Encoder Generation Results

We see that none of the images have a very good-resemblance with any of the actual images. And, it is hard to see any particular class in the ‘generated’ images.

Variational Auto-encoders

The variational auto-encoders are the models which attempt to learn a latent distribution of the input data, instead of a latent representation. In order to do so, mean and variance are learned for a multivariate Gaussian distribution, and the gap between a normal distribution and the learned distribution is reduced. An important term introduced in the loss functions for this is the KL-Divergence Loss, between the learned distribution and the normal distribution. Of course, there is a lot of math involved in the derivation of the loss term, but it is out of scope for this article.

Variational Auto-Encoder (Image Source: Lilian Weng’s Article)

I will be using a very simple implementation of Variational Auto-encoders, which uses the KL Divergence Loss similar to that discussed in this StackOverflow post.

Variational Auto-Encoder Basic

We see that the KL-Divergence loss takes two separate ‘Linear’ layers from the input, and then a randomly sampled example is used to form a vector z. This vector is then passed through the decoder to generate the example.

The training is done in a similar way to the auto-encoders, except the KL divergence loss is added to the loss term :

Variational-AutoEncoder Training

There are complex, and probably better, variational auto-encoder implementations out there, but I used this one here for its simplicity.

We check if the variational auto-encoder does a good job at the reconstruction of the images or not. The results are shown below :

We see that it does well for certain images, while for others it does not work well. We can also note that given a pant image, it generates a particular type of pant, instead of the same pant. This is in contrast with auto-encoders, which are able to reconstruct all the images very well. Hence, it would not be a good idea to use Variational Auto-encoder over Auto-encoders for a reconstruction task.

However, the images generated by the Variational Auto-encoder are much better than Auto-encoders.

To generate images, we need to pass the random samples through the whole Variational Auto-encoder and not just the decoder because the linear layers which learn the mean and the sigma values are present in between. Hence, to generate an image, we need to go through the encoder, to the mean and variance, to the sample, and then to the decoder.

Generate Images On VAE

The generated images are shown below:

Variational Auto-Encoder Results

We see that even though there is an ‘overlap’ of various objects in the images, some of the shoe and pants images are almost perfect. We can also see the model is biased towards the shoes and pants, but it should be noted that the number of epochs, the number of neurons in the hidden layer, the type of network (fully connected/convolutional/recurrent) also affects the generation of images. Also, the same architecture was provided to the auto-encoder, but it doesn’t perform as well.

In conclusion, the answer is — No, auto-encoders are poor at generating images.

Hope you had fun reading this short article. Please feel free to comment or provide me with feedback.

--

--

Gunjan Chhablani

Applications Engineer @ Oracle. Interested in ML/DL/CV/NLP research.