Text to Image Synthesis

This repository hosts all the code related to my diploma thesis, titled "Text to Image Synthesis Using GANs", for my degree in Electrical and Computer Engineering at the National Technical University of Athens. The main focus of this thesis is to propose a novel architecture which will be able to generate high-resolution images conditioned on a given text description. In addition, we examine the impact of different text representations on the quality of generated images from well-known models.

Abstract

The problem of text-to-image synthesis is a research area that combines the fields of Computer Vision and Natural Language Processing. The goal is to create a model which, given a text description, generates images. These images must not only be realistic but also contain visual details that match the aforementioned text description.

The emergence of Generative Adversarial Networks (GANs) marked a period of significant pro-gress in this direction. The systems that have been proposed can generate high-resolution images that match their corresponding text description using a variety of techniques. Stacked GANs probably constitute the most important development in this direction. Existing models generate an initial image of low quality, which passes through a number of sketch-refinement processing stages in order to generate the high-resolution image.

In this diploma dissertation, we propose a novel architecture (TeleGAN) to generate high-resolution images. In particular, we use the Stacked GANs structure, with three stages, in order to decompose the difficult problem of generating images of high quality into more manageable sub-problems. More specifically, the network of the first stage generates a black and white image of 128x128 resolution. At the second stage, colors are added to the image of the first stage. Finally, at the third and last stage, the image of the second stage is enhanced to high resolution (256x256).

In addition, we examine the impact of different text representations, produced by char-CNN-RNN, GPT-2 and RoBERTa language models, on the quality of generated images from gan-int-cls and StackGAN models on Oxford-102 and CUB datasets. We also train these networks on the Flickr8k dataset and produce results.

Author

Thanos Masouris (ThanosM97)

Name		Name	Last commit message	Last commit date
Latest commit History 75 Commits
StackGAN		StackGAN
StackGANv2		StackGANv2
TeleGAN		TeleGAN
gan-int-cls		gan-int-cls
README.md		README.md
config.json		config.json
hdf5_converter.py		hdf5_converter.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

StackGAN

StackGAN

StackGANv2

StackGANv2

TeleGAN

TeleGAN

gan-int-cls

gan-int-cls

README.md

README.md

config.json

config.json

hdf5_converter.py

hdf5_converter.py

Repository files navigation

Text to Image Synthesis

Abstract

Author

About

Languages

ails-lab/teleGAN

Folders and files

Latest commit

History

Repository files navigation

Text to Image Synthesis

Abstract

Author

About

Topics

Resources

Stars

Watchers

Forks

Languages