Department

Mathematical Sciences

Author Type

Undergraduate Student

Submission Type

Poster

Start Date

27-7-2017 1:15 PM

End Date

27-7-2017 2:15 PM

Description

Generative Adversarial Text-to-Image Synthesis (Reed et al., 2016) is a model that can synthesize images based on given text – we have worked to try to apply to different data and to try to improve results seen in the original paper. The model performs two main tasks – it collects relevant information about the images to form a text feature representation of each of the images and it uses these learned text features to then synthesize images from given (new) text. To accomplish this, the model uses a DC-GAN (deep convolutional generative adversarial network) which has been conditioned on the text features coming from the visually-discriminative vector representations of the images that are assembled from the training data set. The features are then encoded by a hybrid character-level CRNN (convolutional recurrent neural network) to achieve feed-forward inference. There are several algorithms that can be used to produce synthesized images: GAN, GAN-CLS, GAN-INT, and GAN-INT-CLS. Each of those algorithms have varying results; their performance depends on the type of dataset used to train them, and the type of results you want to see from your input text.

Share

COinS
 
Jul 27th, 1:15 PM Jul 27th, 2:15 PM

Synthesizing Pictures From Text Using a DC-GAN

Generative Adversarial Text-to-Image Synthesis (Reed et al., 2016) is a model that can synthesize images based on given text – we have worked to try to apply to different data and to try to improve results seen in the original paper. The model performs two main tasks – it collects relevant information about the images to form a text feature representation of each of the images and it uses these learned text features to then synthesize images from given (new) text. To accomplish this, the model uses a DC-GAN (deep convolutional generative adversarial network) which has been conditioned on the text features coming from the visually-discriminative vector representations of the images that are assembled from the training data set. The features are then encoded by a hybrid character-level CRNN (convolutional recurrent neural network) to achieve feed-forward inference. There are several algorithms that can be used to produce synthesized images: GAN, GAN-CLS, GAN-INT, and GAN-INT-CLS. Each of those algorithms have varying results; their performance depends on the type of dataset used to train them, and the type of results you want to see from your input text.