We propose a novel approach that exploits latent correlations among multiple views: visual and textual views, and a sentiment view constructed using SentiWordNet. In the proposed method, we find a latent embedding space in which correlations among the three views are maximized. The projected features in the latent space are used to train a sentiment classifier, which considers the complementary information from different views.
To evaluate the performance of image sentiment classification, we collected a set of images from Flickr and Instagram, and then prepared their sentiment labels via crowdsourcing. For each image, three workers were asked to provide a sentiment score. They could choose on a discrete five-point scale labeled with "highly positive," "positive," "neutral," "negative," and "highly negative." The datasets with sentiment labels (the number of users for each sentiment polarity) are available. that we divided the whole dataset into three batches for download.
If you use our dataset, please refer to the following paper.
M. Katsurai and S. Satoh, "Image Sentiment Analysis Using Latent CorrelationsAmong Visual, Textual, and Sentiment Views," in 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2837–2841, 2016.