MambaPainter (2024)

Abstract

Stroke-based rendering aims to reconstruct an input image into an oil painting style by predicting brush stroke sequences. Conventional methods perform this prediction stroke-by-stroke or require multiple inference steps due to the limitations of a predictable number of strokes. This procedure leads to inefficient translation speed, limiting their practicality. In this study, we propose MambaPainter, capable of predicting a sequence of over 100 brush strokes in a single inference step, resulting in rapid translation. We achieve this sequence prediction by incorporating the selective state-space model. Additionally, we introduce a simple extension to patch-based rendering, which we use to translate high-resolution images, improving the visual quality with a minimal increase in computational cost. Experimental results demonstrate that MambaPainter can efficiently translate inputs to oil painting-style images compared to state-of-the-art methods. The codes are available at this URL.

Demo

Source Code

https://github.com/STomoya/MambaPainter

Reference

Tomoya Sawada and Marie Katsurai, “MambaPainter: Neural Stroke-Based Rendering in a Single Step,” in SIGGRAPH Asia 2024 Posters (SA ’24), Association for Computing Machinery, New York, NY, USA, Article 98, 1–2, 2024. DOI: 10.1145/3681756.3697906.

Super-Deformation of Character Faces (2024)

Abstract

Super-deformation in character design refers to a simplified modeling of character illustrations that are drawn in detail. Such super-deformation requires both texture and geometrical translation. However, directly adopting conventional image-to-image translation methods for super-deformation is challenging as these methods use a pixel-wise loss which makes the translated images highly dependent on the spatial information of the input image. This study proposes a novel deep architecture-based method for the super-deformation of illustrated character faces using an unpaired dataset of detailed and super-deformed character face images collected from the Internet. First, we created a dataset construction pipeline based on image classification and character face detection using deep learning. Then, we designed a generative adversarial network (GAN) that was trained using two discriminators, each for detailed and super-deformed images, and a single generator, capable of synthesizing identical pairs of characters with different textural and geometrical appearance. As ornaments are an important element in character identification, we further introduced ornament augmentation to enable the generator to synthesize a variety of ornaments on the generated character faces. Finally, we constructed a loss function to project character illustrations provided by the user to the learned GAN latent space, which can find an identical super-deformed version. The experimental results show that compared to baseline methods, the proposed method can successfully translate character illustrations to identical super-deformed versions. The codes are available on the Internet.

Demo

The following visualization shows how the source image (left) is gradually translated into the target domain (right).

Source Code

https://github.com/STomoya/ChibiGAN

Reference

Tomoya Sawada, Marie Katsurai, “Illustrated Character Face Super-Deformation via Unsupervised Image-to-Image Translation,” Multimedia Systems 30, 63, 2024. DOI: 10.1007/s00530-023-01255-y.

Map Image Classification (2020)

Abstract

Map images have been published around the world. The management of map data, however, has been an open issue for several research fields, This paper explores an approach for classifying diverse map images by their themes using map content features. Specifically, we present a novel strategy for preprocessing text data that are positioned inside the map images, which are extracted using OCR. The activation of the textual feature-based model is joint with the visual features in an early fusion manner. Finally, we train a classifier model which predicts the belonging class of the input map. We have made our dataset available here to facilitate this new task.

Download

Reference

If you use our dataset, please refer to the following paper.

T. Sawada and M. Katsurai, “A Deep Multimodal Approach for Map Image Classification,” in 2020 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 4457–4461, 2020.

Emoji Sentiment Lexicon (2017)

Abstract

Emojis have been frequently used to express users’ sentiments, emotions, and feelings in text-based communication. We presented a simple and efficient method for automatically constructing an emoji sentiment lexicon with arbitrary sentiment categories. The proposed method extracts sentiment words from WordNet-Affect and calculates the co-occurrence frequency between the sentiment words and each emoji. Based on the ratio of the number of occurrences of each emoji among the sentiment categories, each emoji is assigned a multidimensional vector whose elements indicate the strength of the corresponding sentiment. In experiments conducted on a collection of tweets, we showed a high correlation between the conventional lexicon and our lexicon for three sentiment categories. We also showed the results for a new lexicon constructed with additional sentiment categories.

Download

Reference

If you use our lexicons, please refer to the following paper.

M. Kimura and M. Katsurai, “Automatic Construction of an Emoji Sentiment Lexicon,” Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2017, pp. 1033–1036, 2017.

Image Sentiment Analysis (2016)

Abstract

We propose a novel approach that exploits latent correlations among multiple views: visual and textual views, and a sentiment view constructed using SentiWordNet. In the proposed method, we find a latent embedding space in which correlations among the three views are maximized. The projected features in the latent space are used to train a sentiment classifier, which considers the complementary information from different views.

Dataset description

To evaluate the performance of image sentiment classification, we collected a set of images from Flickr and Instagram, and then prepared their sentiment labels via crowdsourcing. For each image, three workers were asked to provide a sentiment score. They could choose on a discrete five-point scale labeled with “highly positive,” “positive,” “neutral,” “negative,” and “highly negative.” The datasets with sentiment labels (the number of users for each sentiment polarity) are available. that we divided the whole dataset into three batches for download.

Download

Flickr dataset

Instagram dataset

Reference

If you use our dataset, please refer to the following paper.

M. Katsurai and S. Satoh, “Image Sentiment Analysis Using Latent CorrelationsAmong Visual, Textual, and Sentiment Views,” in 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2837–2841, 2016.