MambaPainter (2024)

Abstract

Stroke-based rendering aims to reconstruct an input image into an oil painting style by predicting brush stroke sequences. Conventional methods perform this prediction stroke-by-stroke or require multiple inference steps due to the limitations of a predictable number of strokes. This procedure leads to inefficient translation speed, limiting their practicality. In this study, we propose MambaPainter, capable of predicting a sequence of over 100 brush strokes in a single inference step, resulting in rapid translation. We achieve this sequence prediction by incorporating the selective state-space model. Additionally, we introduce a simple extension to patch-based rendering, which we use to translate high-resolution images, improving the visual quality with a minimal increase in computational cost. Experimental results demonstrate that MambaPainter can efficiently translate inputs to oil painting-style images compared to state-of-the-art methods. The codes are available at this URL.

Demo

Source Code

https://github.com/STomoya/MambaPainter

Reference

Tomoya Sawada and Marie Katsurai, “MambaPainter: Neural Stroke-Based Rendering in a Single Step,” in SIGGRAPH Asia 2024 Posters (SA ’24), Association for Computing Machinery, New York, NY, USA, Article 98, 1–2, 2024. DOI: 10.1145/3681756.3697906.

Super-Deformation of Character Faces (2024)

Abstract

Super-deformation in character design refers to a simplified modeling of character illustrations that are drawn in detail. Such super-deformation requires both texture and geometrical translation. However, directly adopting conventional image-to-image translation methods for super-deformation is challenging as these methods use a pixel-wise loss which makes the translated images highly dependent on the spatial information of the input image. This study proposes a novel deep architecture-based method for the super-deformation of illustrated character faces using an unpaired dataset of detailed and super-deformed character face images collected from the Internet. First, we created a dataset construction pipeline based on image classification and character face detection using deep learning. Then, we designed a generative adversarial network (GAN) that was trained using two discriminators, each for detailed and super-deformed images, and a single generator, capable of synthesizing identical pairs of characters with different textural and geometrical appearance. As ornaments are an important element in character identification, we further introduced ornament augmentation to enable the generator to synthesize a variety of ornaments on the generated character faces. Finally, we constructed a loss function to project character illustrations provided by the user to the learned GAN latent space, which can find an identical super-deformed version. The experimental results show that compared to baseline methods, the proposed method can successfully translate character illustrations to identical super-deformed versions. The codes are available on the Internet.

Demo

The following visualization shows how the source image (left) is gradually translated into the target domain (right).

Source Code

https://github.com/STomoya/ChibiGAN

Reference

Tomoya Sawada, Marie Katsurai, “Illustrated Character Face Super-Deformation via Unsupervised Image-to-Image Translation,” Multimedia Systems 30, 63, 2024. DOI: 10.1007/s00530-023-01255-y.

SolutionTailor (2022)

Abstract

We develop SolutionTailor, a novel system that recommends papers that provide diverse solutions for a specific research objective. The proposed system does not require any prior information from a user; it only requires the user to specify the target research field and enter a research abstract representing the user’s interests. Our approach uses a neural language model to divide abstract sentences into “Background/Objective” and “Methodologies” and defines a new similarity measure between papers. Our current experiments indicate that the proposed system can recommend literature in a specific objective beyond a query paper’s citations compared with a baseline system.

Demo video

Download

Reference

If you use our dataset, please refer to the following paper.

Tetsuya Takahashi and Marie Katsurai, “SolutionTailor: Scientific paper recommendation based on fine-grained abstract analysis,” Advances in Information Retrieval. ECIR 2022. Lecture Notes in Computer Science, vol 13186. Springer, Cham, 2022.

Venue Relevance Finder (2021)

Abstract

We present a novel tool that finds the relevance between publication venues to foster opportunities for collaboration development. When a user inputs a publication venue name related to the user’s research field, our tool first shows several relevant publication venues using results of citation network analysis. After the user selects one of those, our tool shows the trend information for each venue as well as the common keywords between the two venues.

Demo video

Reference

If you use our lexicons, please refer to the following paper.

M. Kimura and M. Katsurai, “Automatic Construction of an Emoji Sentiment Lexicon,” Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2017, pp. 1033–1036, 2017.

Map Image Classification (2020)

Abstract

Map images have been published around the world. The management of map data, however, has been an open issue for several research fields, This paper explores an approach for classifying diverse map images by their themes using map content features. Specifically, we present a novel strategy for preprocessing text data that are positioned inside the map images, which are extracted using OCR. The activation of the textual feature-based model is joint with the visual features in an early fusion manner. Finally, we train a classifier model which predicts the belonging class of the input map. We have made our dataset available here to facilitate this new task.

Download

Reference

If you use our dataset, please refer to the following paper.

T. Sawada and M. Katsurai, “A Deep Multimodal Approach for Map Image Classification,” in 2020 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 4457–4461, 2020.

TrendNets (2019)

Abstract

Mapping the knowledge structure from word co-occurrences in a collection of academic papers has been widely used to provide insight into the topic evolution in an arbitrary research field. TrendNets is a novel visualization approach that highlights the rapid changes in edge weights over time. Specifically, we formulated a new convex optimization framework that decomposes the matrix constructed from dynamic co-word networks into a smooth part and a sparse part: the former represents stationary research topics, while the latter corresponds to bursty research topics.

Download

Reference

If you use our code, please refer to the following paper.

M. Katsurai and S. Ono, “Mapping Emerging Research Trends from Dynamic Co-Word Networks via Sparse Representations,”Scientometrics, vol. 121, no. 3, pp. 1583–1598, 2019.

Emoji Sentiment Lexicon (2017)

Abstract

Emojis have been frequently used to express users’ sentiments, emotions, and feelings in text-based communication. We presented a simple and efficient method for automatically constructing an emoji sentiment lexicon with arbitrary sentiment categories. The proposed method extracts sentiment words from WordNet-Affect and calculates the co-occurrence frequency between the sentiment words and each emoji. Based on the ratio of the number of occurrences of each emoji among the sentiment categories, each emoji is assigned a multidimensional vector whose elements indicate the strength of the corresponding sentiment. In experiments conducted on a collection of tweets, we showed a high correlation between the conventional lexicon and our lexicon for three sentiment categories. We also showed the results for a new lexicon constructed with additional sentiment categories.

Download

Reference

If you use our lexicons, please refer to the following paper.

M. Kimura and M. Katsurai, “Automatic Construction of an Emoji Sentiment Lexicon,” Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2017, pp. 1033–1036, 2017.

Image Sentiment Analysis (2016)

Abstract

We propose a novel approach that exploits latent correlations among multiple views: visual and textual views, and a sentiment view constructed using SentiWordNet. In the proposed method, we find a latent embedding space in which correlations among the three views are maximized. The projected features in the latent space are used to train a sentiment classifier, which considers the complementary information from different views.

Dataset description

To evaluate the performance of image sentiment classification, we collected a set of images from Flickr and Instagram, and then prepared their sentiment labels via crowdsourcing. For each image, three workers were asked to provide a sentiment score. They could choose on a discrete five-point scale labeled with “highly positive,” “positive,” “neutral,” “negative,” and “highly negative.” The datasets with sentiment labels (the number of users for each sentiment polarity) are available. that we divided the whole dataset into three batches for download.

Download

Flickr dataset

Instagram dataset

Reference

If you use our dataset, please refer to the following paper.

M. Katsurai and S. Satoh, “Image Sentiment Analysis Using Latent CorrelationsAmong Visual, Textual, and Sentiment Views,” in 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2837–2841, 2016.