Different Strokes for Robot Folks Transformer-Based Image Generator Imitates Painters

Published

Oct 20, 2021

Reading time

2 min read

A neural network can make a photo resemble a painting via neural style transfer, but it can also learn to reproduce an image by applying brush strokes. A new method taught a system this painterly skill without any training data.

What’s new: Songhua Liu, Tianwei Lin, and colleagues at Baidu, Nanjing University, and Rutgers developed Paint Transformer, which learned to render pictures as paintings by reproducing paintings it generated randomly during training.

Key insight: A human painter generally starts with the background and adds details on top of it. A model can mimic this process by generating background strokes, generating further strokes over the top, and learning to reproduce these results. Dividing the resulting artwork into smaller pieces can enable the model to render finer details. Moreover, learning to reproduce randomly generated strokes is good training for reproducing non-random graphics like photos.

How it works: Paint Transformer paints eight strokes at a time. During training, it randomly generates an eight-stroke background and adds an eight-stroke foreground. Then it learns to minimize the difference between the background-plus-foreground image and its own work after adding eight strokes to the background.

During training, separate convolutional neural networks generated representations of background and background-plus-foreground paintings.
A transformer accepted the representations and computed the position, shape, and color of eight strokes required to minimize the difference between them.
The transformer sent those parameters to a linear model, the stroke renderer, which transformed a generic image of a stroke accordingly and laid the strokes over the background.
The system combined two loss terms: (a) the difference between the pixels in the randomly generated background-plus-foreground and the system’s output and (b) the difference between the randomly generated stroke parameters and those computed by the transformer.
At inference, it minimized the difference between a photo and a blank canvas by adding eight strokes to the blank canvas. Then it divided the photo and canvas into quadrants and repeated the process for each quadrant, repeating the cycle four times. Finally, it assembled the output subdivisions into a finished painting.

Results: Qualitatively, Paint Transformer used fewer and bolder strokes than an optimization method, while a reinforcement learning approach produced output that looked overly similar to the input. Quantitatively, Paint Transformer trained faster than RL (3.79 hours versus 40 hours) and took less time at inference than either alternative (0.30 seconds versus 0.32 seconds for RL and 521.45 seconds for optimization).

Why it matters: The system learned to paint without seeing any existing paintings, eliminating the need for matched pairs of photos and paintings, never mind tens of thousands or millions of them. This kind of approach might bear fruit in art forms from photo editing to 3D modeling.

We’re thinking: Hook this thing up to a robot holding a brush! We want to see what its output looks like in oils or acrylics.

Subscribe to The Batch