Mike Heavers

Training drawings on how to be photos with Machine Learning

In this tutorial we'll use the massive library of sketches from Google's Quickdraw databse and train them on photos from the Internet Archive, turning drawings of fish to fish-photo-hybrids.

TAGS: machine learning, tensorflow, imagemagick, montage, pix2pix, quickdraw

06/04/2018

In a previous post I talked about how to use ImageMagick to convert images into a format that can be used by Pix2Pix, a Tensorflow-based Machine Learning algorithm that can be trained to render photos of one style into photos based on another style, for example, grayscale images to color images. Black and white to color images with Pix2Pix

My first attempt at machine learning was to use images from Google’s Quick Draw Database, an open-source library of over 50 million crowd-sourced sketches of everyday objects:

Google Quickdraw Database

and try and convert them to illustrations of a style I found on the Internet Archive Flickr Set:

Internet Archive Flickr Set

The quick draw database can be found here, and you can search for the dataset you want (in my case fish). Download it, and you’ll see you have a very large ndjson file, where rather than images, we have an array of coordinates representing each brush stroke a person took to make their drawing:

json data for a quickdraw image

But what we want are images. I found a way to visualize this data by using the ndjson-cli tool from D3 creator Mike Bostock (found here), and some D3 sample code by Ian Johnson. Since D3 only understands json and not njdson, we first need to convert it. We do that by installing the njdson-cli , and converting our saved quickdraw dataset:


            npm install ndjson-cli -g
            cat [file].ndjson | ndjson-filter 'd.recognized == true | ndjson-reduce > [output].json

where [file] is the name of the quickdraw file you downloaded, and [output] is the name of the converted file you want saved to your hard drive.

For my purposes, I didn’t want a huge dataset, so I just took the first few hundred results from the json file, and deleted the rest.

Next, we want to visualize what we saved in the browser. You can copy the code here:

and replace the output.json with the name of your json file.

Then view it in your browser:

D3 Quickdraw Image Viewer

Now, to download this file as an image, we can just use our web inspector, and copy the SVG node into a blank file, and save it as [output.svg] (or whatever you want to call it.

copying svg code from chrome inspector

Now you can load this file in any program capable of viewing vector graphics, such as Illustrator:

Slicing images for use in Pix 2 Pix

You may need to do a few things to the file. My images were very small, and the stroke was very thick. So I had to select it, change the stroke to 1px, and then I resized the whole image so that each fish was about 256px in size, which is the size that Pix2Pix renders everything out as. Your images can be bigger, but it will just slow the processing time down — Pix2Pix (as far as I know) can only render out images at 256x256.

So now that we’ve done this, we need to slice this image up into individual images for each fish. So first I need to know how big each slice needs to be. In Illustrator, I went to the Settings > Guides > Grid and changed the Grid line every [blank] pixels until each grid square could accurately contain a single fish, without that drawing crossing over the gridline (in which case our drawing would be cropped off when we slice the image up.

Grid Line settings in Illustrator

Lastly, save the SVG as a GIF or JPG.

Now in order to programmatically slice up and save out all these images, we can use ImageMagick. Once it’s installed, run the following command from within the folder where you saved your image grid:

convert fish.gif -crop 30x19@ +repage +adjoin tiled/fish-%d.gif

Where fish.gif is the name of your file, 30x19 is the number of rows and columns in your image grid, tiled/ is the directory you want to save the new images into, and fish-%d.gif is the file name for each image (fish1.gif,fish2.gif, etc).

Once you’ve done this, you should resize the images. Paste this into the command line / terminal:


            for i in tiled/*.gif; do
                convert \
                    ${i} \
                    -resize 256x256 \
                    $(convert ${i} -format "sized/%t.%e" info:)
            done

where tiled/*.gif is the directory where you saved out all the images in the previous step, 256x256 is the image size, and sized/%t.%e is the directory you want the new, resized images saved into, with %t being the iterator (1,2,3,etc), and %e being the extension (.gif).

Now that we’ve got some images that we want to our machine learning model to convert, we need images to train our model on, so it knows what we want these drawings to end up looking like. I mentioned I’d be using the Internet Archive Flickr Set. If you head to that page, you can browse some really amazing illustrations of all sorts of things. I wanted to download some of the fish images, but I didn’t want to have to go one-by-one through them in flickr and manually download them. Luckily, there’s a number of Chrome Extensions which can automate image downloading.

The first one I could find that would actually work with Flickr was Image Downloader by Vlad Sabev.

Once it’s installed, you can browse the Internet Archive, and when you see some images you want, just click the extension icon, and you’ll be presented with a bunch of images you can download. Just select all the ones you want, and click download. Place these in a directory somewhere near the folder with all of our the drawings you just created in the previous steps.

Downloading images from Flickr with Image Downloader Chrome Extension

Now that we have these, we need to make them square as well, so that they can be processed by Pix2Pix. Again we can use ImageMagick.

In my case, all of my images were 640px wide, but varied in height. So I wanted to just increase the height to 640px on each image, but keep each image’s aspect ratio — sort of the equivalent of Photoshop’s Canvas Size functionality. In order to do that, I wanted to just fill the extra space created with the background color of the image. So first, for each image we need to sample the background color, as each image is different. This command will help with that:


            for i in *.jpg; do
            c=$(convert ${i} -format "%[pixel:p{1,1}]" info:)
            convert \
                ${i} \
                -gravity center \
                -background "$c" \
                -extent 640x640 \
                $(convert ${i} -format "fish-%t.%e" info:)
            done

where *.jpg assumes you are running the ImageMagick commands from the directory where you saved your Flickr images, c=$... will store the background color of each image in a variable called c , and pixel:p{1,1} tells ImageMagick that we want to get the color of the 1st pixel at the top left corner of our image.

Once we have that, we then run the convertcommand again, expanding the image to 640x640 to make our images square, and saving them out with a name of format fish-%t.%e (fish-1.jpg, fish-2.jpg, etc).

Now that these are squared, lets just size them down to 256x256 so they can be used in Pix2Pix:

for i in square/*.jpg; do


            convert \
                ${i} \
                -resize 256x256 \
                $(convert ${i} -format "sized/%t.%e" info:)
            done

where sized/ is the directory you want to save the resized images to, and square/ is the directory you ran the previous steps from.

Now we’ve got both source images and training images of size 256x256. Great! Pix2Pix will want us to combine these into a single side-by-side image, like so:

Combining drawings and photos for Pix2Pix processing

In order to achieve this, we can use ImageMagick’s montage command, which basically does the reverse of what we did early earlier — it takes sliced images, and combines them into a grid. For my part, I only saved out about 100 fish photos from the internet archive, so I’ll just create 100 montages:


            for i in {1..100}; do
                montage drawings/sized/fish-${i}.gif sized/fish-${i}.jpg -tile 2x1 -geometry 512x256+0+0 montage/fish-${i}.jpg
            done

where {1..100} is the range of items we want to loop through, drawings/sized/fish-${i}.gif refers to the directory holding our resized fish drawing images we created earlier, sized/fish-${i}.jpg refers to our resized fish illustrations from the Internet archive, tile 2x1 means that we want our images side by side (2 rows by 1 column), with a geometry 512x256+0+0 , meaning 256 pixels high, 512 pixels wide (256*2), with 0+0 padding around each image set (no horizontal or vertical padding). We’re saving everything out to the montage/ directory as fish-${i}.jpg (fish1.jpg, fish2.jpg, etc).

Once that’s done, we can finally train our model! You’ll need to download Pix2Pix, which has some very specific requirements. You can follow the Machine Learning for Artists Guide by Gene Kogan here for more details, but essentially you’ll need to:

Install CUDA by NVIDIA, which is a Computer Vision library for NVIDIA supported graphics cards
Install CUDNNN, a library for CUDA specifically designed for Deep Neural Nets.
Install Tensorflow, a Machine Learning library developed by engineers at Google.
Install Pix2Pix for tensorflow, which is a github repo that you can simple clone or download, and place into a directory.

Whew! Once that’s all up and running, we can train it on our model. You’ll want to make sure that your image folder containing all of the montage images we created is now inside the directory holding the pix2pix.py file from the github repo so that we can more easily reference it. From there, let’s start the training. Heads up, this will take awhile (around 3 hours for me for 100 images through 100 epochs on an older MacBook Pro).


            python pix2pix.py --mode train --input_dir fish/montage --output_dir fish/model --which_direction AtoB --max_epochs 100

Where fish/montage is the directory containing our montage images, fish/model is the directory where we want to save our trained model, and — which_direction AtoB means we’ll be converting the images on the left into images that look like the ones on the right. max_epochs means we’ll run each image through the model 100 times for training.

In reality, you’d probably want many more images for proper training, and more epochs as well, but this will provide us a good glimpse into the capabilities, and most likely give us some weird and fun results.

Tensorflow Model Training Progress

When the model is fully trained, we’ll want to prep some drawings from the quick draw database that we can through our model to see what it dreams up. To prep them, we’ll just need the drawings on the left, and instead of using training images on the right, we just want blank white space. I’ll grab the next batch of 100 images from my drawings directory, and use montage again, this time as follows:


            for i in {101..200}; do
                convert drawings/sized/fish-${i}.gif -gravity west -background white -type truecolor -extent 512x256 train/fish-${i}.jpg
            done

where 101-200 is the batch of images I want (since 1-100 were used in training my model, we won’t reuse them), drawings/sized/fish-${i}.gif represents the file location of each drawing (fish-1.gif, fish-2.gif, etc), gravity-west means we’re anchoring the fish on the left, and creating the white space on the right, truecolor converts the gif (a black and white indexed color image), into an RGB jpg, and train/fish-${i}.jpg is the directory we want to save our images out to.

Montage Images for Conversion in Pix2Pix

Now we can run these images through our model, and see what Pix2Pix comes up with:


                python pix2pix.py --mode test --input_dir fish/train --output_dir fish/gen --checkpoint fish/model

where, again, we’re in the directory containing pix2pix.py , and fish/train is where we saved all of our blank white montaged drawings out to, fish/gen is where we want to save our generated images, and fish/model is the directory containing our trained model from earlier.

A generated image set with the source image on the left, a reference photo at center, and the generated result on the right.

The result is predictably weird and pretty unpolished, but I think kind of cool. This was a fun first experiment for me, and I’m excited to play around more with Machine Learning and see what it’s capable of.

You can watch the video below as a companion to this article.

Mike Heavers

creative technologist | coder