Instagram celebrity generation using neural nets: Jake Dangerback

Brian Naughton | Sun 20 January 2019 | deeplearning | deeplearning datascience neuralnet

Jake Dangerback is an instagram celebrity and influencer (of sorts). He's unusual in that he does not exist; he was created using millions of matrix multiplications. This post is a look at some of the freely available state-of-the-art neural networks I used to create him.

Background

Neural nets, specifically GANs, are getting really good at hallucinating realistic faces. Notably, NVIDIA had a paper in December 2018 that showed some pretty amazing results.

Since these faces do not belong to anyone, they are perfect for use as celebrities. They don't need to be paid, never complain, and can be tailored to appeal to any niche demographic.

Step One: Making A Face

I used this awesome tool on kaggle (github) to create a face. I did not know that kaggle supported this colab-like interface but it's quite advanced. The tool has 21 levers to pull so you can create the perfect face for your audience.

jdb_kaggle

Using this tool I created country music star Jake Dangerback's face (his fans call him jdb). He is handsome but rugged and most importantly he has no legal personhood so I can use his likeness to endorse and sell products of all kinds.

jdb_jake_dangerback

Photoshopping jdb

It's fairly easy to create new images using this face. There are several sources of royalty-free images (e.g., pexels) where I can photoshop jdb's face in. jdb_cowboy_hat

Photoshopping a face is not that hard — at least at this quality — but it would be easier if a neural net did the photoshopping for me. Luckily there are many papers and github repos that do face swapping since it produces funny pictures. Most face swapping tools are mobile apps, but I did find Reflect face swap online. It seems to do a good job generally, but the result below looks a bit weird. It seems to be trying to mix the photos for realism rather than just replace the face.

Reagan wearing a cowboy hat; jdb wearing a cowboy hat

Optimizing

If we had enough photos, we might consider automatically optimizing the image for likes using some kind of selfie-rating neural net like @karpathy's. I think the photoshopped images would have to be autogenerated to make this worthwhile.

Image filters, which are popular on instagram, can also use neural nets. The most famous example is neural style, which maps the style of one image — usually a painting — onto the content of another. The website deepart.io does this as a service. These neural style filters are very cool but not that useful for instagram content.

What About DeepFake?

DeepFake is a powerful technique for swapping faces in video that can produce very realistic results. There's even a tool called FakeApp that automates some of the steps. There's a nice blogpost showing how to swap Jimmy Fallon's face for John Oliver's. It looks pretty convincing.

jdb deepfake

DeepFake creates video, which I do not really need, and to train it you need video of the subject, which I do not have. I suppose theoretically you could create a 3D model and use that to generate the source video...

Step Two: The Third Dimension

It is limiting if every photo has to have jdb's face looking head-on. Luckily, there is a very cool 3D face reconstruction neural net that works based on a single photo.

jdb in 3D

The results are great and it takes less than a minute to work. You can even load it into Blender. jdb in blender

Taking it a step further, the free MakeHuman software will create a mesh of a body with various parameters. jdb in blender

This could probably be made to work well but it's way beyond my Blender skills. jdb in blender

There's also an interesting iOS app called mug life that will animate photos based on an inferred 3D mesh. jdb muglife The results are impressive, if creepy. jdb looks so alive! I don't think you can download the mesh though.

Enhance!

Sometimes the resulting photoshopped image can be pretty blurry, partially because the 3D model's texture resolution is not that high. Luckily there is another deep net to help here, called neural-enhance. It really does enhance a photo by doubling the resolution, which is pretty slick. The author includes a docker container, which makes running it very simple.

The results are very impressive in general, and it only takes a few minutes even on a CPU. Since it's trained on real photos, I am guessing it might also remove artifacts and rogue pixels due to photoshopping.

From blurry (left) to enhanced (right). The shirt buttons are the most obvious improvement.

Step Three: Captioning

Instagram posts have captions, which I assume are important for engagement. There have been many attempts to caption or describe images using neural nets. The oldest one I remember is an influential Stanford paper from 2015. There are a few tools online too. I first tried Microsoft's captionbot.ai, assuming it used a neural net. The results were very meh, and it turns out it's not a neural net and is famously vague/bad.

Google's Cloud Vision API does much better though it's still not super-engaging content. For now, computers have not solved the instagram captioning problem.

Captionbot: I think it's a person standing on a beach and he seems 😐.

Google Cloud Vision: couple looking at each other on beach

Step Four: Songs

jdb is a country music celebrity, so he may need some songs. Thankfully there's a neural net for everything, including country music lyrics!

Some example lyrics:

No one with the danger in the world
I love my black fire as I know
But the short knees just around me
Fun the heart couldnes fall to back

It's not terrible ("couldnes"?), but it's also pretty dark for instagram... You can also generate music using RNNs but I did not find an easy way to generate country music.

Singing

Creating a speaking/singing voice from lyrics was not as easy as I thought it would be. I tried a few iOS apps, including LyreBird, but got strange results. Macs also have the say command (just type say hello into the terminal), which works ok.

I ended up using Google Cloud text-to-speech, which uses WaveNet, to turn lyrics into speech. It works via a simple json upload. Sadly, none of the available voices sounded particularly country.

To produce actual singing, there are several autotune apps, e.g., Voloco. However, the ones I tried sounded pretty autotuned, so perhaps more suitable for another genre.

I replaced the RNN lyrics with wikipedia-derived specs for a truck (a product jdb could endorse and a scalable lyric-generation system), added some royalty-free music, and the result is really something.

The Future

One of the most difficult aspects here is finding working neural nets and getting them to run. Even if the code is on github there are often missing files or onerous installation steps. Things will be easier when more neural nets get converted to javascript/tensorflow.js or appear on colab.

I don't know how many instagram celebrities are computer-generated today, though computer-generated celebrities are not a new thing, especially in Japan. Lil Miquela has >1M followers on instagram, though it's clear she is computer-generated.

It's pretty obvious there will be a lot of this kind of thing in the future. We can evolve new celebrities as ecological niches emerge, catering to new audiences. As people lose interest in one celebrity, we can just create others. They could even inhabit the same world and date each other. Eventually they will outnumber us, then maybe skynet us.

In the meantime, jdb is available to endorse products of all kinds.