The Freakout-Free Guide to Generative AI

I saw a bit of panic on my Twitter feed. Apparently, the recent progress of Computational Creativity (CC) and “AI Art” reached the bubble of professional artists, igniting fear, anger, and a good amount of drama.

That’s fair. We will not go anywhere by dismissing other people fears. But as someone in the field, I must write a comment about all this, trying to answer some common answers and misconceptions, and maybe easing some of those fears. I am writing this without in-depth research, so take this as the casual comment of an informed person. I will make another more in-depth comment later if there are more questions.

In the meantime, if you are in a hurry: 

  • Yes, AI art is reaching extremely good quality.
  • Yes, it will get exponentially better in the next ten years.
  • Yes, it will democratize and reshape the visual art field.
  • Yes, somebody could be burned.
  • NO, it is not time to despair.
  • AND NO, it will not replace human artists, and it may make their art more valuable.

So, what Computational Creativity?

In short, Computational Creativity is the algorithmic replication of any human creative output. All of them: architecture, music, stories, screenwriting, novels, and much more.

The story of CC starts in the late 50s with the creation of automated combinatorial poetry. However, in the last 15 years, the field has advanced so much that it has become general-purpose and competitive.

(To be honest, I think CC for Music is in better shape than the one for art. But probably generated visual art looks immediately more impressive to the general public.)

What is this “AI Art” thing?

AI Art is a subset of Computational Creativity.

If you are here, you probably saw “AI art” pieces floating around the web. You may have noted that some are almost indistinguishable from “human art” to an untrained eye.

Image for “master of puppets”. The AI clearly likes puppies instead.
Figure 1. Image for “master of puppets”. The AI clearly likes puppies instead.

The main “AI authors” are currently DALLE-2 (developed by OpenAI), MidJourney (by an independent research lab) and Stable Diffusion (an Open Source model). But there are many more (and they will be more in the future).

From a mere technical point of view, these AIs solve a precise problem: “image generation from caption” or, in short, “text-to-image.”

For many years we had trained AIs for the opposite problem. We give them an image, and we get a textual description of the photo, such as “orange cat on the table” or “person standing in front of a car.” We use this all the time, from image search (like how in Google Photo you can search “cat” and get all the photos of cats) and to accessibility softwares (e.g., for automatic image-to-voice readers). At some point, we decided to try the opposite problem: get the image from the caption. And that’s where we are now.

In a more practical way, we have a piece of software that can take a natural language input (for instance, “portrait of a beautiful Japanese woman under the rain, autumn”) and get something like this:

Image for “portrait of a beautiful Japanese woman under the rain, autumn.” Without adjustments. It is just the first thing the AI spit out.
Figure 2. Image for “portrait of a beautiful Japanese woman under the rain, autumn.” Without adjustments. It is just the first thing the AI spit out.

Impressive. Isn’t it?

How do these AIs work?

It would be too complex to go over the technical details. In short, though, these AIs are powered by Neural Networks trained over massive amounts of data (from datasets, web crawlers, and users’ inputs). In the field, we call this “models.”

Once the model is trained, you have a big array of numbers. You take the user input, use these numbers according to some simple algorithm, and get an image as an output.

If you want to learn more about the technical part, watch this video. But, don’t worry, you don’t need more technical knowledge to read further. 

If you are still here, maybe you want to know more. Fine. Let’s add some details. 

The most common algorithms for the text-to-image problem are called diffusion models.

The algorithm starts with a random blurry blob and a text and tries to progressively “focus” the blob guided by the input text. It is very similar to how they magically enhance pixellated photos in the CSI TV serie 😅. Let’s do an example.

Let’s start with this image.

If I tell you that this is a cat, you can feel your brain starting to fill the gaps. It starts to interpret some blobs like the head, others like the paws, and so on. In your mind, you may come up with an image like this one.

But if I told you that the same blob is a pumpkin, you may start seeing something like this.

An AI image is generated in this way (with a significant approximation). You may also notice that these algorithms are fascinating models of how the human brain dreams or imagine things. But this is another story.

(To me, the way the AI think is more fashinating than the actual result. But, again, this is another story. 😅)

Are these AIs good?

The alarming fact for professional artists is that we finally have something really good. I mean really good. You can quickly start with a text prompt and get something that untrained eyes cannot differentiate from the product of a real human artist.

There are, however, still many lacking points:

  • Details still suck. Never zoom into AI-generated images.
  • Not every style is equally effective.
  • The “create an image from any prompt” is misleading. In reality the “natural language understanding” part is lights-year away from human-level. To generate a good image, you need to use a pseudo-natural language different from how human beings talk. For example:
    • These systems are awful with detailed prompts (e.g., “a portrait of a chef with four arms juggling spoons and knives while riding a black motorcycle on Route 66” will produce a disaster).
    • It works poorly with negative clauses (e.g., “without X,” “with no Y”) or comparatives (e.g., “with few X,” “with more Y than X,” and so on).
    • They struggle greatly with relative positions (e.g., “X on top of Y,” “X in front of Y,” “side by side,” etc.)
    • The result is weird non-sensical prompts in which word order matters more than expected. You can see things like “< sky in a starry night with glowing meteor showers, ascension of a woman decomposing and dissolving into moon, dark-blue black gold beige saturated, ornate baroque rococo art nouveau intricate detail, 3d specular lighting, cinematic.” 
  • It is not easy or obvious to iterate over the same image. You cannot tell the AI, “I like this image but make this character wear sunglasses, and add a red flower here.” Therefore, if you have something really specific, or complex, you are 100% better if you hire a human artist.

In the end, at the moment, I think it is easy to spot AI generated art. Don’t be fooled from what you see online. People posts only “the good one” discarding the bad ones. So you have the impression that the AIs are much better than what they really are.

But this remaining human edge is destined to fade away faster than we think.

They will get better?

That’s state of the art today. But what about tomorrow?

It is simple: it will get exponentially better, and there is no way to stop it.

I have followed these technologies for years and saw visible, sensible improvements every six months.

I see that somebody say things like “there are things that AI cannot replicate” because “they have no emotions, only data.”

Let me be blunt on this. This is delusional. It is comforting, I know, but it is delusional. There will be a point in the future in which AI art will be indistinguishable from human art.

People told the same things for deep fakes and other AI generative content. Yet, ten years later, 10€ of computational power can generate better video special effects than an entire team of professional CGI experts in the early 2000s.

Is it a reason for desperation? No. Art and value will only adapt. (I’ll expand on this in “The Value of Humanity” and “The Democratization of the Medium” sections)

This is one of the controversial parts.

It is not the first time we, as a society, have tackled this issue. For example, when GitHub (a popular software code hosting website) released Copilot (an AI that can write functional code from basic input), we had the same exact discussion. Is it legal to use public code to train an AI?

I am not a copyright lawyer, so I am not the best person to answer this question. But my impression is that, yes, it is mostly legal. (Note I will revisit this section if there are any sensible developments.)

UK, EU, and US laws about data mining allow the use of any material publicly available. So if it can be crawled, it can be used.

Moreover, training an AI is a transformative action, not a derivative one (it would be weird to say that a big bag of Neural Network weights trained on billions of inputs is a “derivative” of any specific work). Transformative actions are generally allowed. Imagine that two authors write two books drawing from the same cultural context. They cannot claim copyright violation from each other, even if the books end up similar. Nor can a specific element of the “cultural background” claim violation over the two books.

Of course, some exceptions exist, but they are primarily on the specific generated images. Suppose the AI generates an image that can be seen as a derivative (or a copy) of an existing copyrighted image. In that case, you can legally intervene on that image. (And, for instance, DALL-E 2 actively tries to avoid these cases.)

Okay. I am an artist and still want to opt out of the training data.

That is understandable, and if you want, you can try to be removed from standard datasets like LAION and the Common Crawl dataset. (Note, however, that this may mean that search engines like Google will not index your work: once you opt out from web crawling, you opt out from all the crawlers; even search engines).

But, if you allow me to give you a suggestion, this is wasted time. Not because you may not get removed from the training set, but because it will have no effect in the medium-long term. The next wave of AI artists will probably be able to do style transfer (the technical term for imitating the style of someone) on the fly. So if someone wants an image with your style, they can simply pass one or more of your art pieces to the AI along the prompt and get the desired image. This leads to worse results at the moment, but AIs are rapidly getting better at this.

No. There is no protection under copyright law because generated images are not the outcome of human intellect.

Is it ethical?

Big question. But vague. I don’t think I am capable of answering this.

There are many substantial ethical questions. What if I generate disturbing images of a real person? What about generative pornography? What about scams?

But if we limit ourselves to a generic, “is it ethical to use AI art?”, my answer is yes. As long you don’t try to pass an AI-generated image like your own creation, and you clearly state the generation algorithm, I don’t see any problem with that.

The AI art has no soul/it is not good/etc.

That’s true. But as I said, it will get better and better very soon.

It is true, though, that AI art has no soul if we use a specific definition of “soul” (see “The Value of Humanity” section). But, let’s be honest, not every image needs “soul” and “deepness.”

So, will it steal artist work?

The answer is yes and no. Let me explain.

I can see in the near future that AI art may replace some form of artist commission. Even in the not-perfect state of AI today, there are use cases in which AI art is “just enough.”

I am talking about people that want an avatar for social media. People that want an image of their D&D character. People that wish just some illustration to add color to their blog posts (guilty as charged 🙋‍♂️). Now they have a way to conjure into existence part of their imagination (in The Democratization of the Medium section, I’ll try to argue that this is an overall positive thing).

But let’s not go on suicide watch yet! I don’t think AI art will destroy the value of most of the art I see around. There will be 80% of the same opportunities, and maybe there will be new opportunities we can still not predict.

(Some time ago, a friend of mine, experimented making art with “non human partners.” I think it is the best approach to explore the new opportunities of generative AI.)

In my humble opinion, AI Art will impact the visual arts like photography did in the past century. Photography definitely cut the share of work for painters doing portraits, but painters adapted, and photography allowed us to do new things. The full spectrum of future direction, though, is hard to predict.

In the transition phase, though, there will be some struggle for that ~20%. It happens every time technology changes faster than our capacity to adapt. However, this is a much more significant and general problem affecting the entire post-industrial economic model. And this post is already too long.

The Value of Humanity

It is fair to ask if there will be space for human art in the future. The answer is yes. The reason resides in the value of the human component. The value of something is a perceived property, and there is intrinsic value in the fact that something was made by a human.

I see two kinds of “meta-value” that AIs can never steal: the “human experience” (that is, that a particular human has done a piece of art in one specific moment of life) and the “human effort” (that is that something requires mastery and effort).

There are plenty of examples already in art itself. For instance, look at Perfect Lover by Felix Gonzalez-Torres. They are just two commercial clocks put side by side. Still, they assume incredible (even emotional) value if we know their story. Without this, 99% of modern and post-modern art would be just worthless weird things.

Or let’s take hyperrealistic art. When a hyperrealist artist draws an apple that looks like a photo, we value the fact that a human draws that. Yes, I could take a picture of an apple and get an even better “realism,” but that’s not what amazes us. That is not what we value. Instead, we value the human; we recognize that doing that draw requires mastery and years of training.

On the contrary, an AI doing a “hyperrealistic” art will raise NO eyebrows. There is no effort in that. There is nothing behind the picture; therefore, there is no value.

It happened the same in chess. When the first AIs started to beat the Grandmasters, somebody began to announce the death of chess. What will be of chess after the AI takeover? In reality, nothing happened. On the contrary, chess is still incredibly popular, and human chess players are still regional celebrities. AI changed chess, sure, but it also produced better human chess players. And it quickly became evident that nobody was interested in watching two chess engines play with themselves. On the other hand, thousands of people enjoy watching human chess players on Twitch, even if their games are sub-optimal.

AI Art will change the landscape of art. For the first time in human history, machines and algorithms are infiltrating a domain humanity thought inviolable: creativity. This will change how we look at creativity and, maybe, in the future, will help to create better artists. But, in the end, human art will adapt. Humans will adapt. That’s what we do.

The Democratization of the Medium

There is another point. It is the argument for which I think AI art is a good thing (I am very sorry for the artist that I may trigger). The point is that it is another step into the “democratization of the expressive medium.”

When I first started playing with generative AI art, I showed it to a friend. After some week of playing, he said: “you have no idea how good it is to extract these images out of my brain.” I think that was a beautiful thing for “non-artist people.”

It is not the first time this has happened. Every time the “entry fee” for a medium gets lower, people inside the medium cry for the death of the medium. It happened when computers made music production easier allowing people who didn’t know how to play any instrument to create music. It happened when digital photographs allowed everyone to take photos. It happened when digital video and YouTube allowed everybody to create and distribute videos. And so on.

Yes, this flooded each of these markets with a tsunami of crap (like everybody can see by browsing any photo album on anybody’s phone). Yes, this reduces some of the work available. For example, my mother used a professional photographer to photograph my Carnival costumes when I was a child. Now people can take better pictures with their phones).

But, every time, it created more opportunities and much more variety.

I am deeply convinced that having more people able to produce images of their thoughts will be a good thing in the long term.

Conclusions

I hope this will clarify some information on AI art and provide some point for further discussion. Note that I have not a crystallized idea – everything is relatively new – and I may change opinion along the road. This article only reflect my opinion at the time I am writing.

I’ll try to update this article once I have more information or if there are more comments/questions.

Header Image
Machine Consciousness is Inevitable

Can a robot become self-conscious? It looks like an interesting question, but it is not: machine consciousness is just inevitable. The true interesting questions are hidden in the details of …

Read
Header Image
The Trolley Cart Problem is not an AI problem

Every time there is a discussion on the future of AI-powered Autonomous Vehicles, somebody put the Trolley Cart Problem (TCP) on the table. And every time this happens, I am annoyed. …

Read
Header Image
Overview of Three Techniques for Procedural Storytelling

Inspired by a recent paper I read this week, I decided to explain the three major “classic solutions” to the generative storytelling problem: Simulation, Planning, and Context-Free Grammars. …

Read
comments powered by Disqus