Cover image: Dall-E Mini, “Generative Art About Empathy In Blue Tones”, digital medium, 2022
From the bizarre juxtapositions of images created by Dall-E Mini to the NFT market: images generated by AI algorithms are increasingly becoming mainstream. At the same time, this close intersection between art and technology raises several questions.
Can a machine generate works of art autonomously? If so, what is the future of artistic production when it is no longer exclusive to humankind? What are the limits and risks, but also the potential, of this kind of art?
What is Generative Art?
Generative Art is a type of art, mostly visual, based on cooperation between a human being and an autonomous system. An "autonomous system" is by definition a software, algorithm or AI model capable of performing complex operations without the need for programmer intervention.
Randomness is a fundamental property of Generative Art. Depending on the type of software, the autonomous system is able to process different and unique results at every generating command, or it can return a variable number of results in response to user input.
The first experiments in Generative Art date back to the 1960s with the works of Harold Cohen and his AARON algorithm. Cohen was the first to use stand-alone software to generate abstract artworks inspired by Pop Art silkscreens. Cohen's works are now on display at the Tate Gallery in London.
Another attribute of Generative Art, although one that is becoming less and less of a prerogative, is the repetition of patterns or abstract elements provided by the programmer and implemented within the software code.
The development of increasingly complex neural networks that operate on text-image association has led to generative models capable of creating increasingly realistic and accurate images. The best known example of this type of Generative Art is Dall-E.
Dall-E and CLIP: a revolution in image recognition
Dall-E is a multimodal neural network based on OpenAI's GPT-3 deep learning model. This system is capable of generating images from a textual description based on a dataset of text-image pairs.
The first version of Dall-E, which was launched in January 2021 and remained the prerogative of a small number of professionals in the field, was a real revolution for this type of generative model, surpassing the innovations of GPT-3 itself.
Dall-E is indeed capable of generating plausible images from a wide variety of sentences and textual prompts, even those characterized by a composite linguistic structure. OpenAI's model is shown to be capable of understanding and implementing:
- The perspective structure of the image
- The inner and outer structure of an object
- Comparisons and sequences between different images
- The spatial-temporal location of objects.
The accuracy of the results processed by Dall-E proved to be the perfect area of application for another OpenAI solution: CLIP (Contrastive Language-Image Pre-training), an image classification and ranking neural network trained on the basis of text-image associations, such as captions found on the Internet.
Thanks to CLIP's intervention, which reduces the number of results offered to the user per prompt to 32, Dall-E was found to return satisfactory images in most cases. However, the results obtained are low-quality and still show obvious limitations in processing certain types of logical associations between elements, such as their spacial location.
Dall-E Mini conquers the Internet
In the art world, imitation is the sincerest form of flattery. OpenAI never published the code of DALL-E, but it only took a few months before a less refined version of the neural network appeared, based on the same principles of association and combination of images from a database of about 30 million elements.
Enter Dall-E Mini, by American developer Boris Dayma, released on the open-source hosting platform HuggingFace. Available to everyone in the form of a simple web app since the spring of 2022, Dall-E Mini has quickly become, according to Wired, "Internet's favorite meme machine."
The ability to generate 9 low-resolution images from any prompt, even the most bizarre ones, sparked the imagination of users, who had fun creating funny and surreal combinations and sharing them on platforms such as Twitter and Reddit.
In just a few weeks, Dall-E found itself processing about 50 thousand images per day and attracted the attention of users normally uninterested in Artificial Intelligence developments, while providing experts with several insights into the application of these technologies on a larger scale.
Generative Art: limits and self-impositions
The degree of popularity achieved by Dall-E Mini has immediately raised questions about the possible risks that may creep into Generative Art and its outputs, especially those depicting real people and things.
Images processed by Dall-E Mini have an unmistakable appearance: the outlines of subjects are often poorly defined or distorted, and human faces are almost always deformed to the point that they are no longer recognizable. In most cases, therefore, the artificial nature of the generated images is well understood by the user, so as to minimize the likelihood of generating deepfakes with malicious intent.
Nonetheless, the open-source nature of Dall-E Mini and the vast amount of prompts entered by users soon shed a light on the need to regulate the results generated by the neural network. Dall-E's database blocks out the most explicit or violent keywords - a system that, although still imperfect, allows developers to control the results returned to the end user.
On the other hand, as is the case with any Artificial Intelligence, within Dall-E and its Mini version lurk social biases common to the humans who developed these technologies.
OpenAI's neural network, for example, reflects the most superficial stereotypes about the food or population of a place with geographic prompts; Dall-E Mini, on the other hand, only returns images of men at the "doctor" prompt and women at the "nurse" prompt.
Back to privacy issues: the possibility that Generative Art could jeopardize the safety of portrayed individuals gets more and more worrying considering the advancement of neural networks, which are now capable of returning higher quality results with more precise details than Dall-E.
Dall-E 2, the second generation of OpenAI's neural network unveiled in April 2022, seeks to reduce these kinds of risks by strengthening the system's filtering rules for training data and accepted keywords. The few professionals who have so far gained access to Dall-E 2 have to meet even stricter standards, at least while the capabilities and limitations of the new technology are still being tested.
Dall-E 2: towards a subscription-based model
As anticipated in the previous section, in a little over a year, progress in the area of Generative Art has been substantial, with Dall-E 2 being able to generate even more realistic and accurate images in four times higher resolution of the first generation.
The improvements in Dall-E 2 mainly focus on the combination of concepts, attributes, and art styles. The neural network can now make various changes to pre-existing images from a natural language description, adding or moving elements within a scene and creating variations from an original subject or artwork.
After an initial period of limited access, OpenAI is ready to release Dall-E 2 in beta to the first million users on the waiting list. Unlike its first version, however, the consortium founded by Elon Musk (among others) and funded by Microsoft is set to adopt a subscription-based model structured on a credit basis.
Specifically, each user of the Dall-E 2 beta will receive a predefined number of credits (50 at sign-up and 15 each following month), each of which will equate to an image generated by the neural network. Once they run out of credits, users will be able to purchase a 115-credit bundle for $15.
Generative Art: current and future applications
From the bizarre creations of Dall-E Mini, ironically shared on the Web, to actual works of art sold at auction for astronomical amounts of money, Generative Art has been reaching an increasingly large audience in recent years.
For the first time, clients will be able to use the generated images for commercial purposes as well as personal. Users on the waiting list, OpenAI explains, already plan to implement the images generated by Dall-E 2 in several types of projects, including some more traditional ones:
- Children's book illustrations
- Concept art and storyboards for video games and movies
- Moodboards for design consultancies.
One of the most fruitful commercial outlets for this type of "digital native" art, however, is undoubtedly the NFT market.
The images generated by neural networks, combined and reprocessed by multimedia artists or proposed as the algorithm generated them, can be uploaded to blockchain and sold on marketplaces such as OpenSea or on platforms for the independent management of own non-fungible tokens such as our NFT Commerce.
On the other hand, the results obtained from neural networks such as Dall-E assume great importance not only for their aesthetic value, but also for their use in a variety of practical applications. It is precisely on image search and recognition that Google has focused its efforts, announcing the development of two AIs that function similarly to Dall-E, Imagen and Parti, neither of which has yet been shared with the public.
Generative Art (?)
The incursion of Artificial Intelligence has opened within art history a chapter that is largely yet unwritten.
In the past decades, Pop Art has brought the seriality of industrial processes within the visual arts, while postmodernism has untied the knots of mass society in an ironic game of combination. Even earlier, Dadaism opposed creative intention with the playful randomness of free associations.
From a cultural perspective, Generative Art inserts another fundamental variable to this chronology: the autonomy of the tool from the author. This raises questions about some essential points.
Authorship of the artwork
Authorship is an open question in the contemporary art world. This is demonstrated by the recent lawsuit filed against Maurizio Cattelan by Daniel Druet, a sculptor who created some of the artist's most famous installations without ever appearing in the credits or catalogs.
If a work of visual art is generated by an AI, does the authorship belong to the AI, the professionals who developed it, or the digital artist who provided the prompt? Indeed, can a dataset of text-image associations be an adequate counterpart to the faculty of imagination?
The production of Generative Art itself also involves business models that are still being defined. The subscription-based model is currently the most widely used in content creation and distribution, but it is also the one that most limits the independence of the medium and the freedom of creators.
With a pen and a sheet of paper an artist can freely create what they want: that is not the case when in order to give voice to his or her creativity the artist must pay monthly or "by use" to a Generative Art platform, which moreover can be restricted and censored by those who manage it.
Subscription models are complex to manage properly precisely because they involve a continuous exchange of value and freedom between the user and the company. We at Neosperience, having carried out projects on the subject with some of the most important companies nationally and internationally, offer our expertise on the subject through both business design work and the development of dedicated digital products.
Unbiased Artificial Intelligence
As we have seen, in order to enhance the potential of Generative Art, we need to make the best use of the specificity of this medium in all its fields of application. More than that, it is essential to design artificial intelligences in an empathetic way. Is it possible to untie our biases as human beings from the code that gives life to the Artificial Intelligences we are developing?
Achieving this goal requires a thorough understanding of the hybrid nature of Generative Art, which calls into question both culture and technology. It will therefore be necessary to bring data scientists and humanists together at the design stage, in order to provide AIs with datasets capable of producing results that are unbiased, yet accurate and representative at the same time.