Although artificial intelligence (AI) has already proven itself to be the catalyst behind tech movements such as self-driving cars and mass data set analysis, it’s common knowledge that the new era of tech has been made most famous through more trivial and democratised means throughout its relatively short time of activity.
As you will probably already know, most people’s experience with the nascent area of tech has come through platforms such as ChatGPT, where through inputting text prompts, they can receive an array of AI-generated text-based services – be it advice (on literally a limitless number of topics), content creation, complex smart contract creation, and so on.
However, as previously explored here at dGen Network, the domain of text-to-image AI generators has also made a substantial emergence in the ever-evolving Web3 space – where as the name suggests, these systems involve users inputting text prompts in order to produce novel images/art within a matter of seconds.
Text-to-Image AI Generative Art
Generative AI art comes to life thanks to machine-learning AI models (i.e. DALL-E, Midjourney, etc.) that are trained on billions of existing images from across the internet. After being fed such images, models then become accustomed to a vast array of artistic and image-based styles, patterns, and concepts, which it can then use to generate new art when fed a text-based prompt.
The fundamental utility behind such tech is to allow users to instantly get ahold of particular images they’re after, regardless of how niche they may be. In turn, such services can be used by individuals for personal needs, as well as companies and corporate entities for branding, marketing, and social media purposes.
As previously explored, AI-generated artwork can also be useful for those in the digital art and NFT scene. Through the curation of custom-made algorithms, mass collections of unique digital art pieces sporting similar characteristics can be generated with complete proficiency. If not concerned with mass NFT collections, text-to-image AI-generated images can also be used for literally any other branding, marketing, and product use within Web3 – however such a process involves several copyright and disclaimer requirements that must be fulfilled.
Although quite worrying for human labourers, another fundamental utility here is that both time and money – on creative workers and their wages that is – can be saved through the deployment of text-to-image AI software.
Limitations
Of course, with AI-generated art unavoidably being made possible through leveraging the work of thousands (if not millions) of human creatives, issues of accreditation can come into play here. This is because each AI-generated image doesn’t actually entail any new or novel elements of creativity, as instead, they are new and novel convulsions of pre-existing imagery and artworks.
Intuitively, what this leads to is artists having their work – or at least elements of it – used in content without approving it, nor even knowing about it. An example of this issue arose back in January, when Getty Images sued Stability AI – i.e. the company behind text-to-image platform Stable Diffusion – on the grounds of the company committing copyright infringement through copying and processing millions of its images without proper licensing. In response, Stability AI filed a motion to dismiss the case earlier this month.
In turn, what this also results in is a debate over the originality of AI-generated art, as some critics refuse to believe that it entails anything ‘new’. Whilst, as is evident, the domain undoubtedly uses pre-made designs and artwork, some may argue against this point, as the tech is able to create incredible niche images that quite-probably have never been created before. For example, I asked DALL-E to create this image of a 3D animated donkey slam dunking a basketball in space within seconds- and to my knowledge and intuition, this specific image quite-probably wasn’t already in existence.
Also is the issue of inaccuracies, as there is essentially no guarantee that each AI-generated ‘prediction’ correctly abides with the text-prompt it’s been fed. This can come through information shortcomings (i.e. a lack of data surrounding a particular concept), a lack of precision in information inputs, algorithmic faults, and biases featured within the content it’s fed.
However, whilst such problems most-certainly cause artistic-related inconveniences for users, it’s logical to assume that they don’t pose the same level of threat as the inaccuracies involved in text-based AI platforms and chatbots.
DALL-E
The leader of the text-to-image AI-generated art space in 2023 has undoubtedly been DALL-E- which like ChatGPT, is the brainchild of AI powerhouse OpenAI. DALL-E is a neural network-based model that leverages a combination of two technologies: Transformer architecture and generative adversarial networks (GANs).
In accordance with the aforementioned method which uses existing images to recreate new and novel ones, DALL-E is trained on a massive data set which revolves around the relationships between images and their corresponding textual descriptions. Through such a thorough training process, the platform is able to instantly generate high quality and realistic images (to a standard that currently tops the AI-generated art scene).
After its first edition – DALL-E 1 that is – was announced back in January 2021, today’s DALL-E 2 came out in April 2022. Here, the original version used discrete VAE (dVAE) in its training and image generation process, whilst DALL-E 2 uses a diffusion model that generates higher quality images.
As for its name, DALL-E is a combination of Pixar legend WALL-E and the esteemed artist Salvador Dalí. Of course, inferences of futurism and the surreal and highly imaginative artisan styles of Dalí provide the inspiration behind such a title.
Adobe Photoshop’s ‘Firefly’
With AI passing into the mainstream on a day-by-day basis, it was only a matter of time until some of the world’s leading Web2 image software editing tools got in on the act. Most recently, Adobe Photoshop – an image editing tool used by over 90% of the world’s creative professionals (per Photo Stock Secrets) – joined the party by adding an AI-powered image generator to its industry-leading software offering.
Per Adobe, the AI addition – which is called Firefly – comes with the goal of allowing users to ‘dramatically’ accelerate how they edit their photos. To do so, the software allows users to add or delete elements of an image through the simple use of a text prompt, as well as automatically match the lighting and style of images with existing ones.
Similar to other AI-generated image platforms, Firefly was trained using a wealth of pre-existing images. These, however, came from the company’s vast collection of stock images, as well as other publicly available images. Through this, Adobe hopes to avoid the aforementioned issues that are tied to platforms facing legal action when using the work of unspecified artists for training purposes.
Although not planned to be launched in full until the end of the year, Firefly was launched in web-only beta mode at the Adobe Summit back in March. Since then, Adobe has stated that over 70 million images were created in its first month of activity, making it one of the company’s most successful beta launches ever.
Looking Forwards
As is the case with literally all areas of tech, the future of text-to-image AI is largely unfathomable to those not directly developing or working with the space. In addition, its future may also depend on other coinciding technological developments, as well as any ethical guidelines that are created to ensure that the tech is to be used responsibly and to maximum benefit. However, as the space advances every day, several possibilities for its mass usage lie ahead…
The first relates to its realism capabilities, and how with enough innovation, text-to-image AI will be able to create images that are indistinguishable from real photographs. After experiencing an extremely rigorous training regime, models may also be able to proficiently replicate the art of esteemed artists, further meaning that the distinct artisan styles of deceased artists can live on forever through the use of AI. Of course, such scenarios will also come with the consequence of impacting the workflow and/or roles of today’s professional photographers and artists.
With high level training in play, there’s also the possibility that text-to-image AI can be applied to specific domains, meaning industries such as fashion, product design, and architecture may be able to leverage the tech in order to streamline design processes.
Another possible use of text-to-image AI-generated images comes through multimodal understanding and content creation. Here, AI-generated images can be used in conjunction with other forms of AI-generated context – such as audio and visual productions – to create seamless entertainment experiences that’ve been made through the holistic use of AI. Intuitively, the advent of such AI-centric processes will completely revolutionise the content creation industry.
With the tech’s generative qualities, there is also ample room for the all-encompassing reality of human-AI co-creation. Here, further human creativity can be sparked and/or refined through the AI’s ability to generate new and novel visual suggestions, whilst any gaps in the creative process can also be filled.
In turn, text-to-image AI-generated images can become companions for human creatives, resulting in access to a larger creative scope, creative block irradiations, shorter production processes, and ultimately, enhanced content outcomes.