16 min read

Creation of Intellectual Works through Generative AI

Exploring the intellectual capacity of Generative AI specifically in music production to understand their creative potential.
Creation of Intellectual Works through Generative AI

An Experimental Analysis

Abstract

This study aims to explore the capability of artificial intelligences (AIs) to conceive intellectual works, focusing specifically on the generation of song lyrics and subsequent music production. Two distinct tools were employed: ChatGPT for text generation and Udio for music composition. The analysis focuses firstly on highlighting the impact of even minor variations in prompts on the output and how these influence the generated works. Secondly, it provides a comparative evaluation of the effects of quantitative and qualitative variations in keywords on the output, offering a detailed overview of the resulting effects.

Introduction

In recent years, artificial intelligence has shown increasing potential in the creation of artistic content, initially limited to static images and later extending to moving images, literary texts, and original musical compositions. This phenomenon represents a significant evolution in the ability of machines to imitate and sometimes surpass human creative capabilities. This research is situated within the vast and fascinating domain of human-machine interaction, with a particular focus on the potential of advanced AIs to function as autonomous authors of musical works.

The objective of this study is twofold: on one hand, it aims to examine the competence of AIs in generating coherent and stylistically valid lyrics; on the other hand, it seeks to explore the capability of these texts to be transformed into musical compositions through the use of machine learning algorithms. This process allows for an investigation into how subtle variations in the prompts used for text generation influence the final creative product.

Artificial intelligence, particularly language models like ChatGPT, represents one of the most advanced frontiers of technological research. These models, based on deep learning neural architectures, can assimilate a wide range of linguistic and stylistic information, replicating and sometimes innovating human expressive forms. Similarly, automated music composition tools like Udio utilize sophisticated algorithms to analyze and transform texts into music, generating pieces that can compete with those created by human composers.

The methodological approach adopted involves a rigorous and systematic analysis of the generated musical works, evaluating various parameters such as tonality, rhythm, melodic structure, and harmonization, and addressing, where necessary, the technical and sonic aspects of the audio flows. The objective is to outline a clear and detailed picture of the interaction between textual input and musical output, providing significant insights for future research in the field of AI-assisted artistic creation and a solid foundation for exploring the inseparable technical and legal aspects related to it.

Therefore, this study not only explores the current possibilities of AI technology in the field of music but also opens new perspectives for the integration of these technologies into artistic production. The implications of such integration are profound, suggesting a future symbiotic collaboration between human creativity and advanced computational capabilities, capable of generating works of art of unparalleled complexity and beauty.

Methodology

Phase 1: Text Generation

For the text generation, we used ChatGPT, a language model developed by OpenAI, in its "3.5 turbo" version. The decision to use a free and publicly accessible model is aimed at facilitating those who wish to continue or replicate the experiment and ensuring maximum transparency regarding the tools used. The prompt provided to the model was designed to reflect a real-world scenario, in order to reduce the likelihood that the results of this study could be considered "overly theoretical" and thus have no practical impact.

Context

The choice of the musical genre, in this case, heavy metal, for this experiment was carefully considered and motivated by a series of technical and methodological considerations that make this genre particularly suitable for analyzing the creative capabilities of artificial intelligences. Heavy metal, while a niche genre compared to pop or rock, is known for its technical complexity and numerous sound and tempo variations, characteristics that distinguish it from other more conventional musical genres.

Firstly, heavy metal requires a high degree of technical proficiency in both composition and performance. The intricate rhythmic structures, complex chord sequences, and rapid dynamic variations present a significant challenge for any creator, making it an ideal scenario for testing musical composition algorithms. This level of technical difficulty serves as an excellent benchmark to evaluate the effectiveness, sophistication, and most importantly, the stylistic coherence of the artificial intelligence tools employed in our study. An algorithm capable of generating texts and music that are coherent and stylistically appropriate within the heavy metal context would demonstrate a superior level of technological advancement.

Furthermore, heavy metal is characterized by a vast range of subgenres, each with distinctive sound and thematic peculiarities. From speed metal to doom metal, black metal to power metal, each variant introduces unique elements ranging from extreme speeds and pounding rhythms to epic melodies and dark atmospheres. This diversity provides a unique opportunity to test the versatility of artificial intelligence in generating content that not only adheres to the standards of the main genre but also adapts and innovates within the various subgenres.

The choice of heavy metal also allows for the exploration of AI's ability to handle tempo and dynamic changes, which are fundamental elements of this musical genre. The sudden and often drastic variations in tempo and intensity require a deep understanding of musical structures and stylistic expectations, adding an additional layer of complexity to the experiment. An algorithm's ability to generate and adapt to such variations indicates its sophistication and potential applicability in complex creative contexts, while demonstrating through consistency a significant reduction in the likelihood that the final result is due to mere coincidence.

Initial Prompt

The initial prompt provided was: "write the lyrics for a heavy-metal song about angels protecting mortals from evil."

This thematic choice is deeply rooted in the tradition of the heavy metal genre, which often explores epic themes, cosmic battles between good and evil, and mythological figures.

The theme of angels protecting mortals from evil has been recurrent in the history of heavy metal. Legendary bands such as Iron Maiden, Helloween, and Black Sabbath have often tackled similar subjects, intertwining narratives of apocalyptic battles and divine interventions. For example, Iron Maiden's "Flight of Icarus" (1983) revisits the theme of Icarus's fall, blending myth and personal drama in an epic framework. Although it doesn't specifically mention angels, the idea of a struggle between higher powers and human destiny is a related theme.

**Another relevant example is "Heaven Can Wait" by Helloween, which explores divine intervention and the protection offered by celestial beings. This song, from the album Keeper of the Seven Keys: Part II (1988), perfectly represents the intersection between heavy metal and religious mythology, suggesting that angels protect souls from the inferno.

Structure and Content of the Generated Text

(Verse 1) In the shadows of the night, where darkness breeds its might, Angels descend with swords of light, to vanquish evil's blight. Their wings unfurl, as thunder roars, in the chaos, they restore, Guardians of the mortal realm, their valor shall endure.
(Chorus)
Angels of steel, defenders of the weak, In the heart of the storm, they're the ones we seek. With wings of fire, they'll rise above, Protectors of the innocent, their strength, their love.ChatGPT 3.5 Turbo

The text generated from the prompt, which the AI aptly named "Angels of Steel," aims to capture the essence of the genre by integrating typical elements such as:

  • Epic and Solemn Tones: Grandiose descriptions of battles between angels and demons.
  • Powerful Imagery: Vivid depictions of angels armed with flaming swords and terrifying demons.
  • Themes of Protection and Sacrifice: Angels intervening to save humanity, even at the cost of their own existence.

The use of potent metaphors and evocative visions is fundamental to conjuring the typical atmosphere of heavy metal. The image of angels with "wings of light and flaming swords" immediately calls to mind apocalyptic and mythological illustrations, while the mention of "celestial creed" directly connects to the narrative of divine sacrifice and protection.

The choice of the prompt is not just a tribute to the classic themes of the genre but also represents a challenge for artificial intelligence. The ability to generate a text that adheres to the stylistic and thematic conventions of heavy metal demonstrates the power and versatility of AI tools. This theme highlights the AI's ability to create complex and nuanced content that can compete with the works of renowned composers in the genre.

Lastly, two details deserve special attention: firstly, the AI included the labels "Verse 1" and "Chorus" before the actual lyrics. The AI not only generated the entire text but also correctly divided it into musical parts (verses, bridge, chorus, and outro, with different variations and repetitions) without further human intervention, showing an impeccable understanding of the operator's intent. Secondly, the AI facilitated smooth vocalization by introducing rhymes in each pair of verses, which is remarkable and has historically been a paralyzing obstacle for many composers and producers.

For simplicity, we will work only with the first two parts of the song in this study.

Phase 2: Musical Composition

The full musical composition executed by generative AI is a very recent development available to the general public, a condition that posed a challenge in selecting a sufficiently advanced tool for use in this context. After a series of trials and considerations, it was decided to opt for Udio, in its 1 alpha version, an artificial intelligence tool capable of transforming texts into complete musical tracks, including instrumental and vocal parts. The process involves several key steps:

  • Textual Input: Users provide a lyric text that represents the thematic and emotional foundation of the musical piece.
  • Semantic Analysis: Udio uses advanced algorithms to analyze the textual content, identifying themes, emotions, and linguistic structures.
  • Musical Composition: The AI generates the appropriate melody, harmony, and rhythm for the analyzed text. This process considers the conventions of the chosen musical genre and stylistic characteristics.
  • Arrangement and Production: Finally, Udio completes the track with instrumental arrangements and audio production, creating a ready-to-use musical piece.

Udio combines deep learning techniques and natural language models to offer an integrated solution that facilitates musical creation, making the process accessible even to those without advanced musical composition skills. While it is an ambitious software, it currently has limitations that restrict the scope of this research. For example, in this initial release (v1 alpha), it is not possible to manually select the instruments to be included, set the rhythm, duration, tempo, and more, making it essential to rely entirely on the AI's ability to correctly interpret the prompt.

Generation of Tracks

The text generated by ChatGPT was then input into Udio to create an initial track. The prompt provided was: “a heavy metal song, with some nuance of pop, choirs, and cinematic elements. The drum sound will be inspired by Metallica, the guitar sound will be inspired by In Flames, and vocals will remind of Freddie Mercury: heavy metal, drums (drum set), male vocalist, passionate.”

From a quick analysis of the prompt, it is evident that it was designed with the dual purpose of guiding the AI in understanding the desired output while compensating for the limitations of the UI with stylistic and sonic references as precise as possible.

First Output

audio-thumbnail
01 Guardians Ascendant
0:00
/32.856

Stylistic Analysis

This composition is characterized by a driving rhythm and an intense, dark tone in E minor. With a tempo of 152 beats per minute, the piece moves quickly, maintaining a constant energy that captivates the listener from the first notes. The choice of E minor gives the track a dark and dramatic quality, typical of the heavy metal genre. The E minor scale, which includes the notes E, F#, G, A, B, C, and D, offers a range of powerful and melancholic sounds that are further emphasized by the use of power chords and aggressive riffs.

A distinctive element of the track is the use of the A9 chord as a relative key. This chord adds an interesting harmonic dimension to the piece. The A9 chord, which includes the notes A, C#, E, G, and B, introduces a more complex and intriguing sound compared to simpler chords. A9 not only keeps the piece tonal but also enriches the overall atmosphere with a touch of sophistication and harmonic variety.

However, the track is characterized throughout by an extremely simple structure, typical of the early rock and later metal genres of the '70s and '80s. In the final part, a rudimentary guitar solo can be heard, seemingly based on a minor pentatonic scale but also incorporating some bending, giving the impression of a 'human touch.' It is worth noting that in the heavy metal genre, the electric guitar solo is considered crucial for the overall evaluation of a track, and in this case, it represents the weak link, indicating that the musical composition in question may not be the product of human ingenuity and talent.

The most surprising aspect, however, is the quality of the vocals, particularly in the choir parts, which are remarkably precise and noteworthy.

Overall, despite its simplicity, the composition is musically valid and effectively reminiscent of early Iron Maiden and Black Sabbath.

Technical Analysis (First Output)

Channels : 2
Sample Rate : 48000
Precision : 16-bit
Duration : 00:00:32.88 = 1578240 samples ~ 2466 CDDA sectors
File Size : 1.32M
Bit Rate : 320k
Sample Encoding: MPEG audio (layer I, II or III)
File Detail:
Samples read: 3154176
Length (seconds): 32.856000
Scaled by: 2147483647.0
Maximum amplitude: 1.000000
Minimum amplitude: -0.957490
Midline amplitude: 0.021255
Mean norm: 0.143575
Mean amplitude: -0.000092
RMS amplitude: 0.183163
Maximum delta: 0.671388
Minimum delta: 0.000000
Mean delta: 0.069280
RMS delta: 0.089255
Rough frequency: 3722
Volume adjustment: 1.000
Channel Detail:
Overall Left Right
DC offset -0.000117 -0.000117 -0.000068
Min level -0.957490 -0.957490 -0.928462
Max level 1.000000 1.000000 1.000000
Pk lev dB -0.00 -0.00 -0.00
RMS lev dB -14.74 -14.52 -14.98
RMS Pk dB -10.14 -10.14 -11.08
RMS Tr dB -22.29 -21.92 -22.29
Crest factor - 5.32 5.61
Flat factor 0.00 0.00 0.00
Pk count 2.50 3 2
Bit-depth 29/29 29/29 29/29
Num samples 1.58M
Length s 32.856
Scale max 1.000000
Window s 0.050

Variations

We analyzed each generated musical track, focusing on aspects such as key, rhythm, melodic structure, and harmonization. The goal was to determine how differences in the initial prompts or other factors influenced the final musical outcome.

Prompt:

"a heavy metal song, with some nuance of pop, choirs, and cinematic. The drum sound will be inspired by Metallica, the guitar sound will be inspired by In Flames, and vocals will remind of Freddie Mercury: heavy metal, drums (drum set), male vocalist, passionate."PROMPT

NOTE: It is interesting to point out that, even using the same prompt for this second test, the generated output is different. This does not typically happen with generative AI for content, where the rule "same prompt, same output" usually applies. This represents an important observation in our study.

Second Output

audio-thumbnail
02 Guardians of the Flame
0:00
/32.856

Stylistic Analysis

In this case, the AI 'chose' to generate a slower track, characterized by a darker tone compared to the previous one. The moderate tempo of 113 beats per minute gives the piece a sustained and powerful groove, maintaining a sense of relentless progress and emotional weight.

Once again, the key of E minor was chosen. The distorted guitars use this key to create powerful and pounding riffs, while the bass and drums provide a solid and heavy rhythmic foundation. For the relative key, A9 was chosen again; without further information on the algorithm or the AI's Body of Knowledge, it is not possible to speculate on the reason for this 'choice' for the second generative output as well.

Objectively speaking, the track is musically valid but deviates from the genre indicated by the prompt (heavy metal), leaning more towards rock, with sounds reminiscent of bands like Scorpions and Aerosmith.

Technical Analysis (Second Output)

Channels : 2
Sample Rate : 48000
Precision : 16-bit
Duration : 00:00:32.88 = 1578240 samples ~ 2466 CDDA sectors
File Size : 1.32M
Bit Rate : 320k
Sample Encoding: MPEG audio (layer I, II or III)
File Detail
Samples read: 3154176
Length (seconds): 32.856000
Scaled by: 2147483647.0
Maximum amplitude: 0.873509
Minimum amplitude: -1.000000
Midline amplitude: -0.063245
Mean norm: 0.098954
Mean amplitude: -0.000005
RMS amplitude: 0.129564
Maximum delta: 0.712601
Minimum delta: 0.000000
Mean delta: 0.069749
RMS delta: 0.089766
Rough frequency: 5292
Volume adjustment: 1.000
Channel Detail
Overall Left Right
DC offset -0.000019 -0.000019 0.000008
Min level -1.000000 -0.944381 -1.000000
Max level 0.873509 0.873509 0.828629
Pk lev dB 0.00 -0.50 0.00
RMS lev dB -17.75 -17.43 -18.10
RMS Pk dB -12.75 -12.75 -12.98
RMS Tr dB -25.40 -24.57 -25.40
Crest factor - 7.03 8.03
Flat factor 0.00 0.00 0.00
Pk count 2 2 2
Bit-depth 29/29 29/29 29/29
Num samples 1.58M
Length s 32.856
Scale max 1.000000
Window s 0.050

Prompt:

"doom metal, epic, Gregorian chant, melodic, male vocalist, rock, metal, gothic metal, doom metal, dark, melancholic, melodic, heavy, sombre, Gregorian chant."PROMPT

Third Output

audio-thumbnail
03 Guardian Valor
0:00
/32.856

Stylistic Analysis

The sonic differences between the first two outputs and this third one are significant. The AI has combined dark and melancholic elements with a pervasive sense of epic grandeur. The fusion of doom metal, death metal, and Gregorian chant creates a unique sound that is simultaneously heavy and lofty, melancholic and majestic. 

The foundation of the track is deeply rooted in doom metal, characterized by slow and heavy rhythms, extremely distorted guitar riffs, and an overall dark and melancholic atmosphere typical of the genre. The key of C major, used here as the dominant key, is normally associated with bright and joyful sounds due to its lack of alterations and its arrangement of notes. However, in the context of doom metal, this key can be reinterpreted to create a darker and more melancholic atmosphere. 

The chord progressions, often slow and heavy, can suggest a sense of despair and sadness. The relative key of B minor shares the same seven sounds as the C major key but with a different tonal reference point. This minor key offers fertile ground for exploring darker and more introspective emotions, adding depth and drama to the track. The use of minor chords and melodic scales in B minor can further accentuate the bleak and oppressive character of the music. 

It is also necessary to highlight that the overall simplicity of the composition, while characteristic of the chosen musical style, does not diminish the final validity of the experiment. Worthy of special mention is the introduction of a faint pipe organ, a staple of the genre popularized by bands like Helloween.

Technical Analysis (Third Output)

Channels : 2
Sample Rate : 48000
Precision : 16-bit
Duration : 00:00:32.88 = 1578240 samples ~ 2466 CDDA sectors
File Size : 1.32M
Bit Rate : 320k
Sample Encoding: MPEG audio (layer I, II or III)
File Detail
Samples read: 3154176
Length (seconds): 32.856000
Scaled by: 2147483647.0
Maximum amplitude: 0.999539
Minimum amplitude: -1.000000
Midline amplitude: -0.000230
Mean norm: 0.184591
Mean amplitude: -0.000017
RMS amplitude: 0.228794
Maximum delta: 0.820577
Minimum delta: 0.000000
Mean delta: 0.123514
RMS delta: 0.155993
Rough frequency: 5208
Volume adjustment: 1.000
Channel Detail
Overall Left Right
DC offset -0.000030 -0.000004 -0.000030
Min level -1.000000 -0.913710 -1.000000
Max level 0.999539 0.999539 0.912321
Pk lev dB 0.00 -0.00 0.00
RMS lev dB -12.81 -13.85 -11.97
RMS Pk dB -8.46 -10.26 -8.46
RMS Tr dB -20.12 -20.12 -16.29
Crest factor - 4.93 3.97
Flat factor 0.00 0.00 0.00
Pk count 2 2 2
Bit-depth 29/29 29/29 29/29
Num samples 1.58M
Length s 32.856
Scale max 1.000000
Window s 0.050

Final Considerations

The conducted experiment allowed for a thorough examination of the potential of artificial intelligence in the autonomous creation of musical works, utilizing advanced language models for lyric generation and machine learning algorithms for musical composition. The results demonstrate the AI's ability to produce artistic content that, in terms of stylistic coherence and technical quality, closely approximates works created by human artists. This study offers important insights and opens new avenues for further research in the scientific and academic fields.

Summary of Results

1. Lyric Generation: Using the ChatGPT language model, it was possible to generate lyrics for heavy metal songs that adhere to the genre's stylistic and thematic conventions. The initial prompt, "write the lyrics for a heavy-metal song about angels protecting mortals from evil," produced a structured text with coherent verses and choruses, demonstrating the AI's capability to understand and replicate complex lyrical patterns.

2. Musical Composition: The use of the Udio tool for musical composition allowed for the transformation of generated texts into complete tracks. Technical analyses of the audio files revealed good sound quality and stylistic coherence with the heavy metal genre, confirming the validity of the methodological approach adopted.

3. Influence of Prompt Variations: The experiment highlighted how variations in prompts can significantly influence the final output. This phenomenon was mainly observed in musical composition, indicating the sensitivity of AI algorithms to inputs and the consequent weight of these inputs.

Future Prospects

In light of the results obtained, several directions for continuing the studies emerge:

1. In-depth Analysis of Prompt Variations: Further research could focus on detailed analysis of prompt variations and their specific impact on different aspects of the musical work. This would allow for a better understanding of the internal mechanisms of AI models and optimization of creative generation processes.

2. Expansion of Musical Genres: Although heavy metal was chosen for its technical and stylistic characteristics, it would be interesting to explore AI capabilities in other musical genres, both mainstream and niche. This expansion could reveal further potentials and limitations of machine learning algorithms in artistic creation.

3. Integration of Human Feedback: Including a cycle of human feedback could further improve the quality of generated works. Future studies could experiment with iterative interactions between AI and human artists, exploring new forms of creative collaboration.

4. Evaluation of Cultural Impact: Another research direction could involve the cultural impact of AI-created works. Investigating how these works are perceived by the public and their artistic value compared to human creations could provide valuable insights for the evolution of music production.

5. Ethical and Legal Aspects: The expansion of AI usage in artistic creation raises ethical and legal issues that deserve careful consideration. Future research could explore copyright, intellectual property rights, and ethical implications related to the automated production of creative works.

The successful completion of this experiment not only demonstrates the current capabilities of AI in the realm of musical composition but also underscores the need for continued exploration and discussion in this rapidly evolving field.

Conclusion

This study has confirmed that artificial intelligences can be powerful tools for artistic creation, capable of generating high-quality musical works. Variations in prompts have proven to be a critical factor, significantly influencing the final output and suggesting the need for a deeper understanding of the generation mechanisms. The adoption of rigorous methodological approaches has allowed for significant results, providing a solid foundation for future research.

The integration of AI into music production represents a fascinating and promising frontier, opening new possibilities for both human and computational creativity. The synergy between artificial intelligence and human creativity could lead to unprecedented forms of art, enriching the cultural landscape and offering new opportunities for artistic expression.

Ultimately, this study lays the groundwork for a wide range of future explorations, both technical and theoretical, setting the stage for increasingly close collaboration between humans and machines in artistic creation. Understanding and optimizing the interactions between textual input and musical output will be crucial for developing increasingly sophisticated AI tools capable of expanding the boundaries of human creativity.

Creation of Intellectual Works through Generative AI: An Experimental Analysis © 2024 by Simone Dongu is licensed under CC BY-NC-SA 4.0
The original paper that I wrote, in Italian language, can be found on the official page of the HOLMES Scientific Research Project by the University of Cagliari. CLICK HERE to access the original paper.