Alex Lavaee Apr 9, 2023

Statistical Analysis and Exploration of Generative AI in Graphic Memoirs

According to Anantrasirichai and Bull's "Artificial Intelligence in the Creative Industries: a Review," artificial intelligence is described as an algorithmic process which allows "a computer system to develop and emulate human-like behavior and hence make decisions similar to (or in some cases, better than) humans" (591). A large branch of AI is machine learning (ML) which "employs computational methods to 'learn' information directly from large amounts of example data without relying on a predetermined equation or model" (591). Among the most recent developments in artificial intelligence is generative AI which utilizes algorithms to very accurately (measured as the algorithm's ability to effectively predict the next word/token given a previous sequence of words/tokens) create or generate text and image content. Artificial intelligence (AI) and statistical analysis is an exciting and rapidly evolving field, but its impact on literature has yet to be fully explored. In the first part of my research I will attempt to create a statistical tool to analyze graphic memoirs with Marjane Satrapi's graphic memoir, The Complete Persepolis, as a case study by closely examining the text and image content (features) of her memoir utilizing machine learning techniques. I will then follow my analysis with an experiment using generative AI to recreate text in the style of Satrapi in order to explore the capabilities of AI. While there are previous papers describing the influence of Satrapi's work as a medium to promote political and social activism against Iran's strict Islamic regime (Stromberg 91-119), none follow a statistical and machine learning approach to automatically identify unique features in Satrapi's memoir for analysis and directly assess the effects of generative AI for this matter. My work also provides a baseline as well as an open-source tool for statistical methods in comparing different graphic memoirs.

Methods

There are numerous features of a graphic memoir that allow it to convey different messages. For instance, the average number of words per page which can be extracted using optical character recognition algorithms presented in Vetulani, Zygmunt, et al. can reveal much about an author's style—short versus long sentences can have different effects on readers. Additionally, the number of panels in a page allows certain authors to underscore a particular scene in their memoir. Using object recognition or shape detection algorithms, it may be possible to count the number of panels per page in a given graphic memoir. Also, there are currently sentiment analysis algorithms in place that allow computers to recognize the connotation of certain words and phrases with the capabilities of large language models (LLMs) which are algorithms that have the ability to effectively interpret and generate natural language. Looking at the ratio of positive, negative, and neutral sentiment words may reveal important information in that regard. Another important metric would be to consider the average ratio of words to images in each page of the memoir. Additionally, whether a memoir is illustrated in black and white may reveal much about the work versus colored content. The author's background including race, age, nationality, gender, and any other personal information all could be indirectly related to the aformentioned statistics. Through statistical analysis of these features I can gain a greater context of the features that allow Satrapi's work to be such an effective medium for political and social protest as well as create a new form of computational literary analysis that can be embedded with traditional analysis of text.

Findings

Object Detection for Counting Panels

Using traditional object detection algorithms and appropriate shape thresholding it is possible to get an accurate count of the average number of panels per page in a given chapter of Satrapi's work. Through observing the number of panels per page, the reader can understand and compare the presence or lack-thereof of images in different memoirs. In the case of Satrapi's memoir, the strong presence of images aligns well with a study by psychologists Piotr Winkielman and Yekaterina Gogolushko on the "Influence of Suboptimally and Optimally Presented Affective Pictures and Words on Consumption-Related Behavior" which reveals that a single image elicits a stronger emotional response than a single word. In this particular sample page below we can observe the results of the object detection algorithm. In this specific case, the algorithm detects 6 panels denoted by the green outlines in the right-most image:

Panel Detection

Chapter vs Average Number of Panels Per Page

Furthermore, digging deeper into the distribution of chapters versus the average number of panels per page one can see that Satrapi typically does not use more than seven panels on average per page. With this information, one can use the average number of panels per page to compare different memoirs.

Average Panels

Chapter vs Average Number of Words Per Page

Moving on to the average number of words per page, due to the nature of The Complete Persepolis being a graphic memoir, it is much less dense in terms of average word (at most 200 words) count per page compared to novels which typically average to much more. The statistic of average number of words per page can be used to contrast the writing styles of different authors in graphic memoirs and weigh the advantages and disadvantages of denser texts. A highly debated topic in literature is finding this optimal word-count balance. By repeating this analysis for multiple memoirs it may be possible to find a general trend in word-count.

Average Words

Words Frequencies

The frequency of certain words in a graphic memoir can also reveal more about the theme of a particular memoir. In the case of Satrapi's The Complete Persepolis, below is a visualization (word cloud) of the most common words in the entire memoir along with an accompanying table:

Word Cloud
Word Relative Frequency
one 1.0
know 0.98
see 0.839
going 0.826
time 0.805
mother 0.752
want 0.752
u 0.698
go 0.691
come 0.691
even 0.597
day 0.57
well 0.57
think 0.53
right 0.523
will 0.517
say 0.503
went 0.49
father 0.49

As seen in the table above, among the most common words is "know". The word "know" may appear frequently because it shows Satrapi's continual quest for knowledge of the world around her through her childhood in Iran, teenage years in Austria, and early adulthood in Iran. Arguably, this statistic reveals Satrapi's focus on knowledge acquired through interpersonal relationships with family, friends, and acquaintances. These relationships lead Satrapi to develop a knack for social and political activism in order to inform individuals with stereotypical and disillusioned perspectives of her Iranian nationality. This raises the ultimate question if the knowledge we gain from other people is as reliable as the research we conduct. The broader implications of Satrapi's concept of social disillusionment through her encounters with the global community while in school at Austria can be related to the effects of social media today and false narratives surrounding demographics of people.

Chapter vs Word to Image Ratio

A highly discussed topic introduced by Scott McCloud in the chapter "Show and Tell" of his work, Understanding Comics: The Invisible Art, is finding the "perfect" balance between words and images. According to McCloud the norm in literature is that great works of art and literature are only possible when the words and images are kept separate. In Scott's perspective, people are often taught with picture books as children because they are "easier" and less nuanced than the more "real" novels read as adults. However, the art of comics is finding the right balance of words and images. Specifically, Satrapi's memoir utilizes a word and picture combination known as interdependent which involves words and pictures going hand in hand to convey an idea that neither could convey alone. The graph below shows the word to image ratios in every chapter of Marjane Satrapi's graphic memoir, The Complete Persepolis. For reference, the higher the word to image ratio, the greater the number of words relative to images there are. By repeating this analysis, it is possible to compare the interaction between words and images to study the ratio of each and their relative effectiveness in graphic memoirs.

Average Ratio

Colors Present in Memoir

Another undervalued property of graphic memoirs is their color palette. Scott McCloud describes the effect of color in the chapter "A Word About Color" of his work, Understanding Comics: The Invisible Art. According to McCloud, while color gives the author more ability to explore with the expression of their images and "objectify their subjects" (189), "unfortunately, color is still an expensive option and has historically been in the hands of larger, more conservative publishers" (191). This leads to most comic artists to be forced to experiment with shades of black and white. As explained by McCloud, "the differences between black-and-white and color comics are vast and profound affecting every level of the reading experience." (192) McCloud states that in "black and white, the ideas behind the art are communicated more directly [where] meaning transcends form [and] art approaches language." (192) Furthermore, McCloud states that flat colors "take on more significance...and through more expressive colors comics can become an intoxicating environment of sensations that only color can give." (192) Thus, color plays an important role in the meaning of graphic memoirs, however, cost can be prohibitive. This could be a potential reason Satrapi uses a black and white color palette as well as to communicate a more direct meaning of her work:

Primary Colors

47%
rgb(20, 20, 20)
 

Discussion

One interesting statistic to explore is the sentiment of a graphic memoir. Using state of the art language models it is possible to identify the connotation and tone of pages in a graphic memoir. As a good starting point, we can place the concept of sentiment into three broad categories: negative, neutral, and positive. Linguistically speaking, it can be possible to create datasets that roughly map common English words and their meaning to these categories. In the case of Marjane's work, it is not surprising to see that the majority of pages are detected as negative because the graphic memoir is centered around the difficulty of her life with the Iranian Revolution of 1979, Iran-Iraq War, separation from her family, loss of loved ones, and other trauma. However, this method is still lacking in understanding the nuance of combinations of multiple sentiments being expressed in a page. While it is possible to take the ratio of positive, neutral, and negative to aggregate a statistic of each, it leads to difficulty in effectively communicating my analysis. Hence, sentiment as a comparison metric of different memoirs has the potential to be further explored in future works. A graph of the highest sentiment score for each page of the memoir can be visualized in the bar graph below:

Sentiment

AI Generated Text Using GPT2

As a final experiment, the capabilities of generative AI are explored through a transfer learning application on a LLM architecture known as GPT2 (Generative Pre-Trained Transformer). The model was fine-tuned (trained) on Satrapi's memoir and then tasked with sentence-completion of similar style text to her memoir. An example of the results are shown below. In this example, it can be seen that there are generally inconsistencies in the AI-generated text such as repetitive statements and logical fallacies. Through this initial case study, generative AI is still a long way from effectively and convincingly generating text. In fact, the simplest case of the algorithm involves probabilistically generating text based on the context of previous words (tokens) in the text. In most cases, similar types of algorithms are found to regurgitate information that they are trained on rather than generating unique information.

GPT2

Conclusion

The space of AI and literature is an exciting and upcoming field. It is possible to explore human creativity and the effectiveness of literature by utilizing statistics and machine learning. My work sets a framework for future statistical tools that can be used to compare and contrast pieces of literature, especially graphic memoirs. My code is open-source and available for future use and modification in my GitHub repo.

(1893 Words)

Works Cited

Anantrasirichai, Nantheera, and David Bull. "Artificial Intelligence in the Creative Industries: a Review." The Artificial Intelligence Review, vol. 55, no. 1, 2022, pp. 589-656. https://doi.org/10.1007/s10462-021-10039-7

The following research paper provides a background on the current state of artificial intelligence (AI) in creative applications. It categorizes creative applications of AI into (i) content creation, (ii) information analysis, (iii) content enhancement and post production workflows, (iv) information extraction and enhancement, and (v) data compression. It is also important to note that the work also emphasizes the use of AI as a creative aid rather than a replacement for human creativity. The genre of this journal falls under the background source category of BEAT. I plan to use this source to highlight the current capabilities of AI and connect it to the subject matter of my research paper, computational creativity for graphic memoirs. (115 words)

McCloud, Scott. Understanding Comics. William Morrow Paperbacks, 1994.

McCloud's work examines the unique qualities of comics and presents a unified theory of how they work as an art form to create significant meaning. He covers everything from the physical construction of comics to the role of the reader, and provides a clear and engaging guide to this often misunderstood medium. McCloud is a highly accredited source that provides technical information pertaining to the functional relationships of words and images in comics. As a theory source in BEAT, this book will help in my analysis of specific techniques used in each graphic memoir and will also help me collect more informative statistics on the memoirs for my analysis of features. (111 words)

Satrapi, Marjane. The Complete Persepolis. Pantheon Books, 2007.

Persepolis is the story of Satrapi's unforgettable childhood and coming of age within a large and loving family in Tehran during the Islamic Revolution of 1979. The memoir describes her journey between Europe and Iran along with the violations of freedom and Satrapi's desire to pursue social activism. This memoir falls into one of three exhibit sources in BEAT that I will be using for the statistical analysis in my research. The work particularly appeals to me because it motivates me to explore the unique backgrounds of each author to compare and contrast their upbringing and how it translates to their writing. (102 words)

Stromberg, Fredrik. "Schemata in the Graphic Novel Persepolis." European Comic Art, vol. 13, no. 2, 2020, pp. 91-119. https://doi.org/10.3167/eca.2020.130205

This research paper describes the influence of Marjane Satrapi's The Complete Persepolis as a medium to promote political and social activism against Iran's strict Islamic regime. The paper mentions the influence of the graphic by opening up a new genre for artists and authors in the aftermath of the Green Movement and most recently the protests sparked by the death of Mahsa Amini. The argument source in the BEAT acronym also explains how Satrapi's work re-presents Iran, often misinterpreted with negative preconceptions by the Western reader through the weight of words and the representational burden of images. Through the context of Stromberg's work, I will gain a greater understanding of the global impact of Satrapi's work and connect it to my overarching analysis of what makes her memoir so emotionally compelling. (131 words)

Vetulani, Zygmunt, et al. "How to Improve Optical Character Recognition of Historical Finnish Newspapers Using Open Source Tesseract OCR Engine - Final Notes on Development and Evaluation." Human Language Technology. Challenges for Computer Science and Linguistics, vol. 12598, Springer International Publishing AG, Switzerland, 2020, pp. 17-30. Lecture Notes in Computer Science. https://doi.org/10.1007/978-3-030-66527-2_2

The paper presented in this journal provides an intersting perspective into the state of converting paper content to digital formats using optical character recognition technology (OCR). The particular case study describes the efforts that are being carried out in the National Library of Finland (NLF) using the open-source OCR engine, Tesseract. As more paper formats are scanned and digitized the possibilities for computational analysis of literature dramatically increase. The genre of this journal falls under the background source category of BEAT. This work provides important context since in my work I use a similar approach for converting a PDF of Satrapi's memoir to images and then use Google's OCR library to extract the text. (114 words)

Winkielman, Piotr, and Yekaterina Gogolushko. "Influence of Suboptimally and Optimally Presented Affective Pictures and Words on Consumption-Related Behavior." Frontiers in Psychology, vol. 8, 2018, p. 2261. https://doi.org/10.3389/fpsyg.2017.02261

This study measures the impact of two of the most common emotional stimuli, images and words. According to their research, a single picture has the power to sway people while a single word does not. The genre of this source falls under the background source category of BEAT. The goal of including the scientists' work is to provide additional context to the statistical analysis I will be attempting on Marjane Satrapi's The Complete Persepolis, specifically, measuring the average ratio of words to images per page. This source will allow me to better convey the significance of using object detection algorithms to count the panels and images in a graphic memoir. (110 words)