Monthly Archives: August 2023

Generative AI and human understanding

Many years ago, BYTE magazine published an algorithm for something they called a “travesty generator” that generated an infinite length text from a fixed corpus. It worked by statistically generating tuples of characters based on the probability of occurrence in the original; for example, if asked for the next letter that should come after the letter pair “th”, it would likely come up with “e” if the word “the” was commonly found in the original.

The length of the tuple used to select the next letter could be altered. Lower numbers tended to match parts of words, so the output was “words” that more or less approximated English-sounding language but were nonsense. Higher numbers resulted in whole actual words, but sentences that didn’t make sense; as the number was increased, sensible phrases would start to appear juxtaposed with each other. I implemented, modified and used this mechanism in a number of pieces. One of these, “21 bits”, continuously increased the length of the comprehensibility parameter over the course of the piece. I had performers reading the resulting text, as well as a computer vocalizing it.

The point of the text was to play with the audience’s tendency to anthropomorphize what they were hearing. When listening to the generated nonsense, the human inclination was to read meaning into the result. Many people were surprised and amused at the salience of the vocalized output during a performance.

In general, a lot of my music took advantage of people’s inclination to read into the sound they were hearing, by selecting voice-like or physical-instrument-like sound as material. Hearing something like a voice, people tend to want to try to understand what it’s saying, for example. Hearing a piano sound, they tend to visualize the instrument, so there is a dissonance created when the piano sound has vibrato. I used things like this to create a rich context for simple cheap audio.

This effect reminds me of the buzz about generative AI and “understanding”. If I comprehend how things work, it’s a bigger more sophisticated statistical approach to the generation of the “next” set of words from a fixed corpus (or, more accurately, from a neural network trained on a fixed corpus, so somewhat more indirect).

We humans have a tendency to attribute intention to animals, babies, computers. When a puppy does something smart, the puppy must be smart. When a baby smiles at me, it must be happy to see me. When a computer responds to a question with a well-formed answer, it must be intelligent. In my case, I leveraged that mechanism for artistic purposes. I’m sure others will find less benign reasons for doing it.