How Artificial Intelligence Helped Write this Award-Winning Song

Allison Parshall: Hey, I’m Allison Parshall, you’re listening to Science, Quickly. This week, we’re revisiting some of our favorite episodes – and honestly this one is one of my favorite things I’ve ever worked on. 

It’s the first in a three-part series on artificial intelligence making music. Together we’re going to hear a very unique song, and trace the technical revolution that made its creation possible.

And actually, it’s the perfect time to be coming back to this, because the consequences of that revolution are getting very real, very quickly. Just recently, a startup released a tool people are calling the “Chat GPT of music.” I’ve played around with it a bit, and honestly it’s left me … kind of speechless. I didn’t know that AI-generated audio could sound this polished. 


On supporting science journalism

If you’re enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.


Even having reported this series a year ago—even knowing something like this had to be coming—I still feel, just, so caught off guard. So I hope you enjoy the episode, and check out the rest of the series “AI Gets Musical” on scientificamerican.com.

[Clip: Show theme music]

Allison: This is Scientific American’s Science, Quickly. I’m Allison Parshall.

I’m going to play you a song. And I’m willing to bet good money that you’ve never heard anything like it before.

[CLIP: Beginning of “Enter Demons & Gods,” by Yaboi Hanoi]

Parshall: So listening to this for the first time, I was intrigued and also baffled. I was bopping my head to a beat that sounded pretty familiar, but those notes didn’t sound familiar at all.

Are those notes that I could even play if I sat down at my piano?

[CLIP: Notes on piano] 

And the melody—I’m not even sure what instrument sounds like that.

As it turns out, no instrument sounds like that. But if you happen to be familiar with music from Thailand or with the country’s national sport, Muay Thai boxing, something about it may sound familiar.

[CLIP: Muay Thai boxing background sounds] 

Lamtharn “Hanoi” Hantrakul: For Thai listeners it’s, like, in this uncanny valley of familiarity but also foreignness. If it makes you wanna move, I have succeeded in connecting with you.

Parshall: That’s Lamtharn “Hanoi” Hantrakul. He’s a music technologist and the brains behind this wholly new kind of music.

He was born and raised in Bangkok. The nickname “Hanoi” is admittedly kind of confusing, given that it’s the capital of Vietnam—but his parents just loved the city so much that they named their kid after it.

Hanoi is Thai through and through. And musically, he’s known as Yaboi Hanoi, which kind of started off as a joke …

Hantrakul: This nerdy, like, music technologist trying to be, like, a—you know, like, “Yeah, I’m like a cool yaboi Hanoi.”

Parshall: But people really liked it, so it stuck. And it’s under that moniker that he created the piece you heard a minute ago with the help of a nonhuman assistant—machine learning, a type of artificial intelligence.

See, artificial intelligence algorithms are musicians now.

Welcome to part one of a three-part “Fascination—that’s what we’re calling these Science, Quickly miniseries, FYI—on how artificial intelligence is getting deep into the world of music.

So about that whole AI is a musician now thing: I’m mostly kidding—these computer algorithms are really reflections of our own creativity and ingenuity, no matter how advanced.

They are given a bunch of data to learn from, and they can detect subtle patterns that can be used to make useful predictions, not just in music but in many artistic endeavors.

And they’re getting really good at it. In fact, these algorithms are advancing so fast that it’s beginning to feel like they’re doing way more than just pattern recognition.

ChatGPT, a large language-learning model AI, can compose a poem about estate tax or methamphetamine. It can create a recipe for “French-style chicken thighs with carrots and cream.” Or it can just make your job application cover letters suck less.

Dall-E 2, a deep-learning model, can make fantastical art from simple language prompts such as “a bowl of soup that is a portal to another dimension in the style of Basquiat.”

And now people such as Hanoi are using machine learning to challenge the musical chokehold that Western Europe has had on basically all of us.

That brings us back to “Enter Demons & Gods”—Hanoi’s machine-learning-assisted composition that started off the episode. In 2022 he submitted it to an international competition called the AI Song Contest.   

He won.

[CLIP: Recording of award ceremony]

Announcer: That means with a combined total of 21.1 points from the voters and the jury…, Yaboi Hanoi is the winner of the AI Song Contest 2022!

[CLIP: Applause]

[CLIP: Hantrakul: Yeah, I’m, I’m just completely lost for words. I’m just so excited that the song spoke to so many people both at home in Thailand … and that it also spoke to the jury.]

Hantrakul: The fact that your ears are not used to it I think it really heightens that feeling for many, like, Western listeners where it’s, like, it feels out of this world because literally it is out of this world of equal-tempered tuning.

Parshall: He’s referring to 12-tone equal temperament tuning. Even if you haven’t heard that technical name before, you’ve definitely heard it in action—it describes the 12 repeating notes you can play on a piano and the notes underlying basically all of Western pop and classical music. See, you have an octave—where the higher note is exactly twice the frequency of the lower note …

[CLIP: Piano plays an octave of A at 440 and 880 hertz]

Western music divides that octave into 12 notes spread apart at equal ratios. 

[CLIP: Ascending chromatic A scale on the piano]

Hanoi knows this tuning system well because his mom encouraged him to play the piano as a kid, like so many people around the world. 

Hantrakul: My mom once said, “If you understand how to play the piano, you’ll be able to understand every other instrument.” And I remember coming back to her after I finished my music major and said, “That’s actually only true because all of music has been written from the perspective of Western instruments.”

Parshall: It’s true. Those 12 notes of the piano can sound natural and normal if you grew up mostly listening to Western music.

Now hear me out; I’m going to geek out for a second. I want you to stay with me. The note on the piano called A4 sounds like this …

[CLIP: 440 Hz tone]

… typically is set at the frequency of 440 hertz, meaning the sound wave oscillates 440 times a second.

But there’s nothing special about that 440 number. It was only first standardized in the 1930s. And some orchestras, like the New York Philharmonic, tune to 442 Hz. That’s a small enough difference that our ears basically can’t hear it. But during the Baroque period, around the 1600s, that same note was set at 415 Hz. 

[CLIP: 415 Hz tone]

All of this is to say that pitch is a spectrum, and there are literally infinite notes you could tune an instrument to play. There’s no real reason why we have to divide the octave into 12 notes rather than, say, 22 or five.

And there’s no real reason that we have to space those notes out evenly through the octave, either. There are plenty of musicians who argue that the 12 notes of the Western scale sound better when they’re not spread apart at equal ratios.

And finally, we don’t even have to base tuning systems around octaves at all. Sure, the laws of physics and the biology of our ears might predispose us to find the octave pleasing, but we human beings are creative and flexible creatures. 

So, given these infinite possibilities, it makes sense that cultures around the world choose to chop up musical space differently. Neither of the Thai fiddles that Hanoi learned to play when he was growing up in Bangkok …

Hantrakul: The saw u. The saw duang

Parshall: Fit the 12-tone model—they’re closer to something that musicologists have called seven-tone equal temperament tuning. That fits seven notes in the same amount of musical space as Western music fits 12—though that’s definitely an oversimplification of the complexities of Thai tuning.

Similarly, the pi nai, the oboe-like instrument played at Thai boxing matches, plays notes and intervals that can’t be mimicked on the piano.

In fact, if you take this pi nai trill –…

[CLIP: Pi nai trill]  

Parshall: The closest notes I can play on the piano sound like this …

[CLIP: Pi nai piano approximation] 

Parshall: Those pitches are kind of close, but they’re no pi nai. Yet those are the pitches that our music technology is designed to work with.

Hantrakul: And I think that was really, you know, the starting point for me to think about, well, “What if the reverse was true? What if we could use technology to write music that is on the terms of music from Thailand?”

Parshall: That’s where machine learning comes in. See, there’s been a pretty big revolution in how it works with music.

For a long time, the music we made with AI was limited to just those 12 notes on the piano.

That’s because raw audio files themselves are massive, encoding tons of information. There are typically 44,000 samples every second for good quality recording—double that for a stereo recording that plays different tracks in each ear of your headphones.

So, let’s see, if you tried to have an A.I. chew through Led Zeppelin’s “Stairway to Heaven” … 

[CLIP: Led Zeppelin’s Stairway to Heaven]

… it would have over 42 million data points to process. That’s just too much for an AI algorithm.

So those older algorithms needed something simpler. All they could handle was a symbolic, text-based representation that gets the gist of the notes being played…

[CLIP: MIDI version of Led Zeppelin’s Stairway to Heaven]

… like notes written on a page to tell you how you might play a song on the piano. And that’s where we would run into our old foe, the limited 12-tone scale.

[CLIP: Reprise of the 12-tone approximation of the pi nai]

Hantrakul: The minute that your music goes off of this tuning system, there’s actually no way for these symbolic models to even, like, understand or comprehend some of these melodies.

Parshall: So the old symbolic models wouldn’t have known what to do with the pi nai. Luckily, our computing power is finally catching up to our audio ambitions.

Hanoi is one of the engineers who helped make this possible through his previous work at Google Magenta and current work at TikTok.

He’s developed machine-learning tools that allow you to take a melody being played on one instrument and, within seconds, transform it into another instrument.

Hantrakul: There were these really fun demos that I worked on when I was at Google where, you know, you could take Indian classical singing but then transform it as though it was being played by a saxophone or take the sound of birds and have that rerendered as a flute.

[CLIP: Birds re-rendered as a flute using Google’s Tone Transfer]

Parshall: And for Hanoi, these machine-learning tools were what allowed him to operate directly on recordings of the pi nai without having to filter it through the sieve of Western music notation.

Let’s hear that trill again:

[CLIP: Pi nai trill]

Parshall: That recording comes from the Thai musician Udomkiet Joey Phengaubon  playing a standard Thai classical melody. 

Hanoi fed this recording through one of the AI tools he helped develop, called Mawf. It extracted the unique characteristics of its tuning and timbre and rerendered it as a different instrument—first the saxophone …      

[CLIP: Pi nai as saxophone]

Parshall: Then the trumpet …

[CLIP: Pi Nai as trumpet]

Parshall: And then another Thai instrument for good measure.

[CLIP: Pi nai as khlui flute]

Hantrakul: It’s, like, five instruments, like, layered on top of each other with a lot of distortions. So it—who knows what it is anymore?

[CLIP: All instruments together; music reenters] 

Parshall: Hanoi named his piece “Enter Demons & Gods,” or in Thai …

Hantrakul:Asura Deva Choom Noom.”

Parshall: It was inspired by a scene in Thai mythology where there’s a legendary clash between hundreds of gods and demons over an elixir of immortality.

[CLIP: Battle sound effects in the background]

In the end he created basically a Thai-Western fusion EDM track.

Hantrakul: I don’t wanna call it, like, cultural preservation. I like to call it, like, cultural reinvigoration. It’s incredibly liberating, I think. I almost feel like I’ve been speaking in English my entire life.

Parshall: And now that he’s speaking Thai, he doesn’t plan to stop.

Hantrakul: It was sort of this big bang moment for me that I should write more music like this. It’s those moments where a Thai passage that could never be rerendered beyond classical Thai music finally crosses this dimension into electronic music. To me, this is the definition of what AI can empower creative human beings to be able to do.

[CLIP: Music ends]

Parshall: The AI Song Contest judges and voting public loved Hanoi’s song. And the contest is growing each year with new musical artists that each use AI in a different way.

But this revolution in music AI that Hanoi took advantage of, from flattened, text-based representations to the lush world of raw audio, has far-reaching consequences.

In the next episode … 

[CLIP: ChristineMcLeavey:“Working with raw audio, the sky is the limit in terms of what you can create.”]

[CLIP: ShellyPalmer:The amount of things that have to be true for this to be what it is are unbelievable—like, unbelievable.”] 

Parshall: We’re going to hear how scary advanced these algorithms have gotten through a recent model created by Google that can take your written description of music and create an audio file at the click of a button.

Science, Quickly is produced by Jeff DelViscio, Tulika Bose and Kelso Harper. Our theme music was composed by Dominic Smith.

Don’t forget to subscribe to Science, Quickly wherever you get your podcasts. For more in-depth science news and features, go to ScientificAmerican.com.

For Scientific American’s Science Quickly, I’m Allison Parshall.

Read More

Allison Parshall