Allison Parshall: Hi there, I’m Allison Parshall, you’re taking stamp of Science, Instant. This week, we’re revisiting just a few of our favorite episodes – and the truth is this one is considered one of my favorite issues I’ve ever labored on.
It’s doubtlessly the significant in a three-portion series on artificial intelligence making song. Collectively we’re going to listen to a very unusual song, and worth the technical revolution that made its introduction doable.
And the truth is, it’s the ideally suited time to be coming aid to this, since the implications of that revolution are getting very proper, very swiftly. Factual lately, a startup launched a tool of us are calling the “Chat GPT of song.” I’ve performed round with it a little bit of, and the truth is it’s left me … make of speechless. I didn’t know that AI-generated audio could perhaps perhaps sound this polished.
On supporting science journalism
In the occasion you are playing this article, preserve cease into consideration supporting our award-winning journalism by subscribing. By purchasing a subscription which that it is in all probability you’ll perhaps be helping to be definite that the blueprint in which forward for impactful tales about the discoveries and systems shaping our world this day.
Even having reported this series a twelve months ago—even knowing one thing love this had to be coming—I restful feel, exact, so caught off guard. So I am hoping you experience the episode, and check out the leisure of the series “AI Gets Musical” on scientificamerican.com.
[Clip: Show theme music]
Allison: This is Scientific American’s Science, Instant. I’m Allison Parshall.
I’m going to play you a song. And I’m willing to bet correct cash that you just’ve never heard the leisure adore it earlier than.
[CLIP:Originof“[CLIP:Beginningof“Enter Demons & Gods,” by Yaboi Hanoi]
Parshall: So taking stamp of this for doubtlessly the significant time, I changed into as soon as intrigued and moreover baffled. I changed into as soon as bopping my head to a beat that sounded rather familiar, however these notes didn’t sound familiar at all.
Are these notes that I could perhaps perhaps even play if I sat down at my piano?
[CLIP: Notes on piano]
And the melody—I’m no longer even obvious what instrument sounds love that.
As it seems, no instrument sounds love that. Nonetheless while you happen to be conversant in song from Thailand or with the country’s national sport, Muay Thai boxing, one thing about it can perhaps perhaps sound familiar.
[CLIP: Muay Thai boxing background sounds]
Lamtharn “Hanoi” Hantrakul: For Thai listeners it’s, love, in this uncanny valley of familiarity however moreover foreignness. If it makes you wanna transfer, I no doubt occupy succeeded in connecting with you.
Parshall: That’s Lamtharn “Hanoi” Hantrakul. He’s a song technologist and the brains within the aid of this wholly unique make of song.
He changed into as soon as born and raised in Bangkok. The nickname “Hanoi” is de facto make of confusing, on condition that it’s the capital of Vietnam—however his fogeys exact loved the metropolis so noteworthy that they named their kid after it.
Hanoi is Thai thru and thru. And musically, he’s acknowledged as Yaboi Hanoi, which make of started off as a shaggy dog story …
Hantrakul: This nerdy, love, song technologist attempting to be, love, a—you perceive, love, “Yeah, I’m love a fab yaboi Hanoi.”
Parshall: Nonetheless of us no doubt most in model it, so it stuck. And it’s under that moniker that he created the portion you heard a minute ago with the abet of a nonhuman assistant—machine studying, a make of synthetic intelligence.
Be aware, artificial intelligence algorithms are musicians now.
Welcome to portion regarded as one of a three-portion “Fascination—that’s what we’re calling these Science, Instant miniseries, FYI—on how artificial intelligence is getting deep into the sphere of song.
So about that whole AI is a musician now thing: I’m largely kidding—these computer algorithms are no doubt reflections of our accept as true with creativity and ingenuity, no topic how developed.
They are given a bunch of files to be taught from, and they may be able to detect refined patterns that could perhaps be light to accomplish notable predictions, no longer exact in song however in many inventive endeavors.
And so they’re getting no doubt correct at it. In reality, these algorithms are advancing so snappy that it’s initiating to feel love they’re doing blueprint better than exact sample recognition.
ChatGPT, a colossal language-studying model AI, can make a poem about estate tax or methamphetamine. It will assign a recipe for “French-model chicken thighs with carrots and cream.” Or it’ll exact accomplish your job utility quilt letters suck much less.
Dall-E 2, a deep-studying model, can accomplish fantastical art work from straightforward language prompts comparable to “a bowl of soup that is a portal to yet any other dimension within the model of Basquiat.”
And now of us comparable to Hanoi are the use of machine studying to inform the musical chokehold that Western Europe has had on in overall all of us.
That brings us aid to “Enter Demons & Gods”—Hanoi’s machine-studying-assisted composition that started off the episode. In 2022 he submitted it to an global competition known as the AI Music Contest.
He won.
[CLIP:Recordingof[CLIP:Recordingofaward ceremony]
Announcer: That blueprint with a combined whole of 21.1 facets from the voters and the jury…, Yaboi Hanoi is the winner of the AI Music Contest 2022!
[CLIP: Applause]
[CLIP: Hantrakul: Yeah, I’m, I’m just completely lost for words. I’m just so excited that the song spoke to so many people both at home in Thailand … and that it also spoke to the jury.]
Hantrakul: The reality that your ears are no longer light to it I occupy it no doubt heightens that feeling for many, love, Western listeners the assign it’s, love, it feels out of this world because literally it is out of this world of equal-tempered tuning.
Parshall: He’s referring to 12-tone equal temperament tuning. Even while you haven’t heard that technical name earlier than, you’ve positively heard it in action—it describes the 12 repeating notes that it is in all probability you’ll also play on a piano and the notes underlying in overall all of Western pop and classical song. Be aware, you occupy an octave—the assign the increased elaborate is precisely twice the frequency of the lower elaborate …
[CLIP: Piano plays an octave of A at 440 and 880 hertz]
Western song divides that octave into 12 notes spread apart at equal ratios.
[CLIP: Ascending chromatic A scale on the piano]
Hanoi knows this tuning machine wisely because his mom inspired him to play the piano as a kid, love so many of us around the sphere.
Hantrakul: My mom as soon as stated, “In the occasion you stamp play the piano, you’ll be in a job to stamp every completely different instrument.” And I take into account coming aid to her after I carried out my song significant and stated, “That’s the truth is splendid exact because all of song has been written from the perspective of Western devices.”
Parshall: It’s exact. These 12 notes of the piano can sound natural and customary while you grew up largely taking stamp of Western song.
Now hear me out; I’m going to geek out for a 2nd. I desire you to persist with me. The frilly on the piano known as A4 sounds love this …
[CLIP: 440 Hz tone]
… generally is determined at the frequency of 440 hertz, which blueprint the sound wave oscillates 440 times a 2nd.
Nonetheless there’s nothing particular about that 440 number. It changed into as soon as splendid first standardized within the Thirties. And some orchestras, love the Novel York Philharmonic, tune to 442 Hz. That’s a sufficiently exiguous distinction that our ears in overall can’t hear it. Nonetheless all the blueprint in which thru the Baroque period, around the 1600s, that same elaborate changed into as soon as role at 415 Hz.
[CLIP: 415 Hz tone]
All of this is to voice that pitch is a spectrum, and there are literally infinite notes which that it is in all probability you’ll perhaps tune an instrument to play. There’s no proper causes why we occupy to divide the octave into 12 notes in space of, inform, 22 or 5.
And there’s no proper reason that we’ve to space these notes out evenly thru the octave, either. There are a range of musicians who argue that the 12 notes of the Western scale sound better when they’re no longer spread apart at equal ratios.
And at final, we don’t even must unpleasant tuning systems round octaves at all. Obvious, the legal guidelines of physics and the biology of our ears could perhaps perhaps predispose us to search out the octave wonderful, however we human beings are inventive and versatile creatures.
So, given these infinite chances, it is a ways good that cultures around the sphere identify to lower up musical space otherwise. Neither of the Thai fiddles that Hanoi learned to play when he changed into as soon as rising up in Bangkok …
Hantrakul: The saw u. The saw duang …
Parshall: Fit the 12-tone model—they’re closer to one thing that musicologists occupy known as seven-tone equal temperament tuning. That suits seven notes within the the same quantity of musical space as Western song suits 12—though that’s positively an oversimplification of the complexities of Thai tuning.
In the same blueprint, the pi nai, the oboe-love instrument performed at Thai boxing suits, plays notes and intervals that could perhaps’t be mimicked on the piano.
In reality, while you preserve cease this pi nai trill –…
[CLIP: Pi nai trill]
Parshall: The closest notes I will play on the piano sound love this …
[CLIP: Pi nai piano approximation]
Parshall: These pitches are make of cease, however they’re no pi nai. Yet these are the pitches that our song know-how is designed to work with.
Hantrakul: And I occupy that changed into as soon as no doubt, you perceive, the initiating point for me to say, wisely, “What if the reverse changed into as soon as exact? What if lets use know-how to write song that is on the phrases of song from Thailand?”
Parshall: That’s the assign machine studying is obtainable in. Be aware, there’s been a pretty ample revolution within the blueprint in which it no doubt works with song.
For a prolonged time, the song we made with AI changed into as soon as exiguous to exact these 12 notes on the piano.
That’s because raw audio files themselves are huge, encoding tons of files. There are generally 44,000 samples every 2nd for correct quality recording—double that for a stereo recording that plays completely different tracks in each ear of your headphones.
So, let’s scrutinize, while you tried to occupy an A.I. chunk thru Led Zeppelin’s “Stairway to Heaven” …
[CLIP:LedZeppelin’s[CLIP:LedZeppelin’sStairway to Heaven]
… it would occupy over 42 million files facets to direction of. That’s exact too noteworthy for an AI algorithm.
So these older algorithms wished one thing extra efficient. All they would handle changed into as soon as a symbolic, text-based completely mostly illustration that will get the gist of the notes being performed…
[CLIP:MIDImodelofLedZeppelin’s[CLIP:MIDIversionofLedZeppelin’sStairway to Heaven]
… love notes written on a page to repeat you the blueprint in which which that it is in all probability you’ll perhaps play a song on the piano. And that’s the assign we would speed into our outdated foe, the exiguous 12-tone scale.
[CLIP: Reprise of the 12-tone approximation of the pi nai]
Hantrakul: The minute that your song goes off of this tuning machine, there’s the truth is no blueprint for these symbolic units to even, love, stamp or comprehend these forms of melodies.
Parshall: So the outdated symbolic units wouldn’t occupy acknowledged what to cease with the pi nai. Fortuitously, our computing energy is at final catching up to our audio ambitions.
Hanoi is considered one of many engineers who helped accomplish this doable thru his old work at Google Magenta and present work at TikTok.
He’s developed machine-studying instruments that abet you preserve cease a melody being performed on one instrument and, within seconds, remodel it into yet any other instrument.
Hantrakul: There had been these no doubt fun demos that I labored on when I changed into as soon as at Google the assign, you perceive, which that it is in all probability you’ll perhaps preserve cease Indian classical singing however then remodel it as if it changed into as soon as being performed by a saxophone or preserve cease the sound of birds and occupy that rerendered as a flute.
[CLIP: Birds re-rendered as a flute using Google’s Tone Transfer]
Parshall: And for Hanoi, these machine-studying instruments had been what allowed him to operate straight on recordings of the pi nai without needing to filter it thru the sieve of Western song notation.
Let’s hear that trill again:
[CLIP: Pi nai trill]
Parshall: That recording comes from the Thai musician Udomkiet Joey Phengaubon taking part in a frail Thai classical melody.
Hanoi fed this recording thru regarded as one of many AI instruments he helped make, known as Mawf. It extracted the extraordinary traits of its tuning and timbre and rerendered it as a obvious instrument—first the saxophone …
[CLIP: Pi nai as saxophone]
Parshall: Then the trumpet …
[CLIP: Pi Nai as trumpet]
Parshall: And then yet any other Thai instrument for correct measure.
[CLIP: Pi nai as khlui flute]
Hantrakul: It’s, love, 5 devices, love, layered on top of every completely different with rather quite so a lot of distortions. So it—who knows what it is anymore?
[CLIP: All instruments together; music reenters]
Parshall: Hanoi named his portion “Enter Demons & Gods,” or in Thai …
Hantrakul: “Asura Deva Choom Noom.”
Parshall: It changed into as soon as inspired by a scene in Thai mythology the assign there’s a legendary clash between an whole bunch of gods and demons over an elixir of immortality.
[CLIP: Battle sound effects in the background]
Somehow he created in overall a Thai-Western fusion EDM note.
Hantrakul: I don’t wanna call it, love, cultural preservation. I love to call it, love, cultural reinvigoration. It be extremely freeing, I occupy. I nearly feel love I’ve been talking in English my whole lifestyles.
Parshall: And now that he’s talking Thai, he doesn’t belief to slay.
Hantrakul: It changed into as soon as kind of this ample bang moment for me that I could perhaps perhaps restful write extra song love this. It’s these moments the assign a Thai passage that will never be rerendered beyond classical Thai song at final crosses this dimension into electronic song. To me, this is the definition of what AI can empower inventive human beings in an effort to cease.
[CLIP: Music ends]
Parshall: The AI Music Contest judges and vote casting public loved Hanoi’s song. And the competition is rising every twelve months with unique musical artists that every use AI in a obvious blueprint.
Nonetheless this revolution in song AI that Hanoi took revenue of, from flattened, text-based completely mostly representations to the plush world of raw audio, has a ways-reaching consequences.
In the next episode …
[CLIP: ChristineMcLeavey:“Working with raw audio, the sky is the limit in terms of what you can create.”]
[CLIP:ShellyPalmer:Thequantityofissuesthatwillrestfulbeexactforthistobewhatitisareinconceivable—love[CLIP:ShellyPalmer:Theamountofthingsthathavetobetrueforthistobewhatitisareunbelievable—likeinconceivable.”]
Parshall: We’re going to listen to how upsetting developed these algorithms occupy gotten thru a present model created by Google that could perhaps preserve cease your written description of song and assign an audio file at the click on of a button.
Science, Instant is produced by Jeff DelViscio, Tulika Bose and Kelso Harper. Our theme song changed into as soon as restful by Dominic Smith.
Don’t overlook to subscribe to Science, Instant wherever you secure your podcasts. For added in-depth science files and facets, mosey to ScientificAmerican.com.
For Scientific American’s Science Instant, I’m Allison Parshall.