Scientists have built noise-canceling headphones that filter out specific types of sound in real-time — such as birds chirping or car horns blaring — thanks to a deep learning artificial intelligence (AI) algorithm.
The system, which researchers at the University of Washington dub “semantic hearing,” streams all sounds captured by headphones to a smartphone, which cancels everything before letting wearers pick the specific types of audio they’d like to hear. They described the protoype in a paper published Oct. 29 in the journa IACM Digital Library.
Once sounds are streamed to the app, the deep learning algorithm embedded in the software means they can use voice commands, or the app itself, to choose between 20 categories of sound to allow. These include sirens, baby cries, vacuum cleaners, and bird chips among others. They chose these 20 categories because they felt humans could distinguish between them with reasonable accuracy, according to the paper. The time delay for this entire process is under one-hundredth of a second.
“Imagine being able to listen to the birds chirping in a park without hearing the chatter from other hikers, or being able to block out traffic noise on a busy street while still being able to hear emergency sirens and car honks or being able to hear the alarm in the bedroom but not the traffic noise,” Shyam Gollakota, assistant professor in the Department of Computer Science and Engineering at the University of Washington, told Live Science in an email.
Related link: Best running headphones 2023: Step up your workout
Deep learning is a form of machine learning in which a system is trained with data in a way that mimics how the human brain learns.
The deep learning algorithm was challenging to design, Gollakota said, because it needed to understand the different sounds in an environment, separate the target sounds from the interfering sounds, and preserve the directional cues for the target sound. The algorithm also needed all of this to happen within just a few milliseconds, so as not to cause lags for the wearer.
His team first used recordings from AudioSet, a widely used database of sound recordings, and combined this with additional data from four separate audio databases. The team labeled these entries manually then combined them to train the first neural network.
But this neural network was only trained on sample recordings — not real-world sound, which is messier and more difficult to process. So the team created a second neural network to generalize the algorithm it’d eventually deploy. This included more than 40 hours of ambient background noise, general noises you’d encounter in indoor and outdoor spaces, and recordings captured from more than 45 people wearing a variety of microphones.
They used a combination of the two datasets to train the second neural network, so it can distinguish between the target categories of sound in the real world, regardless of which headphones the user is wearing, or the shape of their head. Differences, even small ones, may affect the way the headphones receive sound.
The researchers plan to commercialize this technology in the future and find a way to build headphones fitted with the software and hardware to perform the AI processing on the device.
“Semantic hearing is the first step towards creating intelligent hearables that can augment humans with capabilities that can achieve enhanced or even superhuman hearing,” Gollakota continued, which likely means amplifying quiet noises or allowing wearers to hear previously inaudible frequencies.
“In the industry we are seeing custom chips that are designed for deep learning integrated into wearable devices. So it is very likely that technology like this will be integrated into headsets and earbuds that we are using.”