• Lowpass
  • Posts
  • AI turns Roku remotes into karaoke machines

AI turns Roku remotes into karaoke machines

LLMBop

In partnership with

Welcome to Lowpass! This week: Volley CEO Max Child on the company’s use of AI to build better voice games.

How Volley turned Roku remotes into karaoke machines

The other day, a video went viral of TikTok star Kyle Thorn doing “blind” karaoke: Facing the TV with his back, he tried to recognize the karaoke versions popular songs based on the music, and then sing along without taking a peek at the lyrics – getting many things wrong, to the delight of his peers.

A fun party game, for sure. But what was unusual about the video was that Thorn wasn’t singing into a traditional microphone. Instead, he was holding a Roku remote, and using the remote control’s built-in mic, which is generally used for voice search of movies and TV shows, to bellow along.

The unusual setup was enabled by Volley, a voice game app developed by the Bay Area-based startup of the same name. I recently chatted with Volley CEO Max Child about the new karaoke game, how advances in AI have improved voice games, and why developing games for smart TVs has been liberating for a company that got its start on smart speakers and displays.

Like Rockband, without the expensive hardware. Volley’s karaoke app looks less like an old-school karaoke setup, and more like a rhythm game; cheesy videos of couples strolling through fields of flowers have been replaced by scrolling bars that help players with pitch and timing. “It's like the singing mode from Rock Band, without [the need to spend] a bunch of money [on] hardware and consoles,” Child said.

Players can use their Roku remote as a microphone, or instead opt for Volley’s mobile app if their Roku doesn’t have a mic built-in (cheaper and older Roku models generally don’t). However, repurposing the remote’s mic, which is optimized for simple voice commands, wasn’t easy, Child told me. “What's really hard is the quality of the audio,” he said. “The latency, background noise and crosstalk, all that kind of stuff.”

To account for the relative low power of streaming device CPUs, Volley is offloading pitch matching and other resource-intensive tasks to the cloud. And to deal with the latency issues of this approach, it added several hundred milliseconds of what it calls “scoring forgiveness” – a kind of assumption that you're doing it right until it’s obvious you’re not approach that’s similar to how cloud gaming platforms work.

Another challenge: Not everyone uses the mic as intended while playing the game. “We found kids just like to scream the words into the microphone,” Child told me. “They don't care at all about pitch matching, which is fine.” As a result, the company is now considering adding a game mode that doesn’t take pitch into account as much.

There’s no party mode metadata. Another challenge that the Volley team didn’t anticipate was the lack of metadata for their specific use case. Music tech developers can license metadata from a number of providers for a range of use cases, which includes the lyrics necessary to build a karaoke app.

“There's a lot of these corpuses of data in the music industry, and you have to license different pieces of the puzzle,” Child said. “But to get that perfect, millisecond-accurate pitch that you want, you end up having [to] create it yourself.” That’s especially true because Volley has to match the pitch to the karaoke versions of popular songs, which tend to be just different enough to make working with the data for the original recordings impossible.

Why it’s easier to develop for smart TVs. Volley’s karaoke game is part of a Roku app that also offers access to other games developed by the company, including Jeopardy and Song Quiz. Some of these games were first developed for smart speakers and displays, where the company got its start in 2017. 

More recently, Volley expanded to TV-based voice games, and Child told me that the leap to platforms like Roku and Fire TV was liberating. “It made it much easier,” he said. “We can take the audio stream directly to our server. We can do our own speech recognition. We can do pitch and frequency matching.”

All of that is not easily doable on a smart speaker, since those platforms tend to be a lot more locked down. Amazon and Google generally require developers for their respective ambient computing devices to use their built-in voice recognition technologies, limiting functionality to a tightly defined set of features. “They're running the show on that front,” Child said.

(…)

Subscribe to Premium to read the rest.

Become a paying subscriber of Premium to get access to this post and other subscriber-only content.

Already a paying subscriber? Sign In.

A subscription gets you:

  • • A full-length newsletter every week
  • • No ads or sponsorship messages
  • • Access to every story on Lowpass.cc
  • • Access to a subscriber-only Slack space and subscriber-only events

Reply

or to participate.