Interpreting voice results for Android media apps in cars

Danny Preussler
ProAndroidDev
Published in
3 min readJan 10, 2022

--

When working on the Android Auto integration for SoundCloud, I stumbled upon an interesting issue. When it comes to implementing voice actions, things you read in the documentation might differ a bit from reality.

Voice commands are very critical while driving a car. You want the driver to keep looking at the street, keep his hands on the steering wheel, instead of interacting with any screen. This is why when building for Android Auto, Automotive OS, and Assistant Driving Mode you have to support basic voice searches.

For a music app like ours, the idea is simple. A user can say something like:

“Play Moderat on SoundCloud”

Under the hood

The Assistant will break down our sentence for interpretation:

<Play> <Moderat> <on SoundCloud>

The verb (Play) and the app (SoundCloud) in the above command were meant for the system. It will then wake up the app mentioned (if it declared auto support via manifest) and, as we asked for verb “Play”, the method MediaSessionCompat.Callback.onPlayFromSearch() will be called with the remaining query (“Moderat”) as the argument.

This query of “what to play” can still mean different things though. How do you know if Moderat is a band or a song? Don’t worry, the Assistant will help us also here. It already got some idea what our query could mean, if we asked for an artist, an album, or a specific track. But be aware, it behaves slightly differently than officially documented.

According to documentation we are supposed to write code like this:

val mediaFocus = extras?.getString(MediaStore.EXTRA_MEDIA_FOCUS)
if (mediaFocus == MediaStore.Audio.Artists.ENTRY_CONTENT_TYPE) {
isArtistFocus = true
artist = extras.getString(MediaStore.EXTRA_MEDIA_ARTIST)

We are suppose to check for a bundle entry with the key MediaStore.EXTRA_MEDIA_FOCUS . This will give us a hint on how best to interpret the query. The value of the constant MediaStore.EXTRA_MEDIA_ARTIST checked above is vnd.android.cursor.item/artist and there are similar constants related to album name, song title. In theory all you have to to is match the mediaFocus against one of those.

Unfortunately, this doesn't work like intended!

In reality, you will always get the valuevnd.android.cursor.item/* . This does not even exist as a constant in the SDK but it is exactly the value you will be getting as the focus param from Assistant for now. Even a Google search will not bring us very far here, which is the main reason I’m writing this.

Why? This is a question I can not answer. Maybe it was working like this somewhen in the past. I opened an issue, that hopefully will put some light into this, feel free star and/or follow.

What now?

This doesn’t mean that we are completely lost though.
Let’s look at a more complex query:

“Play Live from Moderat on SoundCloud”

The system actually gives us some hints about that query. Let’s check the entries of the Bundle we got in or callbackonPlayFromSearch.
Of course, we will see, what we just talked about, the weird media focus entry:
android.intent.extra.focus -> vnd.android.cursor.item/*

We also have access to the pure and the extracted query:

android.intent.extra.user_query -> Play live from moderat on soundcloud

query -> live moderat

But the interesting part is the following:

android.intent.extra.artist -> Moderat
android.intent.extra.title
-> Live

The Assistant figured out that one part is probably a band and one a title.
There are more intent-extras like this that can be used to find the right entity such as android.intent.extra.title, android.intent.extra.artist, android.intent.extra.album etc.
And these are actually also stated in the documentation:

The following extras are supported in Android Automotive OS and Android Auto:
EXTRA_MEDIA_ALBUM
EXTRA_MEDIA_ARTIST
EXTRA_MEDIA_GENRE
EXTRA_MEDIA_PLAYLIST
EXTRA_MEDIA_TITLE

As result just ignore the code sample part of the documentation, ignore MediaStore.EXTRA_MEDIA_FOCUSand use those values instead.

Just grab them all and then check which values are non null

val artist = extras.getString(MediaStore.EXTRA_MEDIA_ARTIST)
val album = extras.getString(MediaStore.EXTRA_MEDIA_ALBUM)
val title = extras.getString(MediaStore.EXTRA_MEDIA_TITLE)

If the user asked for an album, you probably will have a valid value in album and artist, otherwise use artist and title or simply the title

Sum up

As said, it is unclear to me why the focus is not forwarded to the apps as shown. Even, the Universal Media Application from Google uses the focus parameter similar to the documentation, which might mean this worked at some point.

In the end remember, always make sure you are actually testing all parts of your implementation! Code that was implemented as stated in the documentation, might still not actually work.

https://issuetracker.google.com/issues/212779546

--

--

Android @ Soundcloud, Google Developer Expert, Goth, Geek, writing about the daily crazy things in developer life with #Android and #Kotlin