The title is dense and the paper is short. But the demo is outstanding: (
https://huggingface.co/spaces/aiola/whisper-ner-v1). The sample audio is submitted with "entity labels" set to "football-club, football-player, referee" and WhisperNER returns tags Arsenal and Juventus for the football-club tag. They suggest "personal information" as a tag to try on audio.
Impressive, very impressive. I wonder if it could listen for credit cards or passwords.