Unable to PLAY audio using skainet speech recognition example
Posted: Wed Dec 31, 2025 7:40 am
I have cloned the latest skainet repository and am able to successfully build and run the examples/en_speech_commands_recognition project on an ESP32 Korvo-2 V3.1 board under esp-idf version 5.5.1. The wake word is detected correctly, and all English voice commands are recognized as expected. I can also trigger actions such as LEDs and servos based on those commands. It is actually very impressive!
The one thing I have not been able to get working is audio playback in response to a command.
The example includes a file called speech_commands_action.c, which contains functions intended to play audio when either the wake word is detected or a command is recognized. In particular, when a command ID is detected, it attempts to play the corresponding audio clip using playlist[command_id + 1]. Playback is initiated via esp_audio_play(), which ultimately calls bsp_audio_play() in the Korvo-2 bsp_board.c. That function, in turn, uses esp_codec_dev_write() from the esp_codec_dev component.
The issue I am seeing is that instead of playing the full audio file, it only plays the first ~100 ms and then loops repeatedly over the next ~100 ms segment.
To simplify debugging, I removed all speech recognition logic and made a single call to esp_board_init() and then a single call to esp_audio_play() directly from main(). The behavior is exactly the same.
I tried upgrading the esp_codec_dev component to the latest version 1.5.4. However, any version above the 1.1.0 version I am currently using generates a runtime error: "E (758) i2c: CONFLICT! driver_ng is not allowed to be used with this old driver". So that explains why the code version is currently pinned to 1.1.0.
Has anyone successfully gotten the speech recognition example to play audio responses (not just detect commands), and if so, what am I missing here?
The one thing I have not been able to get working is audio playback in response to a command.
The example includes a file called speech_commands_action.c, which contains functions intended to play audio when either the wake word is detected or a command is recognized. In particular, when a command ID is detected, it attempts to play the corresponding audio clip using playlist[command_id + 1]. Playback is initiated via esp_audio_play(), which ultimately calls bsp_audio_play() in the Korvo-2 bsp_board.c. That function, in turn, uses esp_codec_dev_write() from the esp_codec_dev component.
The issue I am seeing is that instead of playing the full audio file, it only plays the first ~100 ms and then loops repeatedly over the next ~100 ms segment.
To simplify debugging, I removed all speech recognition logic and made a single call to esp_board_init() and then a single call to esp_audio_play() directly from main(). The behavior is exactly the same.
I tried upgrading the esp_codec_dev component to the latest version 1.5.4. However, any version above the 1.1.0 version I am currently using generates a runtime error: "E (758) i2c: CONFLICT! driver_ng is not allowed to be used with this old driver". So that explains why the code version is currently pinned to 1.1.0.
Has anyone successfully gotten the speech recognition example to play audio responses (not just detect commands), and if so, what am I missing here?