Skip to content

Conversation

aryanrahar
Copy link

Add early extension check (stdin “-” still supported).
On decode failure, print a concise hint with a one-line ffmpeg command.
Keep supported set consistent with usage text (flac, mp3, ogg, wav).
Print timings only if at least one file was processed successfully.
Improves UX for cases like #3375 without changing core behavior.

Manual tests:
samples/jfk.wav → transcribes + timings
bad.m4a → “unsupported extension … (flac, mp3, ogg, wav) + ffmpeg hint”, no timings
bad.mp3 (corrupt) → “failed to decode … (detected .mp3) + ffmpeg hint”, no timings
stdin (-f -) still works

@aryanrahar
Copy link
Author

????????????

@danbev
Copy link
Member

danbev commented Aug 24, 2025

@aryanrahar Can you take a look at the commit and update it, as it is a little difficult to review at the moment. It looks like some formatting has been applied perhaps.

@aryanrahar aryanrahar force-pushed the fix/cli-better-audio-errors branch from 90a7428 to 6734aac Compare August 24, 2025 18:44
@aryanrahar
Copy link
Author

aryanrahar commented Aug 24, 2025

@aryanrahar Can you take a look at the commit and update it, as it is a little difficult to review at the moment. It looks like some formatting has been applied perhaps.
Thanks for the heads-up — I’ve removed the accidental formatting and force-pushed a minimal diff.
Changes only:
Add #include and
Early extension check (skips -/stdin) — supported: flac, mp3, ogg, wav
Clearer decode-failed message with a one-line ffmpeg hint
Print timings only after a successful run (processed_any guard)
When you have a moment, could you please take another look? I’m eager to keep working on this project and can turn around any feedback quickly.

@danbev
Copy link
Member

danbev commented Aug 27, 2025

Perhaps the extension validation could be extracted into a separate function to not "interfer" with the normal code path. For example, something like this:

diff --git a/examples/cli/cli.cpp b/examples/cli/cli.cpp
index a87dab1c..7a7ba113 100644
--- a/examples/cli/cli.cpp
+++ b/examples/cli/cli.cpp
@@ -1,5 +1,5 @@
 #include <filesystem>
-#include <algorithm>   
+#include <algorithm>
 #include "common.h"
 #include "common-whisper.h"
 
@@ -128,6 +128,38 @@ static char * requires_value_error(const std::string & arg) {
     exit(0);
 }
 
+static bool validate_audio_extension(const std::string & fname_inp) {
+    if (fname_inp == "-") {
+        return true; // stdin is always valid
+    }
+
+    std::string ext;
+    try {
+        ext = std::filesystem::path(fname_inp).extension().string();
+        std::transform(ext.begin(), ext.end(), ext.begin(),
+                       [](unsigned char c){ return std::tolower(c); });
+    } catch (...) {
+        // ignore; let the decoder try
+        return true;
+    }
+
+    auto ext_supported = [](const std::string &e) {
+        // keep in sync with usage: "supported audio formats: flac, mp3, ogg, wav"
+        return e == ".wav" || e == ".mp3" || e == ".flac" || e == ".ogg";
+    };
+
+    if (!ext.empty() && !ext_supported(ext)) {
+        fprintf(stderr,
+            "error: unsupported audio extension '%s' for '%s'.\n"
+            "supported: flac, mp3, ogg, wav.\n"
+            "hint: convert with ffmpeg, e.g.:\n"
+            "      ffmpeg -i \"%s\" -ar 16000 -ac 1 -c:a pcm_s16le out.wav\n",
+            ext.c_str(), fname_inp.c_str(), fname_inp.c_str());
+        return false;
+    }
+    return true;
+}
+
 static bool whisper_params_parse(int argc, char ** argv, whisper_params & params) {
     for (int i = 1; i < argc; i++) {
         std::string arg = argv[i];
@@ -1108,43 +1140,20 @@ int main(int argc, char ** argv) {
         std::vector<float> pcmf32;               // mono-channel F32 PCM
         std::vector<std::vector<float>> pcmf32s; // stereo-channel F32 PCM
 
-        std::string ext;
-if (fname_inp != "-") {
-    try {
-        ext = std::filesystem::path(fname_inp).extension().string();
-        std::transform(ext.begin(), ext.end(), ext.begin(),
-                       [](unsigned char c){ return std::tolower(c); });
-    } catch (...) {
-        // ignore; let the decoder try
-    }
-}
-
-auto ext_supported = [](const std::string &e) {
-    // keep in sync with usage: "supported audio formats: flac, mp3, ogg, wav"
-    return e == ".wav" || e == ".mp3" || e == ".flac" || e == ".ogg";
-};
-
-if (fname_inp != "-" && !ext.empty() && !ext_supported(ext)) {
-    fprintf(stderr,
-        "error: unsupported audio extension '%s' for '%s'.\n"
-        "supported: flac, mp3, ogg, wav.\n"
-        "hint: convert with ffmpeg, e.g.:\n"
-        "      ffmpeg -i \"%s\" -ar 16000 -ac 1 -c:a pcm_s16le out.wav\n",
-        ext.c_str(), fname_inp.c_str(), fname_inp.c_str());
-    continue;
-}
+        if (!validate_audio_extension(fname_inp)) {
+            continue;
+        }
 
-// Try to read/decode the audio. If it fails, give an actionable hint.
-if (!::read_audio_data(fname_inp, pcmf32, pcmf32s, params.diarize)) {
-    std::string det = ext.empty() ? "" : (" (detected extension: " + ext + ")");
-    fprintf(stderr,
-        "error: failed to decode audio from '%s'%s.\n"
-        "Make sure the file is not corrupted and has one of: flac, mp3, ogg, wav.\n"
-        "If you still hit this, convert to a standard WAV with:\n"
-        "  ffmpeg -i \"%s\" -ar 16000 -ac 1 -c:a pcm_s16le out.wav\n",
-        fname_inp.c_str(), det.c_str(), fname_inp.c_str());
-    continue;
-}
+        // Try to read/decode the audio. If it fails, give an actionable hint.
+        if (!::read_audio_data(fname_inp, pcmf32, pcmf32s, params.diarize)) {
+            fprintf(stderr,
+                "error: failed to decode audio from '%s'.\n"
+                "Make sure the file is not corrupted and has one of: flac, mp3, ogg, wav.\n"
+                "If you still hit this, convert to a standard WAV with:\n"
+                "  ffmpeg -i \"%s\" -ar 16000 -ac 1 -c:a pcm_s16le out.wav\n",
+                fname_inp.c_str(), fname_inp.c_str());
+            continue;
+        }
 
         if (!whisper_is_multilingual(ctx)) {
             if (params.language != "en" || params.translate) {

@aryanrahar aryanrahar force-pushed the fix/cli-better-audio-errors branch from 6734aac to a892322 Compare September 3, 2025 13:57
@aryanrahar
Copy link
Author

Perhaps the extension validation could be extracted into a separate function to not "interfer" with the normal code path. For example, something like this:

diff --git a/examples/cli/cli.cpp b/examples/cli/cli.cpp
index a87dab1c..7a7ba113 100644
--- a/examples/cli/cli.cpp
+++ b/examples/cli/cli.cpp
@@ -1,5 +1,5 @@
 #include <filesystem>
-#include <algorithm>   
+#include <algorithm>
 #include "common.h"
 #include "common-whisper.h"
 
@@ -128,6 +128,38 @@ static char * requires_value_error(const std::string & arg) {
     exit(0);
 }
 
+static bool validate_audio_extension(const std::string & fname_inp) {
+    if (fname_inp == "-") {
+        return true; // stdin is always valid
+    }
+
+    std::string ext;
+    try {
+        ext = std::filesystem::path(fname_inp).extension().string();
+        std::transform(ext.begin(), ext.end(), ext.begin(),
+                       [](unsigned char c){ return std::tolower(c); });
+    } catch (...) {
+        // ignore; let the decoder try
+        return true;
+    }
+
+    auto ext_supported = [](const std::string &e) {
+        // keep in sync with usage: "supported audio formats: flac, mp3, ogg, wav"
+        return e == ".wav" || e == ".mp3" || e == ".flac" || e == ".ogg";
+    };
+
+    if (!ext.empty() && !ext_supported(ext)) {
+        fprintf(stderr,
+            "error: unsupported audio extension '%s' for '%s'.\n"
+            "supported: flac, mp3, ogg, wav.\n"
+            "hint: convert with ffmpeg, e.g.:\n"
+            "      ffmpeg -i \"%s\" -ar 16000 -ac 1 -c:a pcm_s16le out.wav\n",
+            ext.c_str(), fname_inp.c_str(), fname_inp.c_str());
+        return false;
+    }
+    return true;
+}
+
 static bool whisper_params_parse(int argc, char ** argv, whisper_params & params) {
     for (int i = 1; i < argc; i++) {
         std::string arg = argv[i];
@@ -1108,43 +1140,20 @@ int main(int argc, char ** argv) {
         std::vector<float> pcmf32;               // mono-channel F32 PCM
         std::vector<std::vector<float>> pcmf32s; // stereo-channel F32 PCM
 
-        std::string ext;
-if (fname_inp != "-") {
-    try {
-        ext = std::filesystem::path(fname_inp).extension().string();
-        std::transform(ext.begin(), ext.end(), ext.begin(),
-                       [](unsigned char c){ return std::tolower(c); });
-    } catch (...) {
-        // ignore; let the decoder try
-    }
-}
-
-auto ext_supported = [](const std::string &e) {
-    // keep in sync with usage: "supported audio formats: flac, mp3, ogg, wav"
-    return e == ".wav" || e == ".mp3" || e == ".flac" || e == ".ogg";
-};
-
-if (fname_inp != "-" && !ext.empty() && !ext_supported(ext)) {
-    fprintf(stderr,
-        "error: unsupported audio extension '%s' for '%s'.\n"
-        "supported: flac, mp3, ogg, wav.\n"
-        "hint: convert with ffmpeg, e.g.:\n"
-        "      ffmpeg -i \"%s\" -ar 16000 -ac 1 -c:a pcm_s16le out.wav\n",
-        ext.c_str(), fname_inp.c_str(), fname_inp.c_str());
-    continue;
-}
+        if (!validate_audio_extension(fname_inp)) {
+            continue;
+        }
 
-// Try to read/decode the audio. If it fails, give an actionable hint.
-if (!::read_audio_data(fname_inp, pcmf32, pcmf32s, params.diarize)) {
-    std::string det = ext.empty() ? "" : (" (detected extension: " + ext + ")");
-    fprintf(stderr,
-        "error: failed to decode audio from '%s'%s.\n"
-        "Make sure the file is not corrupted and has one of: flac, mp3, ogg, wav.\n"
-        "If you still hit this, convert to a standard WAV with:\n"
-        "  ffmpeg -i \"%s\" -ar 16000 -ac 1 -c:a pcm_s16le out.wav\n",
-        fname_inp.c_str(), det.c_str(), fname_inp.c_str());
-    continue;
-}
+        // Try to read/decode the audio. If it fails, give an actionable hint.
+        if (!::read_audio_data(fname_inp, pcmf32, pcmf32s, params.diarize)) {
+            fprintf(stderr,
+                "error: failed to decode audio from '%s'.\n"
+                "Make sure the file is not corrupted and has one of: flac, mp3, ogg, wav.\n"
+                "If you still hit this, convert to a standard WAV with:\n"
+                "  ffmpeg -i \"%s\" -ar 16000 -ac 1 -c:a pcm_s16le out.wav\n",
+                fname_inp.c_str(), fname_inp.c_str());
+            continue;
+        }
 
         if (!whisper_is_multilingual(ctx)) {
             if (params.language != "en" || params.translate) {

I have refactored the extension check into validate_audio_extension(), simplified the decode-failed message, and guarded timings with processed_any. The diff is now minimal and free of formatting changes. When you have a moment, could you take another look? I’m excited to keep contributing and can iterate quickly on any feedback!!!!

@aryanrahar
Copy link
Author

??

@aryanrahar
Copy link
Author

can you approve the workflows?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants