the example is unable to run on iphone 11 pro. #10

woodymoo · 2024-02-02T10:48:31Z

the example is unable to run on iphone 11 pro.
(The example is running good on Mac m1 max )

The following is the screen shot on iphone 11 pro. Base Model

debug log:
[WhisperKit] --------------- DECODER INPUTS DEBUG ---------------
[WhisperKit] Cache Length: 2 Input Token: 50359
[WhisperKit] Key Cache | Val Cache | Update Mask | Decoder Mask | Position
[WhisperKit] -0.125732 | 0.048828 | 0 | 0 | 0
[WhisperKit] 0.308350 | -0.556641 | 0 | 0 | 1
[WhisperKit] 0.000000 | 0.000000 | 1 | 0 | 2
[WhisperKit] 0.000000 | 0.000000 | 0 | -10000 | 3
[WhisperKit] [0.00 --> 14.90]
[WhisperKit] ---- Transcription Timings ----
[WhisperKit] Audio Load: 0.00 ms / 1 runs ( 0.00 ms/run) 0.00%
[WhisperKit] Audio Processing: 0.41 ms / 1 runs ( 0.41 ms/run) 0.03%
[WhisperKit] Mels: 57.57 ms / 1 runs ( 57.57 ms/run) 3.96%
[WhisperKit] Encoding: 1171.59 ms / 1 runs ( 1171.59 ms/run) 80.56%
[WhisperKit] Matrices Init: 5.36 ms / 1 runs ( 5.36 ms/run) 0.37%
[WhisperKit] Prefill: 0.49 ms / 1 runs ( 0.49 ms/run) 0.03%
[WhisperKit] Decoding: 208.06 ms / 4 runs ( 52.01 ms/run) 14.31%
[WhisperKit] Non-inference: 7.49 ms / 4 runs ( 1.87 ms/run) 0.52%
[WhisperKit] - Sampling: 4.13 ms / 4 runs ( 1.03 ms/run) 0.28%
[WhisperKit] - Kv Caching: 3.91 ms / 4 runs ( 0.98 ms/run) 0.27%
[WhisperKit] - Windowing: 0.08 ms / 1 runs ( 0.08 ms/run) 0.01%
[WhisperKit] Fallbacks: 122.98 ms / 0 runs ( 0.00 ms/run) 8.46%
[WhisperKit] Decoding Full Loop: 1448.16 ms / 4 runs ( 362.04 ms/run) 99.57%
[WhisperKit] -------------------------------
[WhisperKit] Model Load Time: 6.60 seconds
[WhisperKit] Inference Duration: 1.45 seconds
[WhisperKit] - Decoding Loop: 1.45 seconds
[WhisperKit] Time to first token: 1.30 seconds
[WhisperKit] Total Tokens: 5
[WhisperKit] Tokens per Second: 2.76 tok/s
[WhisperKit] Real Time Factor: 0.10
[WhisperKit] Fallbacks: 0.0
[WhisperKit] [0.00 --> 14.90] <|endoftext|>

atiorh · 2024-02-02T17:22:26Z

Thank you for the report! TestFlight feedback also showed that iPhone 11 and iPhone XS (the 2 oldest devices still supported by iOS17) are consistently having issues. My current suspicion is that this is due to the Neural Engine specifically. If you are able to rebuild the app after changing the compute units to GPU and test it on your 11 Pro, would be much appreciated :)

atiorh · 2024-02-02T17:25:13Z

Looking a bit deeper into your debug logs, the model forward passes seem to be successful but model outputs are likely to be corrupt if you are getting <|endoftext|> right away.

ZachNagengast · 2024-02-02T18:00:08Z

Yes one thing standing out to me here is that the cache size is only 2. Can you post the full debug logs from the beginning of the loop all the way to the end? That way we can see what the tokens are coming through as - typically for base you would want to see

0: <|startoftranscription|>
1: <|en|> (or whatever language option set)
2: <|transcribe|> OR <|translate|>
3: <|0.00|> OR <|notimestamps|>
4..<n: (predicted transcript)
n: <|endoftext|>

woodymoo · 2024-02-03T18:07:20Z

I tried the base model with ios app whisperBoard, which was ok.
But the WhisperAX example was failed, here is the log message and error message:

doUnloadModel:options:qos:error:: model=_ANEModel: { modelURL=file:///var/mobile/Containers/Data/Application/F857AEFB-FBFC-4530-825D-81DA7772EC2D/Library/Caches/com.moody.whisperkit.WhisperAX/com.apple.e5rt.e5bundlecache/21C66/55FDBDC681E9BEC3377E76410A13B6A36D117901B2EACD6CCEC8EF6477BBCE68/F154AEEA75FF5C72ECAEBA3C899508F053C1576E8D7761AF31DBC258BF1ABFC3.bundle/H12.bundle/main/main_eir/ : sourceURL= (null) : key={"isegment":5,"inputs":{"encoder_output_embeds_eir":{"shape":[1500,1,1,512,1]}},"outputs":{"key_cast_fp16":{"shape":[1500,1,1,512,1]},"value_cast_fp16":{"shape":[1500,1,1,512,1]}}} : identifierSource=0 : cacheURLIdentifier=C231E55A5F6BD6E18E60C70EF622BACAF5215E7A45E150A265BFE09C379D6CA7_2393CF36BE99B057765D2CBDB5257F2EF24668A91C43BF3732D9D8E6D0053626 : string_id=0x70ac000000264af7 : program=_ANEProgramForEvaluation: { programHandle=4755813074256 : intermediateBufferHandle=4755813104416 : queueDepth=127 } : state=3 : programHandle=4755813074256 : intermediateBufferHandle=4755813104416 : queueDepth=127 : attr={
ANEFModelDescription = {
ANEFModelInput16KAlignmentArray = (
);
ANEFModelOutput16KAlignmentArray = (
);
ANEFModelProcedures = (
{
ANEFModelInputSymbolIndexArray = (
0
);
ANEFModelOutputSymbolIndexArray = (
0,
1
);
ANEFModelProcedureID = 0;
}
);
kANEFModelInputSymbolsArrayKey = (
"encoder_output_embeds_eir"
);
kANEFModelOutputSymbolsArrayKey = (
"key_cast_fp16@output",
"value_cast_fp16@output"
);
kANEFModelProcedureNameToIDMapKey = {
"net_5" = 0;
};
};
NetworkStatusList = (
{
LiveInputList = (
{
BatchStride = 1540096;
Batches = 1;
Channels = 512;
Depth = 1;
DepthStride = 1540096;
Height = 1;
Interleave = 1;
Name = "encoder_output_embeds_eir";
PlaneCount = 512;
PlaneStride = 3008;
RowStride = 3008;
Symbol = "encoder_output_embeds_eir";
Type = Float16;
Width = 1500;
}
);
LiveOutputList = (
{
BatchStride = 1540096;
Batches = 1;
Channels = 512;
Depth = 1;
DepthStride = 1540096;
Height = 1;
Interleave = 1;
Name = "key_cast_fp16@output";
PlaneCount = 512;
PlaneStride = 3008;
RowStride = 3008;
Symbol = "key_cast_fp16@output";
Type = Float16;
Width = 1500;
},
{
BatchStride = 1540096;
Batches = 1;
Channels = 512;
Depth = 1;
DepthStride = 1540096;
Height = 1;
Interleave = 1;
Name = "value_cast_fp16@output";
PlaneCount = 512;
PlaneStride = 3008;
RowStride = 3008;
Symbol = "value_cast_fp16@output";
Type = Float16;
Width = 1500;
}
);
Name = "net_5";
}
);
} : perfStatsMask=0} was not loaded by the client.
[WhisperKit] Loaded text decoder
[WhisperKit] Loading models from /var/mobile/Containers/Data/Application/F857AEFB-FBFC-4530-825D-81DA7772EC2D/Documents/huggingface/models/argmaxinc/whisperkit-coreml/openai_whisper-base with prewarmMode: false
[WhisperKit] Loading feature extractor
[WhisperKit] Loaded feature extractor
[WhisperKit] Loading audio encoder

[WhisperKit] Loaded audio encoder
[WhisperKit] Loading text decoder
Validation failure: Invalid input tensor channel 1 and format size 2 bytes, must be aligned on 64 bytes
Validation failure: Invalid input tensor channel 1 and format size 2 bytes, must be aligned on 64 bytes
Validation failure: Invalid input tensor channel 1 and format size 2 bytes, must be aligned on 64 bytes
Validation failure: Invalid input tensor channel 1 and format size 2 bytes, must be aligned on 64 bytes
Validation failure: Invalid input tensor channel 1 and format size 2 bytes, must be aligned on 64 bytes
Validation failure: Invalid input tensor channel 1 and format size 2 bytes, must be aligned on 64 bytes
[WhisperKit] Loaded text decoder
[WhisperKit] Loading tokenizer for base

[WhisperKit] Loaded tokenizer
[WhisperKit] Loaded models for whisper size: base

tcp_input [C2.1.1:3] flags=[R.] seq=67868, ack=3630046153, win=32950 state=LAST_ACK rcv_nxt=67868, snd_una=3630046129
[WhisperKit] Current audio size: 32000 samples, most recent buffer: 1600 samples, most recent energy: (0.14581555, 0.0028157167, 0.009438675, 1.1093871e-06)

[WhisperKit] Current audio size: 64000 samples, most recent buffer: 1600 samples, most recent energy: (0.07724396, 0.0031911524, 0.009436812, 1.4101643e-06)

[WhisperKit] Current audio size: 96000 samples, most recent buffer: 1600 samples, most recent energy: (0.48100042, 0.03933922, 0.13287143, 2.9166695e-05)

[WhisperKit] Current audio size: 128000 samples, most recent buffer: 1600 samples, most recent energy: (0.293728, 0.011141126, 0.041470844, 5.8486767e-06)

[WhisperKit] Current audio size: 160000 samples, most recent buffer: 1600 samples, most recent energy: (0.3946247, 0.022941826, 0.073072806, 3.928726e-06)

[WhisperKit] Current audio size: 192000 samples, most recent buffer: 1600 samples, most recent energy: (0.054507904, 0.0029965553, 0.0102628, 8.92207e-07)

[WhisperKit] Decoder init time: 0.006031990051269531
[WhisperKit] Prefill time: 0.00011301040649414062
[WhisperKit] Prefill prompt: ["<|startoftranscript|>", "<|en|>", "<|transcribe|>", "<|0.00|>"]
[WhisperKit] Decoding Seek: 0
[WhisperKit] Decoding 0.0s - 13.3s
[WhisperKit] Decoding with tempeartures [0.0, 0.2, 0.4, 0.5996]
[WhisperKit] Decoding Temperature: 0.0
[WhisperKit] Running main loop for a maximum of 224 iterations, starting at index 0
[WhisperKit] Forcing token 50258 at index 0 from initial prompt
[WhisperKit] --------------- DECODER INPUTS DEBUG ---------------
[WhisperKit] Cache Length: 0 Input Token: 50258
[WhisperKit] Key Cache | Val Cache | Update Mask | Decoder Mask | Position
[WhisperKit] 0.000000 | 0.000000 | 1 | 0 | 0
[WhisperKit] 0.000000 | 0.000000 | 0 | -10000 | 1
[WhisperKit] 0.000000 | 0.000000 | 0 | -10000 | 2
[WhisperKit] 0.000000 | 0.000000 | 0 | -10000 | 3
[WhisperKit] tokenIndex: 0, token: 50362, word: <|nocaptions|>
[WhisperKit] Forcing token 50259 at index 1 from initial prompt
[WhisperKit] --------------- DECODER INPUTS DEBUG ---------------
[WhisperKit] Cache Length: 1 Input Token: 50259
[WhisperKit] Key Cache | Val Cache | Update Mask | Decoder Mask | Position
[WhisperKit] -0.125732 | 0.048828 | 0 | 0 | 0
[WhisperKit] 0.000000 | 0.000000 | 1 | 0 | 1
[WhisperKit] 0.000000 | 0.000000 | 0 | -10000 | 2
[WhisperKit] 0.000000 | 0.000000 | 0 | -10000 | 3
[WhisperKit] tokenIndex: 1, token: 50359, word: <|transcribe|>
[WhisperKit] Forcing token 50359 at index 2 from initial prompt
[WhisperKit] --------------- DECODER INPUTS DEBUG ---------------
[WhisperKit] Cache Length: 2 Input Token: 50359
[WhisperKit] Key Cache | Val Cache | Update Mask | Decoder Mask | Position
[WhisperKit] -0.125732 | 0.048828 | 0 | 0 | 0
[WhisperKit] 0.308350 | -0.556641 | 0 | 0 | 1
[WhisperKit] 0.000000 | 0.000000 | 1 | 0 | 2
[WhisperKit] 0.000000 | 0.000000 | 0 | -10000 | 3
[WhisperKit] Fallback #1.0 (logProbThreshold)
[WhisperKit] Decoding Temperature: 0.2
[WhisperKit] Running main loop for a maximum of 224 iterations, starting at index 0
[WhisperKit] Forcing token 50258 at index 0 from initial prompt
[WhisperKit] --------------- DECODER INPUTS DEBUG ---------------
[WhisperKit] Cache Length: 0 Input Token: 50258
[WhisperKit] Key Cache | Val Cache | Update Mask | Decoder Mask | Position
[WhisperKit] -0.125732 | 0.048828 | 1 | 0 | 0
[WhisperKit] 0.308350 | -0.556641 | 0 | -10000 | 1
[WhisperKit] 0.000000 | 0.000000 | 0 | -10000 | 2
[WhisperKit] 0.000000 | 0.000000 | 0 | -10000 | 3
[WhisperKit] tokenIndex: 0, token: 50362, word: <|nocaptions|>
[WhisperKit] Forcing token 50259 at index 1 from initial prompt
[WhisperKit] --------------- DECODER INPUTS DEBUG ---------------
[WhisperKit] Cache Length: 1 Input Token: 50259
[WhisperKit] Key Cache | Val Cache | Update Mask | Decoder Mask | Position
[WhisperKit] -0.125732 | 0.048828 | 0 | 0 | 0
[WhisperKit] 0.308350 | -0.556641 | 1 | 0 | 1
[WhisperKit] 0.000000 | 0.000000 | 0 | -10000 | 2
[WhisperKit] 0.000000 | 0.000000 | 0 | -10000 | 3
[WhisperKit] tokenIndex: 1, token: 50359, word: <|transcribe|>
[WhisperKit] Forcing token 50359 at index 2 from initial prompt
[WhisperKit] --------------- DECODER INPUTS DEBUG ---------------
[WhisperKit] Cache Length: 2 Input Token: 50359
[WhisperKit] Key Cache | Val Cache | Update Mask | Decoder Mask | Position
[WhisperKit] -0.125732 | 0.048828 | 0 | 0 | 0
[WhisperKit] 0.308350 | -0.556641 | 0 | 0 | 1
[WhisperKit] 0.000000 | 0.000000 | 1 | 0 | 2
[WhisperKit] 0.000000 | 0.000000 | 0 | -10000 | 3
[WhisperKit] [0.00 --> 13.30] <|startoftranscript|><|en|><|transcribe|><|0.00|><|endoftext|>
[WhisperKit] ---- Transcription Timings ----
[WhisperKit] Audio Load: 0.00 ms / 1 runs ( 0.00 ms/run) 0.00%
[WhisperKit] Audio Processing: 0.60 ms / 1 runs ( 0.60 ms/run) 0.03%
[WhisperKit] Mels: 135.32 ms / 1 runs ( 135.32 ms/run) 6.90%
[WhisperKit] Encoding: 939.16 ms / 1 runs ( 939.16 ms/run) 47.92%
[WhisperKit] Matrices Init: 6.03 ms / 1 runs ( 6.03 ms/run) 0.31%
[WhisperKit] Prefill: 0.11 ms / 1 runs ( 0.11 ms/run) 0.01%
[WhisperKit] Decoding: 855.78 ms / 4 runs ( 213.94 ms/run) 43.66%
[WhisperKit] Non-inference: 17.11 ms / 4 runs ( 4.28 ms/run) 0.87%
[WhisperKit] - Sampling: 10.92 ms / 4 runs ( 2.73 ms/run) 0.56%
[WhisperKit] - Kv Caching: 4.92 ms / 4 runs ( 1.23 ms/run) 0.25%
[WhisperKit] - Windowing: 0.15 ms / 1 runs ( 0.15 ms/run) 0.01%
[WhisperKit] Fallbacks: 773.91 ms / 0 runs ( 0.00 ms/run) 39.48%
[WhisperKit] Decoding Full Loop: 1953.54 ms / 4 runs ( 488.39 ms/run) 99.67%
[WhisperKit] -------------------------------
[WhisperKit] Model Load Time: 3.55 seconds
[WhisperKit] Inference Duration: 1.96 seconds
[WhisperKit] - Decoding Loop: 1.95 seconds
[WhisperKit] Time to first token: 1.79 seconds
[WhisperKit] Total Tokens: 9
[WhisperKit] Tokens per Second: 2.05 tok/s
[WhisperKit] Real Time Factor: 0.15
[WhisperKit] Fallbacks: 0.0
[WhisperKit] [0.00 --> 13.30] <|startoftranscript|><|en|><|transcribe|><|0.00|><|endoftext|>

ZachNagengast · 2024-02-06T05:22:06Z

Here is the problem

[WhisperKit] Cache Length: 2 Input Token: 50359
[WhisperKit] Key Cache | Val Cache | Update Mask | Decoder Mask | Position
[WhisperKit] -0.125732 | 0.048828 | 0 | 0 | 0
[WhisperKit] 0.308350 | -0.556641 | 0 | 0 | 1
[WhisperKit] 0.000000 | 0.000000 | 1 | 0 | 2
[WhisperKit] 0.000000 | 0.000000 | 0 | -10000 | 3
[WhisperKit] Fallback https://github.com/argmaxinc/WhisperKit/pull/1.0 (logProbThreshold)
[WhisperKit] Decoding Temperature: 0.2

The loop is hitting its log prob threshold before end of text, which is likely part of the early stopping mechanism in the app. Normally it's not an issue that early but perhaps on iphone 11 it is.

One step to see if it solves the issue is to try editing line 929 in the example app ContentView.swift and add a param for logprob threshold like this:

        let options = DecodingOptions(
            verbose: false,
            task: task,
            language: languageCode,
            temperatureFallbackCount: 3, // limit fallbacks for realtime
            sampleLength: Int(sampleLength), // reduced sample length for realtime
            usePrefillPrompt: enablePromptPrefill,
            usePrefillCache: enableCachePrefill,
            skipSpecialTokens: !enableSpecialCharacters,
            withoutTimestamps: !enableTimestamps,
            clipTimestamps: seekClip,
            logProbThreshold: 0 //<-------- add this line
        )

Will attempt to reproduce this in the meantime, thanks for the report 👍

woodymoo · 2024-02-08T10:40:55Z

I am sorry, Still same error.

ZachNagengast · 2024-02-16T22:04:24Z

@woodymoo Checking in on this, it appears to be a device-specific bug. If anyone can reproduce this consistently on an iPhone 11 Pro - please post here with any other info.

cyrilzakka · 2024-02-26T07:09:38Z

Feel like this is related but why does the Tesflight version run faster than the Xcode one? Debug/Release build or further optimizations? Transcription appears with a 10 second delay with Xcode (same model size)

iamyoungjo · 2024-05-23T08:51:40Z

No solution yet? Experiencing the same issue on iPad8,8. Tried above "logProbThreshold: 0" tip but no luck.

[WhisperKit] Running on iPad8,8
[WhisperKit] Loading models from /var/mobile/Containers/Data/Application/826DAB14-7584-42C5-995D-DB610EC2F560/Documents/huggingface/models/argmaxinc/whisperkit-coreml/openai_whisper-base.en with prewarmMode: true
[WhisperKit] Loading feature extractor
[WhisperKit] Loaded feature extractor
[WhisperKit] Loading audio encoder

[WhisperKit] Loaded audio encoder
[WhisperKit] Loading text decoder
Validation failure: Invalid input tensor channel 1 and format size 2 bytes, must be aligned on 64 bytes
Validation failure: Invalid input tensor channel 1 and format size 2 bytes, must be aligned on 64 bytes
Validation failure: Invalid input tensor channel 1 and format size 2 bytes, must be aligned on 64 bytes
Validation failure: Invalid input tensor channel 1 and format size 2 bytes, must be aligned on 64 bytes
Validation failure: Invalid input tensor channel 1 and format size 2 bytes, must be aligned on 64 bytes
Validation failure: Invalid input tensor channel 1 and format size 2 bytes, must be aligned on 64 bytes
doUnloadModel:options:qos:error:: model=_ANEModel: { modelURL=file:///var/mobile/Containers/Data/Application/826DAB14-7584-42C5-995D-DB610EC2F560/Library/Caches/PG.AutoDoc/com.apple.e5rt.e5bundlecache/21E236/FC9E22833F2E559D5C91AFDFA46A92DD405A337FA2FEEEDE3D7B1FB0404809A9/2AFB581F639499133FE0F915E992E19CD88B045C19D172DD4F1DAA30EA54D09D.bundle/H11G.bundle/main/main_eir/ : sourceURL= (null) : key={"isegment":2,"inputs":{"denom_15_cast_fp16_ctx_tx_default__2":{"shape":[1,1,1,1,1]},"zero_mean_15_cast_fp16_ctx_tx_default__2":{"shape":[1,1,1,512,1]},"encoder_output_embeds_eir":{"shape":[1500,1,1,512,1]}},"outputs":{"query_11_cast_fp16":{"shape":[1,1,1,512,1]},"key_11_cast_fp16":{"shape":[1500,1,1,512,1]},"value_11_cast_fp16":{"shape":[1500,1,1,512,1]}}} : identifierSource=0 : cacheURLIdentifier=35CA4612A76C1BF9B086EADFED6E5004AD9F6DBC52F394148B6617D7FFCA8A4F_1E58A2423EF9EAAC837AC75C1D878F925524A1ACFF44DBDFB0EA68EEC3613D0B : string_id=0x00000000 : program=_ANEProgramForEvaluation: { programHandle=42126760470 : intermediateBufferHandle=42126814457 : queueDepth=127 } : state=3 : programHandle=42126760470 : intermediateBufferHandle=42126814457 : queueDepth=127 : attr={
    ANEFModelDescription =     {
        ANEFModelInput16KAlignmentArray =         (
            1,
            1,
            1
        );
        ANEFModelOutput16KAlignmentArray =         (
            1,
            1,
            1
        );
        ANEFModelProcedures =         (
                        {
                ANEFModelInputSymbolIndexArray =                 (
                    0,
                    1,
                    2
                );
                ANEFModelOutputSymbolIndexArray =                 (
                    0,
                    1,
                    2
                );
                ANEFModelProcedureID = 0;
            }
        );
        kANEFModelInputSymbolsArrayKey =         (
            242597c4eae0168622c8ad87213eca2b,
            cd370f0cd20ef8258ccb4fd6beb1d568,
            "encoder_output_embeds_eir"
        );
        kANEFModelOutputSymbolsArrayKey =         (
            "key_11_cast_fp16@output",
            "query_11_cast_fp16@output",
            "value_11_cast_fp16@output"
        );
        kANEFModelProcedureNameToIDMapKey =         {
            "net_2" = 0;
        };
    };
    NetworkStatusList =     (
                {
            LiveInputList =             (
                                {
                    BatchStride = 64;
                    Batches = 1;
                    Channels = 1;
                    Depth = 1;
                    DepthStride = 64;
                    Height = 1;
                    Interleave = 1;
                    Name = 242597c4eae0168622c8ad87213eca2b;
                    PlaneCount = 1;
                    PlaneStride = 64;
                    RowStride = 64;
                    Symbol = 242597c4eae0168622c8ad87213eca2b;
                    Type = Float16;
                    Width = 1;
                },
                                {
                    BatchStride = 32768;
                    Batches = 1;
                    Channels = 512;
                    Depth = 1;
                    DepthStride = 32768;
                    Height = 1;
                    Interleave = 1;
                    Name = cd370f0cd20ef8258ccb4fd6beb1d568;
                    PlaneCount = 512;
                    PlaneStride = 64;
                    RowStride = 64;
                    Symbol = cd370f0cd20ef8258ccb4fd6beb1d568;
                    Type = Float16;
                    Width = 1;
                },
                                {
                    BatchStride = 1540096;
                    Batches = 1;
                    Channels = 512;
                    Depth = 1;
                    DepthStride = 1540096;
                    Height = 1;
                    Interleave = 1;
                    Name = "encoder_output_embeds_eir";
                    PlaneCount = 512;
                    PlaneStride = 3008;
                    RowStride = 3008;
                    Symbol = "encoder_output_embeds_eir";
                    Type = Float16;
                    Width = 1500;
                }
            );
            LiveOutputList =             (
                                {
                    BatchStride = 1540096;
                    Batches = 1;
                    Channels = 512;
                    Depth = 1;
                    DepthStride = 1540096;
                    Height = 1;
                    Interleave = 1;
                    Name = "key_11_cast_fp16@output";
                    PlaneCount = 512;
                    PlaneStride = 3008;
                    RowStride = 3008;
                    Symbol = "key_11_cast_fp16@output";
                    Type = Float16;
                    Width = 1500;
                },
                                {
                    BatchStride = 32768;
                    Batches = 1;
                    Channels = 512;
                    Depth = 1;
                    DepthStride = 32768;
                    Height = 1;
                    Interleave = 1;
                    Name = "query_11_cast_fp16@output";
                    PlaneCount = 512;
                    PlaneStride = 64;
                    RowStride = 64;
                    Symbol = "query_11_cast_fp16@output";
                    Type = Float16;
                    Width = 1;
                },
                                {
                    BatchStride = 1540096;
                    Batches = 1;
                    Channels = 512;
                    Depth = 1;
                    DepthStride = 1540096;
                    Height = 1;
                    Interleave = 1;
                    Name = "value_11_cast_fp16@output";
                    PlaneCount = 512;
                    PlaneStride = 3008;
                    RowStride = 3008;
                    Symbol = "value_11_cast_fp16@output";
                    Type = Float16;
                    Width = 1500;
                }
            );
            Name = "net_2";
        }
    );
} : perfStatsMask=0}  was not loaded by the client.
doUnloadModel:options:qos:error:: model=_ANEModel: { modelURL=file:///var/mobile/Containers/Data/Application/826DAB14-7584-42C5-995D-DB610EC2F560/Library/Caches/PG.AutoDoc/com.apple.e5rt.e5bundlecache/21E236/FC9E22833F2E559D5C91AFDFA46A92DD405A337FA2FEEEDE3D7B1FB0404809A9/2AFB581F639499133FE0F915E992E19CD88B045C19D172DD4F1DAA30EA54D09D.bundle/H11G.bundle/main/main_eir/ : sourceURL= (null) : key={"isegment":1,"inputs":{"zero_mean_9_cast_fp16_ctx_tx_default__1":{"shape":[1,1,1,512,1]},"encoder_output_embeds_eir":{"shape":[1500,1,1,512,1]},"denom_9_cast_fp16_ctx_tx_default__1":{"shape":[1,1,1,1,1]}},"outputs":{"key_7_cast_fp16":{"shape":[1500,1,1,512,1]},"value_7_cast_fp16":{"shape":[1500,1,1,512,1]},"query_7_cast_fp16":{"shape":[1,1,1,512,1]}}} : identifierSource=0 : cacheURLIdentifier=35CA4612A76C1BF9B086EADFED6E5004AD9F6DBC52F394148B6617D7FFCA8A4F_DB53ACF841ED8B23A05AE898F9B7413FCEA9552468D5A4F5E21F5093C216CABC : string_id=0x00000000 : program=_ANEProgramForEvaluation: { programHandle=42119499582 : intermediateBufferHandle=42119552918 : queueDepth=127 } : state=3 : programHandle=42119499582 : intermediateBufferHandle=42119552918 : queueDepth=127 : attr={
    ANEFModelDescription =     {
        ANEFModelInput16KAlignmentArray =         (
            1,
            1,
            1
        );
        ANEFModelOutput16KAlignmentArray =         (
            1,
            1,
            1
        );
        ANEFModelProcedures =         (
                        {
                ANEFModelInputSymbolIndexArray =                 (
                    0,
                    1,
                    2
                );
                ANEFModelOutputSymbolIndexArray =                 (
                    0,
                    1,
                    2
                );
                ANEFModelProcedureID = 0;
            }
        );
        kANEFModelInputSymbolsArrayKey =         (
            055d8d371a86daad353b3fde75fdd997,
            d9a0409d949391bc8fa8e96671e9c79b,
            "encoder_output_embeds_eir"
        );
        kANEFModelOutputSymbolsArrayKey =         (
            "key_7_cast_fp16@output",
            "query_7_cast_fp16@output",
            "value_7_cast_fp16@output"
        );
        kANEFModelProcedureNameToIDMapKey =         {
            "net_1" = 0;
        };
    };
    NetworkStatusList =     (
                {
            LiveInputList =             (
                                {
                    BatchStride = 64;
                    Batches = 1;
                    Channels = 1;
                    Depth = 1;
                    DepthStride = 64;
                    Height = 1;
                    Interleave = 1;
                    Name = 055d8d371a86daad353b3fde75fdd997;
                    PlaneCount = 1;
                    PlaneStride = 64;
                    RowStride = 64;
                    Symbol = 055d8d371a86daad353b3fde75fdd997;
                    Type = Float16;
                    Width = 1;
                },
                                {
                    BatchStride = 32768;
                    Batches = 1;
                    Channels = 512;
                    Depth = 1;
                    DepthStride = 32768;
                    Height = 1;
                    Interleave = 1;
                    Name = d9a0409d949391bc8fa8e96671e9c79b;
                    PlaneCount = 512;
                    PlaneStride = 64;
                    RowStride = 64;
                    Symbol = d9a0409d949391bc8fa8e96671e9c79b;
                    Type = Float16;
                    Width = 1;
                },
                                {
                    BatchStride = 1540096;
                    Batches = 1;
                    Channels = 512;
                    Depth = 1;
                    DepthStride = 1540096;
                    Height = 1;
                    Interleave = 1;
                    Name = "encoder_output_embeds_eir";
                    PlaneCount = 512;
                    PlaneStride = 3008;
                    RowStride = 3008;
                    Symbol = "encoder_output_embeds_eir";
                    Type = Float16;
                    Width = 1500;
                }
            );
            LiveOutputList =             (
                                {
                    BatchStride = 1540096;
                    Batches = 1;
                    Channels = 512;
                    Depth = 1;
                    DepthStride = 1540096;
                    Height = 1;
                    Interleave = 1;
                    Name = "key_7_cast_fp16@output";
                    PlaneCount = 512;
                    PlaneStride = 3008;
                    RowStride = 3008;
                    Symbol = "key_7_cast_fp16@output";
                    Type = Float16;
                    Width = 1500;
                },
                                {
                    BatchStride = 32768;
                    Batches = 1;
                    Channels = 512;
                    Depth = 1;
                    DepthStride = 32768;
                    Height = 1;
                    Interleave = 1;
                    Name = "query_7_cast_fp16@output";
                    PlaneCount = 512;
                    PlaneStride = 64;
                    RowStride = 64;
                    Symbol = "query_7_cast_fp16@output";
                    Type = Float16;
                    Width = 1;
                },
                                {
                    BatchStride = 1540096;
                    Batches = 1;
                    Channels = 512;
                    Depth = 1;
                    DepthStride = 1540096;
                    Height = 1;
                    Interleave = 1;
                    Name = "value_7_cast_fp16@output";
                    PlaneCount = 512;
                    PlaneStride = 3008;
                    RowStride = 3008;
                    Symbol = "value_7_cast_fp16@output";
                    Type = Float16;
                    Width = 1500;
                }
            );
            Name = "net_1";
        }
    );
} : perfStatsMask=0}  was not loaded by the client.
[WhisperKit] Loaded text decoder
[WhisperKit] Loading models from /var/mobile/Containers/Data/Application/826DAB14-7584-42C5-995D-DB610EC2F560/Documents/huggingface/models/argmaxinc/whisperkit-coreml/openai_whisper-base.en with prewarmMode: false
[WhisperKit] Loading feature extractor
[WhisperKit] Loaded feature extractor
[WhisperKit] Loading audio encoder
[WhisperKit] Loaded audio encoder
[WhisperKit] Loading text decoder
Validation failure: Invalid input tensor channel 1 and format size 2 bytes, must be aligned on 64 bytes
Validation failure: Invalid input tensor channel 1 and format size 2 bytes, must be aligned on 64 bytes
Validation failure: Invalid input tensor channel 1 and format size 2 bytes, must be aligned on 64 bytes
Validation failure: Invalid input tensor channel 1 and format size 2 bytes, must be aligned on 64 bytes
Validation failure: Invalid input tensor channel 1 and format size 2 bytes, must be aligned on 64 bytes
Validation failure: Invalid input tensor channel 1 and format size 2 bytes, must be aligned on 64 bytes
[WhisperKit] Loaded text decoder
[WhisperKit] Loading tokenizer for base.en
[WhisperKit] Loaded tokenizer
[WhisperKit] Loaded models for whisper size: base.en

[WhisperKit] Current audio size: 32000 samples, most recent buffer: 1600 samples, most recent energy: (0.063490346, 0.00058010797, 0.0021451442, 1.3345561e-07)

[WhisperKit] Current audio size: 64000 samples, most recent buffer: 1600 samples, most recent energy: (0.038249217, 0.0006892931, 0.0022904596, 7.053459e-07)
[WhisperKit] Decoder init time: 0.012899041175842285
[WhisperKit] Prefill time: 0.0006909370422363281
[WhisperKit] Prefill prompt: ["<|startoftranscript|>", "<|0.00|>"]
[WhisperKit] Decoding Seek: 0
[WhisperKit] Current audio size: 96000 samples, most recent buffer: 1600 samples, most recent energy: (0.0035468133, 0.0005462394, 0.0016842313, 7.582712e-07)

[WhisperKit] Decoding 0.0s - 5.1s
[WhisperKit] Decoding with tempeartures [0.0, 0.2, 0.4, 0.5996, 0.8, 1.0]
[WhisperKit] Decoding Temperature: 0.0
[WhisperKit] Running main loop for a maximum of 223 iterations, starting at index 0
[WhisperKit] Forcing token 50257 at index 0 from initial prompt
[WhisperKit] --------------- DECODER INPUTS DEBUG ---------------
[WhisperKit] Cache Length:  0 Input Token: 50257
[WhisperKit] Key Cache | Val Cache | Align Cache | Update Mask | Decoder Mask | Position
[WhisperKit]  0.000000 |  0.000000 |  0.000000 |           1 |            0 | 0
[WhisperKit]  0.000000 |  0.000000 |  0.000000 |           0 |       -10000 | 1
[WhisperKit]  0.000000 |  0.000000 |  0.000000 |           0 |       -10000 | 2
[WhisperKit]  0.000000 |  0.000000 |  0.000000 |           0 |       -10000 | 3
[WhisperKit] Current audio size: 128000 samples, most recent buffer: 1600 samples, most recent energy: (0.009261748, 0.00054546853, 0.0017254573, 3.4167897e-08)
[WhisperKit] tokenIndex: 0, token: 50361, word: <|nocaptions|>
[WhisperKit] Forcing token 50363 at index 1 from initial prompt
[WhisperKit] --------------- DECODER INPUTS DEBUG ---------------
[WhisperKit] Cache Length:  1 Input Token: 50363
[WhisperKit] Key Cache | Val Cache | Align Cache | Update Mask | Decoder Mask | Position
[WhisperKit]  0.476074 | -0.014931 |  0.000000 |           0 |            0 | 0
[WhisperKit]  0.000000 |  0.000000 |  0.024368 |           1 |            0 | 1
[WhisperKit]  0.000000 |  0.000000 |  0.000000 |           0 |       -10000 | 2
[WhisperKit]  0.000000 |  0.000000 |  0.000000 |           0 |       -10000 | 3
[WhisperKit] tokenIndex: 1, token: 357, word:  (
[WhisperKit] Early stopping
[WhisperKit] Fallback #1.0 (logProbThreshold)
[WhisperKit] Decoding Temperature: 0.2
[WhisperKit] Running main loop for a maximum of 223 iterations, starting at index 0
[WhisperKit] Forcing token 50257 at index 0 from initial prompt
[WhisperKit] --------------- DECODER INPUTS DEBUG ---------------
[WhisperKit] Cache Length:  0 Input Token: 50257
[WhisperKit] Key Cache | Val Cache | Align Cache | Update Mask | Decoder Mask | Position
[WhisperKit]  0.476074 | -0.014931 |  0.000000 |           1 |            0 | 0
[WhisperKit]  0.482666 |  0.555664 |  0.024368 |           0 |       -10000 | 1
[WhisperKit]  0.000000 |  0.000000 |  0.032623 |           0 |       -10000 | 2
[WhisperKit]  0.000000 |  0.000000 |  0.000000 |           0 |       -10000 | 3

ZachNagengast added bug Something isn't working help wanted Extra attention is needed labels Feb 16, 2024

iamyoungjo mentioned this issue May 23, 2024

Experiencing crash on iPad8,8. #144

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

the example is unable to run on iphone 11 pro. #10

the example is unable to run on iphone 11 pro. #10

woodymoo commented Feb 2, 2024

atiorh commented Feb 2, 2024

atiorh commented Feb 2, 2024

ZachNagengast commented Feb 2, 2024

woodymoo commented Feb 3, 2024

ZachNagengast commented Feb 6, 2024

woodymoo commented Feb 8, 2024

ZachNagengast commented Feb 16, 2024

cyrilzakka commented Feb 26, 2024

iamyoungjo commented May 23, 2024 •

edited

the example is unable to run on iphone 11 pro. #10

the example is unable to run on iphone 11 pro. #10

Comments

woodymoo commented Feb 2, 2024

atiorh commented Feb 2, 2024

atiorh commented Feb 2, 2024

ZachNagengast commented Feb 2, 2024

woodymoo commented Feb 3, 2024

ZachNagengast commented Feb 6, 2024

woodymoo commented Feb 8, 2024

ZachNagengast commented Feb 16, 2024

cyrilzakka commented Feb 26, 2024

iamyoungjo commented May 23, 2024 • edited

iamyoungjo commented May 23, 2024 •

edited