Check for llama_get_logits_ith() errors #7448

jart · 2024-05-21T21:20:16Z

Embeddings models like BERT don't have logits. This caused the llamafile software to crash for users who tried to inference mxbai-embed-large-v1. This change potentially helps prevent the server from crashing. Since it is possible for this function to fail having callers check the result is a good idea from a defensive coding standpoint. The older exception code has also been refactored, since it's no longer needed.

llama.cpp

examples/perplexity/perplexity.cpp

common/sampling.cpp

Co-authored-by: compilade <git@compilade.net>

jart · 2024-05-21T23:21:57Z

Thanks for the detailed review!

github-actions · 2024-05-22T01:20:29Z

📈 llama.cpp server for bench-server-baseline on Standard_NC4as_T4_v3 for phi-2-q4_0: 564 iterations 🚀

Expand details for performance related PR only

Concurrent users: 8, duration: 10m
HTTP request : avg=8252.3ms p(95)=19822.05ms fails=, finish reason: stop=512 truncated=52
Prompt processing (pp): avg=88.16tk/s p(95)=353.3tk/s
Token generation (tg): avg=47.58tk/s p(95)=48.04tk/s
ggml-org/models/phi-2/ggml-model-q4_0.gguf parallel=8 ctx-size=16384 ngl=33 batch-size=2048 ubatch-size=256 pp=1024 pp+tg=2048 branch=failhouse commit=a94895217c675e6d5705acc46bc314f1f242438c

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 564 iterations"
    y-axis "llamacpp:prompt_tokens_seconds"
    x-axis "llamacpp:prompt_tokens_seconds" 1716349902 --> 1716350528
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 632.23, 632.23, 632.23, 632.23, 632.23, 653.84, 653.84, 653.84, 653.84, 653.84, 684.26, 684.26, 684.26, 684.26, 684.26, 762.78, 762.78, 762.78, 762.78, 762.78, 781.47, 781.47, 781.47, 781.47, 781.47, 781.65, 781.65, 781.65, 781.65, 781.65, 797.3, 797.3, 797.3, 797.3, 797.3, 811.09, 811.09, 811.09, 811.09, 811.09, 818.35, 818.35, 818.35, 818.35, 818.35, 818.74, 818.74, 818.74, 818.74, 818.74, 844.1, 844.1, 844.1, 844.1, 844.1, 883.0, 883.0, 883.0, 883.0, 883.0, 894.47, 894.47, 894.47, 894.47, 894.47, 916.59, 916.59, 916.59, 916.59, 916.59, 923.81, 923.81, 923.81, 923.81, 923.81, 926.73, 926.73, 926.73, 926.73, 926.73, 926.72, 926.72, 926.72, 926.72, 926.72, 929.93, 929.93, 929.93, 929.93, 929.93, 931.96, 931.96, 931.96, 931.96, 931.96, 934.15, 934.15, 934.15, 934.15, 934.15, 939.05, 939.05, 939.05, 939.05, 939.05, 938.72, 938.72, 938.72, 938.72, 938.72, 930.59, 930.59, 930.59, 930.59, 930.59, 928.09, 928.09, 928.09, 928.09, 928.09, 929.68, 929.68, 929.68, 929.68, 929.68, 917.25, 917.25, 917.25, 917.25, 917.25, 914.7, 914.7, 914.7, 914.7, 914.7, 912.77, 912.77, 912.77, 912.77, 912.77, 915.98, 915.98, 915.98, 915.98, 915.98, 915.99, 915.99, 915.99, 915.99, 915.99, 912.81, 912.81, 912.81, 912.81, 912.81, 916.13, 916.13, 916.13, 916.13, 916.13, 884.16, 884.16, 884.16, 884.16, 884.16, 853.07, 853.07, 853.07, 853.07, 853.07, 833.31, 833.31, 833.31, 833.31, 833.31, 831.31, 831.31, 831.31, 831.31, 831.31, 831.53, 831.53, 831.53, 831.53, 831.53, 833.59, 833.59, 833.59, 833.59, 833.59, 833.86, 833.86, 833.86, 833.86, 833.86, 841.57, 841.57, 841.57, 841.57, 841.57, 848.46, 848.46, 848.46, 848.46, 848.46, 848.96, 848.96, 848.96, 848.96, 848.96, 848.43, 848.43, 848.43, 848.43, 848.43, 846.0, 846.0, 846.0, 846.0, 846.0, 846.11, 846.11, 846.11, 846.11, 846.11, 851.45, 851.45, 851.45, 851.45, 851.45, 850.5, 850.5, 850.5, 850.5, 850.5, 851.32, 851.32, 851.32, 851.32, 851.32, 855.25, 855.25, 855.25, 855.25, 855.25, 858.34, 858.34, 858.34, 858.34, 858.34, 862.55, 862.55, 862.55, 862.55, 862.55, 861.62, 861.62, 861.62, 861.62, 861.62, 860.18, 860.18, 860.18, 860.18, 860.18, 860.99, 860.99, 860.99, 860.99, 860.99, 859.8, 859.8, 859.8, 859.8, 859.8, 859.25, 859.25, 859.25, 859.25, 859.25, 859.68, 859.68, 859.68, 859.68, 859.68, 861.01, 861.01, 861.01, 861.01, 861.01, 864.73, 864.73, 864.73, 864.73, 864.73, 862.93, 862.93, 862.93, 862.93]

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 564 iterations"
    y-axis "llamacpp:predicted_tokens_seconds"
    x-axis "llamacpp:predicted_tokens_seconds" 1716349902 --> 1716350528
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 36.94, 36.94, 36.94, 36.94, 36.94, 33.33, 33.33, 33.33, 33.33, 33.33, 30.42, 30.42, 30.42, 30.42, 30.42, 32.76, 32.76, 32.76, 32.76, 32.76, 32.95, 32.95, 32.95, 32.95, 32.95, 33.76, 33.76, 33.76, 33.76, 33.76, 34.58, 34.58, 34.58, 34.58, 34.58, 34.6, 34.6, 34.6, 34.6, 34.6, 34.5, 34.5, 34.5, 34.5, 34.5, 33.93, 33.93, 33.93, 33.93, 33.93, 34.04, 34.04, 34.04, 34.04, 34.04, 33.42, 33.42, 33.42, 33.42, 33.42, 32.87, 32.87, 32.87, 32.87, 32.87, 32.78, 32.78, 32.78, 32.78, 32.78, 32.28, 32.28, 32.28, 32.28, 32.28, 30.3, 30.3, 30.3, 30.3, 30.3, 30.34, 30.34, 30.34, 30.34, 30.34, 30.76, 30.76, 30.76, 30.76, 30.76, 30.58, 30.58, 30.58, 30.58, 30.58, 30.78, 30.78, 30.78, 30.78, 30.78, 30.85, 30.85, 30.85, 30.85, 30.85, 31.09, 31.09, 31.09, 31.09, 31.09, 31.14, 31.14, 31.14, 31.14, 31.14, 31.03, 31.03, 31.03, 31.03, 31.03, 31.06, 31.06, 31.06, 31.06, 31.06, 31.24, 31.24, 31.24, 31.24, 31.24, 31.05, 31.05, 31.05, 31.05, 31.05, 31.07, 31.07, 31.07, 31.07, 31.07, 31.42, 31.42, 31.42, 31.42, 31.42, 31.48, 31.48, 31.48, 31.48, 31.48, 31.6, 31.6, 31.6, 31.6, 31.6, 31.74, 31.74, 31.74, 31.74, 31.74, 31.85, 31.85, 31.85, 31.85, 31.85, 31.75, 31.75, 31.75, 31.75, 31.75, 31.68, 31.68, 31.68, 31.68, 31.68, 31.16, 31.16, 31.16, 31.16, 31.16, 31.14, 31.14, 31.14, 31.14, 31.14, 31.27, 31.27, 31.27, 31.27, 31.27, 31.33, 31.33, 31.33, 31.33, 31.33, 31.48, 31.48, 31.48, 31.48, 31.48, 31.28, 31.28, 31.28, 31.28, 31.28, 31.28, 31.28, 31.28, 31.28, 31.28, 30.78, 30.78, 30.78, 30.78, 30.78, 29.63, 29.63, 29.63, 29.63, 29.63, 29.3, 29.3, 29.3, 29.3, 29.3, 29.26, 29.26, 29.26, 29.26, 29.26, 29.23, 29.23, 29.23, 29.23, 29.23, 29.27, 29.27, 29.27, 29.27, 29.27, 29.37, 29.37, 29.37, 29.37, 29.37, 29.46, 29.46, 29.46, 29.46, 29.46, 29.49, 29.49, 29.49, 29.49, 29.49, 29.46, 29.46, 29.46, 29.46, 29.46, 29.46, 29.46, 29.46, 29.46, 29.46, 29.44, 29.44, 29.44, 29.44, 29.44, 29.47, 29.47, 29.47, 29.47, 29.47, 29.61, 29.61, 29.61, 29.61, 29.61, 29.69, 29.69, 29.69, 29.69, 29.69, 29.82, 29.82, 29.82, 29.82, 29.82, 29.9, 29.9, 29.9, 29.9, 29.9, 30.04, 30.04, 30.04, 30.04]

Details

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 564 iterations"
    y-axis "llamacpp:kv_cache_usage_ratio"
    x-axis "llamacpp:kv_cache_usage_ratio" 1716349902 --> 1716350528
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.31, 0.31, 0.31, 0.31, 0.31, 0.28, 0.28, 0.28, 0.28, 0.28, 0.17, 0.17, 0.17, 0.17, 0.17, 0.18, 0.18, 0.18, 0.18, 0.18, 0.19, 0.19, 0.19, 0.19, 0.19, 0.11, 0.11, 0.11, 0.11, 0.11, 0.16, 0.16, 0.16, 0.16, 0.16, 0.19, 0.19, 0.19, 0.19, 0.19, 0.19, 0.19, 0.19, 0.19, 0.19, 0.17, 0.17, 0.17, 0.17, 0.17, 0.22, 0.22, 0.22, 0.22, 0.22, 0.28, 0.28, 0.28, 0.28, 0.28, 0.24, 0.24, 0.24, 0.24, 0.24, 0.37, 0.37, 0.37, 0.37, 0.37, 0.36, 0.36, 0.36, 0.36, 0.36, 0.23, 0.23, 0.23, 0.23, 0.23, 0.14, 0.14, 0.14, 0.14, 0.14, 0.21, 0.21, 0.21, 0.21, 0.21, 0.1, 0.1, 0.1, 0.1, 0.1, 0.15, 0.15, 0.15, 0.15, 0.15, 0.16, 0.16, 0.16, 0.16, 0.16, 0.18, 0.18, 0.18, 0.18, 0.18, 0.33, 0.33, 0.33, 0.33, 0.33, 0.14, 0.14, 0.14, 0.14, 0.14, 0.13, 0.13, 0.13, 0.13, 0.13, 0.28, 0.28, 0.28, 0.28, 0.28, 0.21, 0.21, 0.21, 0.21, 0.21, 0.14, 0.14, 0.14, 0.14, 0.14, 0.15, 0.15, 0.15, 0.15, 0.15, 0.18, 0.18, 0.18, 0.18, 0.18, 0.1, 0.1, 0.1, 0.1, 0.1, 0.13, 0.13, 0.13, 0.13, 0.13, 0.26, 0.26, 0.26, 0.26, 0.26, 0.34, 0.34, 0.34, 0.34, 0.34, 0.42, 0.42, 0.42, 0.42, 0.42, 0.23, 0.23, 0.23, 0.23, 0.23, 0.16, 0.16, 0.16, 0.16, 0.16, 0.15, 0.15, 0.15, 0.15, 0.15, 0.11, 0.11, 0.11, 0.11, 0.11, 0.28, 0.28, 0.28, 0.28, 0.28, 0.51, 0.51, 0.51, 0.51, 0.51, 0.58, 0.58, 0.58, 0.58, 0.58, 0.57, 0.57, 0.57, 0.57, 0.57, 0.53, 0.53, 0.53, 0.53, 0.53, 0.18, 0.18, 0.18, 0.18, 0.18, 0.23, 0.23, 0.23, 0.23, 0.23, 0.24, 0.24, 0.24, 0.24, 0.24, 0.13, 0.13, 0.13, 0.13, 0.13, 0.08, 0.08, 0.08, 0.08, 0.08, 0.16, 0.16, 0.16, 0.16, 0.16, 0.25, 0.25, 0.25, 0.25, 0.25, 0.19, 0.19, 0.19, 0.19, 0.19, 0.21, 0.21, 0.21, 0.21, 0.21, 0.22, 0.22, 0.22, 0.22, 0.22, 0.17, 0.17, 0.17, 0.17, 0.17, 0.14, 0.14, 0.14, 0.14, 0.14, 0.1, 0.1, 0.1, 0.1, 0.1, 0.13, 0.13, 0.13, 0.13, 0.13, 0.15, 0.15, 0.15, 0.15, 0.15, 0.08, 0.08, 0.08, 0.08]

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 564 iterations"
    y-axis "llamacpp:requests_processing"
    x-axis "llamacpp:requests_processing" 1716349902 --> 1716350528
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 4.0, 4.0, 4.0, 4.0, 4.0, 8.0, 8.0, 8.0, 8.0, 8.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 8.0, 8.0, 8.0, 8.0, 8.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 2.0, 2.0, 2.0, 2.0, 2.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 3.0, 3.0, 3.0, 3.0, 3.0, 7.0, 7.0, 7.0, 7.0, 7.0, 1.0, 1.0, 1.0, 1.0, 1.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 2.0, 2.0, 2.0, 2.0, 2.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 3.0, 3.0, 3.0, 3.0, 3.0, 7.0, 7.0, 7.0, 7.0, 7.0, 4.0, 4.0, 4.0, 4.0, 4.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0, 5.0, 5.0, 1.0, 1.0, 1.0, 1.0, 1.0, 6.0, 6.0, 6.0, 6.0, 6.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 3.0, 3.0, 3.0, 3.0]

jart · 2024-05-22T02:52:21Z

@compilade I'm reasonably certain it's passing and I've addressed your comments, if you want to take a look.

compilade · 2024-05-22T01:21:41Z

examples/speculative/speculative.cpp

@@ -457,7 +463,9 @@ int main(int argc, char ** argv) {
                    continue;
                }

-                llama_sampling_sample(drafts[s].ctx_sampling, ctx_dft, NULL, drafts[s].i_batch_dft);
+                if (llama_sampling_sample(drafts[s].ctx_sampling, ctx_dft, NULL, drafts[s].i_batch_dft) == -1) {
+                    return -1;


It's returning -1 here while it's returning 1 in other places for the same kind of check in the same file. Why?

It returns 1 from main() functions to exit(1).

Right. But it's -1 (negative one) here, so the exit code will be 255.

compilade · 2024-05-22T02:30:27Z

examples/llava/llava-cli.cpp

@@ -44,6 +44,7 @@ static const char * sample(struct llama_sampling_context * ctx_sampling,
                           struct llama_context * ctx_llama,
                           int * n_past) {
    const llama_token id = llama_sampling_sample(ctx_sampling, ctx_llama, NULL);
+    GGML_ASSERT(id != -1);


Sometimes it's a return 1, other times it's an assertion, or an exception.
Which to use when? Should a single way be chosen?

examples/server/server.cpp

Co-authored-by: compilade <git@compilade.net>

compilade self-requested a review May 21, 2024 21:40

github-actions bot added android Issues specific to Android examples labels May 21, 2024

jart force-pushed the failhouse branch from 880c40f to 99291c0 Compare May 21, 2024 22:29

compilade reviewed May 21, 2024

View reviewed changes

jart and others added 6 commits May 21, 2024 16:19

Update llama.cpp

ac6bed1

Co-authored-by: compilade <git@compilade.net>

Update llama.cpp

5224b65

Co-authored-by: compilade <git@compilade.net>

Update llama.cpp

bc9a2e8

Co-authored-by: compilade <git@compilade.net>

Update llama.cpp

cc363da

Co-authored-by: compilade <git@compilade.net>

Update examples/perplexity/perplexity.cpp

6b17898

Co-authored-by: compilade <git@compilade.net>

Update common/sampling.cpp

aa3094c

Co-authored-by: compilade <git@compilade.net>

github-actions bot added the server label May 22, 2024

Make sampling not throw exception

a948952

jart force-pushed the failhouse branch from 6c6d55b to a948952 Compare May 22, 2024 02:24

compilade reviewed May 22, 2024

View reviewed changes

mofosyne added the review complexity : medium Generally require more time to grok but manageable by beginner to medium expertise level label May 22, 2024

Update examples/server/server.cpp

8be06dc

Co-authored-by: compilade <git@compilade.net>

compilade mentioned this pull request May 23, 2024

Allow pooled embeddings on any model #7477

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Check for llama_get_logits_ith() errors #7448

Check for llama_get_logits_ith() errors #7448

jart commented May 21, 2024

jart commented May 21, 2024

github-actions bot commented May 22, 2024 •

edited

jart commented May 22, 2024

compilade May 22, 2024

jart May 22, 2024

compilade May 22, 2024

compilade May 22, 2024

Check for llama_get_logits_ith() errors #7448

Are you sure you want to change the base?

Check for llama_get_logits_ith() errors #7448

Conversation

jart commented May 21, 2024

jart commented May 21, 2024

github-actions bot commented May 22, 2024 • edited

jart commented May 22, 2024

compilade May 22, 2024

Choose a reason for hiding this comment

jart May 22, 2024

Choose a reason for hiding this comment

compilade May 22, 2024

Choose a reason for hiding this comment

compilade May 22, 2024

Choose a reason for hiding this comment

github-actions bot commented May 22, 2024 •

edited