Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check for llama_get_logits_ith() errors #7448

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

jart
Copy link
Contributor

@jart jart commented May 21, 2024

Embeddings models like BERT don't have logits. This caused the llamafile software to crash for users who tried to inference mxbai-embed-large-v1. This change potentially helps prevent the server from crashing. Since it is possible for this function to fail having callers check the result is a good idea from a defensive coding standpoint. The older exception code has also been refactored, since it's no longer needed.

@compilade compilade self-requested a review May 21, 2024 21:40
@github-actions github-actions bot added android Issues specific to Android examples labels May 21, 2024
Embeddings models like BERT don't have logits. This caused the llamafile
software to crash for users who tried to inference mxbai-embed-large-v1.
This change potentially helps prevent the server from crashing. Since it
is possible for this function to fail having callers check the result is
a good idea from a defensive coding standpoint. The older exception code
has also been refactored, since it's no longer needed.
llama.cpp Outdated Show resolved Hide resolved
llama.cpp Outdated Show resolved Hide resolved
llama.cpp Outdated Show resolved Hide resolved
llama.cpp Outdated Show resolved Hide resolved
examples/perplexity/perplexity.cpp Outdated Show resolved Hide resolved
common/sampling.cpp Outdated Show resolved Hide resolved
common/sampling.cpp Outdated Show resolved Hide resolved
jart and others added 6 commits May 21, 2024 16:19
Co-authored-by: compilade <git@compilade.net>
Co-authored-by: compilade <git@compilade.net>
Co-authored-by: compilade <git@compilade.net>
Co-authored-by: compilade <git@compilade.net>
Co-authored-by: compilade <git@compilade.net>
Co-authored-by: compilade <git@compilade.net>
@jart
Copy link
Contributor Author

jart commented May 21, 2024

Thanks for the detailed review!

Copy link
Contributor

github-actions bot commented May 22, 2024

📈 llama.cpp server for bench-server-baseline on Standard_NC4as_T4_v3 for phi-2-q4_0: 564 iterations 🚀

Expand details for performance related PR only
  • Concurrent users: 8, duration: 10m
  • HTTP request : avg=8252.3ms p(95)=19822.05ms fails=, finish reason: stop=512 truncated=52
  • Prompt processing (pp): avg=88.16tk/s p(95)=353.3tk/s
  • Token generation (tg): avg=47.58tk/s p(95)=48.04tk/s
  • ggml-org/models/phi-2/ggml-model-q4_0.gguf parallel=8 ctx-size=16384 ngl=33 batch-size=2048 ubatch-size=256 pp=1024 pp+tg=2048 branch=failhouse commit=a94895217c675e6d5705acc46bc314f1f242438c

prompt_tokens_seconds

More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 564 iterations"
    y-axis "llamacpp:prompt_tokens_seconds"
    x-axis "llamacpp:prompt_tokens_seconds" 1716349902 --> 1716350528
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 632.23, 632.23, 632.23, 632.23, 632.23, 653.84, 653.84, 653.84, 653.84, 653.84, 684.26, 684.26, 684.26, 684.26, 684.26, 762.78, 762.78, 762.78, 762.78, 762.78, 781.47, 781.47, 781.47, 781.47, 781.47, 781.65, 781.65, 781.65, 781.65, 781.65, 797.3, 797.3, 797.3, 797.3, 797.3, 811.09, 811.09, 811.09, 811.09, 811.09, 818.35, 818.35, 818.35, 818.35, 818.35, 818.74, 818.74, 818.74, 818.74, 818.74, 844.1, 844.1, 844.1, 844.1, 844.1, 883.0, 883.0, 883.0, 883.0, 883.0, 894.47, 894.47, 894.47, 894.47, 894.47, 916.59, 916.59, 916.59, 916.59, 916.59, 923.81, 923.81, 923.81, 923.81, 923.81, 926.73, 926.73, 926.73, 926.73, 926.73, 926.72, 926.72, 926.72, 926.72, 926.72, 929.93, 929.93, 929.93, 929.93, 929.93, 931.96, 931.96, 931.96, 931.96, 931.96, 934.15, 934.15, 934.15, 934.15, 934.15, 939.05, 939.05, 939.05, 939.05, 939.05, 938.72, 938.72, 938.72, 938.72, 938.72, 930.59, 930.59, 930.59, 930.59, 930.59, 928.09, 928.09, 928.09, 928.09, 928.09, 929.68, 929.68, 929.68, 929.68, 929.68, 917.25, 917.25, 917.25, 917.25, 917.25, 914.7, 914.7, 914.7, 914.7, 914.7, 912.77, 912.77, 912.77, 912.77, 912.77, 915.98, 915.98, 915.98, 915.98, 915.98, 915.99, 915.99, 915.99, 915.99, 915.99, 912.81, 912.81, 912.81, 912.81, 912.81, 916.13, 916.13, 916.13, 916.13, 916.13, 884.16, 884.16, 884.16, 884.16, 884.16, 853.07, 853.07, 853.07, 853.07, 853.07, 833.31, 833.31, 833.31, 833.31, 833.31, 831.31, 831.31, 831.31, 831.31, 831.31, 831.53, 831.53, 831.53, 831.53, 831.53, 833.59, 833.59, 833.59, 833.59, 833.59, 833.86, 833.86, 833.86, 833.86, 833.86, 841.57, 841.57, 841.57, 841.57, 841.57, 848.46, 848.46, 848.46, 848.46, 848.46, 848.96, 848.96, 848.96, 848.96, 848.96, 848.43, 848.43, 848.43, 848.43, 848.43, 846.0, 846.0, 846.0, 846.0, 846.0, 846.11, 846.11, 846.11, 846.11, 846.11, 851.45, 851.45, 851.45, 851.45, 851.45, 850.5, 850.5, 850.5, 850.5, 850.5, 851.32, 851.32, 851.32, 851.32, 851.32, 855.25, 855.25, 855.25, 855.25, 855.25, 858.34, 858.34, 858.34, 858.34, 858.34, 862.55, 862.55, 862.55, 862.55, 862.55, 861.62, 861.62, 861.62, 861.62, 861.62, 860.18, 860.18, 860.18, 860.18, 860.18, 860.99, 860.99, 860.99, 860.99, 860.99, 859.8, 859.8, 859.8, 859.8, 859.8, 859.25, 859.25, 859.25, 859.25, 859.25, 859.68, 859.68, 859.68, 859.68, 859.68, 861.01, 861.01, 861.01, 861.01, 861.01, 864.73, 864.73, 864.73, 864.73, 864.73, 862.93, 862.93, 862.93, 862.93]
                    
predicted_tokens_seconds
More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 564 iterations"
    y-axis "llamacpp:predicted_tokens_seconds"
    x-axis "llamacpp:predicted_tokens_seconds" 1716349902 --> 1716350528
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 36.94, 36.94, 36.94, 36.94, 36.94, 33.33, 33.33, 33.33, 33.33, 33.33, 30.42, 30.42, 30.42, 30.42, 30.42, 32.76, 32.76, 32.76, 32.76, 32.76, 32.95, 32.95, 32.95, 32.95, 32.95, 33.76, 33.76, 33.76, 33.76, 33.76, 34.58, 34.58, 34.58, 34.58, 34.58, 34.6, 34.6, 34.6, 34.6, 34.6, 34.5, 34.5, 34.5, 34.5, 34.5, 33.93, 33.93, 33.93, 33.93, 33.93, 34.04, 34.04, 34.04, 34.04, 34.04, 33.42, 33.42, 33.42, 33.42, 33.42, 32.87, 32.87, 32.87, 32.87, 32.87, 32.78, 32.78, 32.78, 32.78, 32.78, 32.28, 32.28, 32.28, 32.28, 32.28, 30.3, 30.3, 30.3, 30.3, 30.3, 30.34, 30.34, 30.34, 30.34, 30.34, 30.76, 30.76, 30.76, 30.76, 30.76, 30.58, 30.58, 30.58, 30.58, 30.58, 30.78, 30.78, 30.78, 30.78, 30.78, 30.85, 30.85, 30.85, 30.85, 30.85, 31.09, 31.09, 31.09, 31.09, 31.09, 31.14, 31.14, 31.14, 31.14, 31.14, 31.03, 31.03, 31.03, 31.03, 31.03, 31.06, 31.06, 31.06, 31.06, 31.06, 31.24, 31.24, 31.24, 31.24, 31.24, 31.05, 31.05, 31.05, 31.05, 31.05, 31.07, 31.07, 31.07, 31.07, 31.07, 31.42, 31.42, 31.42, 31.42, 31.42, 31.48, 31.48, 31.48, 31.48, 31.48, 31.6, 31.6, 31.6, 31.6, 31.6, 31.74, 31.74, 31.74, 31.74, 31.74, 31.85, 31.85, 31.85, 31.85, 31.85, 31.75, 31.75, 31.75, 31.75, 31.75, 31.68, 31.68, 31.68, 31.68, 31.68, 31.16, 31.16, 31.16, 31.16, 31.16, 31.14, 31.14, 31.14, 31.14, 31.14, 31.27, 31.27, 31.27, 31.27, 31.27, 31.33, 31.33, 31.33, 31.33, 31.33, 31.48, 31.48, 31.48, 31.48, 31.48, 31.28, 31.28, 31.28, 31.28, 31.28, 31.28, 31.28, 31.28, 31.28, 31.28, 30.78, 30.78, 30.78, 30.78, 30.78, 29.63, 29.63, 29.63, 29.63, 29.63, 29.3, 29.3, 29.3, 29.3, 29.3, 29.26, 29.26, 29.26, 29.26, 29.26, 29.23, 29.23, 29.23, 29.23, 29.23, 29.27, 29.27, 29.27, 29.27, 29.27, 29.37, 29.37, 29.37, 29.37, 29.37, 29.46, 29.46, 29.46, 29.46, 29.46, 29.49, 29.49, 29.49, 29.49, 29.49, 29.46, 29.46, 29.46, 29.46, 29.46, 29.46, 29.46, 29.46, 29.46, 29.46, 29.44, 29.44, 29.44, 29.44, 29.44, 29.47, 29.47, 29.47, 29.47, 29.47, 29.61, 29.61, 29.61, 29.61, 29.61, 29.69, 29.69, 29.69, 29.69, 29.69, 29.82, 29.82, 29.82, 29.82, 29.82, 29.9, 29.9, 29.9, 29.9, 29.9, 30.04, 30.04, 30.04, 30.04]
                    

Details

kv_cache_usage_ratio

More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 564 iterations"
    y-axis "llamacpp:kv_cache_usage_ratio"
    x-axis "llamacpp:kv_cache_usage_ratio" 1716349902 --> 1716350528
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.31, 0.31, 0.31, 0.31, 0.31, 0.28, 0.28, 0.28, 0.28, 0.28, 0.17, 0.17, 0.17, 0.17, 0.17, 0.18, 0.18, 0.18, 0.18, 0.18, 0.19, 0.19, 0.19, 0.19, 0.19, 0.11, 0.11, 0.11, 0.11, 0.11, 0.16, 0.16, 0.16, 0.16, 0.16, 0.19, 0.19, 0.19, 0.19, 0.19, 0.19, 0.19, 0.19, 0.19, 0.19, 0.17, 0.17, 0.17, 0.17, 0.17, 0.22, 0.22, 0.22, 0.22, 0.22, 0.28, 0.28, 0.28, 0.28, 0.28, 0.24, 0.24, 0.24, 0.24, 0.24, 0.37, 0.37, 0.37, 0.37, 0.37, 0.36, 0.36, 0.36, 0.36, 0.36, 0.23, 0.23, 0.23, 0.23, 0.23, 0.14, 0.14, 0.14, 0.14, 0.14, 0.21, 0.21, 0.21, 0.21, 0.21, 0.1, 0.1, 0.1, 0.1, 0.1, 0.15, 0.15, 0.15, 0.15, 0.15, 0.16, 0.16, 0.16, 0.16, 0.16, 0.18, 0.18, 0.18, 0.18, 0.18, 0.33, 0.33, 0.33, 0.33, 0.33, 0.14, 0.14, 0.14, 0.14, 0.14, 0.13, 0.13, 0.13, 0.13, 0.13, 0.28, 0.28, 0.28, 0.28, 0.28, 0.21, 0.21, 0.21, 0.21, 0.21, 0.14, 0.14, 0.14, 0.14, 0.14, 0.15, 0.15, 0.15, 0.15, 0.15, 0.18, 0.18, 0.18, 0.18, 0.18, 0.1, 0.1, 0.1, 0.1, 0.1, 0.13, 0.13, 0.13, 0.13, 0.13, 0.26, 0.26, 0.26, 0.26, 0.26, 0.34, 0.34, 0.34, 0.34, 0.34, 0.42, 0.42, 0.42, 0.42, 0.42, 0.23, 0.23, 0.23, 0.23, 0.23, 0.16, 0.16, 0.16, 0.16, 0.16, 0.15, 0.15, 0.15, 0.15, 0.15, 0.11, 0.11, 0.11, 0.11, 0.11, 0.28, 0.28, 0.28, 0.28, 0.28, 0.51, 0.51, 0.51, 0.51, 0.51, 0.58, 0.58, 0.58, 0.58, 0.58, 0.57, 0.57, 0.57, 0.57, 0.57, 0.53, 0.53, 0.53, 0.53, 0.53, 0.18, 0.18, 0.18, 0.18, 0.18, 0.23, 0.23, 0.23, 0.23, 0.23, 0.24, 0.24, 0.24, 0.24, 0.24, 0.13, 0.13, 0.13, 0.13, 0.13, 0.08, 0.08, 0.08, 0.08, 0.08, 0.16, 0.16, 0.16, 0.16, 0.16, 0.25, 0.25, 0.25, 0.25, 0.25, 0.19, 0.19, 0.19, 0.19, 0.19, 0.21, 0.21, 0.21, 0.21, 0.21, 0.22, 0.22, 0.22, 0.22, 0.22, 0.17, 0.17, 0.17, 0.17, 0.17, 0.14, 0.14, 0.14, 0.14, 0.14, 0.1, 0.1, 0.1, 0.1, 0.1, 0.13, 0.13, 0.13, 0.13, 0.13, 0.15, 0.15, 0.15, 0.15, 0.15, 0.08, 0.08, 0.08, 0.08]
                    
requests_processing
More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 564 iterations"
    y-axis "llamacpp:requests_processing"
    x-axis "llamacpp:requests_processing" 1716349902 --> 1716350528
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 4.0, 4.0, 4.0, 4.0, 4.0, 8.0, 8.0, 8.0, 8.0, 8.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 8.0, 8.0, 8.0, 8.0, 8.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 2.0, 2.0, 2.0, 2.0, 2.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 3.0, 3.0, 3.0, 3.0, 3.0, 7.0, 7.0, 7.0, 7.0, 7.0, 1.0, 1.0, 1.0, 1.0, 1.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 2.0, 2.0, 2.0, 2.0, 2.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 3.0, 3.0, 3.0, 3.0, 3.0, 7.0, 7.0, 7.0, 7.0, 7.0, 4.0, 4.0, 4.0, 4.0, 4.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0, 5.0, 5.0, 1.0, 1.0, 1.0, 1.0, 1.0, 6.0, 6.0, 6.0, 6.0, 6.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 3.0, 3.0, 3.0, 3.0]
                    

@jart
Copy link
Contributor Author

jart commented May 22, 2024

@compilade I'm reasonably certain it's passing and I've addressed your comments, if you want to take a look.

@@ -457,7 +463,9 @@ int main(int argc, char ** argv) {
continue;
}

llama_sampling_sample(drafts[s].ctx_sampling, ctx_dft, NULL, drafts[s].i_batch_dft);
if (llama_sampling_sample(drafts[s].ctx_sampling, ctx_dft, NULL, drafts[s].i_batch_dft) == -1) {
return -1;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's returning -1 here while it's returning 1 in other places for the same kind of check in the same file. Why?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It returns 1 from main() functions to exit(1).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. But it's -1 (negative one) here, so the exit code will be 255.

@@ -44,6 +44,7 @@ static const char * sample(struct llama_sampling_context * ctx_sampling,
struct llama_context * ctx_llama,
int * n_past) {
const llama_token id = llama_sampling_sample(ctx_sampling, ctx_llama, NULL);
GGML_ASSERT(id != -1);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sometimes it's a return 1, other times it's an assertion, or an exception.
Which to use when? Should a single way be chosen?

examples/server/server.cpp Outdated Show resolved Hide resolved
@mofosyne mofosyne added the review complexity : medium Generally require more time to grok but manageable by beginner to medium expertise level label May 22, 2024
Co-authored-by: compilade <git@compilade.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
android Issues specific to Android examples review complexity : medium Generally require more time to grok but manageable by beginner to medium expertise level server
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants