You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Like most of the code uploaded by Google developers , your model tuning code that uses the stackoverflow data fails miserably giving the below errors.
{
"summary": "Found 7 errors in your file. See 'errors' field for specific details.\nValidated 4000 examples for tokenization. Found 7 examples where either 'input_text' or 'output_text' exceeds the model token limits. See 'tokenization_issues' field for some specific examples.\nValidated 1000 examples for RAI. Found 43 examples that has RAI issues. See 'rai_issues' field for some specific examples.\n",
"max_user_input_token_length": 8177,
"tokenization_issues": [
"Row: 122. Token limit exceeded for 'input_text' [tokens: 15851|limit: 8192] or 'output_text' [tokens: 24|limit: 1024]",
"Row: 362. Token limit exceeded for 'input_text' [tokens: 13474|limit: 8192] or 'output_text' [tokens: 19|limit: 1024]",
"Row: 391. Token limit exceeded for 'input_text' [tokens: 10643|limit: 8192] or 'output_text' [tokens: 34|limit: 1024]",
"Row: 528. Token limit exceeded for 'input_text' [tokens: 9351|limit: 8192] or 'output_text' [tokens: 17|limit: 1024]",
"Row: 840. Token limit exceeded for 'input_text' [tokens: 16309|limit: 8192] or 'output_text' [tokens: 33|limit: 1024]",
"Row: 868. Token limit exceeded for 'input_text' [tokens: 20337|limit: 8192] or 'output_text' [tokens: 51|limit: 1024]",
"Row: 1535. Token limit exceeded for 'input_text' [tokens: 8969|limit: 8192] or 'output_text' [tokens: 26|limit: 1024]"
],
"rai_issues": [
"Row: 15. RAI violation. High scores for categories Finance",
"Row: 46. RAI violation. High scores for categories Finance",
"Row: 275. RAI violation. High scores for categories Finance",
"Row: 401. RAI violation. High scores for categories Finance",
"Row: 444. RAI violation. High scores for categories Health",
"Row: 503. RAI violation. High scores for categories Finance",
"Row: 558. RAI violation. High scores for categories Finance",
"Row: 571. RAI violation. High scores for categories Health",
"Row: 848. RAI violation. High scores for categories Finance",
"Row: 934. RAI violation. High scores for categories Finance",
"... there are more cases ..."
],
"errors": [
"Row: 122. exceeds token limit",
"Row: 362. exceeds token limit",
"Row: 391. exceeds token limit",
"Row: 528. exceeds token limit",
"Row: 840. exceeds token limit",
"Row: 868. exceeds token limit",
"Row: 1535. exceeds token limit"
],
"max_user_output_token_length": 79
}
The text was updated successfully, but these errors were encountered:
Understood, I am new to this repo but an LLM enthusiast. I can try some reproduction and triage based on a specific use case and code specific run you encountered. Here to help.
I faced similar rai_issues even with private data. It marked when I had a person's name or asked about going to a specific bank website.
It went away once I removed those samples from my jsonl file. So, unless these examples were crucial, you could try removing them.
Like most of the code uploaded by Google developers , your model tuning code that uses the stackoverflow data fails miserably giving the below errors.
The text was updated successfully, but these errors were encountered: