Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Problem with escaped characters in LLM answers #73

Open
oderwat opened this issue Apr 24, 2023 · 3 comments
Open

[BUG] Problem with escaped characters in LLM answers #73

oderwat opened this issue Apr 24, 2023 · 3 comments
Labels
bug Something isn't working

Comments

@oderwat
Copy link

oderwat commented Apr 24, 2023

Describe the bug

When asking to write a program, the LLM may output escaped characters like the following sequence:

//>  fmt
//> .
//> Print
//> f
//> ("%
//> d
//> \
//> n
//> ",
//> i
//> )
//>

This will end up in the frontend as an actual line feed, which makes it invalid code.

To Reproduce

Using alpaca-native-13B-ggml.bin

Prompt: Write a Go program that counts from 1 to 10

The prompt may need to be repeated to get a version where the LLM answers with something using fmt.Printf().

Expected behavior

The output should be correctly escaped for such code usage. I guess it is not easy to detect if the model outputs something that needs to be taken literal as it looks like as if it outputs '\n' for line feeds anyway. But maybe there is some solution I do not see.

Even if not perfect, it may be better to escape things when the LLM output generates double quotes?

@oderwat oderwat added the bug Something isn't working label Apr 24, 2023
@ItsPi3141
Copy link
Owner

I understand the problem here. However, most of the time, the AI isn't used to generate code. And the AI often doesn't use linebreaks in strings in the code that it generates. It would also be very hard to recognize if something is a quote for a string in a programming language or if it's a quote from other texts (e.g. book, news, etc)

@oderwat
Copy link
Author

oderwat commented Apr 24, 2023

Well, I use AI (GPT-4) about 80% of the time to reason about or generate code, and it would be wonderful to be able to use it for that with actual private data (this is also why I don't want to send stuff to Duck Duck Go without being asked first).

But, I already tested and found that llama.cpp does not output the line feeds in the shell as \n anyway, while it does output escaped strings like the \n in the code perfectly fine. See this example generated with chat_mac_x64:

> write a go program that outputs the numbers 1 to 5 together with their written names
func main() {
    for i := 1; i <= 5; i++ {
        fmt.Printf("%d - %s\n", i, english[i-1])
    }
}

cbvb-96E2CE56-5DDE-43C1-AEF9-83C13F71D46B

It seems as if you escape the output of the shell in a way that does not let the frontend distinguish between the actual LF and the normal strings. I understand that it is sometimes mind twisting to get all the escapes right. I guess you can fix that by looking into your pipeline processing and making some modifications to the escapes used.

Likewise, I may also look into that if you can't find a solution, I am pretty confident that this can be solved.

@ItsPi3141
Copy link
Owner

ItsPi3141 commented Apr 24, 2023

Ok thanks for letting me know! I did not know that. I'll look into this and see what I can do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants