Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I found a flaw: "Bot is cooking for too long." #139

Open
yoobaring opened this issue Nov 3, 2023 · 21 comments
Open

I found a flaw: "Bot is cooking for too long." #139

yoobaring opened this issue Nov 3, 2023 · 21 comments
Labels
bug Something isn't working enhancement New feature or request

Comments

@yoobaring
Copy link

yoobaring commented Nov 3, 2023

image

Hi @n4ze3m I have encountered an issue. The bot is cooking for too long. I've found this problem in documents with multiple pages, but for a small number of pages, I haven't encountered this issue. I hope this issue will be resolved. Please note that I'm running tests on a railway.

@chalitbkb
Copy link

chalitbkb commented Nov 3, 2023

Hi @n4ze3m I have encountered an issue. The bot is cooking for too long. I've found this problem in documents with multiple pages, but for a small number of pages, I haven't encountered this issue. I hope this issue will be resolved. Please note that I'm running tests on a railway.

This is the same issue I've encountered as well. I've opened a ticket to report this problem before. I believe it won't be long before it gets fixed. Let's wait for the next update...

@n4ze3m
Copy link
Owner

n4ze3m commented Nov 3, 2023

which data source is causing the issue, the docx or the pdf? I know this issue occurs with the railway and works fine locally. I will be looking into a solution

@yoobaring
Copy link
Author

which data source is causing the issue, the docx or the pdf? I know this issue occurs with the railway and works fine locally. I will be looking into a solution

Both

@n4ze3m n4ze3m mentioned this issue Nov 18, 2023
@n4ze3m
Copy link
Owner

n4ze3m commented Nov 18, 2023

I have updated the railway template, which may fix the file processing issues.

@yoobaring
Copy link
Author

yoobaring commented Nov 18, 2023

@n4ze3m I waited 3 hours and still got the same problem. Nothing has changed. The problem has not been completely resolved. My file has approximately 500-1000 pages. How many pages did you test the document for? Please try 500-1000 or more pages and you will encounter this problem.

image

@n4ze3m
Copy link
Owner

n4ze3m commented Nov 18, 2023

Is this issue related to the railway or is it local?

For Railway,I think you need to reinstall the railway template. The old one doesn't have a Docker mount, which may be causing the issue.

@yoobaring
Copy link
Author

Is this issue related to the railway or is it local?

For Railway,I think you need to reinstall the railway template. The old one doesn't have a Docker mount, which may be causing the issue.

railway

I tried it and your latest version is 1.4.1.

@n4ze3m
Copy link
Owner

n4ze3m commented Nov 19, 2023

Hello, can you reinstall your railway app? The latest update has mounted an upload folder, preventing the deletion of uploaded files.

Railway template: https://railway.app/template/TXdjD7

I have tested a 758-page PDF, approximately 17 MB, using Cohere embedding, and it's working without any issue.

PDF I tested: https://www.microsoft.com/en-us/research/uploads/prod/2006/01/Bishop-Pattern-Recognition-and-Machine-Learning-2006.pdf

image

@yoobaring
Copy link
Author

yoobaring commented Nov 19, 2023

Hello, can you reinstall your railway app? The latest update has mounted an upload folder, preventing the deletion of uploaded files.

Railway template: https://railway.app/template/TXdjD7

I have tested a 758-page PDF, approximately 17 MB, using Cohere embedding, and it's working without any issue.

PDF I tested: https://www.microsoft.com/en-us/research/uploads/prod/2006/01/Bishop-Pattern-Recognition-and-Machine-Learning-2006.pdf

@n4ze3m

Please carefully watch the video. Do not fast forward or skip, as there are explanations that you need to read.

https://streamable.com/afx0tc

I have tested on the "Railway" again, and it seems that I am still encountering the same issues. Here are my observations:

  1. When testing with the "Cohere API," there are no issues when using files with the .pdf extension. However, problems arise when working with files in the .docx format.

  2. Testing with the "OpenAI API" reveals problems with files that have multiple pages, including the file you provided me for testing.

@n4ze3m
Copy link
Owner

n4ze3m commented Nov 20, 2023

When testing with the "Cohere API," there are no issues when using files with the .pdf extension. However, problems arise when working with files in the .docx format.

I will look into it. I think the issue is with the DOCX loader.

Testing with the "OpenAI API" reveals problems with files that have multiple pages, including the file you provided me for testing.

I will test with the OpenAI API, as I think the issue may be caused by a rate limit. I will look into it

Currently, you cannot delete a data source while it is processing. I will update the error label.

@n4ze3m n4ze3m added bug Something isn't working enhancement New feature or request labels Nov 20, 2023
@n4ze3m n4ze3m mentioned this issue Nov 23, 2023
@n4ze3m
Copy link
Owner

n4ze3m commented Nov 23, 2023

Hello,

I have released a new update which addresses the issue with the docx loader. This update has been tested on a 700+ page docx document on railways using the text-embedding-ada-002 model.

The processing time for the file is approximately 2-3 minutes.

teste docx link: https://docs.google.com/document/d/18-ETRBO4yRpRl3nF68P8vTbunlBgdy_t/edit?usp=sharing&ouid=108531690400573042017&rtpof=true&sd=true

demo.mp4

@yoobaring
Copy link
Author

yoobaring commented Nov 24, 2023

Hello,

I have released a new update which addresses the issue with the docx loader. This update has been tested on a 700+ page docx document on railways using the text-embedding-ada-002 model.

The processing time for the file is approximately 2-3 minutes.

teste docx link: https://docs.google.com/document/d/18-ETRBO4yRpRl3nF68P8vTbunlBgdy_t/edit?usp=sharing&ouid=108531690400573042017&rtpof=true&sd=true

demo.mp4

@n4ze3m
No more words from now on. I've been waiting for 1-2 hours, and the problem remains the same. I feel so frustrated, haha :)

image

@n4ze3m
Copy link
Owner

n4ze3m commented Nov 24, 2023

:| same docs ??

@yoobaring
Copy link
Author

yoobaring commented Nov 24, 2023

:| same docs ??

Yes, I have tried the OPENAI API, Cohere API, Jina API, Llama API, but the problem persists.

image

@n4ze3m
Copy link
Owner

n4ze3m commented Nov 24, 2023

I'm sorry, I don't fully understand what's happening. If you are using Railway, I highly recommend deleting the existing application and creating a new one from the latest template. I have tested it on a new Railway application.

@yoobaring
Copy link
Author

I'm sorry, I don't fully understand what's happening. If you are using Railway, I highly recommend deleting the existing application and creating a new one from the latest template. I have tested it on a new Railway application.

Is it necessary to delete the database on Supabase? I've tried reinstalling the app excluding Supabase and reinstalling it from scratch, but it still doesn't work. Do I need to delete the database to start fresh?

@n4ze3m
Copy link
Owner

n4ze3m commented Nov 24, 2023

No, Make sure your database has enough space. Embedding takes up a lot of space

I just tested the application on the railway, and it works perfectly for me. Here is the uncut version:

brave_Bf0jqbXYDB.mp4

@yoobaring
Copy link
Author

No, Make sure your database has enough space. Embedding takes up a lot of space

I just tested the application on the railway, and it works perfectly for me. Here is the uncut version:

brave_Bf0jqbXYDB.mp4

image

@yoobaring
Copy link
Author

@n4ze3m
Alright, I'm going to try applying with all new accounts this time, and we'll see how it goes.

@yoobaring
Copy link
Author

@n4ze3m
I have retested, and unfortunately, I still encounter the same issue. I am frustrated with the persistent problem that has not been fully resolved. I hope it can be addressed soon. I am unsure of the root cause of this issue and feel genuinely discouraged.

@oleg-schmidt
Copy link

@yoobaring : I've run into the same issue multiple times while testing on railway and similar services. While everything was working fine on my local environment, there was this issue with large files on cloud services. At the end it was a simple issue of scaling. Just ensure that your runtime environment has at least 4 gigs of ram and 4 dedicated CPUs. To fix the issue temporarily, simply go into the database table which contains the latest file references, and remove the one on which your boot is hanging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants