New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I get this error when I open my local server: Cannot read properties of undefined (reading 'text') #461
Comments
🤖 Hey @Darrenf040, great to see you back! Hope you're doing well. The error "Cannot read properties of undefined (reading 'text')" typically occurs when the code is trying to access a property of an object that hasn't been defined. In the context of the gpt4-pdf-chatbot-langchain, this could be due to several reasons:
If you have checked all these points and the error still persists, you might want to Regarding the structure of the object that the 'text' property is being read from, it's being read from the 'parsed' object. The 'parsed' object is the result of the 'pdf' function being called with 'raw' as an argument. The 'pdf' function is imported from the 'pdf-parse/lib/pdf-parse.js' module. The 'text' property contains the content of the PDF document and 'numpages' property contains the number of pages in the PDF document. Here is the relevant code: const parsed = await pdf(raw);
return [
new Document({
pageContent: parsed.text,
metadata: {
...metadata,
pdf_numpages: parsed.numpages,
},
}),
]; You can find this in the utils/customPDFLoader.ts file. I also found some similar issues in the repository that might help you:
Please check these issues and see if the solutions provided there can help you resolve your problem. Sources
|
how to check pinecode and langchain version |
and my model name is: 'gpt-3.5-turbo' since i dont have gpt4 |
hi @Darrenf040 i have the same error, look at the end of the error message
for me it says to look in ingest-data.ts ligne 52 . this can help you to find what data is not found.
|
I believe there are now type errors when ingesting using newest Pinecone types. I think they want you to now convert the embeddings into vectors and upsert in the new way? But again, this has conflicts with the "makechain" script. Here's the new upsert Pinecone wants you to use: "https://docs.pinecone.io/docs/upsert-data" LMK if you made any progress on cleaning up the types. I'm also incredibly stuck! |
Make sure you are using the podbased pinecode index. The serverless index doesn't work |
I'm stuck on this as well. I have a forked repo with an extended feature set at https://github.com/anandaworldwide/ananda-library-chatbot and it is failing with the error "Cannot read properties of undefined (reading 'text')." I tried upgrading to langchain 0.1.30 but that didn't help and caused other issues from breaking changes. What is the "text" textKey parameter here? ChatGPT suggested changing it to pageContent, which is a field in my document data, but on smaller datasets it is finding the content using "text". I haven't located API docs to explain it.
|
I've given up on serverless Pinecone with this project because there's unresolved type errors between Pinecone serverless docs and Langchain.JS that I cannot figure out. From what I've read about this project, pod-based storage is the way to go since serverless Pinecone is still experimental. |
I did successfully upsert pdfs in serverless using typescript but when I tried to search it threw errors with the makeChain function for me. |
Yes I'm using the free pod-based. Thanks for textKey info!
…On Thu, Mar 28, 2024 at 6:45 PM richard-aoede ***@***.***> wrote:
textKey is the actual text being stored as metadata within Pinecone, I
believe.
I've given up on serverless Pinecone with this project because there's
unresolved type errors between Pinecone serverless docs and Langchain.JS
that I cannot figure out.
From what I've read about this project, pod-based storage is the way to go
since serverless Pinecone is still experimental.
—
Reply to this email directly, view it on GitHub
<#461 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABDDXTMMCUM3VUEOGRGJJDTY2TBTXAVCNFSM6AAAAABD3G52LGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMRWGQ2TIOBVHE>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
I just figured out how to duplicate the error and how to fix it. I noticed that sometimes when I get the error, I also get a JavaScript heap out of memory error as a secondary exception. In the test I just did, however, I only got the primary error. But expanding Javascript memory allocation solves it! Change to line in package.json that fixes it for me: Failure: And here's a failure that includes the JavaScript heap out of memory error:
|
I found the above sometimes still failed. So I upgraded langchain to 0.1.30 (and had to adapt the code a bit). But that didn't do it, so I upgraded @pinecone-database/pinecone to 1.1.3, and now it seems to work. It was never a problem when I processed only 4000 PDF files. The problem only came up when I processed my full set of 6000. So I'm guessing there was a memory leak in pinecone that got resolved in a later version. (Tho I'm still verifying things... vector count is 1/2 of before so perhaps I'm not processing as much as I think.) |
@mowliv That's awesome that you got a whopping 6000 pdfs! I've got 52 pdfs with ranging from 50 to 300 pages. I'm getting really low scores when trying to search. I saw on the examples they get up to .80 in score. I'm only getting less than 0.10 on my best scoring contexts. Please let me know if you have any more updates! I'll try to share mine on this thread or we can continue through DM's.
|
Oh and here's my snippet but it's pretty messy lol. Note that I borrowed someone else's idea of having multiple pdfs from a flat directory (ex: "farm_animals/chickens.pdf", "farm_animals/cows.pdf", "home_animals/dogs.pdf", "home_animals/cats.pdf"). USING SERVERLESS pinecone.
|
I set up everything like the documentation said and when I run 'npm run dev' to run my local server, the ui give me this error when I try to chat with the bot: Cannot read properties of undefined (reading 'text').
The text was updated successfully, but these errors were encountered: