You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to ingest some pdf data using the below ingest.ts code
import { RecursiveCharacterTextSplitter } from 'langchain/text_splitter';
import { OpenAIEmbeddings } from 'langchain/embeddings/openai';
import { PineconeStore } from 'langchain/vectorstores/pinecone';
import { pinecone } from '@/utils/pinecone-client';
import { CustomPDFLoader } from '@/utils/customPDFLoader';
import { PINECONE_INDEX_NAME, PINECONE_NAME_SPACE } from '@/config/pinecone';
import { DirectoryLoader } from 'langchain/document_loaders/fs/directory';
/* Name of directory to retrieve your files from */
const filePath = 'new docs';
export const run = async () => {
try {
/* Load raw docs from all files in the directory */
const directoryLoader = new DirectoryLoader(filePath, {
'.pdf': (path) => new CustomPDFLoader(path),
});
const rawDocs = await directoryLoader.load();
// Extracting the file name using regular expressions and updating metadata
const processedDocs = rawDocs.map(doc => {
const fileName = doc.metadata.source.match(/[^\\\/]+$/)?.[0] || doc.metadata.source;
const modifiedMetadata = { ...doc.metadata, source: fileName };
return { ...doc, metadata: modifiedMetadata };
});
/* Split text into chunks */
const textSplitter = new RecursiveCharacterTextSplitter({
chunkSize: 1000,
chunkOverlap: 200,
});
const docs = await textSplitter.splitDocuments(processedDocs);
console.log('split docs', docs);
console.log('creating vector store...');
/* Create and store the embeddings in the vectorStore */
const embeddings = new OpenAIEmbeddings();
const index = pinecone.Index(PINECONE_INDEX_NAME); // Change to your own index name
// Embed the PDF documents
await PineconeStore.fromDocuments(docs, embeddings, {
pineconeIndex: index,
namespace: PINECONE_NAME_SPACE,
textKey: 'text',
});
} catch (error) {
console.log('error', error);
throw new Error('Failed to ingest your data');
}
};
creating vector store...
error [ErrorWithoutStackTrace: PineconeClient: Error calling upsert: ErrorWithoutStackTrace: PineconeClient: Error calling upsertRaw: FetchError: The request failed and the interceptors did not return an alternative response]
c:\Users\Zara\Desktop\devnewgit\chatbot\scripts\ingest-data.ts:51
throw new Error('Failed to ingest your data');
^
Error: Failed to ingest your data
at run (c:\Users\Zara\Desktop\devnewgit\chatbot\scripts\ingest-data.ts:51:11)
at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
at (c:\Users\Zara\Desktop\devnewgit\chatbot\scripts\ingest-data.ts:56:3)
Node.js v18.16.0
The text was updated successfully, but these errors were encountered:
The error you're encountering during the upsert operation with the PineconeClient suggests a few potential issues, primarily related to the Pinecone configuration, network issues, data issues, or Pinecone service availability. Here's a more focused approach to troubleshoot and resolve the error:
Verify Pinecone Configuration: Ensure that your Pinecone API key, index name, and any other configuration details are correct. Mistakes in these settings are common sources of errors.
Check Network Connectivity: Network issues can prevent successful communication with Pinecone's servers. Ensure your network connection is stable and check for any firewall or VPN that might block the connection.
Review Data Being Upserted: The format and size of the data you're trying to upsert should meet Pinecone's requirements. Oversized data or incorrect formats can cause failures.
Pinecone Service Status: Occasionally, the problem might be on Pinecone's end. Check their status page for any service disruptions or maintenance activities.
If after these steps the issue persists, consider reviewing the specific configurations of your Pinecone client and the data you're attempting to upsert for more clues. Additionally, exploring discussions and issues in the repository related to similar errors might provide further insights or solutions specific to your scenario.
About Dosu
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.
I am trying to ingest some pdf data using the below ingest.ts code
import { RecursiveCharacterTextSplitter } from 'langchain/text_splitter';
import { OpenAIEmbeddings } from 'langchain/embeddings/openai';
import { PineconeStore } from 'langchain/vectorstores/pinecone';
import { pinecone } from '@/utils/pinecone-client';
import { CustomPDFLoader } from '@/utils/customPDFLoader';
import { PINECONE_INDEX_NAME, PINECONE_NAME_SPACE } from '@/config/pinecone';
import { DirectoryLoader } from 'langchain/document_loaders/fs/directory';
/* Name of directory to retrieve your files from */
const filePath = 'new docs';
export const run = async () => {
try {
/* Load raw docs from all files in the directory */
const directoryLoader = new DirectoryLoader(filePath, {
'.pdf': (path) => new CustomPDFLoader(path),
});
} catch (error) {
console.log('error', error);
throw new Error('Failed to ingest your data');
}
};
(async () => {
await run();
console.log('ingestion complete');
})();
I get this error when I run the npm run ingest
creating vector store...
error [ErrorWithoutStackTrace: PineconeClient: Error calling upsert: ErrorWithoutStackTrace: PineconeClient: Error calling upsertRaw: FetchError: The request failed and the interceptors did not return an alternative response]
c:\Users\Zara\Desktop\devnewgit\chatbot\scripts\ingest-data.ts:51
throw new Error('Failed to ingest your data');
^
Error: Failed to ingest your data
at run (c:\Users\Zara\Desktop\devnewgit\chatbot\scripts\ingest-data.ts:51:11)
at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
at (c:\Users\Zara\Desktop\devnewgit\chatbot\scripts\ingest-data.ts:56:3)
Node.js v18.16.0
The text was updated successfully, but these errors were encountered: