Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to ingest your data #466

Open
MuhammadIshaq-AI opened this issue May 6, 2024 · 1 comment
Open

Failed to ingest your data #466

MuhammadIshaq-AI opened this issue May 6, 2024 · 1 comment
Labels
kind: bug Related to a bug, vulnerability, unexpected error with an existing feature

Comments

@MuhammadIshaq-AI
Copy link

I am trying to ingest some pdf data using the below ingest.ts code

import { RecursiveCharacterTextSplitter } from 'langchain/text_splitter';
import { OpenAIEmbeddings } from 'langchain/embeddings/openai';
import { PineconeStore } from 'langchain/vectorstores/pinecone';
import { pinecone } from '@/utils/pinecone-client';
import { CustomPDFLoader } from '@/utils/customPDFLoader';
import { PINECONE_INDEX_NAME, PINECONE_NAME_SPACE } from '@/config/pinecone';
import { DirectoryLoader } from 'langchain/document_loaders/fs/directory';

/* Name of directory to retrieve your files from */
const filePath = 'new docs';

export const run = async () => {
try {
/* Load raw docs from all files in the directory */
const directoryLoader = new DirectoryLoader(filePath, {
'.pdf': (path) => new CustomPDFLoader(path),
});

const rawDocs = await directoryLoader.load();

// Extracting the file name using regular expressions and updating metadata
const processedDocs = rawDocs.map(doc => {
  const fileName = doc.metadata.source.match(/[^\\\/]+$/)?.[0] || doc.metadata.source;
  const modifiedMetadata = { ...doc.metadata, source: fileName };
  return { ...doc, metadata: modifiedMetadata };
});

/* Split text into chunks */
const textSplitter = new RecursiveCharacterTextSplitter({
  chunkSize: 1000,
  chunkOverlap: 200,
});

const docs = await textSplitter.splitDocuments(processedDocs);
console.log('split docs', docs);

console.log('creating vector store...');
/* Create and store the embeddings in the vectorStore */
const embeddings = new OpenAIEmbeddings();
const index = pinecone.Index(PINECONE_INDEX_NAME); // Change to your own index name

// Embed the PDF documents
await PineconeStore.fromDocuments(docs, embeddings, {
  pineconeIndex: index,
  namespace: PINECONE_NAME_SPACE,
  textKey: 'text',
});

} catch (error) {
console.log('error', error);
throw new Error('Failed to ingest your data');
}
};

(async () => {
await run();
console.log('ingestion complete');
})();

I get this error when I run the npm run ingest

creating vector store...
error [ErrorWithoutStackTrace: PineconeClient: Error calling upsert: ErrorWithoutStackTrace: PineconeClient: Error calling upsertRaw: FetchError: The request failed and the interceptors did not return an alternative response]
c:\Users\Zara\Desktop\devnewgit\chatbot\scripts\ingest-data.ts:51
throw new Error('Failed to ingest your data');
^

Error: Failed to ingest your data
at run (c:\Users\Zara\Desktop\devnewgit\chatbot\scripts\ingest-data.ts:51:11)
at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
at (c:\Users\Zara\Desktop\devnewgit\chatbot\scripts\ingest-data.ts:56:3)

Node.js v18.16.0

Copy link

dosubot bot commented May 6, 2024

The error you're encountering during the upsert operation with the PineconeClient suggests a few potential issues, primarily related to the Pinecone configuration, network issues, data issues, or Pinecone service availability. Here's a more focused approach to troubleshoot and resolve the error:

  1. Verify Pinecone Configuration: Ensure that your Pinecone API key, index name, and any other configuration details are correct. Mistakes in these settings are common sources of errors.

  2. Check Network Connectivity: Network issues can prevent successful communication with Pinecone's servers. Ensure your network connection is stable and check for any firewall or VPN that might block the connection.

  3. Review Data Being Upserted: The format and size of the data you're trying to upsert should meet Pinecone's requirements. Oversized data or incorrect formats can cause failures.

  4. Pinecone Service Status: Occasionally, the problem might be on Pinecone's end. Check their status page for any service disruptions or maintenance activities.

If after these steps the issue persists, consider reviewing the specific configurations of your Pinecone client and the data you're attempting to upsert for more clues. Additionally, exploring discussions and issues in the repository related to similar errors might provide further insights or solutions specific to your scenario.

Details

Sources


About Dosu
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

@dosubot dosubot bot added the kind: bug Related to a bug, vulnerability, unexpected error with an existing feature label May 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind: bug Related to a bug, vulnerability, unexpected error with an existing feature
Projects
None yet
Development

No branches or pull requests

1 participant