Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set source name (via Metadata) from data when using document loaders #2361

Open
jonhilt opened this issue May 8, 2024 · 1 comment
Open
Labels
question Further information is requested

Comments

@jonhilt
Copy link

jonhilt commented May 8, 2024

Is there a way to override the source when using the new document store feature (and/or document loaders in general).

Take, for example, the JSON lines loader.

It would be great if there was a way to use a field from the JSON data to set the source.

I tried this…

image

But it just comes through as a hardcoded string…

image

If this isn't possible, I wonder what the best alternative is.

In this specific use case I'm basically trying to get a load of HTML pages, scraped from a site which requires authentication, uploaded as documents with the source set to their URL.

I figured I could save the HTML to a JSON file and upload it that way, but would need to set the source.

I believe I can't use Cheerio etc. because of the need to log in to the web site before scraping it (it's my own site).

@HenryHengZJ
Copy link
Contributor

you can try creating a new jsonl file with just the source content in it

@HenryHengZJ HenryHengZJ added the question Further information is requested label May 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants