Skip to content

Commit

Permalink
[BUG] Introduce parser supplier support in FileSystemDocumentLoader#l…
Browse files Browse the repository at this point in the history
…oadDocuments langchain4j#1026
  • Loading branch information
KaisNeffati committed Apr 28, 2024
1 parent 608f55b commit 154782e
Show file tree
Hide file tree
Showing 3 changed files with 278 additions and 14 deletions.
10 changes: 9 additions & 1 deletion docs/docs/tutorials/7-rag.md
Expand Up @@ -181,12 +181,20 @@ List<Document> documents = FileSystemDocumentLoader.loadDocuments("/home/langcha
List<Document> documents = FileSystemDocumentLoader.loadDocumentsRecursively("/home/langchain4j", new TextDocumentParser());
```

The `FileSystemDocumentLoader` supports stateful parsers (ex: `ApacheTikaDocumentParser `), facilitating parser reusability across multiple documents. Users can specify a Supplier for on-demand parser instantiation, ensuring that each document is processed with a fresh instance of the parser.

Here is an example:
```java
// Load a single document
Document document = FileSystemDocumentLoader.loadDocument("/home/langchain4j/file.txt", ApacheTikaDocumentParser::new);
```


You can also load documents without explicitly specifying a `DocumentParser`.
In this case, a default `DocumentParser` will be used.
The default one is loaded through SPI (e.g. from `langchain4j-document-parser-apache-tika` or `langchain4j-easy-rag`).
If no `DocumentParser`s are found through SPI, a `TextDocumentParser` is used as a fallback.


### Document Transformer
`DocumentTransformer` implementations can perform a variety of tasks such as transforming documents,
cleaning them, filtering, enriching, etc.
Expand Down

0 comments on commit 154782e

Please sign in to comment.