You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Can you combine pymupdf's pdf4llm.to_markdown() to make the parsed pdf more hierarchical (for example, use ("##", "Header 1") to represent the first-level heading, ("###", "Header 2") represents the second-level heading, ("####", "Header 3") represents the third-level heading, etc.), so that langchain can be used to parse using the MarkdownHeaderTextSplitter() method.
link: https://python.langchain.com/docs/modules/data_connection/document_transformers/markdown_header_metadata/
The text was updated successfully, but these errors were encountered:
Description
Can you combine pymupdf's pdf4llm.to_markdown() to make the parsed pdf more hierarchical (for example, use ("##", "Header 1") to represent the first-level heading, ("###", "Header 2") represents the second-level heading, ("####", "Header 3") represents the third-level heading, etc.), so that langchain can be used to parse using the MarkdownHeaderTextSplitter() method.
link: https://python.langchain.com/docs/modules/data_connection/document_transformers/markdown_header_metadata/
The text was updated successfully, but these errors were encountered: