Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: update notion extractor #3898

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

badbye
Copy link
Contributor

@badbye badbye commented Apr 26, 2024

Description

see this post: #3883

This PR combine all the blocks of a notion page into a single document. Headings are converted to the markdown style, so that user can use customized splitter to split it into chunks.
For example, \n## can be used to split by h2 (It may not work if there is a code block and comment in the page).

Anyway, from my experience, too many chunks and less content in each chunk result in poorly performance. This PR could make it better.

Type of Change

  • Improvement, including but not limited to code refactoring, performance optimization, and UI/UX improvement

How Has This Been Tested?

A test script was added.

Suggested Checklist:

  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • My changes generate no new warnings
  • I ran dev/reformat(backend) and cd web && npx lint-staged(frontend) to appease the lint gods
  • optional I have added tests that prove my fix is effective or that my feature works
  • optional New and existing unit tests pass locally with my changes

@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. 📚 feat:datasource Data sources like web, Notion, Logseq, Lark, Docs labels Apr 26, 2024
@JohnJyong JohnJyong self-requested a review April 26, 2024 12:38
@JohnJyong
Copy link
Contributor

We will add more different splitter rules for user to choose in our roadmap , becauser no one is better than the other but just when one fits more in certain type of questions.

@badbye
Copy link
Contributor Author

badbye commented Apr 29, 2024

We will add more different splitter rules for user to choose in our roadmap , becauser no one is better than the other but just when one fits more in certain type of questions.

Totally agree. This PR is actually not about the splitter. It is just trying to combine the blocks into a single document, then you can use a customized splitter to split the page.
You mean you will not consider it before more split rules are added?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
📚 feat:datasource Data sources like web, Notion, Logseq, Lark, Docs size:L This PR changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants