You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The current implementation of the importing notion page, use notion API to fetch page content. The API returns multiple blocks, definition of block can be founded here.
The problem is: each block was treated as a document, then split it into chunks and calculate embeddings for each chunk.
However, a block can be either a heading or a paragraph, regardless of its length. As a result, each chunk tends to be very short due to the small size of the original document. If each individual chunk lacks sufficient information, the performance of RAG will unsurprisingly be poor.
My suggestion is to merge all the blocks into one document and then use the chunk method set by the user to split it into chunks. This way, it is no different from uploading a whole document, with each chunk being long enough to contain sufficient information.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
The current implementation of the importing notion page, use notion API to fetch page content. The API returns multiple blocks, definition of
block
can be founded here.The problem is: each block was treated as a document, then split it into chunks and calculate embeddings for each chunk.
However, a block can be either a heading or a paragraph, regardless of its length. As a result, each chunk tends to be very short due to the small size of the original document. If each individual chunk lacks sufficient information, the performance of RAG will unsurprisingly be poor.
My suggestion is to merge all the blocks into one document and then use the chunk method set by the user to split it into chunks. This way, it is no different from uploading a whole document, with each chunk being long enough to contain sufficient information.
Beta Was this translation helpful? Give feedback.
All reactions