You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The chunker in CodeParser is not structure-aware. We should use something based on treesitter that produces an AST of code structure (even markdown). Especially markdown is a useful case because it can be an intermediate stage in chunking pdf docs (i.e. pdf -> markdown with headers -> structure-aware chunks)
The text was updated successfully, but these errors were encountered:
Structure-aware chunking in general is good to have. E.g. in a markdown doc, it's good to avoid having a logically coherent section broken up, as long as chunk size limits and overlap params are respected.
The chunker in
CodeParser
is not structure-aware. We should use something based ontreesitter
that produces an AST of code structure (even markdown). Especially markdown is a useful case because it can be an intermediate stage in chunking pdf docs (i.e. pdf -> markdown with headers -> structure-aware chunks)The text was updated successfully, but these errors were encountered: