Replies: 6 comments
-
🤖 Hello, Thank you for your detailed question. Based on the information provided, it seems like you have a good understanding of how the HierarchicalNodeParser works. The HierarchicalNodeParser is designed to split a document into a recursive hierarchy of nodes. The When the retriever is called, it retrieves all the leaf nodes associated with a given second level node. The response synthesizer then acts across all these leaf nodes, compacting or refining them based on the If you want to change this behavior, you can modify the I hope this answers your question. If you have any further questions or need clarification, please don't hesitate to ask. SourcesThis response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot. |
Beta Was this translation helpful? Give feedback.
-
@mphipps2 I'm not 100% sure I understand the issue. The ref_doc_id is set to the parent, so that the Then, these nodes are given to the response synthesizer. The default response synthesis mode is If you want to make one LLM call per node, you can set |
Beta Was this translation helpful? Give feedback.
-
@logan-markewich Hmm ok I see what's going on. When I use the HierarchicalNodeParser with the VectorStoreIndex and AutoMergingRetriever as done in the documentation I do in fact get the expected behavior. My problem comes in trying to combine the DocumentSummaryIndex with the HierarchicalNodeParser/AutoMergingRetriever. The idea here was to use an LLM to generate summaries and questions from my leaf nodes, embed the summaries, return the top k summaries using DocumentSummaryIndexEmbeddingRetriever, and pass that to the AutoMergingRetriever, which passes its final merged nodes to the query engine. This seems to work well except for the fact that my summaries are being done on the second level nodes, rather than the leaf nodes. This is the issue I was describing above where creating my leaf nodes through the HierarchicalNodeParser causes the ref_doc_id to be set as the parent node, which causes the DocumentSummaryIndex to perform summaries on those parent second level nodes, rather than the leaf nodes. I can hack together a solution to this, but is there anything I'm missing here that would let DocumentSummaryIndex and HierarchicalNodeParser/AutoMergingRetriever play together better? |
Beta Was this translation helpful? Give feedback.
-
@mphipps2 can you give some code to replicate the issue? From node parsing to retrieval? I can try reproducing and seeing what's up |
Beta Was this translation helpful? Give feedback.
-
Sure and thanks for looking into this ...
|
Beta Was this translation helpful? Give feedback.
-
Just to add to my last post ... I guess the issue is that ref_doc_id serves different purposes in DocumentSummaryIndex and AutoMergingRetriever. In DocumentSummaryIndex, it's used by the ResponseSynthesizer to deal with the context window constraint, while in AutoMergingRetriever, it's used to link/merge child and parent nodes. This causes tension then when the two are chained together. |
Beta Was this translation helpful? Give feedback.
-
Question Validation
Question
I'm using the HierarchicalNodeParser as shown in this documentation: https://gpt-index.readthedocs.io/en/stable/examples/retrievers/auto_merging_retriever.html
But I noticed that each leaf node ref_doc_id is being set to the Node ID of its parent node (the second level node). So then when the retriever is called the response synthesizer acts across all the leaf chunks for a given second level node. So for example, if a given second level node has 4 children, those children would be compacted or refined based off the response_mode being used for the response_synthesizer, meaning we end up with one response back from our llm instead of 4. As I understand, this effectively turns the second level nodes into the new leaves. Was this the intended effect?
To prevent this from happening I have to hack together something like this:
node_parser = HierarchicalNodeParser.from_defaults(chunk_sizes=[2048,512,512])
Beta Was this translation helpful? Give feedback.
All reactions