Creating your first QA Pipeline Tutorial and Use of Embedders #7558

greghobby · 2024-04-18T11:04:30Z

greghobby
Apr 18, 2024

I have used < v2 quite a bit in professional projects and am now just starting to get into v2.0.

In the very first tutorial, 2 separate embeddings objects are created: one for the documents as they are indexed (doc_embedder) and one for the query (text_embedder). In the tutorial, these two objects are created with the same model . There is a comment that says you need to use the same model for each variable. This makes 100% sense to me, you need to have the same embeddings to compare against when doing the retrieving.

So why then are 2 objects created? I would intuitively expect to have only one object and to reuse it for both tasks to cut down on the possibility of making a needless mistake by having mismatched embeddings.

julian-risch · 2024-04-22T12:02:05Z

julian-risch
Apr 22, 2024
Maintainer

Hey @greghobby yes, we thought about this a lot and discussed it in the team when developing Haystack 2.0. In a nutshell, the reason is that Haystack 1 used the retriever in indexing pipelines or whenever embeddings were created for documents: document_store.update_embeddings(retriever=dense_retriever) That is counter-intuitive though because there is no retrieval done at this point. Why would we need a retriever?
Another point is that there scenarios like dense passage retrieval (dpr) where two different models are used for embedding the query and the documents (deepset/gbert-base-germandpr-ctx_encoder and deepset/gbert-base-germandpr-question_encoder for example).
Haystack 2.0 follows the paradigm of having components do only one thing, which helps with reusability of components and keeps the implementation of each component simpler. A component can only be used in one pipeline. That's why we decided to have a separate DocumentEmbedder, which is often used in indexing pipelines and a TextEmbedder component, which is often used in query pipelines.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Creating your first QA Pipeline Tutorial and Use of Embedders #7558

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

Select a reply

Creating your first QA Pipeline Tutorial and Use of Embedders #7558

greghobby Apr 18, 2024

Replies: 1 comment

julian-risch Apr 22, 2024 Maintainer

greghobby
Apr 18, 2024

julian-risch
Apr 22, 2024
Maintainer