Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Adding RAG_FAQ.MD to retrieval-augmented-generation #655

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

sermolin
Copy link

@sermolin sermolin commented May 3, 2024

Description

Thank you for opening a Pull Request!
Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

  • Follow the CONTRIBUTING Guide.
  • You are listed as the author in your notebook or README file.
    • Your account is listed in CODEOWNERS for the file(s).
  • Make your Pull Request title in the https://www.conventionalcommits.org/ specification.
  • Ensure the tests and linter pass (Run nox -s format from the repository root to format).
  • Appropriate docs were updated (if necessary)

Fixes #<issue_number_goes_here> 🦕

@sermolin sermolin requested a review from a team as a code owner May 3, 2024 22:19
Copy link

google-cla bot commented May 3, 2024

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@github-actions github-actions bot added the owlbot:run Add this label to trigger the Owlbot post processor. label May 3, 2024
@gcf-owl-bot gcf-owl-bot bot removed the owlbot:run Add this label to trigger the Owlbot post processor. label May 3, 2024
@github-actions github-actions bot added the owlbot:run Add this label to trigger the Owlbot post processor. label May 6, 2024
@gcf-owl-bot gcf-owl-bot bot removed the owlbot:run Add this label to trigger the Owlbot post processor. label May 6, 2024
Copy link
Collaborator

@holtskinner holtskinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since most of these are related to the intro_multimodal_rag notebook, can you add these as a section at the end of that notebook?

@lavinigam-gcp What do you think?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reasons behind keeping it as a standalone doc:

  • the doc will grow as more questions are brought up. *.MD files don't need to go through as rigorous a pull/merge process as *.ipynb, so the maintenance would be easier.
  • from a customer's standpoint, it might be more useful to refer to one RAG QnA doc rather than look at every notebook in the directory.
    That being said, it is your repo, happy to "disagree and commit". it is more important to get the content in.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand your reasoning. I'm thinking that the questions specific to the one notebook should be in the notebook itself, or maybe an FAQ.md in that specific directory, or just more specific explanations in the main content of the notebook.

The rest of the general RAG FAQ should really be turned into official documentation.

Most questions apply to
https://github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/use-cases/retrieval-augmented-generation/intro_multimodal_rag.ipynb

Q: Why do we need both variables: text_embedding_page (the embedding of the entire page) and text_embedding_chunk (the embedding of each text chunk)?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use code font when referring to variables/methods

gemini/use-cases/retrieval-augmented-generation/RAG_FAQ.MD Outdated Show resolved Hide resolved
Q: Why do we need both variables: text_embedding_page (the embedding of the entire page) and text_embedding_chunk (the embedding of each text chunk)?
A: We use chunk for most cases in the downstream task in the notebook. This is again from a demonstration purposes that you can either take the whole page or divide them further into smaller chunk. A lot depends on 1) LLM token limit, 2) How would you want your data to be structured and searched. Like, Gemin 1.0 has 8k token limit, so we can very easily do a single page (and multiple as well), and same with Gemini 1.5, where you can probably send a whole document at go. One of the rationale for chunk is to make sure, we don't have too much noise while search for something specific. Hope that clarifies?

Q. The accompanying presentation (and https://github.com/GoogleCloudPlatform/generative-ai/blob/main/embeddings/intro-textemb-vectorsearch.ipynb talks about ANN method of finding closest embeddings match, but the code uses cosine similarity… Should we even mention ANN for this notebook?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turn that notebook link into a hyperlink.

A” Yes, In the notebook, we use simple "cosine similarity", but in real world at big scale, you would want something like managed ANN, as given in the presentation.

Q. The cell which calls get_gemini_response() generates different responses if I run it several times, even with temperature=0 and same 'query' and 'context' variable. Is this expected? Can anything be done to make it generate the same response every time?
A. Yeah, that behavior is expected. Temp=0 doesn't solve it. I am not sure if there is any plan in the roadmap to address it. You can however make prompts do that magic by forcing structure like JSON
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make this more formal, instead of "Yeah" put "Yes" also fully spell out the variable name. See comment above about code formatting.

A. The pdf is split just to show that you can read multiple pdf’s with the logic. That’s the only reason.

Q. For text embeddings, we are splitting the docs into smaller "chunks". Are we doing the same for images?
A. No we are not chuncking images. We have two kinds of image embeddings - one uses ‘multimodal-embeddings’ where we send image directly and the API returns embeddings.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
A. No we are not chuncking images. We have two kinds of image embeddings - one uses multimodal-embeddings where we send image directly and the API returns embeddings.
A. No we are not chunking images. We have two kinds of image embeddings - one uses `multimodal-embeddings` where we send the image directly and the API returns embeddings.


Q. For text embeddings, we are splitting the docs into smaller "chunks". Are we doing the same for images?
A. No we are not chuncking images. We have two kinds of image embeddings - one uses ‘multimodal-embeddings’ where we send image directly and the API returns embeddings.
The second method is - we send an image to Gemini - get a description of it as text - and send that text to the ‘text-embeddings’ model and get embeddings back. In the latter case, we send the whole text as is. No chunking.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turn into a bullet point list.

A. No we are not chuncking images. We have two kinds of image embeddings - one uses ‘multimodal-embeddings’ where we send image directly and the API returns embeddings.
The second method is - we send an image to Gemini - get a description of it as text - and send that text to the ‘text-embeddings’ model and get embeddings back. In the latter case, we send the whole text as is. No chunking.

Q: Is it a good practice to do a RAG on the LLM-generated summary of text chunks rather than the raw text? Context: for customer-service or “trouble-ticket” use-cases, the docs in the data corpus aren’t always grammatically correct or well-structured syntactically. Would document pre-processing/summarization prior to generating the embeddings lead to a higher accuracy of a RAG-based solution?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change quote marks to straight quotes " instead of curly quotes “”

Suggested change
Q: Is it a good practice to do a RAG on the LLM-generated summary of text chunks rather than the raw text? Context: for customer-service or trouble-ticket use-cases, the docs in the data corpus aren’t always grammatically correct or well-structured syntactically. Would document pre-processing/summarization prior to generating the embeddings lead to a higher accuracy of a RAG-based solution?
Q: Is it a good practice to do a RAG on the LLM-generated summary of text chunks rather than the raw text? Context: for customer-service or "trouble-ticket" use-cases, the docs in the data corpus aren’t always grammatically correct or well-structured syntactically. Would document pre-processing/summarization prior to generating the embeddings lead to a higher accuracy of a RAG-based solution?

@holtskinner holtskinner changed the title Adding RAG_FAQ.MD to retrieval-augmented-generation feat: Adding RAG_FAQ.MD to retrieval-augmented-generation May 7, 2024
@github-actions github-actions bot added the owlbot:run Add this label to trigger the Owlbot post processor. label May 7, 2024
@gcf-owl-bot gcf-owl-bot bot removed the owlbot:run Add this label to trigger the Owlbot post processor. label May 7, 2024
Co-authored-by: Holt Skinner <13262395+holtskinner@users.noreply.github.com>
@github-actions github-actions bot added the owlbot:run Add this label to trigger the Owlbot post processor. label May 9, 2024
@gcf-owl-bot gcf-owl-bot bot removed the owlbot:run Add this label to trigger the Owlbot post processor. label May 9, 2024
@github-actions github-actions bot added the owlbot:run Add this label to trigger the Owlbot post processor. label May 13, 2024
@gcf-owl-bot gcf-owl-bot bot removed the owlbot:run Add this label to trigger the Owlbot post processor. label May 13, 2024
@github-actions github-actions bot added the owlbot:run Add this label to trigger the Owlbot post processor. label May 15, 2024
@gcf-owl-bot gcf-owl-bot bot removed the owlbot:run Add this label to trigger the Owlbot post processor. label May 15, 2024
@github-actions github-actions bot added the owlbot:run Add this label to trigger the Owlbot post processor. label May 15, 2024
@gcf-owl-bot gcf-owl-bot bot removed the owlbot:run Add this label to trigger the Owlbot post processor. label May 15, 2024
@github-actions github-actions bot added the owlbot:run Add this label to trigger the Owlbot post processor. label May 15, 2024
@gcf-owl-bot gcf-owl-bot bot removed the owlbot:run Add this label to trigger the Owlbot post processor. label May 15, 2024
@holtskinner
Copy link
Collaborator

I added the spell checker, fix the identified errors

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants