New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

feat: Adding RAG_FAQ.MD to retrieval-augmented-generation #655

Open

sermolin wants to merge 7 commits into GoogleCloudPlatform:main from sermolin:main

sermolin commented May 3, 2024 •

edited

Description

Thank you for opening a Pull Request!
Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

Follow the CONTRIBUTING Guide.
You are listed as the author in your notebook or README file.
- Your account is listed in CODEOWNERS for the file(s).
Make your Pull Request title in the https://www.conventionalcommits.org/ specification.
Ensure the tests and linter pass (Run nox -s format from the repository root to format).
Appropriate docs were updated (if necessary)

Fixes #<issue_number_goes_here> 🦕


          Adding RAG_FAQ.MD to retrieval-augmented-generation

232d42a

sermolin requested a review from a team as a code owner

May 3, 2024 22:19

sermolin requested a review from lavinigam-gcp

May 3, 2024 22:19

google-cla bot commented May 3, 2024

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

github-actions bot added the owlbot:run label

gcf-owl-bot bot removed the owlbot:run label


          Merge branch 'main' into main

19101f5

github-actions bot added the owlbot:run label

gcf-owl-bot bot removed the owlbot:run label

holtskinner requested changes

View reviewed changes

Collaborator

holtskinner left a comment

See comments

gemini/use-cases/retrieval-augmented-generation/RAG_FAQ.MD

Collaborator

holtskinner May 6, 2024

Since most of these are related to the intro_multimodal_rag notebook, can you add these as a section at the end of that notebook?

@lavinigam-gcp What do you think?

Author

sermolin May 9, 2024

Reasons behind keeping it as a standalone doc:

the doc will grow as more questions are brought up. *.MD files don't need to go through as rigorous a pull/merge process as *.ipynb, so the maintenance would be easier.
from a customer's standpoint, it might be more useful to refer to one RAG QnA doc rather than look at every notebook in the directory.
That being said, it is your repo, happy to "disagree and commit". it is more important to get the content in.

Collaborator

holtskinner May 13, 2024

I understand your reasoning. I'm thinking that the questions specific to the one notebook should be in the notebook itself, or maybe an FAQ.md in that specific directory, or just more specific explanations in the main content of the notebook.

The rest of the general RAG FAQ should really be turned into official documentation.

gemini/use-cases/retrieval-augmented-generation/RAG_FAQ.MD

+              Most questions apply to
+              https://github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/use-cases/retrieval-augmented-generation/intro_multimodal_rag.ipynb
+              Q: Why do we need both variables: text_embedding_page (the embedding of the entire page) and text_embedding_chunk (the embedding of each text chunk)?

Collaborator

holtskinner May 6, 2024

Use code font when referring to variables/methods

gemini/use-cases/retrieval-augmented-generation/RAG_FAQ.MD Outdated Show resolved Hide resolved

gemini/use-cases/retrieval-augmented-generation/RAG_FAQ.MD

+              Q: Why do we need both variables: text_embedding_page (the embedding of the entire page) and text_embedding_chunk (the embedding of each text chunk)?
+              A: We use chunk for most cases in the downstream task in the notebook. This is again from a demonstration purposes that you can either take the whole page or divide them further into smaller chunk. A lot depends on 1) LLM token limit, 2) How would you want your data to be structured and searched. Like, Gemin 1.0 has 8k token limit, so we can very easily do a single page (and multiple as well), and same with Gemini 1.5, where you can probably send a whole document at go. One of the rationale for chunk is to make sure, we don't have too much noise while search for something specific. Hope that clarifies?
+              Q. The accompanying presentation (and https://github.com/GoogleCloudPlatform/generative-ai/blob/main/embeddings/intro-textemb-vectorsearch.ipynb talks about ANN method of finding closest embeddings match, but the code uses cosine similarity… Should we even mention ANN for this notebook?

Collaborator

holtskinner May 6, 2024

Turn that notebook link into a hyperlink.

gemini/use-cases/retrieval-augmented-generation/RAG_FAQ.MD

+              A” Yes, In the notebook, we use simple "cosine similarity", but in real world at big scale, you would want something like managed ANN, as given in the presentation.
+              Q. The cell which calls get_gemini_response()  generates different responses if I run it several times, even with temperature=0 and same 'query' and 'context' variable. Is this expected? Can anything be done to make it generate the same response every time?
+              A. Yeah, that behavior is expected. Temp=0 doesn't solve it. I am not sure if there is any plan in the roadmap to address it. You can however make prompts do that magic by forcing structure like JSON

Collaborator

holtskinner May 6, 2024

Make this more formal, instead of "Yeah" put "Yes" also fully spell out the variable name. See comment above about code formatting.

gemini/use-cases/retrieval-augmented-generation/RAG_FAQ.MD

+              A. The pdf is split just to show that you can read multiple pdf’s with the logic. That’s the only reason.
+              Q. For text embeddings, we are splitting the docs into smaller "chunks". Are we doing the same for images?
+              A.  No we are not chuncking images. We have two kinds of image embeddings - one uses ‘multimodal-embeddings’ where we send image directly and the API returns embeddings.

Collaborator

holtskinner May 6, 2024

Suggested change

      
            A.  No we are not chuncking images. We have two kinds of image embeddings - one uses ‘multimodal-embeddings’ where we send image directly and the API returns embeddings.
          
            A.  No we are not chunking images. We have two kinds of image embeddings - one uses `multimodal-embeddings` where we send the image directly and the API returns embeddings.

gemini/use-cases/retrieval-augmented-generation/RAG_FAQ.MD

+              Q. For text embeddings, we are splitting the docs into smaller "chunks". Are we doing the same for images?
+              A.  No we are not chuncking images. We have two kinds of image embeddings - one uses ‘multimodal-embeddings’ where we send image directly and the API returns embeddings.
+              The second method is - we send an image to Gemini - get a description of it as text - and send that text to the ‘text-embeddings’ model and get embeddings back. In the latter case, we send the whole text as is. No chunking.

Collaborator

holtskinner May 6, 2024

Turn into a bullet point list.

gemini/use-cases/retrieval-augmented-generation/RAG_FAQ.MD

+              A.  No we are not chuncking images. We have two kinds of image embeddings - one uses ‘multimodal-embeddings’ where we send image directly and the API returns embeddings.
+              The second method is - we send an image to Gemini - get a description of it as text - and send that text to the ‘text-embeddings’ model and get embeddings back. In the latter case, we send the whole text as is. No chunking.
+              Q: Is it a good practice to do a RAG on the LLM-generated summary of text chunks rather than the raw text? Context: for customer-service or “trouble-ticket” use-cases, the docs in the data corpus aren’t always grammatically correct or well-structured syntactically. Would document pre-processing/summarization prior to generating the embeddings lead to a higher accuracy of a RAG-based solution?

Collaborator

holtskinner May 6, 2024

Change quote marks to straight quotes " instead of curly quotes “”

Suggested change

      
            Q: Is it a good practice to do a RAG on the LLM-generated summary of text chunks rather than the raw text? Context: for customer-service or “trouble-ticket” use-cases, the docs in the data corpus aren’t always grammatically correct or well-structured syntactically. Would document pre-processing/summarization prior to generating the embeddings lead to a higher accuracy of a RAG-based solution?
          
            Q: Is it a good practice to do a RAG on the LLM-generated summary of text chunks rather than the raw text? Context: for customer-service or "trouble-ticket" use-cases, the docs in the data corpus aren’t always grammatically correct or well-structured syntactically. Would document pre-processing/summarization prior to generating the embeddings lead to a higher accuracy of a RAG-based solution?

holtskinner assigned sermolin and lavinigam-gcp

holtskinner changed the title ~~Adding RAG_FAQ.MD to retrieval-augmented-generation~~ feat: Adding RAG_FAQ.MD to retrieval-augmented-generation


          Merge branch 'main' into main

ac764aa

github-actions bot added the owlbot:run label

gcf-owl-bot bot removed the owlbot:run label


          Update gemini/use-cases/retrieval-augmented-generation/RAG_FAQ.MD

fbc6747

Co-authored-by: Holt Skinner <13262395+holtskinner@users.noreply.github.com>

github-actions bot added the owlbot:run label

gcf-owl-bot bot removed the owlbot:run label


          Merge branch 'main' into main

1007a45

github-actions bot added the owlbot:run label

gcf-owl-bot bot removed the owlbot:run label

github-actions bot added the owlbot:run label

gcf-owl-bot bot removed the owlbot:run label

holtskinner force-pushed the main branch from dfd9599 to e8ca3e5 Compare

May 15, 2024 15:29


          Merge branch 'main' of https://github.com/GoogleCloudPlatform/generat…

c56eb6f

…ive-ai into pr/sermolin/655

holtskinner force-pushed the main branch from 74d61c0 to c56eb6f Compare

May 15, 2024 15:37

github-actions bot added the owlbot:run label

gcf-owl-bot bot removed the owlbot:run label


          Merge branch 'main' into main

9afd6e1

github-actions bot added the owlbot:run label

gcf-owl-bot bot removed the owlbot:run label

Collaborator

holtskinner commented May 15, 2024

I added the spell checker, fix the identified errors

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment