Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

experiment with thought process prompt #22

Open
thiswillbeyourgithub opened this issue Jun 14, 2023 · 1 comment
Open

experiment with thought process prompt #22

thiswillbeyourgithub opened this issue Jun 14, 2023 · 1 comment

Comments

@thiswillbeyourgithub
Copy link

Hi,

Just wanted to signal that there are promising ways to summarize documents in a much better way in my opinion.

This is more costly as it's using chains of langchain but I think the added value is tremendous.

To me this is the kind of feature that would make me pay for that service.

Also this can nicely handle comments too with a little adjustment: make a prompt that extracts the new information in the comments, summarized opinions, new facts, etc #5

I did a quick try earlier today and find this very promising. I'm insanely busy atm so I thought you might be interested in the raw code directly. The idea is to ask to summarize not the key facts but the reasonning of the author paragraph by logically indented paragraph in markdown. Here's a quick proof of concept (just add your api key and add a txt file as argument, also notice that for testing I shortenned the input via [:1000]):

# source https://python.langchain.com/en/latest/modules/chains/index_examples/summarize.html

from pathlib import Path
import os
from langchain.chat_models import ChatOpenAI
from langchain import PromptTemplate, LLMChain
from langchain.text_splitter import CharacterTextSplitter
from langchain.chains.mapreduce import MapReduceChain
from langchain.prompts import PromptTemplate
from langchain.docstore.document import Document
from langchain.chains.summarize import load_summarize_chain
from pprint import pprint

assert Path("API_KEY.txt").exists(), "No api key found"
os.environ["OPENAI_API_KEY"] = str(Path("API_KEY.txt").read_text()).strip()

llm = ChatOpenAI(
        model_name="gpt-3.5-turbo",
        temperature=0,
        verbose=True,
        )

text_splitter = CharacterTextSplitter()

def load_doc(path):
    assert Path(path).exists(), f"file not found: '{path}'"
    with open(path) as f:
        content = f.read()[:1000]
    texts = text_splitter.split_text(content)
    if len(texts) > 5:
        ans = input(f"Number of texts splits: '{len(texts)}'. Continue? (y/n)\n>")
        if ans != "y":
            raise SystemExit("Quitting")
    docs = [Document(page_content=t) for t in texts]
    return docs


prompt_template = """Write a very concise summary of the author's reasonning paragraph by paragraph as logically indented markdown bullet points:

'''
{text}
'''

CONCISE SUMMARY AS LOGICALLY INDENTED MARKDOWN BULLET POINTS:"""
PROMPT = PromptTemplate(template=prompt_template, input_variables=["text"])
refine_template = (
    """Your job is to continue a summary of a long text as logically indented markdown bullet points of the author's reasonning.
    We have provided an existing summary up to this point:
    '''
    {existing_answer}
    '''

    You have to continue the summary by adding the bullet points of the following part of the article (only if relevant, stay concise, avoid expliciting what is implied by the previous bullet points):
    '''
    {text}
    '''
    Given this new section of the document, refine the summary as logically indented markdown bullet points. If the new section is not worth it, simply return the original summary."""
)
refine_prompt = PromptTemplate(
    input_variables=["existing_answer", "text"],
    template=refine_template,
)

if __name__ == "__main__":
    import sys
    docs = load_doc(sys.argv[-1])
    chain = load_summarize_chain(llm, chain_type="refine", return_intermediate_steps=True, question_prompt=PROMPT, refine_prompt=refine_prompt)
    out = chain({"input_documents": docs}, return_only_outputs=True)

    t = out["output_text"]
    for bulletpoint in t.split("\n"):
        print(bulletpoint)

    print("Openning console.")
    import code ; code.interact(local=locals())

Thoughts?

@thiswillbeyourgithub
Copy link
Author

Btw, I implemented all that in my own cli project that does RAG as well as summaries: https://github.com/thiswillbeyourgithub/DocToolsLLM/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant