Skip to content

An experiment in recursive prompt engineering using GPT and SQuAD

Notifications You must be signed in to change notification settings

chadallen/prompt-eng

Repository files navigation

prompt-eng

This is an experiment in recursive prompt engineering. There has been a lot of discussion recently about the exponentially increasing value of "prompt engineering" as a skill[1]. I wanted to know: is GPT any good at prompt engineering? If we give it some facts and a target answer, can it generate a prompt that, when posed back to itself, elicits the correct response? Once we have this feedback loop established we should be able to tune the model to be better at generating good prompts that lead to correct answers.

My fuzzy intuitions here are that this kind of training is important for any multi-layered systems where we expect LLMs to direct other LLMs. In those systems, the LLMs doing the directing need to be good prompt engineers.

My approach here uses the SQuAD database[2], which is a giant set of: passages from Wikipedia, reading comprehension questions for those passages (generated by humans) and answers to those questions (curated by humans). I give GPT the passages and the answers and tell it to come up with questions that it predicts will elicit the correct answer. Then I again give it the passages along with its newly generated questions and ask it to answer those questions. I then compare this answer to the ground truth answer.[3] I'm basically asking: is it as good at writing questions (prompts) as the humans who wrote the original questions (prompts)?

To use:

  1. Run initialize_dataset.py to download the SQUaD dataset and load it into a PostgreSQL database
  2. Run main.py to iterate over the dataset and populate the database with new prompts and new answers for each question. There is a LIMIT set to 5 in the SQL query. Change it to a higher number if you want, but don't forget you're paying for tokens. Maybe I'll make all of this more easily configurable at some point.

Footnotes:

[1] https://www.forbes.com/sites/craigsmith/2023/04/05/mom-dad-i-want-to-be-a-prompt-engineer/?sh=8b843559c8ef

[2] https://rajpurkar.github.io/SQuAD-explorer/

[3] Well, not really, because that would require me to do some kind of fuzzy matching to check that the answers have equivalent truthiness even if they are not a string match. But what I can say is that just by eyeballing the answers it tends to get this right.

About

An experiment in recursive prompt engineering using GPT and SQuAD

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published