prompt-eng

This is an experiment in recursive prompt engineering. There has been a lot of discussion recently about the exponentially increasing value of "prompt engineering" as a skill[1]. I wanted to know: is GPT any good at prompt engineering? If we give it some facts and a target answer, can it generate a prompt that, when posed back to itself, elicits the correct response? Once we have this feedback loop established we should be able to tune the model to be better at generating good prompts that lead to correct answers.

My fuzzy intuitions here are that this kind of training is important for any multi-layered systems where we expect LLMs to direct other LLMs. In those systems, the LLMs doing the directing need to be good prompt engineers.

My approach here uses the SQuAD database[2], which is a giant set of: passages from Wikipedia, reading comprehension questions for those passages (generated by humans) and answers to those questions (curated by humans). I give GPT the passages and the answers and tell it to come up with questions that it predicts will elicit the correct answer. Then I again give it the passages along with its newly generated questions and ask it to answer those questions. I then compare this answer to the ground truth answer.[3] I'm basically asking: is it as good at writing questions (prompts) as the humans who wrote the original questions (prompts)?

To use:

Run initialize_dataset.py to download the SQUaD dataset and load it into a PostgreSQL database
Run main.py to iterate over the dataset and populate the database with new prompts and new answers for each question. There is a LIMIT set to 5 in the SQL query. Change it to a higher number if you want, but don't forget you're paying for tokens. Maybe I'll make all of this more easily configurable at some point.

Footnotes:

[1] https://www.forbes.com/sites/craigsmith/2023/04/05/mom-dad-i-want-to-be-a-prompt-engineer/?sh=8b843559c8ef

[2] https://rajpurkar.github.io/SQuAD-explorer/

[3] Well, not really, because that would require me to do some kind of fuzzy matching to check that the answers have equivalent truthiness even if they are not a string match. But what I can say is that just by eyeballing the answers it tends to get this right.

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
venv		venv
.breakpoints		.breakpoints
.replit		.replit
README.md		README.md
initialize_dataset.py		initialize_dataset.py
main.py		main.py
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
replit.nix		replit.nix
sample_output.md		sample_output.md
squad.txt		squad.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

venv

venv

.breakpoints

.breakpoints

.replit

.replit

README.md

README.md

initialize_dataset.py

initialize_dataset.py

main.py

main.py

poetry.lock

poetry.lock

pyproject.toml

pyproject.toml

replit.nix

replit.nix

sample_output.md

sample_output.md

squad.txt

squad.txt

Repository files navigation

prompt-eng

About

Releases

Packages

Languages

chadallen/prompt-eng

Folders and files

Latest commit

History

Repository files navigation

prompt-eng

About

Topics

Resources

Stars

Watchers

Forks

Languages