CrafterGPT

Overview

CrafterGPT is an experiment that utilizes language models to play a procedurally-generated survival game, specifically Crafter, a 2D version of Minecraft.
CrafterGPT consists of two agents:
1. Language model with only "think step-by-step" prompting.
2. Language model fine-tuned with expert human dataset and "think step-by-step" prompting.
Both agents utilized the Llama2-7b model.
Since the Crafter environment, which is implemented as a wrapper of OpenAI Gymnasium, returns observations as a 2D image, the SmartPlay library was used to provide a textual description of the observation.
The experiment shows that fine-tuning language models to human datasets can easily lead to overfitting, compromising its performance.

Prompt-Engineering Only

The agent with "thinking step-by-step" prompting scored a 0.9 reward on average across 10 random seeds.
The agent displayed some level of reasoning, although the limited capabilities of the Llama-2b model often lead to hallucination during the "think step-by-step" process.

Fine-Tuning and Prompt Engineering

The expert human dataset available for the Crafter environment was utilized for fine-tuning the language model.
Since the human dataset only contained numeric observations of the Crafter environment, custom code was implemented to generate a textual representation of each observation.
Afterwards, the Llama-7b model was fine-tuned with the human dataset using supervised fine-tuning.
Due to the small size of the model, the supervised fine-tuning easily leads to overfitting, where the agent would output the same action regardless of the observation.
Thus, it was demonstrated that fine-tuning smaller language models for specific agentic tasks is not feasible.

Reward Log

Prompt-Engineering Only:

Rewards (Random Seeds)
3.1
1.1
0.1
0.1
1.1
0.1
0.1
0.1
2.1
1.1
0.9 (avg)

Fine-Tuned:

Rewards (Random Seeds)
-0.9
-0.9
-0.9
-0.9
0.1
-0.9
-0.9
-0.9
-0.9
-0.9
-0.8 (avg)

File Structure

CrafterGPT_SFT_Data_Engineering.ipynb: Colab notebook for generating textual training dataset from Crafter expert human dataset.
CrafterGPT_SFT_Fine_Tuning.ipynb: Colab notebook for fine-tuning Llama-7b model on training dataset.
CrafterGPT_Step_By_Step_Prompt_Engineering.ipynb: Colab notebook for running prompt-engineering only agent on Crafter environment.
CrafterGPT_Step_By_Step_Prompt_Engineering_With_Fine_Tuned_Model.ipynb: Colab notebook for running fine-tuned agent on Crafter environment.

HuggingFace Repositories

techandy42/llama-2-7b-craftergpt-v1.1: Fine-tuned Llama-7b model.
techandy42/CrafterGPT-Training-Dataset: Textual training dataset.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
fine_tuned_trajectories		fine_tuned_trajectories
media		media
prompt_only_trajectories		prompt_only_trajectories
results		results
CrafterGPT_SFT_Data_Engineering.ipynb		CrafterGPT_SFT_Data_Engineering.ipynb
CrafterGPT_SFT_Fine_Tuning.ipynb		CrafterGPT_SFT_Fine_Tuning.ipynb
CrafterGPT_Step_By_Step_Prompt_Engineering.ipynb		CrafterGPT_Step_By_Step_Prompt_Engineering.ipynb
CrafterGPT_Step_By_Step_Prompt_Engineering_With_Fine_Tuned_Model.ipynb		CrafterGPT_Step_By_Step_Prompt_Engineering_With_Fine_Tuned_Model.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fine_tuned_trajectories

fine_tuned_trajectories

media

media

prompt_only_trajectories

prompt_only_trajectories

results

results

CrafterGPT_SFT_Data_Engineering.ipynb

CrafterGPT_SFT_Data_Engineering.ipynb

CrafterGPT_SFT_Fine_Tuning.ipynb

CrafterGPT_SFT_Fine_Tuning.ipynb

CrafterGPT_Step_By_Step_Prompt_Engineering.ipynb

CrafterGPT_Step_By_Step_Prompt_Engineering.ipynb

CrafterGPT_Step_By_Step_Prompt_Engineering_With_Fine_Tuned_Model.ipynb

CrafterGPT_Step_By_Step_Prompt_Engineering_With_Fine_Tuned_Model.ipynb

README.md

README.md

Repository files navigation

CrafterGPT

About

Releases

Languages

techandy42/CrafterGPT

Folders and files

Latest commit

History

Repository files navigation

CrafterGPT

About

Topics

Resources

Stars

Watchers

Forks

Languages