Semantic Memorization

Additional information and workstream can be found in the (Notion project).

Motivation

Memorization refers to language models' tendency to sometimes output entire training sequences verbatim. This phenomenon is not deeply understood but has implications for safely deploying language models. In particular, it is vital to minimize a model’s memorization of sensitive datapoints such as those containing personally identifiable information (PII) and trade secrets.

This project aims to challenge this traditional definition of memorization. We believe that it captures the spirit of the problem but that it is too broad. For example, the k-elicitable definition ( Carlini et al., 2022) treats highly repetitive text, code, and sequences with only a single true continuation as memorized and thus undesirable. We conjecture that traditional memorization definitions incorrectly capture too many of these benign memorizations and don't accurately reflect undesirable memorization.

Archetypal examples of sequences from The Pile “memorized” by GPT-2, even though GPT-2 was not trained on The Pile. This implies that either there is training set overlap, or that there are sequences that most competent language models could predict without needing to see the sequence during training. Carlini et al., 2022)

Potential Research/Paper Contributions

We want to develop a robust taxonomy of types of memorization as well as the ability to analyze memorization across these categories. This may involve developing some metric for how likely a sequence is to be memorized, mapping a model's activations to memorization type, or another approach.
A definition of memorization that better captures adverse/harmful memorizations while minimizing the inclusion of spurious/benign memorizations is an essential step in measuring this problem and taking action toward mitigating it.
Can we assign a probability to whether a particular sequence will be memorized or not? This coupled with a taxonomy may help us begin to understand why LLMs memorize some data and not others.
Can we develop a classifier than can filter out benign memorizations? This will allow us to measure harmful memorizations more closely.

Datasets

We’re currently analyzing the data memorized by the Pythia models as a part of the Emergent and Predictable Memorization in Large Language Models EleutherAI paper. Reading that paper this give a better understanding of where the data came from and what it means. The datasets can be found on Hugging Face. EleutherAI/pythia-memorized-evals · Datasets at Hugging Face

Background

Having a basic grasp of the existing literature and problem area will be helpful for contributing to this project. You don’t need a super deep understand and there are opportunities for contributing across different levels of experience. Please add any more papers/articles that you think are relevant as well as leave comments on existing articles.

Development Setup

Setup your Python (3.11.4) environment via Conda
Run apt-get install -y openjdk-11-jdk to install JDK for PySpark if you're on Ubuntu, otherwise feel free to use the appropriate package manager
Install Python packages via pip install -r requirements.txt

Running metric pipeline

Run python calculate_metrics.py
To monitor the status of Spark jobs, go to http://localhost:4040/jobs/; Don't forget to port-forward 4040 if necessary

Name		Name	Last commit message	Last commit date
Latest commit History 207 Commits
.github/workflows		.github/workflows
datasets		datasets
filters		filters
plotting		plotting
spark		spark
working_dirs		working_dirs
.gitignore		.gitignore
README.md		README.md
calculate_metrics.py		calculate_metrics.py
inference.py		inference.py
metrics-lines.png		metrics-lines.png
model_parameters.py		model_parameters.py
model_training.py		model_training.py
notepad		notepad
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
utils.py		utils.py

EleutherAI/semantic-memorization

Folders and files

Latest commit

History

Repository files navigation

Semantic Memorization

Motivation

Potential Research/Paper Contributions

Datasets

Background

Development Setup

Running metric pipeline

About

Resources

Stars

Watchers

Forks

Languages