LoTR: Low Tensor Rank Adaptation of Large Language Models

Low Tensor Rank adaptation of large language models

Overview

This repository is the original implementation of LoTR (arXiv:2402.01376), a novel approach for parameter-efficient fine-tuning of LLMs which represents a gradient update to parameters in a form of tensor decomposition. Low-rank adapter for each layer is constructed as a product of three matrices, and tensor structure arises from sharing left and right multipliers of this product among layers. Simultaneous compression of a sequence of layers with low-rank tensor representation allows LoTR to archive even better parameter efficiency then LoRA especially for deep models. Moreover, the core tensor does not depend on original weight dimension and can be made arbitrary small, which allows for extremely cheap and fast downstream fine-tuning.

@misc{bershatsky2024lotr,
  title         = {{LoTR}: Low Tensor Rank Weight Adaptation},
  author        = {Daniel Bershatsky and Daria Cherniuk and Talgat Daulbaev and Aleksandr Mikhalev and Ivan Oseledets},
  year          = {2024},
  eprint        = {2402.01376},
  archivePrefix = {arXiv},
  primaryClass  = {cs.CL}
}

Experiments

Logging Files

We assume that all raw experiment results (i.e. logging files, first of all) are located in log directory. This directory's high-level structure should reflect experimental setup. So the path relative to this directory should have structure as follows.

<dataset>/<model>/<method>/<param1>/<param2>/.../<seed>/<tfevents-file>

The model segment preceeds the method path segment since number of differnt models usually are smaller that number of methods and training pipeline usually parameterized by model and then by method. All floating point parameters should be used in scientific notation to ensure that no significant digits are lost. The lat directory is random seed used to run an experiment.

Note that the requirements above are involuntary since there is no full-featured machine learning experiment management software.

Convertion to Arrow Parquet

TensorBoard tfvents-file are quite large files which take noticably long time to read and load. So we convert tfevents-files to parquet-files with the following command.

python -m lotr.tb2parquet log/glue data/glue.parquet \
    --names model method task lr rank seed \

Now, one can read a single parquet-file with all time series as follows.

import pandas as pd
df = pd.read_parquet('data/glue.parquet')

To be more specific, 20Mb of tfevents-file are converted to 200Kb of parquet-file.

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
benchmark		benchmark
data		data
lotr		lotr
sandbox		sandbox
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitlab-ci.yml		.gitlab-ci.yml
.mailmap		.mailmap
CITATION.cff		CITATION.cff
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

benchmark

benchmark

data

data

lotr

lotr

sandbox

sandbox

.gitattributes

.gitattributes

.gitignore

.gitignore

.gitlab-ci.yml

.gitlab-ci.yml

.mailmap

.mailmap

CITATION.cff

CITATION.cff

Dockerfile

Dockerfile

LICENSE

LICENSE

README.md

README.md

pyproject.toml

pyproject.toml

Repository files navigation

LoTR: Low Tensor Rank Adaptation of Large Language Models

Overview

Experiments

Logging Files

Convertion to Arrow Parquet

About

Releases 1

Languages

License

daskol/lotr

Folders and files

Latest commit

History

Repository files navigation

LoTR: Low Tensor Rank Adaptation of Large Language Models

Overview

Experiments

Logging Files

Convertion to Arrow Parquet

About

Topics

Resources

License

Stars

Watchers

Forks

Languages