Interoperability and GPT-NeoX #1058

StellaAthena · 2023-10-12T14:31:59Z

With the increasing interest in using this library to train models originally trained by others (#896 #994 #1051 #1057) I think it's worth thinking more carefully about how we organize and present this info. Right now we have an "exporting to HF" section of our readme and a bunch of scripts in the /tools/ckpts/ directory that are not mentioned in the main README.

It appears that we have a "generic" neox_to_hf and hf_to_neox script (quotes because it will break if the model format differs too much from the original NeoX model architecture) and then model-specific conversion scripts that offer more robust coverage for specific architectures.

In the long run, I don't know if it is feasible to add a new script for every new HF class. Due to HF's disinterest in enforcing consistency across classes, we would likely need to add a new script for each new model class (which often means "for each new model").

I haven't actually done the work of porting a model over so I'm quite curious about the amount of work it is in the minds of people who have (@haileyschoelkopf @Quentin-Anthony @zphang are people I know have done this). Does it make more sense to offer a tutorial on how to do model adaptation, to help people write their own scripts? Or do we want to continue to add new adaptation scripts every time a new model comes out?

A related question is if we want to make HF the first-class entry-point to this library or not. Typically people release weights both in their training library format and in the HF ported format. Right now, we support a mix of original weights and HF weights, depending on the model. In general I think it makes sense to prioritize HF weights, as that's how future finetunes of models are likely to be released.

This is not the only question about how we want to handle interoperability. I believe @LouisCastricato has finished or has nearly finished adapting trlX to work with this library, and we have an adapter script for the eval harnsss (which probably needs to be updated with V2 coming out, @lintangsutawika @haileyschoelkopf). We can also think about if we want to support interoperability with ggml or other libraries directly.

I see interoperability as an important feature in continuing to be relevant in the increasingly diverse and interconnected world. Making a great tool will get you committed users, but making an accessible tool is how you build a large user base. While this library started out as focused on enabling EleutherAI's research, I think that with our strong support for various launchers and hardware we are very well positioned to be a go-to library for many people if we want to be. A dedicated push in that direction would require more user support, and possibly hiring engineers specifically to maintain the library.

Anyways, what do people think about all of this?

The text was updated successfully, but these errors were encountered:

lintangsutawika · 2023-10-12T14:40:27Z

I've only done rudimentary efforts to make a generalizable neox-hf conversion script for the sparsegpt project to allow evaluations on sparsified models with a wider choice of activation functions. I think more serious efforts would require porting parts of the transformer libraries to an easier to maintain blocks of modules (which will probably live in the neox repo).

On LM-Eval, yes will need to update it to work with big-refactor.

Drzhivago264 · 2023-11-11T00:36:58Z

Hello,
Thank you for your effort.
I would like to ask.
In case it is infeasible to port all hf models to the next format, is it much easier to publish the weight of the existing models in the next format?
For example, I am trying to find the correct weight of the gpt-6j and Pythia models, but I don't know where to download the correct version of them.

In addition, there are multiple config files in the neox repo. I wonder whether the 6.7B configuration matches with gpt-j? It must be great to have a site that shares the weight corresponding to each config sample for fine-tuning.

Also, my machine is not exactly modern. I have to play around with zero stage 2. I believe there is no problem in mixing multiple optimization stages.

In addition, is there any way for me to see the estimated time to finish training with the config? I feel the info log is a little bit confusing.

Thank you for your support.
*Edit: I have been able to convert Pythia checkpoint to Neox

StellaAthena added documentation Improvements or additions to documentation feature request New feature or request question and removed feature request New feature or request labels Oct 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Interoperability and GPT-NeoX #1058

Interoperability and GPT-NeoX #1058

StellaAthena commented Oct 12, 2023

lintangsutawika commented Oct 12, 2023

Drzhivago264 commented Nov 11, 2023 •

edited

Interoperability and GPT-NeoX #1058

Interoperability and GPT-NeoX #1058

Comments

StellaAthena commented Oct 12, 2023

lintangsutawika commented Oct 12, 2023

Drzhivago264 commented Nov 11, 2023 • edited

Drzhivago264 commented Nov 11, 2023 •

edited