Practical scope requirements #1

StoneCypher · 2023-04-24T03:44:16Z

How much code in micro-language Foo do you actually need to train one of these?

The text was updated successfully, but these errors were encountered:

minosvasilias · 2023-04-24T13:42:41Z

The dataset used for the provided weights was 60k rows. Each scraped script is split into individual functions as an easy and reliable way to split code into chunks, so one function = one entry.

In practice, this resulted in 762 repositories being parsed for the training data, see godot_dodo_4x_60k_repos.json

StoneCypher · 2023-04-24T18:56:45Z

if you were asked to stick your thumb in the air and guess, what would you expect a lower bound for practical success to be?

my language is nowhere near that common

minosvasilias · 2023-04-24T22:26:21Z

I would say the lower bound of dataset sizes i've seen for LLaMA finetunes in general (not code-specific) sits around 15-20k rows.

I personally trained a 20k rows 7b model initially to judge whether or not this project was worth pursuing, but don't have any evaluations for that one. Still, it showed good enough results to continue, so that would be the sort of minimum i'd be looking at.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Practical scope requirements #1

Practical scope requirements #1

StoneCypher commented Apr 24, 2023

minosvasilias commented Apr 24, 2023

StoneCypher commented Apr 24, 2023

minosvasilias commented Apr 24, 2023

Practical scope requirements #1

Practical scope requirements #1

Comments

StoneCypher commented Apr 24, 2023

minosvasilias commented Apr 24, 2023

StoneCypher commented Apr 24, 2023

minosvasilias commented Apr 24, 2023