New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement prompt/generation alignment #531
base: main
Are you sure you want to change the base?
Implement prompt/generation alignment #531
Conversation
I think this is the right general direction.
Could you illustrate this? I had a PR opened (can't find it right now) where I iterated once over the vocabulary to find the overlapping tokens. |
Making up a fake example. My prompt is "Good mor". Let's say there's a token for "mor" and it's the last one of the prompt. We would want token alignment to replace "mor" with "morning". However, if the token "ning" by itself does not exist, then there's nothing in the I was looking at creating I was then thinking that a solution could be to create at initialization a mapping that contains both information about characters and about tokens (so we would have some states with no tokens leading to them that would be used for the token alignement) |
How about looping over the entire vocabulary and store the tokens that accept Haven't taken the time to think about the constrained case yet. |
I had not realized that I could walk the |
Yes I think that's the right approach. There's some stuff to figure out in terms of design, but otherwise looks good. |
01bfc21
to
4aa74f2
Compare
I'll write unit tests next if you think having those separate functions is the right design |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have made several comments on the overall design, but nothing that would dramatically affect your implementation. You can start implementing tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi there, I am just a user here who is looking forward to this change. However, I noticed that there is an error if the model is running on a GPU. I think it could be fixed by passing the device in these two statements here (at least, this fixes it for me)
We're getting really close. There are a few design changes remaining, and mostly we should have comprehensive tests before merging. |
6bb90f8
to
29853ec
Compare
I rebased your branch on |
Is this still something we want to work on? |
Yes! I'm currently thinking about how we could integrate that to the logits processors since most integration are going to use this :) |
Opening this PR to discuss the implementation of token alignment (#161)
This is not intended to be merged, I was just wondering whether you think this is a promising direction to look into
The idea is to modify the
self.states_to_token_maps
of the fsm associated to each prompt to create a new state to accommodate the fact that a token has been removed compared to when this map was created during the initialization of the fsmDisadvantage:
Advantage: