You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi~ Thanks a lot for the new version code which have made the framework much easier to understand. But I noticed that some details have also changed, e.g., the tokenizer part:
old version:
def tokenizer_X_token(prompt, tokenizer, X_token_index, return_tensors=None):
prompt_chunks = [tokenizer(chunk).input_ids for chunk in prompt.split(f'<{X_INDEX_TOKEN[X_token_index].lower()}>')]
...
new version:
def tokenizer_image_token(prompt, tokenizer, image_token_index=IMAGE_TOKEN_INDEX, return_tensors=None):
prompt_chunks = [tokenizer(chunk).input_ids for chunk in prompt.split('<image>')]
...
Should I worry about any performance degradation? Since
it looks like the video and image are treated as the same?
the original training samples include symbols like <image>\n and \n<video>?
In fact, I am trying to finetune with new modals like audio and depth, so is there any confict with current version (besides the languabind part)?
Thank you so much~☺
The text was updated successfully, but these errors were encountered:
Hi~ Thanks a lot for the new version code which have made the framework much easier to understand. But I noticed that some details have also changed, e.g., the tokenizer part:
old version:
new version:
Should I worry about any performance degradation? Since
In fact, I am trying to finetune with new modals like audio and depth, so is there any confict with current version (besides the languabind part)?
Thank you so much~☺
The text was updated successfully, but these errors were encountered: