dvlab-research / LongLoRA Public

Notifications
Fork 251
Star 2.5k

Code
Issues 39
Pull requests 1
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Issues: dvlab-research/LongLoRA

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

39 Open 124 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

What's the trainset is used to obtain “Model with contextg extension via improved LoRA fine-tuning” (LoRA+)？

#184 opened Apr 22, 2024 by ZackZikaiXiao

How did make questions and answers for long context(LongAlpaca)?

#183 opened Mar 4, 2024 by ddoyles

When I set per_device_train_batch_size=2, the S2-Attn would not shift as expected

#182 opened Mar 1, 2024 by linhaojia13

HF models missing rope scaling in the config

#181 opened Feb 29, 2024 by hsiehjackson

Machine don't install Flash Attention

#180 opened Feb 27, 2024 by huilong-chen

global_step文件

#179 opened Feb 21, 2024 by xxcoco763

Regarding the results in Table 8 and Table 14

#177 opened Feb 4, 2024 by Statisticss

About the different datasets and corresponding models

#176 opened Feb 2, 2024 by Statisticss

Memory usage "too small" for 7B Llama-2

#174 opened Jan 24, 2024 by Linohong

training a LLM w/ shifted sparse attention from the scratch?

#173 opened Jan 24, 2024 by we1k

merge_lora_weights_and_save_hf_model.py Error while deserializing header: HeaderTooLarge

#172 opened Jan 23, 2024 by Spongeorge

Distributed inference issue

#171 opened Jan 22, 2024 by yixliu1

论文中的evaluate结果，推理时用的attention是shifted sparse attention？还是full attention？

#170 opened Jan 19, 2024 by zhangxiann

Is it possible to increase the context length of phi-2 using LongLora? If yes, what changes need to be done to support it?

#169 opened Jan 18, 2024 by dbanka

the value of loss is too unstable when supervised-finetune the 7b-100k-ft model

#168 opened Jan 18, 2024 by seanxuu

streaming llm problem

#167 opened Jan 18, 2024 by seanxuu

How can I use the Llama-2-7b-longlora-100k-ft model correctly

#166 opened Jan 18, 2024 by seanxuu

bug report : RuntimeError: probability tensor contains either inf, nan or element < 0

#165 opened Jan 18, 2024 by seanxuu

Is LongLoRA can be mixed with YaRN ?

#164 opened Jan 1, 2024 by DevNullx64

推理时候显存分配

#163 opened Dec 28, 2023 by xxcoco763

Adapting to new models

#162 opened Dec 24, 2023 by epinnock

如何在LoRA训练中加入embed和norm层的训练？

#161 opened Dec 22, 2023 by Zheng-Jay

What llama attn replacement to use for SFT-based inference?

#159 opened Dec 15, 2023 by spring1915

在没有报错的情况下，LongAlpaca-7B只对文本的第一段文字进行了响应

#158 opened Dec 14, 2023 by waleyW

Configs in inference.py necessary for context length expansion in model serving?

#157 opened Dec 13, 2023 by spring1915

Previous 1 2 Next

Previous Next

ProTip! Updated in the last three days: updated:>2024-05-14.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly