Replies: 1 comment
-
Introducing randomness may potentially make the training process more unstable, and the convergence performance may be affected? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
The formula used in the loss calculation:
def calc_loss_batch:
logits = model(input_batch)
loss = torch.nn.functional.cross_entropy(logits, target_batch.flatten())
and the formula used for the final txt generation with temperature, top_k and sampling;
def generate(model, idx, max_new_tokens, context_size, temperature, top_k=None):
... logits = model(idx_cond) ...
differs for the management of randomness that manipulate the final logits, creating esentially two distinct path for the training and the use of the model.
I am wondering if we incorporate the text_generation (modified) logit calculation in the training loss calculation could be benefit for the performance of the model.
Beta Was this translation helpful? Give feedback.
All reactions