loss function calculation and text_generation #98

nicolaleo · 2024-04-01T08:29:39Z

nicolaleo
Apr 1, 2024

The formula used in the loss calculation:
def calc_loss_batch:
logits = model(input_batch)
loss = torch.nn.functional.cross_entropy(logits, target_batch.flatten())

and the formula used for the final txt generation with temperature, top_k and sampling;
def generate(model, idx, max_new_tokens, context_size, temperature, top_k=None):
... logits = model(idx_cond) ...

differs for the management of randomness that manipulate the final logits, creating esentially two distinct path for the training and the use of the model.

I am wondering if we incorporate the text_generation (modified) logit calculation in the training loss calculation could be benefit for the performance of the model.

Intelligence-Manifesto · 2024-04-01T13:08:20Z

Intelligence-Manifesto
Apr 1, 2024

Introducing randomness may potentially make the training process more unstable, and the convergence performance may be affected？

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

loss function calculation and text_generation #98

{{title}}

Replies: 1 comment

{{title}}

Select a reply

loss function calculation and text_generation #98

nicolaleo Apr 1, 2024

Replies: 1 comment

Intelligence-Manifesto Apr 1, 2024

nicolaleo
Apr 1, 2024

Intelligence-Manifesto
Apr 1, 2024