Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: Deleting steps up to a given point #7357

Open
BramVanroy opened this issue Apr 11, 2024 · 3 comments
Open

[Feature]: Deleting steps up to a given point #7357

BramVanroy opened this issue Apr 11, 2024 · 3 comments
Labels
a:app Area: Frontend/Backend a:cli Area: Client c:stitch ty:feature_request type of the issue is a feature request

Comments

@BramVanroy
Copy link

BramVanroy commented Apr 11, 2024

Description

In HPC environments we often have job allocations (e.g. 10 days). It is possible that a save is triggered on 9 days and 20 hours (e.g. at step 100). After saving, logging just continues until the 10 days are over (e.g. until step 105). Then, we continue with a new job from that latest saved checkpoint at step 100. But that leads to a discrepancy: wandb logging is at step 105 already, but we restart from step 100 - so the graph will be messed up a bit. It would therefore be incredibly useful if some tooling exists to remove data points in a given range of steps.

Suggested Solution

A CLI interface that, given a project name and run as well as a steps-range, allows us to remove all data points within that range. From the outside looking in, this seems straightforward but I'm sure there are technical reasons why this is more difficult to do.

@ArtsiomWB
Copy link
Contributor

Hi @BramVanroy! Thank you for writing in! I will go ahead and submit the feature request for you to our engineering team.

@kptkin kptkin added a:cli Area: Client a:app Area: Frontend/Backend c:stitch ty:feature_request type of the issue is a feature request labels Apr 12, 2024
@sephmard
Copy link
Contributor

Hi @BramVanroy, please find more discussion on the feature backlog here: #7078 (comment)

In short, this is on our radar as a high-value feature. We already have one part complete, but have some remaining work implement.

@BramVanroy
Copy link
Author

@sephmard That's looking great! Any chance that this will also be controllable by environment variables? That would make life easier in HPC environments where passing around env vars might be easier than setting vars in scripts. Perhaps WANDB_RESUME can be update to align with this behavior, or a new WANDB_RESUME_RUN and WANDB_RESUME_STEP can be added?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
a:app Area: Frontend/Backend a:cli Area: Client c:stitch ty:feature_request type of the issue is a feature request
Projects
None yet
Development

No branches or pull requests

4 participants