Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CLI]: Large numbers in parameter section of sweep config breaks sweep by repeating runs #7350

Open
HannesStagge opened this issue Apr 10, 2024 · 2 comments
Labels
c:sweeps Component: Sweeps

Comments

@HannesStagge
Copy link

Describe the bug

I encountered some repeated runs when running grid hyperparameter studies. See
here for another example.
@luisbergua and @QianqianF this might possibly be interesting for you too?

I tracked it to large exponential numbers (1-1e15) in my sweep configs parameter section.

The config that reproduces the error:

sweep_config = {
        "method": "grid",
        "name": this_file.stem,
        "metric": {"goal": "minimize", "name": "test/loss"},
        "parameters": {
            "param_A": {"values": [1, 3, 4, 5]},
            "param_B": {
                "values": [
                    1,
                    10,
                    100,
                    1e3,
                    1e4,
                    1e5,
                    1e6,
                    1e7,
                    1e8,
                    1e9,
                    1e10,
                    1e11,
                    1e12,
                    1e13,
                    1e14,
                    1e15,
                ]
            },
        },
    }

as opposed to

sweep_config = {
        "method": "grid",
        "name": this_file.stem,
        "metric": {"goal": "minimize", "name": "test/loss"},
        "parameters": {
            "param_A": {"values": [1, 3, 4, 5]},
            "param_B": {
                "values": [
                    0,
                    1,
                    2,
                    3,
                    4,
                    5,
                    6,
                    7,
                    8,
                    9,
                    10,
                    11,
                    12,
                    13,
                    14,
                    15,
                ]
            },
        },
    }

working fine.

Parallel coordinates for the faulty case:
grafik
Note the missing runs for param_A=[3,4,5]
And for the correct case:
grafik

Heres the complete code for the faulty and correct case

Additional Files

No response

Environment

WandB version: 0.16.2

OS: Windows-10-10.0.22631-SP0

Python version: 3.10.12

Versions of relevant libraries:
torch: 2.0.0
lightning: 2.0.9

Additional Context

I am running the agents on multiple cores with multiprocessing, but that does not seem to influence the workings from my experience.

@thanos-wandb
Copy link
Contributor

Hi @HannesStagge thank you so much for the detailed report. Investigating this further and will get back to you with any updates.

@thanos-wandb
Copy link
Contributor

@HannesStagge we've reproduced this issue on our end, and reported to our engineering teams. We will keep you posted with any updates. Thanks for reporting!

@kptkin kptkin added the c:sweeps Component: Sweeps label Apr 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c:sweeps Component: Sweeps
Projects
None yet
Development

No branches or pull requests

3 participants