Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Random number generation not generating random data unless maxValue is specified or is implied from other options #259

Open
ronanstokes-db opened this issue Mar 29, 2024 · 0 comments
Assignees
Labels
bug Something isn't working

Comments

@ronanstokes-db
Copy link
Contributor

ronanstokes-db commented Mar 29, 2024

Expected Behavior

When you want to generate a random value for a field, you use the option random=True.

Current Behavior

This currently only works if an upper bound (i.e max value) is specified for the column.
Upper bounds are implicitly calculated when using the values option, the uniqueValues option also.

The workaround in the current release is to always specify an upper bound using either the maxValue option, the uniqueValues option or other options such as values that implicitly compute an upper bound for the range of values produced.

Steps to Reproduce (for bugs)

The following code works correctly generating random data on all columns marked as random except for customer_id2

testDataSpec = (
    dg.DataGenerator(spark, name="test_data_set1", rows=10000, partitions=4)
    .withIdOutput()
    .withColumn("customer_id", "long", minValue=100, maxValue=2147483647, random=True)
    .withColumn("customer_id2", "long", random=True)
    .withColumn("code1", IntegerType(), minValue=100, maxValue=200, random=True)
    .withColumn("code2", "integer", minValue=0, maxValue=10, random=True)
    .withColumn("code3", StringType(), values=["online", "offline", "unknown"], random=True)
    .withColumn(
        "code4", StringType(), values=["a", "b", "c"], random=True, percentNulls=0.05
    )
    .withColumn(
        "code5", "string", values=["a", "b", "c"], random=True, weights=[9, 1, 1]
    )
    .withColumn("code6", "integer",  maxValue=10, random=True)
    .withColumn("code7", "integer",  uniqueValues=50, random=True)
)

Context

Your Environment

  • dbldatagen version used:
  • Databricks Runtime version:
  • Cloud environment used:
@ronanstokes-db ronanstokes-db changed the title Random number generation not executing unless both minValue and maxValue is specified Random number generation not generating random data unless maxValue is specified or is implied from other options Mar 29, 2024
@ronanstokes-db ronanstokes-db self-assigned this Mar 29, 2024
@ronanstokes-db ronanstokes-db added the bug Something isn't working label Mar 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant