Skip to content

Running data generation in a python function #232

Discussion options

You must be logged in to vote

Thanks for your question Pankaj

Does the data for each date depend on the previous dates data ? Rather than looping on the driver side, why not make the function a spark UDF or pandas UDF (pandas is more efficient) that generates the data given some input.

UDFs and Pandas UDFs can take multiple inputs. To generate multiple outputs, you could simply generate a single JSON valued field and then extract out the elements to fields

The following documentation describes generating Pandas UDFs

The following documentation describes extracting individual JSON fields into separate columns:

https://docs.databricks.com/en/optimizations/semi-structured.…

Replies: 1 comment 6 replies

Comment options

You must be logged in to vote
6 replies
@ronanstokes-db
Comment options

@pankajshrestha
Comment options

@ronanstokes-db
Comment options

@pankajshrestha
Comment options

@pankajshrestha
Comment options

Answer selected by ronanstokes-db
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants