Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade this lib to be compatible with Spark Connect #255

Open
MrPowers opened this issue Mar 6, 2024 · 1 comment
Open

Upgrade this lib to be compatible with Spark Connect #255

MrPowers opened this issue Mar 6, 2024 · 1 comment
Assignees

Comments

@MrPowers
Copy link

MrPowers commented Mar 6, 2024

Expected Behavior

This library works the same with Spark Connect.

Current Behavior

This library uses sparkSession.sparkContext which doesn't work with Spark Connect, here is an example:

if sparkSession.sparkContext is not None:
. This actually might work cause the exception would be caught, but you get the idea.

Steps to Reproduce (for bugs)

Run the test suite with Spark Connect enabled and fix all issues.

@ronanstokes-db ronanstokes-db self-assigned this Mar 15, 2024
@ronanstokes-db
Copy link
Contributor

We recently released an update to deal with situations where the spark context is not available to query things like default parallelism. This should address this

In general, the way to safeguard against this is to explicitly specify the number of partitions requested when generating the specification for your dataset. This will avoid the query against the sparkContext.

While we have not tested against Spark Connect, we have tested against other environments where there is no sparkContext available

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants