Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DBX example (Python quickstart) coverage test wont run due to dependency issues #835

Open
seboktamas opened this issue Aug 18, 2023 · 0 comments

Comments

@seboktamas
Copy link

Expected Behavior

Execute code from https://dbx.readthedocs.io/en/latest/guides/python/python_quickstart/ works.

Current Behavior

When running pytest tests/unit --cov there is an exception: AttributeError: 'DataFrame' object has no attribute 'iteritems'

Steps to Reproduce (for bugs)

Follow the instructions and execute the code from https://dbx.readthedocs.io/en/latest/guides/python/python_quickstart/

Context

This is due to pyspark version is fixed and pandas version is not fixed. In pandas 'iteritems' became deprecated and removed.
Upgrading pyspark (and delta-spark) to latest version will fix the issue, but first I had to fix another issue:
Due to the python version is fixed in the example (to 3.9), and my environment has python 3.11, I got the following error: Python in worker has different version 3.11 than that in driver 3.9, PySpark cannot run with different minor versions. Please check environment variables PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON are correctly set.

After setting the worker version to python3.9, it worked. (There should be a note somewhere to need to take care of this version as well.)

Your Environment

platform darwin -- Python 3.9.17

  • dbx version used: 0.8.18
  • Databricks Runtime version: not applicable
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant