Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

population density downloading; attributes besides 'total' breaks when creating parquet files #145

Open
wagnerfe opened this issue Aug 1, 2022 · 0 comments

Comments

@wagnerfe
Copy link

wagnerfe commented Aug 1, 2022

hello hello,
querying population density data (only tested for Germany) works fine, when choosing 'total' as category.
However, when choosing a different category (f.e. 'women'), one can find the downloaded files as .csv in the tmp folder but the code it breaks when creating the parquet files. Error message:

Exception occurred during processing of request from ('127.0.0.1', 33952)
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/socketserver.py", line 316, in _handle_request_noblock
    self.process_request(request, client_address)
  File "/usr/local/lib/python3.10/socketserver.py", line 347, in process_request
    self.finish_request(request, client_address)
  File "/usr/local/lib/python3.10/socketserver.py", line 360, in finish_request
    self.RequestHandlerClass(request, client_address, self)
  File "/usr/local/lib/python3.10/socketserver.py", line 747, in __init__
    self.handle()
  File "/usr/local/lib/python3.10/site-packages/pyspark/accumulators.py", line 262, in handle
    poll(accum_updates)
  File "/usr/local/lib/python3.10/site-packages/pyspark/accumulators.py", line 235, in poll
    if func():
  File "/usr/local/lib/python3.10/site-packages/pyspark/accumulators.py", line 239, in accum_updates
    num_updates = read_int(self.rfile)
  File "/usr/local/lib/python3.10/site-packages/pyspark/serializers.py", line 564, in read_int
    raise EOFError
EOFError
----------------------------------------
ERROR:root:Exception while sending command.
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/py4j/clientserver.py", line 480, in send_command
    raise Py4JNetworkError("Answer from Java side is empty")
py4j.protocol.Py4JNetworkError: Answer from Java side is empty

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/py4j/java_gateway.py", line 1038, in send_command
    response = connection.send_command(command)
  File "/usr/local/lib/python3.10/site-packages/py4j/clientserver.py", line 503, in send_command
    raise Py4JNetworkError(
py4j.protocol.Py4JNetworkError: Error while sending or receiving
Traceback (most recent call last):
  File "/opt/app/pipelines/population-density/src/main.py", line 21, in <module>
    Processor.start(files, output_dir, updated_date)
  File "/opt/app/pipelines/population-density/src/Processor.py", line 70, in start
    df.write.mode("overwrite").parquet(f"{output_dir}{updated_date}_result.parquet")
  File "/usr/local/lib/python3.10/site-packages/pyspark/sql/readwriter.py", line 885, in parquet
    self._jwrite.parquet(path)
  File "/usr/local/lib/python3.10/site-packages/py4j/java_gateway.py", line 1321, in __call__
    return_value = get_return_value(
  File "/usr/local/lib/python3.10/site-packages/pyspark/sql/utils.py", line 111, in deco
    return f(*a, **kw)
  File "/usr/local/lib/python3.10/site-packages/py4j/protocol.py", line 334, in get_return_value
    raise Py4JError(
py4j.protocol.Py4JError: An error occurred while calling o84.parquet
ERROR: 1

any idea on how I can fix this? (I work on macOS, Monterey, intel chip and only need the parquet files)

Thank you so much for any help and in general this really awesome project!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant