Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crashes with memory leak, seems to be deadlock related #1550

Open
3 tasks done
jeff-hykin opened this issue Feb 20, 2024 · 1 comment
Open
3 tasks done

Crashes with memory leak, seems to be deadlock related #1550

jeff-hykin opened this issue Feb 20, 2024 · 1 comment

Comments

@jeff-hykin
Copy link

Current Behaviour

Reading a 1 column, 7 row file causes a total lock up. (Happens with a bigger file, but I shrunk it down)

I think this could be different from this issue and this issue

Here is the CLI output:
Screen Shot 2024-02-20 at 12 02 17 PM

importing
reading data
loading as csv
generating report: './main/inputs.ignore.report.html'
Summarize dataset:   0%|                                                                                                        | 0/5 [00:00<?, ?it/s]
zsh: killed     ydata ./main/inputs.ignore.csv
/opt/homebrew/Cellar/python@3.11/3.11.3/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 3 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '

Expected Behaviour

Generate the html report

Data Description

Here's the contents of the CSV.

NOTE1: Removing even 1 row no longer causes the freeze/hangup

NOTE2: Despite the "fragility", the behavior is consistent. E.g. it always works with 1 row removed, and always hangs when all rows are present

data
1.8979166666666665
1.8770833333333332
1696285500.0
1.8
1.8010416666666667
1.8114583333333334

Code that reproduces the bug

#!/usr/bin/env python3
print(f'''importing''')
import numpy as np
import pandas as pd
from ydata_profiling import ProfileReport
import pandas as pd
from io import StringIO
import sys
import os
# pip install ydata-profiling

print(f'''reading data''')
filepath = sys.argv[1]
with open(filepath,'r') as f:
    output = f.read()

kwargs = dict(sep=",")
if output.startswith("#"):
    kwargs["comment"] = "#"

if output.count('\t') > output.count(','):
    kwargs["sep"] = "\t"

# Use StringIO to create a file-like object from the string
print(f'''loading as csv''')
df = pd.read_csv(StringIO(output))
profile = ProfileReport(df, title="Profiling Report")
new_path_base = os.path.dirname(filepath)
basename = os.path.basename(filepath)
if "." not in basename:
    new_path_base += f"/{basename}"
else:
    new_path_base += "/" + ".".join(basename.split(".")[0:-1])

report_path = f"{new_path_base}.report.html"
print(f'''generating report: {repr(report_path)}''')
profile.to_file(report_path)

pandas-profiling version

v4.6.4

Dependencies

# NOTE: Python 3.11.3
aiohttp==3.8.4
aiosignal==1.3.1
alabaster==0.7.13
annotated-types==0.6.0
ansi2html==1.8.0
anyio==4.1.0
appdirs==1.4.4
appnope==0.1.3
argon2-cffi==23.1.0
argon2-cffi-bindings==21.2.0
arrow==1.3.0
astor==0.8.1
asttokens==2.2.1
async-lru==2.0.4
async-timeout==4.0.2
attrdict==2.0.1
attrs==23.1.0
Babel==2.13.0
backcall==0.2.0
beautifulsoup4==4.12.2
bidict==0.22.1
bleach==6.1.0
blissful-basics==0.2.36
CacheControl==0.12.14
cachy==0.3.0
category-encoders==2.6.2
certifi==2023.7.22
cffi==1.15.1
charset-normalizer==3.2.0
cleo==1.0.0a5
click==8.1.7
cloudpickle==2.2.1
colorama==0.4.6
comm==0.1.4
contourpy==1.1.0
cool-cache==0.3.6
crashtest==0.3.1
cycler==0.11.0
Cython==3.0.0
dacite==1.8.1
dash==2.12.1
dash-bootstrap-components==1.5.0
dash-core-components==2.0.0
dash-html-components==2.0.0
dash-table==5.0.0
dask==2023.5.0
deap==1.4.1
debugpy==1.8.0
decorator==5.1.1
defusedxml==0.7.1
Deprecated==1.2.14
deprecation==2.1.0
distlib==0.3.7
distributed==2023.5.0
docstring-to-markdown==0.12
docutils==0.18.1
dulwich==0.20.50
engineering-notation==0.10.0
et-xmlfile==1.1.0
executing==1.2.0
-e git+ssh://git@github.com/jeff-hykin/ez_yaml.git@e08a2f7abfeac5ad8af1d23b68b99ae3525c93f4#egg=ez_yaml&subdirectory=main
fastjsonschema==2.18.0
file-system-py==0.0.11
filelock==3.12.2
Flask==2.2.5
Flask-Cors==4.0.0
fonttools==4.42.0
fqdn==1.5.1
frozenlist==1.3.3
fsspec==2023.10.0
gensim==4.3.2
gym==0.22.0
gym-notices==0.0.8
html5lib==1.1
htmlmin==0.1.12
idna==3.4
ImageHash==4.3.1
imagesize==1.4.1
imbalanced-learn==0.11.0
importlib-metadata==4.13.0
importlib-resources==6.0.1
informative-iterator==2.1.1
ipykernel==6.26.0
ipython==8.12.2
ipython-genutils==0.2.0
ipywidgets==7.6.5
isoduration==20.11.0
itsdangerous==2.1.2
jaraco.classes==3.3.0
jedi==0.19.0
Jinja2==3.1.2
joblib==1.3.2
-e git+ssh://git@github.com/jeff-hykin/json_fix.git@6303ca934b25bf72bec82b0f5ca1d282f4566543#egg=json_fix&subdirectory=main
json5==0.9.14
jsonpointer==2.4
jsonschema==4.19.0
jsonschema-specifications==2023.7.1
jupyter-events==0.9.0
jupyter-lsp==2.2.0
jupyter_client==8.6.0
jupyter_core==5.5.0
jupyter_server==2.10.1
jupyter_server_terminals==0.4.4
jupyterlab==4.0.9
jupyterlab-widgets==3.0.8
jupyterlab_pygments==0.3.0
jupyterlab_server==2.25.2
kaleido==0.2.1
keyring==24.2.0
kiwisolver==1.4.4
libsvm==3.23.0.4
llvmlite==0.40.1
locket==1.0.0
lockfile==0.12.2
MarkupSafe==2.1.3
matplotlib==3.7.2
matplotlib-inline==0.1.6
mistune==3.0.2
mne==1.5.1
more-itertools==10.1.0
mpmath==1.3.0
msgpack==1.0.5
multidict==6.0.4
multimethod==1.10
nbclient==0.9.0
nbconvert==7.11.0
nbformat==5.9.2
nest-asyncio==1.5.7
networkx==3.1
notebook==7.0.6
notebook_shim==0.2.3
numba==0.57.1
numpy==1.24.4
-e git+https://github.com/TAMU-Robomasters/cv_main@298ae48927c2d6644b4b3834039548c3c7694cc0#egg=opencv&subdirectory=repos/open_cv/modules/python/package
openpyxl==3.1.2
orjson==3.9.5
overrides==7.4.0
packaging==23.1
pandas==2.0.3
pandasgui==0.2.14
pandocfilters==1.5.0
parso==0.8.3
partd==1.4.1
patsy==0.5.3
pexpect==4.8.0
phik==0.12.3
pickleshare==0.7.5
Pillow==10.0.0
pkginfo==1.9.6
pkgutil_resolve_name==1.3.10
platformdirs==2.6.2
plotly==5.16.1
plotly-utils @ git+https://github.com/SengerM/plotly_utils@5f7e724d16d3ce7aa8282613220474bd2fcb90e5
pluggy==1.0.0
pmdarima==2.0.3
poetry==1.2.1
poetry-core==1.2.0
poetry-plugin-export==1.1.2
pooch==1.8.0
pretty-errors==1.2.25
prometheus-client==0.19.0
prompt-toolkit==3.0.39
psutil==5.9.5
ptyprocess==0.7.0
pure-eval==0.2.2
pyarrow==14.0.1
pycairo==1.23.0
pycparser==2.21
pydantic==2.5.3
pydantic_core==2.14.6
Pygments==2.16.1
PyGObject==3.44.1
pylev==1.4.0
pynput==1.7.6
pyobjc-core==10.0
pyobjc-framework-ApplicationServices==10.0
pyobjc-framework-Cocoa==10.0
pyobjc-framework-Quartz==10.0
pyod==1.1.0
pyparsing==3.0.9
Pypubsub==4.0.3
PyQt5==5.15.10
PyQt5-Qt5==5.15.11
PyQt5-sip==12.13.0
PyQtWebEngine==5.15.6
PyQtWebEngine-Qt5==5.15.11
python-dateutil==2.8.2
python-engineio==4.4.1
python-json-logger==2.0.7
python-lsp-jsonrpc==1.0.0
python-lsp-server==1.7.3
python-socketio==5.8.0
pytz==2023.3
pytz-deprecation-shim==0.1.0.post0
PyWavelets==1.5.0
PyYAML==6.0.1
pyzmq==25.1.1
qgrid==1.3.1
qtstylish==0.1.5
quik-config==1.7.7
referencing==0.30.2
requests==2.31.0
requests-toolbelt==0.9.1
retrying==1.3.4
rfc3339-validator==0.1.4
rfc3986-validator==0.1.1
rpds-py==0.9.2
rpy2==3.5.12
schemdraw==0.15
scikit-base==0.5.1
scikit-learn==1.1.3
scikit-plot==0.3.7
scipy==1.10.1
seaborn==0.11.2
Send2Trash==1.8.2
shellingham==1.5.3
silver-spectacle==0.8.0
simplejson==3.19.2
six==1.16.0
sktime==0.21.1
slick-siphon==0.1.2
smart-open==6.4.0
sniffio==1.3.0
snowballstemmer==2.2.0
sortedcontainers==2.4.0
soupsieve==2.5
Sphinx==7.2.6
sphinx-rtd-theme==1.3.0
sphinxcontrib-applehelp==1.0.7
sphinxcontrib-devhelp==1.0.5
sphinxcontrib-htmlhelp==2.0.4
sphinxcontrib-jquery==4.1
sphinxcontrib-jsmath==1.0.1
sphinxcontrib-qthelp==1.0.6
sphinxcontrib-serializinghtml==1.1.9
stack-data==0.6.2
statsmodels==0.14.0
stopit==1.1.2
super-hash==1.2.8
sympy==1.12
tabloo==0.1.0
tangled-up-in-unicode==0.2.0
tbats==1.1.3
tblib==2.0.0
telegram-notifier==0.3
telepy-notify==0.2.1
tenacity==8.2.3
terminado==0.18.0
threadpoolctl==3.2.0
tinycss2==1.2.1
toml==0.10.2
tomlkit==0.12.1
toolz==0.12.0
torch==2.1.1
tornado==6.3.3
TPOT==0.12.1
tqdm==4.66.1
trace-updater==0.0.9.1
traitlets==5.9.0
-e git+ssh://git@github.com/ioerger2/transit2.git@5bcebc1d742b61b2def33d9611a9179f3e71fd9a#egg=transit2
tsdownsample==0.1.2
typeguard==4.1.5
types-python-dateutil==2.8.19.14
typing_extensions==4.7.1
tzdata==2023.3
tzlocal==4.3
ujson==5.7.0
update-checker==0.18.0
uri-template==1.3.0
urllib3==1.26.16
virtualenv==20.21.1
visions==0.7.5
wcwidth==0.2.6
webcolors==1.13
webencodings==0.5.1
websocket-client==1.6.4
Werkzeug==2.2.3
widgetsnbextension==3.5.2
wordcloud==1.9.2
wrapt==1.15.0
wurlitzer==3.0.3
wxPython==4.2.1
xattr==0.9.9
xgboost==1.7.6
xxhash==3.3.0
yarl==1.9.2
ydata-profiling==4.6.4
yellowbrick==1.5
zict==3.0.0
zipp==3.16.2

OS

MacOS 12.6 (Monterey) Apple Silicon

Checklist

  • There is not yet another bug report for this issue in the issue tracker
  • The problem is reproducible from this bug report. This guide can help to craft a minimal bug report.
  • The issue has not been resolved by the entries listed under Common Issues.
@jeff-hykin
Copy link
Author

(also I know its dumb to read a whole file then use StringIO, the code was simplified for the issue)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants