Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add filepaths to pretty_errors' stacktraces; pretty_errors is installed by PL #9167

Closed
wants to merge 3 commits into from

Conversation

akoumpa
Copy link
Collaborator

@akoumpa akoumpa commented May 10, 2024

PL installs pretty_errors which alters sys.excepthook from sys.__excepthook__ to a text formatting function the pretty_errors library defines. However, the format used is incomplete as it misses filepaths, which hinders debugging.

This PR includes pretty_errors and configures it so that we get filepaths in the stacktraces, for example:

Before:

-------------------------------------------------------------------------------
megatron_gpt_finetuning.py 18 <module>
from nemo.utils import logging

__init__.py 31 <module>
from nemo.utils.lightning_logger_patch import add_memory_handlers_to_pl_logger

lightning_logger_patch.py 19 <module>
q()

NameError:
name 'q' is not defined

After:

*******************************************************************************
18 <module> ...examples/nlp/language_modeling/tuning/megatron_gpt_finetuning.py"/opt/NeMo/examples/nlp/language_modeling/tuning/megatron_gpt_finetuning.py", line 18
> from nemo.utils import logging

31 <module> /opt/NeMo/nemo/utils/__init__.py
"/opt/NeMo/nemo/utils/__init__.py", line 31
> from nemo.utils.lightning_logger_patch import add_memory_handlers_to_pl_logger...

19 <module> /opt/NeMo/nemo/utils/lightning_logger_patch.py
"/opt/NeMo/nemo/utils/lightning_logger_patch.py", line 19

  import logging as _logging
  from logging.handlers import MemoryHandler

  import pytorch_lightning as pl
> q()

  HANDLERS = {}

_logging: <module 'logging' from '/usr/lib/python3.10/logging/__init__.py'>
MemoryHandler: <class 'logging.handlers.MemoryHandler'>
pl: <module 'pytorch_lightning' from '/usr/local/lib/python3.10/dist-p... [105]
NameError:
name 'q' is not defined

What does this PR do ?

Add a one line overview of what this PR aims to accomplish.

Collection: [Note which collection this PR will affect]

Changelog

  • Add specific line by line info of high level changes in this PR.

Usage

  • You can potentially add a usage example below
# Add a code snippet demonstrating how to use this 

GitHub Actions CI

The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.

The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

  • New Feature
  • Bugfix
  • Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

  • Related to # (issue)

…ed by PL

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
@akoumpa akoumpa force-pushed the akoumparouli/default_traceback branch from f19bd54 to 4f423f9 Compare May 10, 2024 22:03
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
@akoumpa akoumpa marked this pull request as ready for review May 10, 2024 22:04
@akoumpa akoumpa self-assigned this May 10, 2024
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
nemo/__init__.py Dismissed Show dismissed Hide dismissed
@akoumpa
Copy link
Collaborator Author

akoumpa commented May 15, 2024

Example list of synonym files in NeMo

akoumparouli@1604ab7-lcedt:~/fix_ci/NeMo$ find ./ | grep -i '.py' | rev | cut -d'/' -f1 | rev | sort | uniq -c | sort -nr | head -n 20
235 init.py
20 utils.py
7 get_data.py
6 base.py
4 util.py
4 transformer.py
4 rnnt.py
4 helpers.py
4 features.py
4 dataset.py
4 ctc.py
3 waveglow.py
3 transformer_utils.py
3 tokenizer_utils.py
3 tacotron2.py
3 ssl_tts.py
3 spectrogram_enhancer.py
3 radtts.py
3 mixer_tts.py
3 import_datasets.py

Copy link
Collaborator

@ericharper ericharper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

Copy link
Collaborator

@titu1994 titu1994 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be nicer to move this to nemo.utils.logging. doing stuff inside init which usually doesn't get checked often is not a good pattern.

We almost always import logging so you can do a global var check if you already set it once and if so ignore otherwise do this setup in side the logging.py file

@titu1994
Copy link
Collaborator

Also, it's a core change so whether here or logging, it needs sign-offs from @nithinraok @ericharper @blisc @yaoyu-33

@akoumpa
Copy link
Collaborator Author

akoumpa commented May 15, 2024

@titu1994 no problem, I will move this to nemo/utils/logging.

@titu1994
Copy link
Collaborator

Cool thanks ! Also for some reason the speech text we disabled failed so rebase latest main changes.

@akoumpa
Copy link
Collaborator Author

akoumpa commented May 15, 2024

Ok, false alarm everyone.

On the container I tested on Friday it used torchmetrics v1.4.0, which was installing pretty-errors (https://github.com/Lightning-AI/torchmetrics/blob/v1.4.0/requirements/base.txt#L9)

Ten hours ago, it was moved to a debug requirement https://github.com/Lightning-AI/torchmetrics/pull/2527/files and now the default stack trace is used.

As a result, this is no longer required and I will be closing this PR.

Apologies for pulling everyone to review.

If any take-away from this PR, I would keep perhaps that adding a test to confirm software dependencies altering global behaviors would be a good idea to avoid having them go unnoticed (TODO).

@akoumpa akoumpa closed this May 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants