Adding recipe for Listenable Maps for Audio Classifiers #2538

fpaissan · 2024-05-02T20:31:34Z

What does this PR do?

Implements the work of the ICML'24 paper L-MAC (https://arxiv.org/abs/2403.13086).
Refactors exiting L2I and PIQ recipes to unify the interpretability approaches under a single structure. This simplifies code shared among the interpretability pipelines. Also, it minimizes the room for error in the comparison of the techniques.

Before submitting

Did you read the contributor guideline?
Did you make sure your PR does only one thing, instead of bundling different changes together?
Did you make sure to update the documentation with your changes? (if necessary)
Did you write any new necessary tests? (not for typos and docs)
Did you verify new and existing tests pass locally with your changes?
Did you list all the breaking changes introduced by this pull request?
Does your code adhere to project-specific code style and conventions?

PR review

Reviewer checklist

Is this pull request ready for review? (if not, please submit in draft mode)
Check that all items from Before submitting are resolved
Make sure the title is self-explanatory and the description concisely explains the PR
Add labels and milestones (and optionally projects) to the PR so it can be classified
Confirm that the changes adhere to compatibility requirements (e.g., Python version, platform)
Review the self-review checklist to ensure the code is ready for review

… finetuning

…elop

* Skip lazy imports when the caller is inspect.py This avoids having certain inspect functions import our lazy modules when we don't want them to. `getframeinfo` in particular appears to do it, and this gets called by PyTorch at some point. IPython might also be doing it but autocomplete still seems to work. This does not appear to break anything. Added test for hyperpyyaml to ensure we're not breaking that. * SSL_Semantic_Token _ new PR (speechbrain#2509) * remove unnecassry files and move to dasb * remove extra recepie from test * update ljspeech qunatization recepie * add discrete_ssl and remove extra files * fix precommit * update kmeans and add tokeizer for postprocessing * fix precommit * Update discrete_ssl.py * fix clone warning --------- Co-authored-by: Mirco Ravanelli <mirco.ravanelli@gmail.com> * _ensure_module Raises docstring * Expose `ensure_module` so that docs get generated for it This is already an internal class anyway, and this is safe to call. * Update actions/setup-python * Use `uv` in test CI + merge some dep installs The consequence is faster dependency installation. Merging some of the dependency installs helps avoid some packages being reinstalled from one line to the next. Additionally, CPU versions are specified when relevant, to avoid downloading CUDA stuff the CI can't use anyway. * Use `uv` in doc CI + merge some dep installs Similar rationale as for the test CI * Parallelize doc generation with Sphinx This does not affect the entire doc generation process but should allow some minor multithreading even with the 2-core CI workers. * Enable `uv` caching on the test CI * Enable `uv` caching on the docs CI * CTC-only training recipes for LibriSpeech (code from Samsung AI Cambridge) (speechbrain#2290) CTC-only pre-training of conformer and branchformer. --------- Co-authored-by: Shucong Zhang/Embedded AI /SRUK/Engineer/Samsung Electronics <s1.zhang@sruk-ccn4.eu.corp.samsungelectronics.net> Co-authored-by: Adel Moumen <adelmoumen.pro@gmail.com> Co-authored-by: Adel Moumen <88119391+Adel-Moumen@users.noreply.github.com> Co-authored-by: Parcollet Titouan <titouan.parcollet@univ-avignon.fr> * Update CommonVoice transformer recipes (code from Samsung AI Center Cambridge) (speechbrain#2465) * Update CV transformer recipes to match latest results with conformer. --------- Co-authored-by: Titouan Parcollet/Embedded AI /SRUK/Engineer/Samsung Electronics <t.parcollet@sruk-ccn4.eu.corp.samsungelectronics.net> Co-authored-by: Mirco Ravanelli <mirco.ravanelli@gmail.com> Co-authored-by: Adel Moumen <adelmoumen.pro@gmail.com> * Whisper improvements: flash attention, KV caching, lang_id, translation, training... (speechbrain#2450) Whisper improvements: - flash attention - kv caching - lang identifaction - translation - finetuning amelioration ... and more ... * Update README.md * precommit * update zed download link (speechbrain#2514) * `RelPosEncXL` refactor and precision fixes (speechbrain#2498) * Add `RelPosEncXL.make_pe`, rework precision handling * Rework RelPosEncXL output dtype selection * Fix in-place input normalization when using `sentence`/`speaker` norm (speechbrain#2504) * fix LOCAL_RANK to be RANK in if_main_process (speechbrain#2506) * Fix Separation and Enhancement recipes behavior when NaN encountered (speechbrain#2524) * Fix Separation and Enhancement recipes behavior when NaN encountered * Formatting using precommit hooks * Lock torch version in requirements.txt (speechbrain#2528) * Fix compatibility for torchaudio versions without `.io` (speechbrain#2532) This avoids having the Python interpreter attempt to resolve the type annotation directly. * fix docstrings * consistency tests - classification * consistency tests - classification * consistency tests - interpret * default to no wham * fix after tests pass * fix after tests pass * tests after that * fix consistency --------- Co-authored-by: asu <sdelang@sdelang.fr> Co-authored-by: Pooneh Mousavi <moosavi.pooneh@gmail.com> Co-authored-by: Mirco Ravanelli <mirco.ravanelli@gmail.com> Co-authored-by: shucongzhang <104781888+shucongzhang@users.noreply.github.com> Co-authored-by: Shucong Zhang/Embedded AI /SRUK/Engineer/Samsung Electronics <s1.zhang@sruk-ccn4.eu.corp.samsungelectronics.net> Co-authored-by: Adel Moumen <adelmoumen.pro@gmail.com> Co-authored-by: Adel Moumen <88119391+Adel-Moumen@users.noreply.github.com> Co-authored-by: Parcollet Titouan <titouan.parcollet@univ-avignon.fr> Co-authored-by: Parcollet Titouan <parcollet.titouan@gmail.com> Co-authored-by: Titouan Parcollet/Embedded AI /SRUK/Engineer/Samsung Electronics <t.parcollet@sruk-ccn4.eu.corp.samsungelectronics.net> Co-authored-by: Yingzhi WANG <41187612+BenoitWang@users.noreply.github.com> Co-authored-by: Peter Plantinga <plantinga.peter@protonmail.com> Co-authored-by: Séverin <123748182+SevKod@users.noreply.github.com>

…ility into refactor_lmac

…elop

…into refactor_lmac

…ility into refactor_lmac

fpaissan and others added 30 commits December 11, 2023 16:49

add NMF image logging for debug

32c31fc

fix bug in viz L2I

7a5a9c8

log the number of finetuning masks

00dfdbe

lower crosscor thr

8288675

fix acc

3e541de

align L2I debugging w/ PIQ script

aff2f07

fixed accuracy computation for L2I

e3b981a

L2I with variable number of components (K=200)

23b542e

debugging l2i...

da12c72

update hparams

f4fc9a9

fixed oracle source

024f64c

fixed wrong sources and running finetuning experiments..

3b3a8c4

add AST as classifier

ec01553

hparams ast -- still not converging

7ea4972

add ast augmentation

d0dc205

synced merge

d96bffd

update training script after merge

69cc6e7

with augmentations is better

58117ab

just pushing hparams

1dea3fc

classification with CE

68d0d8e

conv2d fix for CE

1935a7b

playing with AST augmentation

eb120c8

fixed thresholding

728fb0b

starting to experiment with no wham noise stuff

1fe07e4

add wham noise option in classifier training, dot prod correlation in…

ec23e86

… finetuning

single mask training

891d469

added zero grad

8f0b0c9

added the entropy loss

99feb50

implemented a psi function for cnn14

2c617e0

Update README.md

57d0327

fpaissan and others added 26 commits April 29, 2024 22:28

removing model wrapper as it is not needed

c23db68

fix ID samples

e9ee9d6

fix linters

7963ea9

model finetuning test

1b7f7f6

pretrained_PIQ -> pretrained_interpreter

05c59bd

update README.md

ebfbc7d

added README instructions for training with WHAM!

09c3e8a

removing the dataset tag on experiment name

a7cc35d

Merge branch 'develop' of github.com:speechbrain/speechbrain into dev…

88d24b4

…elop

added wham hparams to vit.yaml

0ded7fe

Merge branch 'refactor_lmac' of github.com:fpaissan/audio_interpretab…

75c47f1

…ility into refactor_lmac

added focalnet wham hyperparams

c4cdf7d

Merge branch 'develop' of github.com:speechbrain/speechbrain into dev…

d0fcc0b

…elop

Merge branch 'develop' of github.com:fpaissan/audio_interpretability …

c155871

…into refactor_lmac

add eval info

9125eee

add automatic wham download

8edcde5

additional instructions on README

108fc92

Merge branch 'refactor_lmac' of github.com:fpaissan/audio_interpretab…

4659e24

…ility into refactor_lmac

wham prepare uses explicit parameters

0aef616

Merge branch 'refactor_lmac' of github.com:fpaissan/audio_interpretab…

2f4d174

…ility into refactor_lmac

wham docstrings

56552b4

edited the instructions on different contamination types

a5b0962

Merge branch 'refactor_lmac' of github.com:fpaissan/audio_interpretab…

c5b29a7

…ility into refactor_lmac

removing the table

20b6771

removing the table

cff6d77

fpaissan requested review from mravanelli and ycemsubakan May 2, 2024 20:31

fpaissan marked this pull request as draft May 2, 2024 20:33

revert changes to gitignore

5e66454

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding recipe for Listenable Maps for Audio Classifiers #2538

Adding recipe for Listenable Maps for Audio Classifiers #2538

fpaissan commented May 2, 2024

Adding recipe for Listenable Maps for Audio Classifiers #2538

Are you sure you want to change the base?

Adding recipe for Listenable Maps for Audio Classifiers #2538

Conversation

fpaissan commented May 2, 2024

What does this PR do?

PR review