dynamically set data name for auxiliary asr tasks #5697

wietsedv · 2024-03-08T14:50:17Z

What?

Auxiliary data ASR data tags caused an error because they all get the name "text", which is already used for the regular ASR output. After this change, the data name is the name taken from the argument.

Before this change, I receive the error below when I try to run the fleurs recipe. After the change, I can succesfully run it without making any adaptations to the recipe.

Why?

The asr.sh script of the asr1 task accepts a --auxiliary_data_tags argument in order to define auxiliary text data inputs. Specifically, the fleurs example makes use of this for an auxiliary language identification task. Currently this argument is broken because the data name is hardcoded to "text" instead of the intended data name. The "text" data name is already used for the asr output text and the script will complain about the duplicated data name:

Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/workspace/espnet2/bin/asr_train.py", line 23, in <module>
    main()
  File "/workspace/espnet2/bin/asr_train.py", line 19, in main
    ASRTask.main(cmd=cmd)
  File "/workspace/espnet2/tasks/abs_task.py", line 1120, in main
    cls.main_worker(args)
  File "/workspace/espnet2/tasks/abs_task.py", line 1368, in main_worker
    train_iter_factory = cls.build_iter_factory(
  File "/workspace/espnet2/tasks/abs_task.py", line 1585, in build_iter_factory
    return cls.build_sequence_iter_factory(
  File "/workspace/espnet2/tasks/abs_task.py", line 1617, in build_sequence_iter_factory
    dataset = ESPnetDataset(
  File "/workspace/espnet2/train/dataset.py", line 462, in __init__
    raise RuntimeError(f'"{name}" is duplicated for data-key')
RuntimeError: "text" is duplicated for data-key

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 70.10%. Comparing base (d004740) to head (a9ddbc6).

Additional details and impacted files

@@             Coverage Diff             @@
##           master    #5697       +/-   ##
===========================================
+ Coverage   23.30%   70.10%   +46.80%     
===========================================
  Files         746      746               
  Lines       69369    69369               
===========================================
+ Hits        16163    48634    +32471     
+ Misses      53206    20735    -32471

Flag	Coverage Δ
test_configuration_espnet2	`∅ <ø> (∅)`
test_integration_espnet1	`62.92% <ø> (ø)`
test_python_espnet1	`18.32% <ø> (ø)`
test_python_espnet2	`52.05% <ø> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

sw005320 · 2024-03-10T02:11:57Z

@wanchichen, can you confirm it?

wanchichen · 2024-03-17T03:16:16Z

Thanks @wietsedv, this change should be correct. I also think another line may need to be added to do the same for the validation set.
Maybe in line 1394?

_opts+="--valid_data_path_and_name_and_type ${_asr_train_dir}/${aux_dset},${aux_dset},text "

sw005320 · 2024-03-28T11:37:26Z

@wietsedv, can you confirm @wanchichen's suggestion?

dynamically set data name for auxiliary asr tasks

a9ddbc6

mergify bot added the ESPnet2 label Mar 8, 2024

sw005320 added Bugfix ASR Automatic speech recogntion labels Mar 17, 2024

sw005320 added this to the v.202405 milestone Mar 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dynamically set data name for auxiliary asr tasks #5697

dynamically set data name for auxiliary asr tasks #5697

wietsedv commented Mar 8, 2024

codecov bot commented Mar 8, 2024 •

edited

sw005320 commented Mar 10, 2024

wanchichen commented Mar 17, 2024

sw005320 commented Mar 28, 2024

dynamically set data name for auxiliary asr tasks #5697

Are you sure you want to change the base?

dynamically set data name for auxiliary asr tasks #5697

Conversation

wietsedv commented Mar 8, 2024

What?

Why?

See also

codecov bot commented Mar 8, 2024 • edited

Codecov Report

sw005320 commented Mar 10, 2024

wanchichen commented Mar 17, 2024

sw005320 commented Mar 28, 2024

codecov bot commented Mar 8, 2024 •

edited