AWQ fails on ONNX model when a MatMul node's input is a model input/initializer #1571

jstoecker · 2024-01-25T01:30:59Z

Hello,

The awq_quantize function collects the names of input tensors to each MatMul node, and later looks up the parent node that produces the named tensor. This assumes the tensors are outputs of nodes in the model, which won't be the case for model inputs or initializers. I noticed this when experimenting with a toy model:

Error message:

2024-01-24 17:25:43 [ERROR] Unexpected exception KeyError('input') happened during tuning.
Traceback (most recent call last):
  File "C:\Users\justoeck\Miniconda3\envs\pytorch2\Lib\site-packages\neural_compressor\quantization.py", line 234, in fit
    strategy.traverse()
  File "C:\Users\justoeck\Miniconda3\envs\pytorch2\Lib\site-packages\neural_compressor\strategy\auto.py", line 140, in traverse
    super().traverse()
  File "C:\Users\justoeck\Miniconda3\envs\pytorch2\Lib\site-packages\neural_compressor\strategy\strategy.py", line 505, in traverse
    q_model = self.adaptor.quantize(copy.deepcopy(tune_cfg), self.model, self.calib_dataloader, self.q_func)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\justoeck\Miniconda3\envs\pytorch2\Lib\site-packages\neural_compressor\utils\utility.py", line 304, in fi
    res = func(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\justoeck\Miniconda3\envs\pytorch2\Lib\site-packages\neural_compressor\adaptor\onnxrt.py", line 1925, in quantize
    tmp_model = awq_quantize(
                ^^^^^^^^^^^^^
  File "C:\Users\justoeck\Miniconda3\envs\pytorch2\Lib\site-packages\neural_compressor\adaptor\ox_utils\weight_only.py", line 783, in awq_quantize
    parent = model.output_name_to_node[input_name]
             ~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^
KeyError: 'input'

I can work around this by inserting an identity node between any model inputs and MatMul layers that consume the input tensor directly:

I suspect the following edit to awq_quantize would also work (for model inputs at least, but probably not initializers):

for node in model.nodes():
    if (
        node.op_type in ["MatMul"]
        and weight_config.get(node.name, {}) != "fp32"
        and weight_config.get(node.name, {}).get("algorithm", "AWQ") == "AWQ"
+       and node.input[0] not in model.input()
    ):
        output_names.append(node.input[0])

I considered opening a PR, but I'm not sure what the preferred solution is, plus I see some refactoring for AWQ/GPTQ in the new 3.x API. I'm also unfamiliar with the tests. :)

The text was updated successfully, but these errors were encountered:

yiliu30 · 2024-01-27T15:59:15Z

Hi @jstoecker, thanks for raising this issue, and your enhancements are very welcome!
As the 3.x API is still under development and subject to change, I suggest you fix it based on the master branch and ask the ORT owner(@mengniwang95 @yuwenzho ) to review the PR.

NeoZhangJianyu assigned thuang6 Apr 25, 2024

thuang6 assigned mengniwang95 Apr 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AWQ fails on ONNX model when a MatMul node's input is a model input/initializer #1571

AWQ fails on ONNX model when a MatMul node's input is a model input/initializer #1571

jstoecker commented Jan 25, 2024 •

edited

yiliu30 commented Jan 27, 2024 •

edited

AWQ fails on ONNX model when a MatMul node's input is a model input/initializer #1571

AWQ fails on ONNX model when a MatMul node's input is a model input/initializer #1571

Comments

jstoecker commented Jan 25, 2024 • edited

yiliu30 commented Jan 27, 2024 • edited

jstoecker commented Jan 25, 2024 •

edited

yiliu30 commented Jan 27, 2024 •

edited