Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWQ fails on ONNX model when a MatMul node's input is a model input/initializer #1571

Open
jstoecker opened this issue Jan 25, 2024 · 1 comment
Assignees

Comments

@jstoecker
Copy link

jstoecker commented Jan 25, 2024

Hello,

The awq_quantize function collects the names of input tensors to each MatMul node, and later looks up the parent node that produces the named tensor. This assumes the tensors are outputs of nodes in the model, which won't be the case for model inputs or initializers. I noticed this when experimenting with a toy model:

image

Error message:

2024-01-24 17:25:43 [ERROR] Unexpected exception KeyError('input') happened during tuning.
Traceback (most recent call last):
  File "C:\Users\justoeck\Miniconda3\envs\pytorch2\Lib\site-packages\neural_compressor\quantization.py", line 234, in fit
    strategy.traverse()
  File "C:\Users\justoeck\Miniconda3\envs\pytorch2\Lib\site-packages\neural_compressor\strategy\auto.py", line 140, in traverse
    super().traverse()
  File "C:\Users\justoeck\Miniconda3\envs\pytorch2\Lib\site-packages\neural_compressor\strategy\strategy.py", line 505, in traverse
    q_model = self.adaptor.quantize(copy.deepcopy(tune_cfg), self.model, self.calib_dataloader, self.q_func)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\justoeck\Miniconda3\envs\pytorch2\Lib\site-packages\neural_compressor\utils\utility.py", line 304, in fi
    res = func(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\justoeck\Miniconda3\envs\pytorch2\Lib\site-packages\neural_compressor\adaptor\onnxrt.py", line 1925, in quantize
    tmp_model = awq_quantize(
                ^^^^^^^^^^^^^
  File "C:\Users\justoeck\Miniconda3\envs\pytorch2\Lib\site-packages\neural_compressor\adaptor\ox_utils\weight_only.py", line 783, in awq_quantize
    parent = model.output_name_to_node[input_name]
             ~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^
KeyError: 'input'

I can work around this by inserting an identity node between any model inputs and MatMul layers that consume the input tensor directly:

image

I suspect the following edit to awq_quantize would also work (for model inputs at least, but probably not initializers):

for node in model.nodes():
    if (
        node.op_type in ["MatMul"]
        and weight_config.get(node.name, {}) != "fp32"
        and weight_config.get(node.name, {}).get("algorithm", "AWQ") == "AWQ"
+       and node.input[0] not in model.input()
    ):
        output_names.append(node.input[0])

I considered opening a PR, but I'm not sure what the preferred solution is, plus I see some refactoring for AWQ/GPTQ in the new 3.x API. I'm also unfamiliar with the tests. :)

@yiliu30
Copy link
Collaborator

yiliu30 commented Jan 27, 2024

Hi @jstoecker, thanks for raising this issue, and your enhancements are very welcome!
As the 3.x API is still under development and subject to change, I suggest you fix it based on the master branch and ask the ORT owner(@mengniwang95 @yuwenzho ) to review the PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants