You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
2024-01-24 17:25:43 [ERROR] Unexpected exception KeyError('input') happened during tuning.
Traceback (most recent call last):
File "C:\Users\justoeck\Miniconda3\envs\pytorch2\Lib\site-packages\neural_compressor\quantization.py", line 234, in fit
strategy.traverse()
File "C:\Users\justoeck\Miniconda3\envs\pytorch2\Lib\site-packages\neural_compressor\strategy\auto.py", line 140, in traverse
super().traverse()
File "C:\Users\justoeck\Miniconda3\envs\pytorch2\Lib\site-packages\neural_compressor\strategy\strategy.py", line 505, in traverse
q_model = self.adaptor.quantize(copy.deepcopy(tune_cfg), self.model, self.calib_dataloader, self.q_func)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\justoeck\Miniconda3\envs\pytorch2\Lib\site-packages\neural_compressor\utils\utility.py", line 304, in fi
res = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\justoeck\Miniconda3\envs\pytorch2\Lib\site-packages\neural_compressor\adaptor\onnxrt.py", line 1925, in quantize
tmp_model = awq_quantize(
^^^^^^^^^^^^^
File "C:\Users\justoeck\Miniconda3\envs\pytorch2\Lib\site-packages\neural_compressor\adaptor\ox_utils\weight_only.py", line 783, in awq_quantize
parent = model.output_name_to_node[input_name]
~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^
KeyError: 'input'
I can work around this by inserting an identity node between any model inputs and MatMul layers that consume the input tensor directly:
I suspect the following edit to awq_quantize would also work (for model inputs at least, but probably not initializers):
for node in model.nodes():
if (
node.op_type in ["MatMul"]
and weight_config.get(node.name, {}) != "fp32"
and weight_config.get(node.name, {}).get("algorithm", "AWQ") == "AWQ"
+ and node.input[0] not in model.input()
):
output_names.append(node.input[0])
I considered opening a PR, but I'm not sure what the preferred solution is, plus I see some refactoring for AWQ/GPTQ in the new 3.x API. I'm also unfamiliar with the tests. :)
The text was updated successfully, but these errors were encountered:
Hi @jstoecker, thanks for raising this issue, and your enhancements are very welcome!
As the 3.x API is still under development and subject to change, I suggest you fix it based on the master branch and ask the ORT owner(@mengniwang95@yuwenzho ) to review the PR.
Hello,
The awq_quantize function collects the names of input tensors to each MatMul node, and later looks up the parent node that produces the named tensor. This assumes the tensors are outputs of nodes in the model, which won't be the case for model inputs or initializers. I noticed this when experimenting with a toy model:
Error message:
I can work around this by inserting an identity node between any model inputs and MatMul layers that consume the input tensor directly:
I suspect the following edit to
awq_quantize
would also work (for model inputs at least, but probably not initializers):for node in model.nodes(): if ( node.op_type in ["MatMul"] and weight_config.get(node.name, {}) != "fp32" and weight_config.get(node.name, {}).get("algorithm", "AWQ") == "AWQ" + and node.input[0] not in model.input() ): output_names.append(node.input[0])
I considered opening a PR, but I'm not sure what the preferred solution is, plus I see some refactoring for AWQ/GPTQ in the new 3.x API. I'm also unfamiliar with the tests. :)
The text was updated successfully, but these errors were encountered: