PostTrainingQuantConfig(quant_level='auto', device='npu', backend="onnxrt_dml_ep") produces fp32 ops. #1580

kleiti · 2024-01-26T11:44:09Z

The below PostTrainingQuantConfig produces fp32 ops for NPU using 2.4.1. Models with int8 and fp16 ops would be preferred for NPU.

conf=PostTrainingQuantConfig(quant_level='auto',
device='npu', backend="onnxrt_dml_ep",
quant_format="QOperator",
approach="static",
excluded_precisions=['bf16'])

mengniwang95 · 2024-02-19T07:41:37Z

Hi @kleiti , onnxrt_dml_ep backend is experimental and currently we only support MatMul int8. We will enhance its functionality later.

chensuyue assigned mengniwang95 Jan 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PostTrainingQuantConfig(quant_level='auto', device='npu', backend="onnxrt_dml_ep") produces fp32 ops. #1580

PostTrainingQuantConfig(quant_level='auto', device='npu', backend="onnxrt_dml_ep") produces fp32 ops. #1580

kleiti commented Jan 26, 2024

mengniwang95 commented Feb 19, 2024

PostTrainingQuantConfig(quant_level='auto', device='npu', backend="onnxrt_dml_ep") produces fp32 ops. #1580

PostTrainingQuantConfig(quant_level='auto', device='npu', backend="onnxrt_dml_ep") produces fp32 ops. #1580

Comments

kleiti commented Jan 26, 2024

mengniwang95 commented Feb 19, 2024