You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I found code as below in quantize.py, it seems like the quantization_code only support running on GPU.
Is there any suggestion to deploy the model on NPU.
Is it possible that you can provide the code in the quantization_code and maybe I can rewrite it to support running on a NPU device.
class Kernel:
def __init__(self, code: bytes, function_names: List[str]):
self.code = code
self._function_names = function_names
self._cmodule = LazyKernelCModule(self.code)
for name in self._function_names:
setattr(self, name, KernelFunction(self._cmodule, name))
quantization_code = "XXXX"
kernels = Kernel(
bz2.decompress(base64.b64decode(quantization_code)),
[
"int4WeightCompression",
"int4WeightExtractionFloat",
"int4WeightExtractionHalf",
"int8WeightExtractionFloat",
"int8WeightExtractionHalf",
],
)
The text was updated successfully, but these errors were encountered:
Is there an existing issue for this?
Current Behavior
I found code as below in quantize.py, it seems like the quantization_code only support running on GPU.
Is there any suggestion to deploy the model on NPU.
Is it possible that you can provide the code in the quantization_code and maybe I can rewrite it to support running on a NPU device.
The text was updated successfully, but these errors were encountered: