how to transfer chatglm2-6b int4 model to npu device #649

woaipichuli · 2024-01-02T09:19:17Z

Is there an existing issue for this?

I have searched the existing issues

Current Behavior

I found code as below in quantize.py, it seems like the quantization_code only support running on GPU.
Is there any suggestion to deploy the model on NPU.
Is it possible that you can provide the code in the quantization_code and maybe I can rewrite it to support running on a NPU device.

class Kernel:
    def __init__(self, code: bytes, function_names: List[str]):
        self.code = code
        self._function_names = function_names
        self._cmodule = LazyKernelCModule(self.code)

        for name in self._function_names:
            setattr(self, name, KernelFunction(self._cmodule, name))

quantization_code = "XXXX"

kernels = Kernel(
    bz2.decompress(base64.b64decode(quantization_code)),
    [
        "int4WeightCompression",
        "int4WeightExtractionFloat",
        "int4WeightExtractionHalf",
        "int8WeightExtractionFloat",
        "int8WeightExtractionHalf",
    ],
)

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how to transfer chatglm2-6b int4 model to npu device #649

how to transfer chatglm2-6b int4 model to npu device #649

woaipichuli commented Jan 2, 2024 •

edited

how to transfer chatglm2-6b int4 model to npu device #649

how to transfer chatglm2-6b int4 model to npu device #649

Comments

woaipichuli commented Jan 2, 2024 • edited

Is there an existing issue for this?

Current Behavior

woaipichuli commented Jan 2, 2024 •

edited