Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to transfer chatglm2-6b int4 model to npu device #649

Open
1 task done
woaipichuli opened this issue Jan 2, 2024 · 0 comments
Open
1 task done

how to transfer chatglm2-6b int4 model to npu device #649

woaipichuli opened this issue Jan 2, 2024 · 0 comments

Comments

@woaipichuli
Copy link

woaipichuli commented Jan 2, 2024

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

I found code as below in quantize.py, it seems like the quantization_code only support running on GPU.
Is there any suggestion to deploy the model on NPU.
Is it possible that you can provide the code in the quantization_code and maybe I can rewrite it to support running on a NPU device.

class Kernel:
    def __init__(self, code: bytes, function_names: List[str]):
        self.code = code
        self._function_names = function_names
        self._cmodule = LazyKernelCModule(self.code)

        for name in self._function_names:
            setattr(self, name, KernelFunction(self._cmodule, name))

quantization_code = "XXXX"

kernels = Kernel(
    bz2.decompress(base64.b64decode(quantization_code)),
    [
        "int4WeightCompression",
        "int4WeightExtractionFloat",
        "int4WeightExtractionHalf",
        "int8WeightExtractionFloat",
        "int8WeightExtractionHalf",
    ],
)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant