Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] ValueError: [quantize] The last dimension of the matrix needs to be divisible by the quantization group size 64. #1033

Open
Blaizzy opened this issue Apr 25, 2024 · 7 comments

Comments

@Blaizzy
Copy link

Blaizzy commented Apr 25, 2024

Describe the bug
When I try to quantize a VLM model that use SigLIP it throws a value error because it has intermediate size of 4304 which is not divisible by 64 or 128.

To Reproduce

Include code snippet

pip install -U mlx-vlm

python -m mlx_vlm.convert \
    --hf-path qnguyen3/nanoLLaVA \
    -q

Expected behavior
Sucessfully quantize model.

Desktop (please complete the following information):

  • OS Version: MacOS 14.4.1
  • Version 0.11.1

Additional context
Add any other context about the problem here.

Traceback

Traceback (most recent call last):
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx_code/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx_code/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/Users/prince_canuma/Documents/Projects/LLMs/mlx-vlm/mlx_vlm/convert.py", line 62, in <module>
    main()
  File "/Users/prince_canuma/Documents/Projects/LLMs/mlx-vlm/mlx_vlm/convert.py", line 58, in main
    convert(**vars(args))
  File "/Users/prince_canuma/Documents/Projects/LLMs/mlx-vlm/mlx_vlm/utils.py", line 540, in convert
    weights, config = quantize_model(model, config, q_group_size, q_bits)
  File "/Users/prince_canuma/Documents/Projects/LLMs/mlx-vlm/mlx_vlm/utils.py", line 452, in quantize_model
    nn.quantize(model, q_group_size, q_bits, class_predicate=class_predicate)
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx_code/lib/python3.10/site-packages/mlx/nn/layers/quantized.py", line 51, in quantize
    leaves = tree_map_with_path(_maybe_quantize, leaves, is_leaf=Module.is_module)
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx_code/lib/python3.10/site-packages/mlx/utils.py", line 95, in tree_map_with_path
    return {
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx_code/lib/python3.10/site-packages/mlx/utils.py", line 96, in <dictcomp>
    k: tree_map_with_path(
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx_code/lib/python3.10/site-packages/mlx/utils.py", line 95, in tree_map_with_path
    return {
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx_code/lib/python3.10/site-packages/mlx/utils.py", line 96, in <dictcomp>
    k: tree_map_with_path(
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx_code/lib/python3.10/site-packages/mlx/utils.py", line 95, in tree_map_with_path
    return {
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx_code/lib/python3.10/site-packages/mlx/utils.py", line 96, in <dictcomp>
    k: tree_map_with_path(
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx_code/lib/python3.10/site-packages/mlx/utils.py", line 95, in tree_map_with_path
    return {
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx_code/lib/python3.10/site-packages/mlx/utils.py", line 96, in <dictcomp>
    k: tree_map_with_path(
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx_code/lib/python3.10/site-packages/mlx/utils.py", line 95, in tree_map_with_path
    return {
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx_code/lib/python3.10/site-packages/mlx/utils.py", line 96, in <dictcomp>
    k: tree_map_with_path(
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx_code/lib/python3.10/site-packages/mlx/utils.py", line 87, in tree_map_with_path
    return TreeType(
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx_code/lib/python3.10/site-packages/mlx/utils.py", line 88, in <genexpr>
    tree_map_with_path(
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx_code/lib/python3.10/site-packages/mlx/utils.py", line 95, in tree_map_with_path
    return {
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx_code/lib/python3.10/site-packages/mlx/utils.py", line 96, in <dictcomp>
    k: tree_map_with_path(
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx_code/lib/python3.10/site-packages/mlx/utils.py", line 95, in tree_map_with_path
    return {
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx_code/lib/python3.10/site-packages/mlx/utils.py", line 96, in <dictcomp>
    k: tree_map_with_path(
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx_code/lib/python3.10/site-packages/mlx/utils.py", line 83, in tree_map_with_path
    return fn(path, tree, *rest)
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx_code/lib/python3.10/site-packages/mlx/nn/layers/quantized.py", line 42, in _maybe_quantize
    return QuantizedLinear.from_linear(m, group_size, bits)
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx_code/lib/python3.10/site-packages/mlx/nn/layers/quantized.py", line 226, in from_linear
    ql = cls(input_dims, output_dims, False, group_size, bits)
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx_code/lib/python3.10/site-packages/mlx/nn/layers/quantized.py", line 185, in __init__
    self.weight, self.scales, self.biases = mx.quantize(weight, group_size, bits)
ValueError: [quantize] The last dimension of the matrix needs to be divisible by the quantization group size 64. However the provided  matrix has shape (1152,4304)
@awni
Copy link
Member

awni commented Apr 25, 2024

It's not a bug.. at the risk of being redundant, the last dimension of the matrix has to be divisible by the quantization group size. For the size 4304 there is no supported group size which divides it (e.g. none of 32, 64, 128).

It's not on our roadmap to support irregular sizes... but we can leave this issue open to help prioritize if it's something we should consider in the future.

@s-smits
Copy link

s-smits commented Apr 25, 2024

It can be divided by 16, would an implementation for that be complicated to implement?

@Blaizzy
Copy link
Author

Blaizzy commented Apr 25, 2024

It's not a bug.. at the risk of being redundant, the last dimension of the matrix has to be divisible by the quantization group size. For the size 4304 there is no supported group size which divides it (e.g. none of 32, 64, 128).

It's not on our roadmap to support irregular sizes... but we can leave this issue open to help prioritize if it's something we should consider in the future.

Yes, it's not a bug. It's more of a feature request / clarification. Because all SigLip based VLM are not quantisable because of this, which include Idefics 2, NanoLlava and Deepseek VL.

@Blaizzy
Copy link
Author

Blaizzy commented Apr 25, 2024

Is there a way to skip particular target layer or Block X in the model in MLX?

Not all layers of the same type like class_predicate does.

@awni
Copy link
Member

awni commented Apr 25, 2024

You can use class_predicate for that. Just put the condition you want in the predicate. For example if you are trying to skip weights of a certain shape:

class_predicate = lambda p, m: isinstance(m, nn.Linear) and m.weight != (x, y)

@Blaizzy
Copy link
Author

Blaizzy commented Apr 25, 2024

Thank you very much, I will give it a try ASAP!

@Blaizzy
Copy link
Author

Blaizzy commented Apr 25, 2024

It works wonders! 💯

Also found a better way, skipping the entire block:

class_predicate = lambda p, m: isinstance(m, nn.Linear) and p.split('.')[0] != "vision_tower"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants