Result of Flexible-input-shape Model is NAN #2166

jmcc113 · 2024-03-12T11:58:13Z

🐞Describing the bug

When I using EnumeratedShape or RangeDim to generate a flexible-input-shape model to inference, the result is all nan.

Stack Trace

/opt/homebrew/anaconda3/envs/bce/bin/python /Users/jinmuchuan/projects/BCEmbedding/model.py 
torch.int32
When both 'convert_to' and 'minimum_deployment_target' not specified, 'convert_to' is set to "mlprogram" and 'minimum_deployment_targer' is set to ct.target.iOS15 (which is same as ct.target.macOS12). Note: the model will not run on systems older than iOS15/macOS12/watchOS8/tvOS15. In order to make your model run on older system, please set the 'minimum_deployment_target' to iOS14/iOS13. Details please see the link: https://coremltools.readme.io/docs/unified-conversion-api#target-conversion-formats
Tuple detected at graph output. This will be flattened in the converted model.
Converting PyTorch Frontend ==> MIL Ops:   0%|          | 0/672 [00:00<?, ? ops/s]Core ML embedding (gather) layer does not support any inputs besides the weights and indices. Those given will be ignored.
Core ML embedding (gather) layer does not support any inputs besides the weights and indices. Those given will be ignored.
Converting PyTorch Frontend ==> MIL Ops: 100%|█████████▉| 670/672 [00:00<00:00, 5151.90 ops/s]
Running MIL frontend_pytorch pipeline: 100%|██████████| 5/5 [00:00<00:00, 458.38 passes/s]
Running MIL default pipeline:   0%|          | 0/71 [00:00<?, ? passes/s]/opt/homebrew/anaconda3/envs/bce/lib/python3.10/site-packages/coremltools/converters/mil/mil/passes/defs/preprocess.py:267: UserWarning: Output, '1617', of the source model, has been renamed to 'var_1617' in the Core ML model.
  warnings.warn(msg.format(var.name, new_name))
Running MIL default pipeline:  59%|█████▉    | 42/71 [00:00<00:00, 135.52 passes/s]/opt/homebrew/anaconda3/envs/bce/lib/python3.10/site-packages/coremltools/converters/mil/mil/ops/defs/iOS15/elementwise_unary.py:894: RuntimeWarning: overflow encountered in cast
  return input_var.val.astype(dtype=string_to_nptype(dtype_val))
Running MIL default pipeline: 100%|██████████| 71/71 [00:13<00:00,  5.44 passes/s]
Running MIL backend_mlprogram pipeline: 100%|██████████| 12/12 [00:00<00:00, 480.47 passes/s]
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
{'output': array([[[nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan]],

       [[nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan]]], dtype=float32), 'var_1617': array([[nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan]], dtype=float32)}

进程已结束，退出代码为 0

To Reproduce

import numpy as np
import torch
import coremltools as ct
from transformers import AutoModel, AutoTokenizer

sentences = ['sentence_0', 'sentence_1']

tokenizer = AutoTokenizer.from_pretrained('maidalun1020/bce-embedding-base_v1')
model = AutoModel.from_pretrained('maidalun1020/bce-embedding-base_v1', return_dict=False)

device = 'cpu'  # if no GPU, set "cpu"
model.to(device)
example_input = torch.randint(0, 10, size=(1, 128)).type(torch.int32)
print(example_input.dtype)

traced_script_module = torch.jit.trace(model.eval(), (example_input, example_input))


input_shape = ct.Shape(shape=(ct.RangeDim(lower_bound=1, upper_bound=8),
                              ct.RangeDim(lower_bound=1, upper_bound=512)))

mlmodel = ct.convert(traced_script_module,
                     inputs=[ct.TensorType(shape=input_shape, dtype=np.int32, name="input_ids"),
                             ct.TensorType(shape=input_shape, dtype=np.int32, name="attention_mask")],
                     outputs=[ct.TensorType(dtype=np.float32, name="output"),
                              ct.TensorType(dtype=np.float32, name="1617")]
                     )

mlmodel.save('embed.mlpackage')

inputs = tokenizer(sentences, padding=True, truncation=True, max_length=512, return_tensors="np")
inputs_on_device = {k: v.astype(np.int32) for k, v in inputs.items()}

out_dict = mlmodel.predict(inputs_on_device)
print(out_dict)

System environment (please complete the following information):

coremltools version:7.1
OS (e.g. MacOS version or Linux type):MacOS 13.3.1 (a)
Any other relevant version information (e.g. PyTorch or TensorFlow version): torch==2.1.0

The text was updated successfully, but these errors were encountered:

TobyRoseman · 2024-03-12T21:50:37Z

Loading an untrusted PyTorch model is a security risk. So I'm unable to reproduce your results. It would be great if you could give us a minimal example (i.e. one which doesn't require loading an external model).

Does the output match if you convert with fixed shapes?

jmcc113 · 2024-03-13T02:50:48Z

Loading an untrusted PyTorch model is a security risk. So I'm unable to reproduce your results. It would be great if you could give us a minimal example (i.e. one which doesn't require loading an external model).

Does the output match if you convert with fixed shapes?

This model is from Hugging Face. I'm not sure which layer causes this bug, so it's difficult for me to construct a minimal example.
But the output of fixed-shape-model is correct.

TobyRoseman · 2024-03-13T19:12:11Z

I'm not sure which layer causes this bug, so it's difficult for me to construct a minimal example.

I completely understand. Unfortunately, without a minimal example, it's difficult for me to help you.

Since the fixed shape works, the issue is almost certainly related to flexible shape. For debugging purposes, there's a few more things you could try.

1 - Verify that the traced PyTorch model still works for shapes within the range of the flexible shape but different than the shapes it was traced on.

2 - See if the model converts correct with fixed input_ids shape, but a flexible attention_mask shape.

3 - See if the model converts correct with fixed attention_mask shape, but a flexible input_ids shape.

jmcc113 · 2024-03-15T07:09:11Z

1 - Verify that the traced PyTorch model still works for shapes within the range of the flexible shape but different than the shapes it was traced on.

2 - See if the model converts correct with fixed input_ids shape, but a flexible attention_mask shape.

3 - See if the model converts correct with fixed attention_mask shape, but a flexible input_ids shape.

The traced model works well for shapes different than the shapes it was traced on.
When running the model which converts with fixed input_ids shape, but a flexible attention_mask shape, I get an error:

Traceback (most recent call last):
  File "/Users/jinmuchuan/projects/BCEmbedding/model.py", line 48, in <module>
    out_dict = mlmodel.predict(inputs_on_device)
  File "/opt/homebrew/anaconda3/envs/bce/lib/python3.10/site-packages/coremltools/models/model.py", line 596, in predict
    return MLModel._get_predictions(self.__proxy__, verify_and_convert_input_dict, data)
  File "/opt/homebrew/anaconda3/envs/bce/lib/python3.10/site-packages/coremltools/models/model.py", line 648, in _get_predictions
    return proxy.predict(data)
RuntimeError: {
    NSLocalizedDescription = "Failed to build the model execution plan using a model architecture file '/private/var/folders/sn/xnnh_7q94y9fx18c0g26rt716qppp1/T/tmpf0e8pl49.mlmodelc/model.mil' with error code: -7.";
}

When running the model which converts with fixed attention_mask shape, but a flexible input_ids shape, I get NAN.

jmcc113 added the bug Unexpected behaviour that should be corrected (type) label Mar 12, 2024

TobyRoseman added PyTorch (traced) Flexible Shape labels Mar 12, 2024

hjrnunes mentioned this issue May 17, 2024

NaN results for PyTorch models on versions > 6.3.0 #2223

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Result of Flexible-input-shape Model is NAN #2166

Result of Flexible-input-shape Model is NAN #2166

jmcc113 commented Mar 12, 2024 •

edited

TobyRoseman commented Mar 12, 2024

jmcc113 commented Mar 13, 2024

TobyRoseman commented Mar 13, 2024

jmcc113 commented Mar 15, 2024 •

edited

Result of Flexible-input-shape Model is NAN #2166

Result of Flexible-input-shape Model is NAN #2166

Comments

jmcc113 commented Mar 12, 2024 • edited

🐞Describing the bug

Stack Trace

To Reproduce

System environment (please complete the following information):

TobyRoseman commented Mar 12, 2024

jmcc113 commented Mar 13, 2024

TobyRoseman commented Mar 13, 2024

jmcc113 commented Mar 15, 2024 • edited

jmcc113 commented Mar 12, 2024 •

edited

jmcc113 commented Mar 15, 2024 •

edited