BeamSearch op returning wrong results on CUDA execution provider when sequence is used as input_ids #20667
Labels
ep:CUDA
issues related to the CUDA execution provider
model:transformer
issues related to a transformer model: BERT, GPT2, Hugging Face, Longformer, T5, etc.
Describe the issue
com.microsoft::BeamSearch op is outputting wrong values when following conditions are satisfied:
Looking with DEBUG_GENERATION enabled, it seems the problem lies in the copy of the sequence tensor to input_ids when feeding the decoder graph.
The copy is done from host to device:
onnxruntime/onnxruntime/contrib_ops/cpu/transformers/subgraph_t5_decoder.cc
Line 197 in 737eb48
but the sequences span points already to GPU memory. This should be changed to a device-to-device copy instead
To reproduce
Following script creates a dummy model and tests it with both CPU and GPU EPs:
Urgency
No response
Platform
Linux
OS Version
Ubuntu 20.04.5 LTS
ONNX Runtime Installation
Built from Source
ONNX Runtime Version or Commit ID
737eb48
ONNX Runtime API
Python
Architecture
X64
Execution Provider
CUDA
Execution Provider Library Version
CUDA 11.8
The text was updated successfully, but these errors were encountered: