[BUG] Running operations over concat output rewrites it's values #20606

SolomidHero · 2024-05-08T15:48:43Z

Describe the issue

I encountered that following code in onnx model (from pytorch) provided me incorrect result:

# inside `def forward()`
  ...
  a = self.encoder(input_seq) # (1, T, C)
  spk_emb = ... # (1, 32)
  
  a = self.cat([a, spk_emb[None].expand(1, a.shape[1], -1)], dim=-1)
  
  down_b = self.b_predictor(a) # (1, T, 16)
  down_c = self.c_predictor(a) # (1, T, 16)
  
  a = self.cat([a, down_b, down_c], dim=-1)
  
  return a

I expect returned result to have same vector values over dimension 1 for channels [C, C+32), i.e.:

result[:, i, C:C+32] == result[:, j, C:C+32] # forall i, j

Like this:

But I get this result:

I apply my model over big input of size 4000, but splitted into chunks by 500 each. So my model gives me incorrect result everytime after first pass.

My additional observations:

Observation 1: passing same input_feed (chunk size is 500 frames) gives me same result starting from 2nd, but 1st pass differs with others. More precisely:

result[:, i] == result[:, i+500] # forall i > 500
result[:, i, C+32:] == result[:, i+500, C+32:] # forall i

Observation 2: This flow for code gives me correct model with needed output.

# inside `def forward()`
  ...
  a = self.encoder(input_seq) # (1, T, C)
  spk_emb = ... # (1, 32)
  
  a_w_emb = self.cat([a, spk_emb[None].expand(1, a.shape[1], -1)], dim=-1)
  
  down_b = self.b_predictor(a_w_emb) # (1, T, 16)
  down_c = self.c_predictor(a_w_emb) # (1, T, 16)
  
  a = self.cat([a, spk_emb_expanded, down_b, down_c], dim=-1)
  
  return a

Observation 3: Removing any group of operations 1 or 2 gives me correct result (I use randn_like instead of this group for down_b or down_c)

So i in short my main problem is described by single picture:

But It is obviously must be the same output, so I guess result is somehow rewritten by Ops.1 and Ops.2
Can you please point what operation can cause such errors, is it really a bug of session.run()?

To reproduce

Here I attached 2 models: ok.onnx and notok.onnx. You can simply load them and apply np.random.randn as input tensors.
issue_models.zip

Urgency

I found workaround on less cleaner code, but THIS BEHAVIOUR IS REALLY WEIRD!

Platform

Mac

OS Version

12.3.1

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.17.3

ONNX Runtime API

Python

Architecture

X64

Execution Provider

Default CPU

Execution Provider Library Version

No response

The text was updated successfully, but these errors were encountered:

edgchen1 added the core runtime issues related to core runtime label May 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Running operations over concat output rewrites it's values #20606

[BUG] Running operations over concat output rewrites it's values #20606

SolomidHero commented May 8, 2024 •

edited

[BUG] Running operations over concat output rewrites it's values #20606

[BUG] Running operations over concat output rewrites it's values #20606

Comments

SolomidHero commented May 8, 2024 • edited

Describe the issue

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

SolomidHero commented May 8, 2024 •

edited