Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Running operations over concat output rewrites it's values #20606

Open
SolomidHero opened this issue May 8, 2024 · 0 comments
Open

[BUG] Running operations over concat output rewrites it's values #20606

SolomidHero opened this issue May 8, 2024 · 0 comments
Labels
core runtime issues related to core runtime

Comments

@SolomidHero
Copy link

SolomidHero commented May 8, 2024

Describe the issue

I encountered that following code in onnx model (from pytorch) provided me incorrect result:

# inside `def forward()`
  ...
  a = self.encoder(input_seq) # (1, T, C)
  spk_emb = ... # (1, 32)
  
  a = self.cat([a, spk_emb[None].expand(1, a.shape[1], -1)], dim=-1)
  
  down_b = self.b_predictor(a) # (1, T, 16)
  down_c = self.c_predictor(a) # (1, T, 16)
  
  a = self.cat([a, down_b, down_c], dim=-1)
  
  return a

I expect returned result to have same vector values over dimension 1 for channels [C, C+32), i.e.:

result[:, i, C:C+32] == result[:, j, C:C+32] # forall i, j

Like this:
image

But I get this result:
image

I apply my model over big input of size 4000, but splitted into chunks by 500 each. So my model gives me incorrect result everytime after first pass.

My additional observations:

Observation 1: passing same input_feed (chunk size is 500 frames) gives me same result starting from 2nd, but 1st pass differs with others. More precisely:

result[:, i] == result[:, i+500] # forall i > 500
result[:, i, C+32:] == result[:, i+500, C+32:] # forall i

image
image

Observation 2: This flow for code gives me correct model with needed output.

# inside `def forward()`
  ...
  a = self.encoder(input_seq) # (1, T, C)
  spk_emb = ... # (1, 32)
  
  a_w_emb = self.cat([a, spk_emb[None].expand(1, a.shape[1], -1)], dim=-1)
  
  down_b = self.b_predictor(a_w_emb) # (1, T, 16)
  down_c = self.c_predictor(a_w_emb) # (1, T, 16)
  
  a = self.cat([a, spk_emb_expanded, down_b, down_c], dim=-1)
  
  return a

Observation 3: Removing any group of operations 1 or 2 gives me correct result (I use randn_like instead of this group for down_b or down_c)

So i in short my main problem is described by single picture:
Screenshot 2024-05-08 at 19 45 45
But It is obviously must be the same output, so I guess result is somehow rewritten by Ops.1 and Ops.2
Can you please point what operation can cause such errors, is it really a bug of session.run()?

To reproduce

Here I attached 2 models: ok.onnx and notok.onnx. You can simply load them and apply np.random.randn as input tensors.
issue_models.zip

Urgency

I found workaround on less cleaner code, but THIS BEHAVIOUR IS REALLY WEIRD!

Platform

Mac

OS Version

12.3.1

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.17.3

ONNX Runtime API

Python

Architecture

X64

Execution Provider

Default CPU

Execution Provider Library Version

No response

@edgchen1 edgchen1 added the core runtime issues related to core runtime label May 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core runtime issues related to core runtime
Projects
None yet
Development

No branches or pull requests

2 participants