-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Restore the "Fold overflow intrinsics with select instructions" #521
Conversation
Benchmark results:
|
Hi @akiramenai , what do you think of the score? They don't look good to me. For example, we got "Worst -19.108" for "Size All M3B3", I guess this means my patch caused a code size bloating. As for Cycles, the mean value is pretty small. Do they match your expectations? |
The numbers are expected. Perhaps except for size regressions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please navigate me to the code which guarantees that if glue contradicts to data dependency a copy of a node is inserted? We need to make sure that testing is sufficient.
You also mentioned that there are cases with suboptimal codegen, could you add them to the regression tests as well?
15e141b
to
947a805
Compare
04483fc
to
e601836
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall the approach with a pseudo flag looks ok to me, but the tests imply that it needs to be addressed in some other places as well.
I expect the flag to be pure pseudo (with no exposure to our asm) and carry semantic of irreversibility in Select and Br which is to be documented in appropriate places with examples. Maybe there is a better way to get the same behaviour w.o. introducing new a pseudo flag [@vladimirradosavljevic perhaps you have some ideas on that]
Hi @akiramenai , thanks for the comments, I will spend more time thinking about how to implement a better one w.o. this new pseudo flag. A possible idea is to lower |
e601836
to
49d1f70
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vladimirradosavljevic could you also have a look?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no support and tests for optimizations where we are manipulating/folding with select instructions (e.g. like in EraVMOptimizeSelectPreRA
). Please add check not to do this optimization for .of
if we need to reverse condition, and add tests to check these optimization where we can do it and where we can't do it.
As I mentioned in the comment, problem with ties is not only present with SELrrr instruction. Please look at the following test:
define i256 @test(i256 %a, i256 %b, i256 %x, i256 %y) {
entry:
%res1 = call {i256, i1} @llvm.uadd.with.overflow.i256(i256 %x, i256 %y)
%overflow = extractvalue {i256, i1} %res1, 1
%selected = select i1 %overflow, i256 %a, i256 12345
ret i256 %selected
}
Following assembly is generated with this patch:
test: ; @test
; %bb.0: ; %entry
add! r3, r4, r2
add 12345, r0, r1
ret
In this case, we have:
%5:gr256 = SELrir killed %0:gr256(tied-def 0), i256 12345, i256 8, implicit $flags
And as you can see, first register is tied with the output and later in the expand select we would run into the same problem as with SELrrr (please note in the generated ASM we have 1 instruction after expansion, but that is the problem in ShouldInverse
where we don't have inverse condition for .of
, so I added a comment to add assert so we can detect these cases).
In order to fix this, we need to redesign how we are tying register in select instruction:
- Remove all tie constraints from select instructions from
EraVMInstrInfo.td
and move tie handling toEraVMTieSelectOperands
pass. Create a separate PR for this to check the numbers, since I expect we will get some differences, as now we would only tie killed input registers with output. - Include changes from 1. here, and for all SELr variants add
let hasPostISelHook = 1
inEraVMInstrInfo.td
. InEraVMTargetLowering::AdjustInstrPostInstrSelection
for all SELr variants with.of
condition, mark output withIsEarlyClobber
flag (callsetIsEarlyClobber()
) to prevent regalloc from assigning the same physical register for input and output. InEraVMTieSelectOperands
, enable removing ofIsEarlyClobber
for all select instructions, not only forSELcrr
andSELrcr
. - When this is addressed, move setting of
IsEarlyClobber
fromEraVMBytesToCells.cpp
(e91f94b) toEraVMTargetLowering::AdjustInstrPostInstrSelection
for all select instructions that have stack and code operands in which register is used (you can look at the positions where register should be used inEraVMBytesToCells::convertStackMachineInstr
andEraVMBytesToCells::convertCodeAddressMachineInstr
). This is to improve readability and maintainability, because we will have only one place where we are addingIsEarlyClobber
for functional reasons.
Hi @vladimirradosavljevic , if we take out the COND_OF part, I think your above comments can be treated as a new design of how we tie the operands and set the early-clobber attribute. Therefor I create a separate PR #553 to address them (without the COND_OF), would you please give it a look? |
We are doing redesign to address problem where we don't have reverse condition for |
49d1f70
to
0ea3921
Compare
✅ With the latest revision this PR passed the C/C++ code formatter. |
994aa65
to
8a8cd8f
Compare
@TerryGuo Could you please address this?
Generated assembly:
As you can see, this optimization generated add with |
8a8cd8f
to
aba8e41
Compare
I searched the EraVM folder and updated the code to quit when need to inverse the |
aba8e41
to
ab6acbe
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ab6acbe
to
620915d
Compare
After folding, the select instructions will use overflow condition code which is LT actually, this will give us shorter code sequence normally. Please be noted that GE can only be used as reversal code for overflow LT caused by usubo, but not for uaddo and umulo. Co-authored-by: Alan Li <alan.li@me.com> PR: #521, Issue: #491.
620915d
to
f32aeb5
Compare
Restore this reverted patch along with two changes:
Code Review Checklist
Purpose
Ticket Number
Requirements
Implementation
Logic Errors and Bugs
code does not behave as intended?
that could break the code?
Error Handling and Logging
be added or removed?
written in a way that allows for easy
debugging?
Maintainability
Dependencies
Security
Performance
system performance?
performance of the code significantly?
Testing and Testability
that should be tested in addition?
Readability
smaller methods?
different function, method or variable names?
file/folder/package?
restructured to have a more intuitive control
flow?
better?
understandable?
Documentation
Best Practices
Experts' Opinion
expert or a usability expert, should look over
the code before it can be accepted?