Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restore the "Fold overflow intrinsics with select instructions" #521

Merged
merged 2 commits into from
May 28, 2024

Conversation

TerryGuo
Copy link
Collaborator

@TerryGuo TerryGuo commented May 6, 2024

Restore this reverted patch along with two changes:

Code Review Checklist

Purpose

Ticket Number

Requirements

  • Have the requirements been met?
  • Have stakeholder(s) approved the change?

Implementation

  • Does this code change accomplish what it is supposed to do?
  • Can this solution be simplified?
  • Does this change add unwanted compile-time or run-time dependencies?
  • Could an additional framework, API, library, or service improve the solution?
  • Could we reuse part of LLVM instead of implementing the patch or a part of it?
  • Is the code at the right abstraction level?
  • Is the code modular enough?
  • Can a better solution be found in terms of maintainability, readability, performance, or security?
  • Does similar functionality already exist in the codebase? If yes, why isn’t it reused?
  • Are there any best practices, design patterns or language-specific patterns that could substantially improve this code?

Logic Errors and Bugs

  • Can you think of any use case in which the
    code does not behave as intended?
  • Can you think of any inputs or external events
    that could break the code?

Error Handling and Logging

  • Is error handling done the correct way?
  • Should any logging or debugging information
    be added or removed?
  • Are error messages user-friendly?
  • Are there enough log events and are they
    written in a way that allows for easy
    debugging?

Maintainability

  • Is the code easy to read?
  • Is the code not repeated (DRY Principle)?
  • Is the code method/class not too long?

Dependencies

  • Were updates to documentation, configuration, or readme files made as required by this change?
  • Are there any potential impacts on other parts of the system or backward compatibility?

Security

  • Does the code introduce any security vulnerabilities?

Performance

  • Do you think this code change decreases
    system performance?
  • Do you see any potential to improve the
    performance of the code significantly?

Testing and Testability

  • Is the code testable?
  • Have automated tests been added, or have related ones been updated to cover the change?
    • For changes to mutable state
  • Do tests reasonably cover the code change (unit/integration/system tests)?
    • Line Coverage
    • Region Coverage
    • Branch Coverage
  • Are there some test cases, input or edge cases
    that should be tested in addition?

Readability

  • Is the code easy to understand?
  • Which parts were confusing to you and why?
  • Can the readability of the code be improved by
    smaller methods?
  • Can the readability of the code be improved by
    different function, method or variable names?
  • Is the code located in the right
    file/folder/package?
  • Do you think certain methods should be
    restructured to have a more intuitive control
    flow?
  • Is the data flow understandable?
  • Are there redundant or outdated comments?
  • Could some comments convey the message
    better?
  • Would more comments make the code more
    understandable?
  • Could some comments be removed by making the code itself more readable?
  • Is there any commented-out code?

Documentation

  • Is there sufficient documentation?
  • Is the ReadMe.md file up to date?

Best Practices

  • Follow Single Responsibility principle?
  • Are different errors handled correctly?
  • Are errors and warnings logged?
  • Magic values avoided?
  • No unnecessary comments?
  • Minimal nesting used?

Experts' Opinion

  • Do you think a specific expert, like a security
    expert or a usability expert, should look over
    the code before it can be accepted?
  • Will this code change impact different teams, and should they review the change as well?

Copy link

github-actions bot commented May 6, 2024

Benchmark results:

╔═╡ Size (-%) ╞════════════════╡ All M3B3 ╞═╗
║ Mean                                0.008 ║
║ Best                               13.333 ║
║ Worst                              -1.357 ║
║ Total                              -0.005 ║
╠═╡ Cycles (-%) ╞══════════════╡ All M3B3 ╞═╣
║ Mean                                0.006 ║
║ Best                               30.769 ║
║ Worst                               0.000 ║
║ Total                               0.014 ║
╠═╡ Ergs (-%) ╞════════════════╡ All M3B3 ╞═╣
║ Mean                                0.005 ║
║ Best                               20.339 ║
║ Worst                              -0.120 ║
║ Total                               0.000 ║
╚═══════════════════════════════════════════╝

╔═╡ Size (-%) ╞════════════════╡ All MzB3 ╞═╗
║ Mean                                0.010 ║
║ Best                               13.333 ║
║ Worst                              -0.076 ║
║ Total                               0.011 ║
╠═╡ Cycles (-%) ╞══════════════╡ All MzB3 ╞═╣
║ Mean                                0.006 ║
║ Best                               30.769 ║
║ Worst                               0.000 ║
║ Total                               0.012 ║
╠═╡ Ergs (-%) ╞════════════════╡ All MzB3 ╞═╣
║ Mean                                0.005 ║
║ Best                               20.339 ║
║ Worst                              -0.007 ║
║ Total                               0.000 ║
╚═══════════════════════════════════════════╝

╔═╡ Size (-%) ╞═════╡ EVMInterpreter M3B3 ╞═╗
║ Mean                                0.000 ║
║ Best                                0.000 ║
║ Worst                               0.000 ║
║ Total                                 NaN ║
╠═╡ Cycles (-%) ╞═══╡ EVMInterpreter M3B3 ╞═╣
║ Mean                                0.000 ║
║ Best                                0.000 ║
║ Worst                               0.000 ║
║ Total                               0.000 ║
╠═╡ Ergs (-%) ╞═════╡ EVMInterpreter M3B3 ╞═╣
║ Mean                                0.000 ║
║ Best                                0.004 ║
║ Worst                               0.000 ║
║ Total                               0.000 ║
╠═╡ Ergs/gas ╞══════╡ EVMInterpreter M3B3 ╞═╣
║ ADD                                69.333 ║
║ MUL                                40.400 ║
║ SUB                                69.333 ║
║ DIV                                48.800 ║
║ SDIV                               65.600 ║
║ MOD                                47.600 ║
║ SMOD                               63.200 ║
║ ADDMOD                             39.625 ║
║ MULMOD                             36.625 ║
║ EXP                                 9.667 ║
║ SIGNEXTEND                         46.400 ║
║ LT                                 73.333 ║
║ GT                                 73.333 ║
║ SLT                                97.333 ║
║ SGT                                95.333 ║
║ EQ                                 73.333 ║
║ ISZERO                             65.000 ║
║ AND                                67.333 ║
║ OR                                 69.333 ║
║ XOR                                69.333 ║
║ NOT                                61.000 ║
║ BYTE                               77.333 ║
║ SHL                                75.333 ║
║ SHR                                73.333 ║
║ SAR                                91.333 ║
║ SGT                                95.333 ║
║ SHA3                               28.519 ║
║ ADDRESS                            93.781 ║
║ BALANCE                            73.230 ║
║ ORIGIN                           1389.719 ║
║ CALLER                             93.781 ║
║ CALLVALUE                          93.781 ║
║ CALLDATALOAD                       63.333 ║
║ CALLDATASIZE                       94.000 ║
║ CALLDATACOPY                       71.533 ║
║ CODESIZE                           91.000 ║
║ CODECOPY                          145.574 ║
║ GASPRICE                         1392.438 ║
║ EXTCODESIZE                         5.149 ║
║ EXTCODECOPY                         4.667 ║
║ RETURNDATASIZE                     89.000 ║
║ RETURNDATACOPY                     55.889 ║
║ EXTCODEHASH                         9.087 ║
║ BLOCKHASH                         244.700 ║
║ COINBASE                         1392.719 ║
║ TIMESTAMP                        1389.719 ║
║ NUMBER                           1389.719 ║
║ PREVRANDAO                       1389.719 ║
║ GASLIMIT                         1395.719 ║
║ CHAINID                          1389.719 ║
║ SELFBALANCE                       641.312 ║
║ BASEFEE                          1386.719 ║
║ POP                                81.500 ║
║ MLOAD                              80.186 ║
║ MSTORE                             82.069 ║
║ MSTORE8                            89.912 ║
║ SLOAD                              27.212 ║
║ SSTORE                              9.099 ║
║ JUMP                               37.333 ║
║ JUMPI                              32.818 ║
║ PC                                 94.281 ║
║ MSIZE                             103.781 ║
║ GAS                                88.281 ║
║ JUMPDEST                          127.562 ║
║ PUSH0                              91.281 ║
║ PUSH1                              78.854 ║
║ PUSH2                              96.854 ║
║ PUSH4                             158.854 ║
║ PUSH5                             180.854 ║
║ PUSH6                             202.854 ║
║ PUSH7                             224.854 ║
║ PUSH8                             246.854 ║
║ PUSH9                             268.854 ║
║ PUSH10                            290.854 ║
║ PUSH11                            312.854 ║
║ PUSH12                            332.854 ║
║ PUSH13                            356.854 ║
║ PUSH14                            378.854 ║
║ PUSH15                            400.854 ║
║ PUSH16                            422.854 ║
║ PUSH17                            444.854 ║
║ PUSH18                            466.854 ║
║ PUSH19                            488.854 ║
║ PUSH20                            510.854 ║
║ PUSH21                            532.854 ║
║ PUSH22                            554.854 ║
║ PUSH23                            576.854 ║
║ PUSH24                            598.854 ║
║ PUSH25                            620.854 ║
║ PUSH26                            642.854 ║
║ PUSH27                            664.854 ║
║ PUSH28                            686.854 ║
║ PUSH29                            708.854 ║
║ PUSH30                            730.854 ║
║ PUSH31                            752.854 ║
║ PUSH32                            774.854 ║
║ DUP1                               63.000 ║
║ DUP2                               67.000 ║
║ DUP3                               67.000 ║
║ DUP4                               67.000 ║
║ DUP5                               67.000 ║
║ DUP6                               67.000 ║
║ DUP7                               67.000 ║
║ DUP8                               67.000 ║
║ DUP9                               67.000 ║
║ DUP10                              67.000 ║
║ DUP11                              67.000 ║
║ DUP12                              67.000 ║
║ DUP13                              65.000 ║
║ DUP14                              67.000 ║
║ DUP15                              67.000 ║
║ DUP16                              67.000 ║
║ SWAP1                              67.667 ║
║ SWAP2                              67.667 ║
║ SWAP3                              67.667 ║
║ SWAP4                              67.667 ║
║ SWAP5                              67.667 ║
║ SWAP6                              67.667 ║
║ SWAP7                              67.667 ║
║ SWAP8                              67.667 ║
║ SWAP9                              67.667 ║
║ SWAP10                             67.667 ║
║ SWAP11                             67.667 ║
║ SWAP12                             67.667 ║
║ SWAP13                             67.667 ║
║ SWAP14                             67.667 ║
║ SWAP15                             65.667 ║
║ SWAP16                             67.667 ║
║ CALL                               56.371 ║
║ STATICCALL                         53.248 ║
║ DELEGATECALL                       52.286 ║
║ CREATE                              5.046 ║
║ CREATE2                             7.067 ║
║ RETURN                              1.000 ║
║ REVERT                              1.000 ║
╠═╡ Ergs/gas (-%) ╞═╡ EVMInterpreter M3B3 ╞═╣
║ ORIGIN                              0.018 ║
║ GASPRICE                            0.018 ║
║ COINBASE                            0.018 ║
║ TIMESTAMP                           0.018 ║
║ NUMBER                              0.018 ║
║ PREVRANDAO                          0.018 ║
║ GASLIMIT                            0.018 ║
║ CHAINID                             0.018 ║
║ BASEFEE                             0.018 ║
║ CREATE                              0.005 ║
║ CREATE2                             0.005 ║
╚═══════════════════════════════════════════╝

╔═╡ Size (-%) ╞═════╡ EVMInterpreter MzB3 ╞═╗
║ Mean                                0.000 ║
║ Best                                0.000 ║
║ Worst                               0.000 ║
║ Total                                 NaN ║
╠═╡ Cycles (-%) ╞═══╡ EVMInterpreter MzB3 ╞═╣
║ Mean                                0.000 ║
║ Best                                0.000 ║
║ Worst                               0.000 ║
║ Total                               0.000 ║
╠═╡ Ergs (-%) ╞═════╡ EVMInterpreter MzB3 ╞═╣
║ Mean                                0.000 ║
║ Best                                0.004 ║
║ Worst                               0.000 ║
║ Total                               0.000 ║
╚═══════════════════════════════════════════╝

╔═╡ Size (-%) ╞════════╡ Precompiles M3B3 ╞═╗
║ Mean                               -0.214 ║
║ Best                                0.000 ║
║ Worst                              -1.357 ║
║ Total                              -0.611 ║
╠═╡ Cycles (-%) ╞══════╡ Precompiles M3B3 ╞═╣
║ Mean                                0.168 ║
║ Best                                4.039 ║
║ Worst                               0.000 ║
║ Total                               0.281 ║
╠═╡ Ergs (-%) ╞════════╡ Precompiles M3B3 ╞═╣
║ Mean                                0.158 ║
║ Best                                4.039 ║
║ Worst                              -0.120 ║
║ Total                               0.003 ║
╚═══════════════════════════════════════════╝

╔═╡ Size (-%) ╞════════╡ Precompiles MzB3 ╞═╗
║ Mean                                0.710 ║
║ Best                                4.401 ║
║ Worst                               0.000 ║
║ Total                               1.094 ║
╠═╡ Cycles (-%) ╞══════╡ Precompiles MzB3 ╞═╣
║ Mean                                0.111 ║
║ Best                                2.610 ║
║ Worst                               0.000 ║
║ Total                               0.239 ║
╠═╡ Ergs (-%) ╞════════╡ Precompiles MzB3 ╞═╣
║ Mean                                0.111 ║
║ Best                                2.425 ║
║ Worst                               0.000 ║
║ Total                               0.003 ║
╚═══════════════════════════════════════════╝

╔═╡ Size (-%) ╞══════════╡ Real life M3B3 ╞═╗
║ Mean                                0.019 ║
║ Best                                0.246 ║
║ Worst                               0.000 ║
║ Total                               0.030 ║
╠═╡ Cycles (-%) ╞════════╡ Real life M3B3 ╞═╣
║ Mean                                0.034 ║
║ Best                                1.015 ║
║ Worst                               0.000 ║
║ Total                               0.124 ║
╠═╡ Ergs (-%) ╞══════════╡ Real life M3B3 ╞═╣
║ Mean                                0.015 ║
║ Best                                0.341 ║
║ Worst                               0.000 ║
║ Total                               0.036 ║
╚═══════════════════════════════════════════╝

╔═╡ Size (-%) ╞══════════╡ Real life MzB3 ╞═╗
║ Mean                                0.027 ║
║ Best                                0.466 ║
║ Worst                              -0.076 ║
║ Total                               0.042 ║
╠═╡ Cycles (-%) ╞════════╡ Real life MzB3 ╞═╣
║ Mean                                0.035 ║
║ Best                                0.977 ║
║ Worst                               0.000 ║
║ Total                               0.114 ║
╠═╡ Ergs (-%) ╞══════════╡ Real life MzB3 ╞═╣
║ Mean                                0.019 ║
║ Best                                0.494 ║
║ Worst                              -0.007 ║
║ Total                               0.038 ║
╚═══════════════════════════════════════════╝

@TerryGuo
Copy link
Collaborator Author

TerryGuo commented May 7, 2024

Hi @akiramenai , what do you think of the score? They don't look good to me. For example, we got "Worst -19.108" for "Size All M3B3", I guess this means my patch caused a code size bloating. As for Cycles, the mean value is pretty small. Do they match your expectations?

@akiramenai
Copy link
Collaborator

akiramenai commented May 7, 2024

The numbers are expected. Perhaps except for size regressions.
A similar logic for branches expected to give us more.

Copy link
Collaborator

@akiramenai akiramenai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please navigate me to the code which guarantees that if glue contradicts to data dependency a copy of a node is inserted? We need to make sure that testing is sufficient.
You also mentioned that there are cases with suboptimal codegen, could you add them to the regression tests as well?

llvm/lib/Target/EraVM/EraVMISelLowering.cpp Outdated Show resolved Hide resolved
llvm/lib/Target/EraVM/EraVMISelLowering.cpp Outdated Show resolved Hide resolved
llvm/test/CodeGen/EraVM/overflow.ll Outdated Show resolved Hide resolved
@TerryGuo TerryGuo force-pushed the terry_fold_overflow_select_v4 branch 3 times, most recently from 15e141b to 947a805 Compare May 8, 2024 00:37
@TerryGuo TerryGuo force-pushed the terry_fold_overflow_select_v4 branch 2 times, most recently from 04483fc to e601836 Compare May 13, 2024 02:47
Copy link
Collaborator

@akiramenai akiramenai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall the approach with a pseudo flag looks ok to me, but the tests imply that it needs to be addressed in some other places as well.
I expect the flag to be pure pseudo (with no exposure to our asm) and carry semantic of irreversibility in Select and Br which is to be documented in appropriate places with examples. Maybe there is a better way to get the same behaviour w.o. introducing new a pseudo flag [@vladimirradosavljevic perhaps you have some ideas on that]

llvm/lib/Target/EraVM/AsmParser/EraVMAsmParser.cpp Outdated Show resolved Hide resolved
@TerryGuo
Copy link
Collaborator Author

Overall the approach with a pseudo flag looks ok to me, but the tests imply that it needs to be addressed in some other places as well. I expect the flag to be pure pseudo (with no exposure to our asm) and carry semantic of irreversibility in Select and Br which is to be documented in appropriate places with examples. Maybe there is a better way to get the same behaviour w.o. introducing new a pseudo flag [@vladimirradosavljevic perhaps you have some ideas on that]

Hi @akiramenai , thanks for the comments, I will spend more time thinking about how to implement a better one w.o. this new pseudo flag. A possible idea is to lower oveflow instrinsic + selection directly to conditional moves.
Hi @vladimirradosavljevic , if you have any thoughts, please share. Thanks in advance.

Copy link
Collaborator

@akiramenai akiramenai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vladimirradosavljevic could you also have a look?

llvm/lib/Target/EraVM/EraVMISelLowering.cpp Outdated Show resolved Hide resolved
llvm/lib/Target/EraVM/EraVMInstrFormats.td Show resolved Hide resolved
llvm/lib/Target/EraVM/EraVMInstrInfo.td Outdated Show resolved Hide resolved
Copy link
Contributor

@vladimirradosavljevic vladimirradosavljevic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no support and tests for optimizations where we are manipulating/folding with select instructions (e.g. like in EraVMOptimizeSelectPreRA). Please add check not to do this optimization for .of if we need to reverse condition, and add tests to check these optimization where we can do it and where we can't do it.

As I mentioned in the comment, problem with ties is not only present with SELrrr instruction. Please look at the following test:

define i256 @test(i256 %a, i256 %b, i256 %x, i256 %y) {
entry:
  %res1 = call {i256, i1} @llvm.uadd.with.overflow.i256(i256 %x, i256 %y)
  %overflow = extractvalue {i256, i1} %res1, 1
  %selected = select i1 %overflow, i256 %a, i256 12345
  ret i256 %selected
}

Following assembly is generated with this patch:

test:                                   ; @test
; %bb.0:                                ; %entry
    add!    r3, r4, r2
    add     12345, r0, r1
    ret

In this case, we have:

%5:gr256 = SELrir killed %0:gr256(tied-def 0), i256 12345, i256 8, implicit $flags

And as you can see, first register is tied with the output and later in the expand select we would run into the same problem as with SELrrr (please note in the generated ASM we have 1 instruction after expansion, but that is the problem in ShouldInverse where we don't have inverse condition for .of, so I added a comment to add assert so we can detect these cases).

In order to fix this, we need to redesign how we are tying register in select instruction:

  1. Remove all tie constraints from select instructions from EraVMInstrInfo.td and move tie handling to EraVMTieSelectOperands pass. Create a separate PR for this to check the numbers, since I expect we will get some differences, as now we would only tie killed input registers with output.
  2. Include changes from 1. here, and for all SELr variants add let hasPostISelHook = 1 in EraVMInstrInfo.td. In EraVMTargetLowering::AdjustInstrPostInstrSelection for all SELr variants with .of condition, mark output with IsEarlyClobber flag (call setIsEarlyClobber()) to prevent regalloc from assigning the same physical register for input and output. In EraVMTieSelectOperands, enable removing of IsEarlyClobber for all select instructions, not only for SELcrr and SELrcr.
  3. When this is addressed, move setting of IsEarlyClobber from EraVMBytesToCells.cpp (e91f94b) to EraVMTargetLowering::AdjustInstrPostInstrSelection for all select instructions that have stack and code operands in which register is used (you can look at the positions where register should be used in EraVMBytesToCells::convertStackMachineInstr and EraVMBytesToCells::convertCodeAddressMachineInstr). This is to improve readability and maintainability, because we will have only one place where we are adding IsEarlyClobber for functional reasons.

llvm/lib/Target/EraVM/EraVMInstrInfo.td Show resolved Hide resolved
llvm/lib/Target/EraVM/EraVMInstrInfo.td Show resolved Hide resolved
llvm/lib/Target/EraVM/EraVMInstrInfo.td Outdated Show resolved Hide resolved
llvm/lib/Target/EraVM/EraVMExpandSelect.cpp Show resolved Hide resolved
llvm/lib/Target/EraVM/EraVMTargetMachine.cpp Outdated Show resolved Hide resolved
llvm/test/CodeGen/EraVM/overflow.ll Show resolved Hide resolved
@TerryGuo
Copy link
Collaborator Author

In order to fix this, we need to redesign how we are tying register in select instruction:

  1. Remove all tie constraints from select instructions from EraVMInstrInfo.td and move tie handling to EraVMTieSelectOperands pass. Create a separate PR for this to check the numbers, since I expect we will get some differences, as now we would only tie killed input registers with output.
  2. Include changes from 1. here, and for all SELr variants add let hasPostISelHook = 1 in EraVMInstrInfo.td. In EraVMTargetLowering::AdjustInstrPostInstrSelection for all SELr variants with .of condition, mark output with IsEarlyClobber flag (call setIsEarlyClobber()) to prevent regalloc from assigning the same physical register for input and output. In EraVMTieSelectOperands, enable removing of IsEarlyClobber for all select instructions, not only for SELcrr and SELrcr.
  3. When this is addressed, move setting of IsEarlyClobber from EraVMBytesToCells.cpp (e91f94b) to EraVMTargetLowering::AdjustInstrPostInstrSelection for all select instructions that have stack and code operands in which register is used (you can look at the positions where register should be used in EraVMBytesToCells::convertStackMachineInstr and EraVMBytesToCells::convertCodeAddressMachineInstr). This is to improve readability and maintainability, because we will have only one place where we are adding IsEarlyClobber for functional reasons.

Hi @vladimirradosavljevic , if we take out the COND_OF part, I think your above comments can be treated as a new design of how we tie the operands and set the early-clobber attribute. Therefor I create a separate PR #553 to address them (without the COND_OF), would you please give it a look?

@vladimirradosavljevic
Copy link
Contributor

In order to fix this, we need to redesign how we are tying register in select instruction:

  1. Remove all tie constraints from select instructions from EraVMInstrInfo.td and move tie handling to EraVMTieSelectOperands pass. Create a separate PR for this to check the numbers, since I expect we will get some differences, as now we would only tie killed input registers with output.
  2. Include changes from 1. here, and for all SELr variants add let hasPostISelHook = 1 in EraVMInstrInfo.td. In EraVMTargetLowering::AdjustInstrPostInstrSelection for all SELr variants with .of condition, mark output with IsEarlyClobber flag (call setIsEarlyClobber()) to prevent regalloc from assigning the same physical register for input and output. In EraVMTieSelectOperands, enable removing of IsEarlyClobber for all select instructions, not only for SELcrr and SELrcr.
  3. When this is addressed, move setting of IsEarlyClobber from EraVMBytesToCells.cpp (e91f94b) to EraVMTargetLowering::AdjustInstrPostInstrSelection for all select instructions that have stack and code operands in which register is used (you can look at the positions where register should be used in EraVMBytesToCells::convertStackMachineInstr and EraVMBytesToCells::convertCodeAddressMachineInstr). This is to improve readability and maintainability, because we will have only one place where we are adding IsEarlyClobber for functional reasons.

Hi @vladimirradosavljevic , if we take out the COND_OF part, I think your above comments can be treated as a new design of how we tie the operands and set the early-clobber attribute. Therefor I create a separate PR #553 to address them (without the COND_OF), would you please give it a look?

We are doing redesign to address problem where we don't have reverse condition for .of and we are doing ties on the first register operand. 3rd point is just a refactoring to have only one place where we add earlyclobber flag. I would focus more on the 2nd point and other comments and do 3rd point when everything else is addressed.

@TerryGuo TerryGuo force-pushed the terry_fold_overflow_select_v4 branch from 49d1f70 to 0ea3921 Compare May 21, 2024 11:49
Copy link

github-actions bot commented May 21, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@vladimirradosavljevic
Copy link
Contributor

There is no support and tests for optimizations where we are manipulating/folding with select instructions (e.g. like in EraVMOptimizeSelectPreRA). Please add check not to do this optimization for .of if we need to reverse condition, and add tests to check these optimization where we can do it and where we can't do it.

@TerryGuo Could you please address this?
In optimizations where we are manipulating/folding with select instructions such as EraVMOptimizeSelectPreRA we need to disable them if we need to revert condition for .of. Please take a look at the following example:

define i256 @test1(i256 %a, i256 %b, i256 %x, i256 %y) {
entry:
  %add = add i256 %a, %b
  %res = call {i256, i1} @llvm.uadd.with.overflow.i256(i256 %x, i256 %y)
  %overflow = extractvalue {i256, i1} %res, 1
  %selected = select i1 %overflow, i256 %a, i256 %add
  ret i256 %selected
}

Generated assembly:

test1:                                  ; @test1
; %bb.0:                                ; %entry
	add!	r3, r4, r3
	add	r1, r2, r1
	ret

As you can see, this optimization generated add with COND_NONE since there is no reverse condition for .of. We should disable this and optimizations that can emit reverse condition in case of .of.

@TerryGuo
Copy link
Collaborator Author

I searched the EraVM folder and updated the code to quit when need to inverse the COND_OF. The corresponding tests are added.

@TerryGuo TerryGuo force-pushed the terry_fold_overflow_select_v4 branch from aba8e41 to ab6acbe Compare May 27, 2024 09:49
Copy link
Collaborator

@akiramenai akiramenai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @TerryGuo. It's always a pleasure to read your documentation. I added a few comments, please address them, then I approve.
Also please mention @lialan as a co-author in the commit message.

llvm/test/CodeGen/EraVM/overflow.ll Outdated Show resolved Hide resolved
llvm/test/CodeGen/EraVM/overflow.ll Outdated Show resolved Hide resolved
llvm/lib/Target/EraVM/EraVMISelLowering.cpp Outdated Show resolved Hide resolved
llvm/lib/Target/EraVM/EraVMInstrFormats.td Outdated Show resolved Hide resolved
llvm/lib/Target/EraVM/EraVMInstrInfo.td Outdated Show resolved Hide resolved
llvm/lib/Target/EraVM/EraVMTieSelectOperands.cpp Outdated Show resolved Hide resolved
@TerryGuo TerryGuo force-pushed the terry_fold_overflow_select_v4 branch from ab6acbe to 620915d Compare May 28, 2024 13:46
TerryGuo and others added 2 commits May 28, 2024 21:30
Add tests for folding select with overflow intrinsics where we shouldn't
attempt to inverse overflow LT caused by uaddo and umulo.

Co-authored-by: Alan Li <alan.li@me.com>
PR: #521, Issue: #491.
After folding, the select instructions will use overflow condition code
which is LT actually, this will give us shorter code sequence normally.
Please be noted that GE can only be used as reversal code for overflow
LT caused by usubo, but not for uaddo and umulo.

Co-authored-by: Alan Li <alan.li@me.com>
PR: #521, Issue: #491.
@akiramenai akiramenai force-pushed the terry_fold_overflow_select_v4 branch from 620915d to f32aeb5 Compare May 28, 2024 19:57
@akiramenai akiramenai merged commit 21eaf8c into main May 28, 2024
4 of 5 checks passed
@akiramenai akiramenai deleted the terry_fold_overflow_select_v4 branch May 28, 2024 19:58
akiramenai pushed a commit that referenced this pull request May 28, 2024
Add tests for folding select with overflow intrinsics where we shouldn't
attempt to inverse overflow LT caused by uaddo and umulo.

Co-authored-by: Alan Li <alan.li@me.com>
PR: #521, Issue: #491.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants