Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to make Mythril analyze all functions to their regular end at RETURN/STOP? #1777

Open
gsalzer opened this issue Jun 13, 2023 · 26 comments
Open

Comments

@gsalzer
Copy link
Contributor

gsalzer commented Jun 13, 2023

I try to understand how Mythril/Laser traverses the execution traces, with the aim to let it reach the regular end of as many functions as possible. For testing, I added a module Function in a file mythril/analysis/module/modules/function.py, see below for the code. Its purpose is to report the name of the current function when reaching its regular end (RETURN). I run it as

myth analyze -f ShieldController.hex -m Function

where ShieldController.hex is the contract creation code at https://etherscan.io/address/0x860b3913e248e6ba352120d550567379cb48fdd6#code.

The expected output are the functions listed by solc --optimize --hashes ShieldController.sol for contract ShieldController. However, the call above just reports the constructor and a function transferFrom(address,address,uint256), which does not occur in ShieldController, but seems to be a function in a contract deployed by ShieldController.

Which settings are needed for Mythril/Laser (like max-depth, loop-bound, call-depth-limit, transaction-count and the search strategy) such that it analyzes the main branches of all (or most) functions until their final RETURN? In the intended application, reaching RETURN will trigger checks for that function.


Contents of mythril/analysis/module/modules/function.py:

from mythril.analysis.report import Issue
from mythril.analysis.module.base import DetectionModule

class Function(DetectionModule):
    name = "Show functions visited"
    swc_id = "0"
    description = "Show functions visited"
    pre_hooks = ["RETURN"]

    def _execute(self, state):
        issue = Issue(
            contract=state.environment.active_account.contract_name,
            function_name=state.environment.active_function_name,
            bytecode=state.environment.code.bytecode,
            title="function visited",
            address=0,
            swc_id=Function.swc_id,
            severity="Low",
            description_head="function visited",
            description_tail="",
            gas_used=(state.mstate.min_gas_used, state.mstate.max_gas_used),
        )
        return [issue]

detector = Function()

In mythril/analysis/module/loader.py, I added

from mythril.analysis.module.modules.function import Function
...
    def _register_mythril_modules(self):
        self._modules.extend([ ..., Function()])
@gsalzer
Copy link
Contributor Author

gsalzer commented Jun 13, 2023

Here are the functions that I want Mythril to report:

$ solc --optimize --hashes ShieldController.sol
...
======= ShieldController.sol:ShieldController =======
Function signatures:
38af3eed: beneficiary()
75b4d78c: bonus()
dc070657: changeBeneficiary(address)
c1ff808d: changeBonus(uint256)
1e761681: changeDepositAmt(uint256)
decc1a86: changeRefFee(uint256)
5b1dbd80: claimTokens(address,address,address)
baef5dfd: createShield(string,string,address,address,address,address,uint256[],address[])
10adc3f5: deleteShield(address,uint256)
b5677b42: depositAmt()
c25dd27a: emitAction(address,address,address,address,uint256,uint256,bool)
82e2fadb: getBalances(address,uint256,uint256)
08abe957: getShields()
aa6ca808: getTokens()
0c340a24: governor()
80d85911: initialize(uint256,uint256,uint256)
fc6b3204: isGov()
1c74a301: receiveOwnership()
ec2e0ab3: refFee()
2fce9884: shieldMapping(address)
f2fde38b: transferOwnership(address)
...

@norhh
Copy link
Collaborator

norhh commented Jun 18, 2023

Mythril visits all functions. You can prefer using bfs as a strategy to make it prioritise the breadth over depth. If your goal is to simply visit all functions, you can just stick to --transaction-count 1. You can ignore the rest, as they are made to be large enough for most contracts. loop-bound and call-depth-limit might only be needed to be increased in exceptional circumstances.

@gsalzer
Copy link
Contributor Author

gsalzer commented Jun 18, 2023

@norhh Thanks for your suggestion. However, the option strategy doesn't make a difference. After maybe a minute Mythril stops, reporting just the constructor and transferFrom, but not a single of the 21 functions that really are in the code.

./myth analyze -f ShieldController.hex -m Function --strategy bfs
==== function visited ====
SWC ID: 0
Severity: Low
Contract: MAIN
Function name: constructor
PC address: 0
Estimated Gas Usage: 1521 - 3920
function visited

--------------------

==== function visited ====
SWC ID: 0
Severity: Low
Contract: MAIN
Function name: gasprice_bit_ether(int128) or transferFrom(address,address,uint256)
PC address: 0
Estimated Gas Usage: 2742 - 4155
function visited

--------------------

--transaction-count doesn't change anything.

@gsalzer
Copy link
Contributor Author

gsalzer commented Jun 18, 2023

./myth version
Mythril version v0.23.23

@norhh
Copy link
Collaborator

norhh commented Jun 19, 2023

You should add STOP into your pre_hooks, as most functions end with either STOP or REVERT. The regular end for most functions is STOP rather than RETURN

@gsalzer
Copy link
Contributor Author

gsalzer commented Jun 19, 2023

@norhh Thanks for the hint regarding STOP, for functions not returning a value. Adding STOP to the pre-hooks lets Mythril find seven more functions. But the remaining 14 functions have a return value, and these functions do terminate with RETURN, like simple getters as bonus().

Adding REVERT yields most other functions, but still not bonus(). But this doesn't help, if I want to perform a check at the regular end of a function.

Might it be the case that Mythril stops its analysis when encountering a state similar to ones seen before? Actually, the state at the end of a getter is not really the same as at the end of other getters, as they return different storage values. Is there a way to let Mythril also explore such parts of the search space?

@gsalzer gsalzer changed the title How to make Mythril visit all functions? How to make Mythril analyze all functions to their regular end at RETURN/STOP? Jun 19, 2023
@norhh
Copy link
Collaborator

norhh commented Jun 19, 2023

Screenshot 2023-06-19 at 17 18 33

Simple getters such as bonus() do not have a RETURN or STOP. They just put the corresponding return value at the top of the stack. A RETURN is much more expensive and heavyweight compared to this operation. From the way mythril computes the function name, the RETURN will happen outside the zone of what Mythril considers the code as the getter function. The state.environment.active_function_name will be the library part of the code which it seems to consider as fallback.

@norhh
Copy link
Collaborator

norhh commented Jun 19, 2023

A quick fix for fixing the getter issue will be to solve for the first 4 bytes of calldata to get the function signature. I'll push that fix today sometime.

@gsalzer
Copy link
Contributor Author

gsalzer commented Jun 19, 2023

Simple getters such as bonus() do not have a RETURN or STOP.

Of course they do, how would the EVM otherwise return control to the calling context, with a return value? Below is the sequence of instructions for bonus(); it ends with RETURN.

000eb: PUSH4 0x75b4d78c ; bonus()
000f0: EQ
000f1: PUSH3 0x0002a8
000f5: JUMPI

002a8: JUMPDEST ; bonus()
002a9: CALLVALUE
002aa: DUP1
002ab: ISZERO
002ac: PUSH3 0x0002b5
002b0: JUMPI

002b1: PUSH1 0x00
002b3: DUP1
002b4: REVERT

002b5: JUMPDEST ; bonus() non-payable
002b6: POP
002b7: PUSH3 0x0002c0
002bb: PUSH1 0x34
002bd: SLOAD    ; S[0x34] 0x0002c0
002be: DUP2     ; 0x0002c0 S[0x34] 0x0002c0
002bf: JUMP

002c0: JUMPDEST ; S[0x34] 0x0002c0
002c1: PUSH1 0x40
002c3: MLOAD    ; fmp S[0x34] 0x0002c0
002c4: SWAP1
002c5: DUP2     ; fmp S[0x34] fmp 0x0002c0
002c6: MSTORE   ; fmp 0x0002c0 # M[fmp] = S[0x34]
002c7: PUSH1 0x20
002c9: ADD      ; fmp+0x20 0x0002c0 # M[fmp] = S[0x34]
002ca: PUSH3 0x00017b
002ce: JUMP

0017b: JUMPDEST
0017c: PUSH1 0x40 
0017e: MLOAD    ; fmp fmp+0x20 0x0002c0 # M[fmp] = S[0x34]
0017f: DUP1     ; fmp fmp fmp+0x20 0x0002c0 # M[fmp] = S[0x34]
00180: SWAP2    ; fmp+0x20 fmp fmp 0x0002c0 # M[fmp] = S[0x34]
00181: SUB      ; 0x20 fmp 0x0002c0 # M[fmp] = S[0x34]  
00182: SWAP1    ; fmp 0x20 0x0002c0 # M[fmp] = S[0x34]
00183: RETURN

@norhh
Copy link
Collaborator

norhh commented Jun 19, 2023

The RETURN for such functions happen outside the zone of what Mythril considers as the function bonus(). Such cases are mapped as unknown and Mythril outputs them as fallback.

@norhh
Copy link
Collaborator

norhh commented Jun 19, 2023

The bytecode maps the final RETURN which is generic to contract ShieldController is Governable part of the code which other functions can JUMP to and reuse. That's a reason why, sometimes, source code mapping might look like it's broken when bugs map to such locations.
So the easiest and an accurate solution is solving for the calldata when such information is required.

@gsalzer
Copy link
Contributor Author

gsalzer commented Jun 19, 2023

Thanks for the explanations, I think I now understand.
It would be nice if Mythril provided reliable information on such basic information as the function name. To me, this seems essential. May I add this to the wish list for Mythril?

@norhh
Copy link
Collaborator

norhh commented Jun 19, 2023

Nevermind, Actually Mythril seems to detect getters at the return by matching the JUMPI address. There is some other issue with it.

@gsalzer gsalzer closed this as completed Jun 19, 2023
@norhh norhh reopened this Jun 19, 2023
@norhh
Copy link
Collaborator

norhh commented Jun 19, 2023

Commenting out:

  1. address token = address( new ArmorToken(_oracle, _name, _symbol) );
  2. arTokens.push(token); -> because token is defined in the above commented line.
    will net all the functions with just a single transaction.
    There is some bug with exploration due to the above lines.

@norhh
Copy link
Collaborator

norhh commented Jun 19, 2023

@gsalzer , can you try it with the latest commit?

@gsalzer
Copy link
Contributor Author

gsalzer commented Jun 20, 2023

@norhh With STOP and RETURN as the pre-hooks, ./myth analyze -f ShieldController.hex -m Function --transaction-count 1 (also tried --strategy bfs, same result), returns the function names

constructor
controller() <== not quite clear where this comes from
fallback

beneficiary()
depositAmt()
getBalances(address,uint256,uint256)
getShields()
getTokens()
governor()
initialize(uint256,uint256,uint256)
isGov()
refFee()
shieldMapping(address)

but the following ones are still missing:

75b4d78c: bonus()
dc070657: changeBeneficiary(address)
c1ff808d: changeBonus(uint256)
1e761681: changeDepositAmt(uint256)
decc1a86: changeRefFee(uint256)
5b1dbd80: claimTokens(address,address,address)
baef5dfd: createShield(string,string,address,address,address,address,uint256[],address[])
10adc3f5: deleteShield(address,uint256)
c25dd27a: emitAction(address,address,address,address,uint256,uint256,bool)
1c74a301: receiveOwnership()
f2fde38b: transferOwnership(address)

@norhh
Copy link
Collaborator

norhh commented Jun 20, 2023

You can increase the transaction count to 3 to cover more functions (20 in total), but 4 of the functions aren't covered like bonus()

@gsalzer
Copy link
Contributor Author

gsalzer commented Jun 21, 2023

@norhh I'm puzzled. Doesn't transaction-count specify the number of successive transactions issued to change the state (and trigger e.g. a vulnerability)? If there are more functions reported for a higher transaction count, then this seems to be rather an incidental by-product and not really expected behavior. Moreover, so far I understood laser as a general-purpose engine for symbolic execution, taint analysis etc. Taken together, Mythril should cover all functions within one transaction, otherwise there is a bug.
Or is your point that Mythril actually reaches all the RETURNs and STOPs, but it is just the reporting of the function name that fails? In this case, I'd appreciate it if you could propose an expression that would return, at the RETURN/STOP, the first four bytes of the call data. Hooking the CALLDATA instruction and storing the four bytes for later is probably not such a good idea, as there may be nested calls. Maybe using the path condition and a call to the Z3 solver?

@norhh
Copy link
Collaborator

norhh commented Jun 21, 2023

Let's take a function:

function withdrawMoney() {
    require(withdraw_flag == 1)
    transferEverything(address(msg.sender))
}

It's not possible to hit STOP for such functions in a single transaction, because a previous transaction has to set the withdraw_flag to 1.
Another example would be:

function kill() {
    require(kill_switch == 1)
    selfdestruct(address(msg.sender))
}

Where the transaction only ends with SELFDESTRUCT when the kill_switch is triggered in a previous transaction.

Also, the issue with bonus() and controller() is that they have the same function JUMPDEST, which Mythril seems to get confused about for some reason.

@gsalzer
Copy link
Contributor Author

gsalzer commented Jun 23, 2023

Thanks for the explanation. It is definitely a valid argument in the case of source code/bytecode where Mythril may take the state of the contract, after initialization, into account.

But in the case of --bin-runtime, the withdraw_flag and the kill_switch should/must be handled symbolically (there is no state), so Mythril should reach the RETURN/STOP instruction while exploring the first transaction. However, the result is still the same as if analyzing creation bytecode, with the majority of functions missing.

@norhh
Copy link
Collaborator

norhh commented Jun 23, 2023

You'll have to use --unconstrained-storage for it to happen in such a way.

@gsalzer
Copy link
Contributor Author

gsalzer commented Jun 23, 2023

Thanks regarding --unconstrained-storage. Now Mythril identifies almost all functions. Functions still missing:

75b4d78c: bonus()
baef5dfd: createShield(string,string,address,address,address,address,uint256[],address[])
c25dd27a: emitAction(address,address,address,address,uint256,uint256,bool)

Functions reported even though they are not there:

0x00000001 account_info_rotate_tine(uint256)
0xf77c4791 controller()
fallback

Do you have an explanation why Mythril reports 0x00000001 account_info_rotate_tine(uint256), a function that isn't anywhere in the contract? controller() occurs at least in a contract deployed by the main contract, but 0x00000001 appears nowhere at all.

@norhh
Copy link
Collaborator

norhh commented Jun 23, 2023

Yes, there was the issue of bonus() being confused with controller() due to issue of them having same jumpdest.
There should be some other similar bug for the rest.

@aj3423
Copy link

aj3423 commented Jul 13, 2023

So the easiest and an accurate solution is solving for the calldata when such information is required.

How to add a constraint to the calldata? For example constrain the 4-bytes-function-signature == 0x12345678 , I tried something like this but it doesn't work.

constraints.append(
    tx.call_data[0:4] == symbol_factory.BitVecVal(0xa9059cbb, 32)
)

@norhh
Copy link
Collaborator

norhh commented Jul 13, 2023

Hi @aj3423, this is an example of how it is done:

constraint = Or(
constraint, calldata[i] == symbol_factory.BitVecVal(func_hash[i], 8)
)

@aj3423
Copy link

aj3423 commented Jul 13, 2023

@norhh Thanks, this is really helpful. Btw, maybe this loop can be optimised, currently it's:

for i in range(FUNCTION_HASH_BYTE_LENGTH):
constraint = Bool(False)
for func_hash in func_hashes:
if func_hash == -1:
# Fallback function without calldata
constraint = Or(constraint, calldata.size < 4)
elif func_hash == -2:
# Receive function
constraint = Or(constraint, calldata.size == 0)
else:
constraint = Or(
constraint, calldata[i] == symbol_factory.BitVecVal(func_hash[i], 8)
)
constraints.append(constraint)

It always generates 4 constraints even with a transaction sequence like [[-1]]:

[Or(False, 4 > 2_calldatasize), Or(False, 4 > 2_calldatasize), Or(False, 4 > 2_calldatasize), Or(False, 4 > 2_calldatasize)]

I think it should be:

    constraint = Bool(False)
    for func_hash in func_hashes:
        if func_hash == -1:
            # Fallback function without calldata
            constraint = Or(constraint, calldata.size < 4)
        elif func_hash == -2:
            # Receive function
            constraint = Or(constraint, calldata.size == 0)
        else:
            for i in range(FUNCTION_HASH_BYTE_LENGTH):
                constraint = Or(
                    constraint, calldata[i] == symbol_factory.BitVecVal(func_hash[i], 8)
                )
    constraints.append(constraint)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants