Skip to content

Tree-of-AST: Python Code Analysis framework inspired by ToT (Tree-of-Thoughts) of Deliberate Code analysing states using ToT-based ways.

Notifications You must be signed in to change notification settings

retr0reg/Tree-of-AST

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Tree-of-AST

Tree-of-AST is a static-pseudocode analysing approach inspired by Tree of Thoughts: Deliberate Problem Solving with Large Language Models that can be used in both security-analysis and jobs debugging tracing backs, However, in Tree-of-AST we are creating a Tree-of-AST embed with both code generated by AST tools and LMs's thought. With a basic input point, we can use evaluation on both tree node probability and creating a sub-tree of LM's response

Chain-Of-thought && backgrounds

Inspired by Tree of Thoughts: Deliberate Problem Solving with Large Language Models, A solution combining the tree of Aspartate transferase and the tree of Chain-Of-thought. However, we are not using chain-of-thought but creating a Chain-of-attack in favor of construction of ToTs, including heuristically evaluating states;

AUTOGDB

ToT

Construction of ToT can be summarized by four questions mentioned in the paper:

  1. How to decompose the intermediate process into thought steps;
  2. How to generate potential thoughts from each state;
  3. How to heuristically evaluate states;
  4. What search algorithm to use.

Focused on the job of analyzing via AST, in this case specifically analyzing possible vulnerabilities, we can conclude that:

  1. Information will given ( Potential Vulnerability: Tools like bandit, Tree construction: AST )
  2. To generate possible K, choosing from Sampling, and Proposing, we will use Proposing since Proposing works better when the thought space is more constrained (e.g. each thought is just a word or a line), so proposing different thoughts in the same context avoids duplication.
  3. Choosing from Vote and Value or using together, we choose COMBINING TWO BOTH. Since vulnerability analyzing jobs require both parallel judgment and future possible prediction, the final call will be considered with weight of 8:2 in 10.
  4. Depth-first search (DFS) since we are actually conducting analysis on an existing possible vulnerability.

The whole workflow should be like:

1. Static vulnerability existence detection (Bandit for instance)
2. AST Tree building from and analyzing (Finding xreferences and input)
3. Eval Tree nodes reversely using `Vote` and `Value` of external LMs
    * If node score > threshold, mark the current node and restart on previous doubtful node using BHF.
    * If no node score > threshold Until the final node is reached (t > T), mark as none.

4. Finally, export the marked branch, in further discussions, we will call it "Tree-of-AST"

Exploitations

For exploitations, we will use another approach based on ToT. To begin with, we will start on from tail-to-Head travaling the Tree-of-AST in order to proceed inverse operation of the Vuln-input. For each node we met in the original Tree-of-AST, we will call it a layer; For each layer we met, we will use LMs ( no-shot or few shot ) to generate a part of exploit, these exploits will be embed in this layer of Tree-of-AST.

Same as ToT goes, we will generate multiple nodes of exploits in a layer, however, we will only evaluate them in vote, value with lookahead simulations in order to avoid issues such as unsure type, after this round of travel. Thus the weight should be 2:8 now

In the exploitation process, we might face the possibility that we can insert arbitrary since in this layer this part have not encountered any limitation, in this case, we will leave this part with <arbitrary> tag to avoid future asserts since we are traveling from Tail-To-Head.

Additionally, if <arbitrary> encounted unspecificed assert, such as len(), we can modify it by adding a limitation property <arbitrary limit='length < 7'>.

Or we can re-travel the tree from head-to-tail, using ToT to reconstruct arbitrary

import os

class UserInput:
   def __init__(self,input) -> None:
       self.input = input
       
   def __setattr__(self, name, value):
       print(f'[!] {name} now is {value}')
       super().__setattr__(name, value)

def vuln(input=input('Your address: ')):
   # ANALYSIS-TOP: [<arbitrary>, <arbitrary>, <arbitrary>retr0reglove<COMMAND>]

   user_input = UserInput(input=input).input

   # ANALYSIS: tytytyty<arbitrary>, <arbitrary>, <arbitrary>retr0reglove<COMMAND>

   if 'tytytyty' not in user_input[:7]:
       exit(0)
   else:
       user_input = user_input[7:]

   # ANALYSIS: <arbitrary>, <arbitrary>, <arbitrary>retr0reglove<COMMAND>
   user_input = user_input.split(',')

   # ANALYSIS UNSURE TYPE, lookahead simulations
   # ANALYSIS: [<arbitrary>, <arbitrary>, <arbitrary>retr0reglove<COMMAND>]
   user_input = user_input[3]
   
   # ANALYSIS: <arbitrary>retr0reglove<COMMAND>,
   #           retr0reg<arbitrary>love<COMMAND>
   if not 'retr0reg' in user_input:
       exit()

   # ANALYSIS-TAIL: <arbitrary>love<COMMAND>
   user_input = user_input.split('love')[1] 
   
   return os.system(user_input)

vuln()

Thought To Reality

Setting up AST nods

Think might be easier than doing, but in this case both is hard. To turn our ideal model in to reality acutally took lots of efforts. To begin with, we started on buiding a source-to-sink alike framework using Python AST with reverse-tracking. However, the AST do not support direct reverse-traveling, thus we will have to assign each node with a fwd, bck and id similar to glibc bidirectional linked list in dynamic memory allocation; while adding parents and children to remain the tree property of our AST.

class ToAfy(ast.NodeVisitor):
    
    """ Adding father and children attribution to every node so we can reverse travel the AST """
    
    def __init__(self, target_function=None, target_variable=None):
        self.parent = None
        self.last_id = 0
        self.last_node = None
        self.target_function = target_function
        self.target_variable = target_variable
        self.vulnerable_note = None

        
    def generic_visit(self, node):
        node.node_id = self.last_id
        self.last_id += 1

        # print(f'setting {node.id}')
        
        if self.last_node is not None:
            self.last_node.fwd = node
            node.bck = self.last_node
        else:
            node.bck = None
            
        node.fwd = None
        self.last_node = node
        # if node.bck and self.last_node:
        #     print(f'{node.id}: last_node\'s fwd {self.last_node.id}, note\'s bck {node.bck.id}')
        
        if not hasattr(node, 'parent'):
            node.parent = self.parent
        if not hasattr(node, 'children'):
            node.children = []

Additionally, we will have to locate the node that's representing our vulnerable code segment. this usually takes other round of positive directional traveling, nevertheless, if we sets the fwd, bck, id while try locating that vulnerability sink, we can manage to use only once of positive directional traveling of the AST.

    if isinstance(node, ast.Call):
            full_function_name = get_full_function_name(node.func)
            if full_function_name == self.target_function:
                for arg in node.args:
                    if isinstance(arg, ast.Name) and arg.id == self.target_variable:
                        # VULN
                        self.vulnerable_note = node

Finally wrap up setting parent nodes.

        self.parent = node
        current_last_node = self.last_node
        for child in ast.iter_child_nodes(node):
            node.children.append(child)
            self.visit(child)
        self.last_node = current_last_node 
        ## RESTORING self.last_node

About

Tree-of-AST: Python Code Analysis framework inspired by ToT (Tree-of-Thoughts) of Deliberate Code analysing states using ToT-based ways.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published