Skip to content

tofti/python-id3-trees

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

python-trees

python implementation of id3 classification trees. id3 is a machine learning algorithm for building classification trees developed by Ross Quinlan in/around 1986.

The algorithm is a greedy, recursive algorithm that partitions a data set on the attribute that maximizes information gain. The information gain of attribute A is defined as the difference between the entropy of a data set S and the size weighted average entropy for sub datasets S' of S when split on attribute A.

This implementation was informed by Dr. Lutz Hamel's notes here. A widely cited text on decision trees is Machine Learning, by Tim Mitchell, you can find pages relevant to id3 here.

There are also some readable notes on information gain from University of Washington here.

Running the code

Run the code with the python interpreter:

python id3.py ./resources/<config.cfg>

Where config.cfg is a plain text configuration file. The format of the config file is a python abstract syntax tree representing a dict with the following fields:

{ 'data_file' : '\\resources\\tennis.csv', 'data_project_columns' : ['Outlook', 'Temperature', 'Humidity', 'Windy', 'PlayTennis'], 'target_attribute' : 'PlayTennis' }

You have to specify:

  • relative path to the csv data_file
  • which columns to project from the file (useful if you have a large input file, and are only interested in a subset of columns)
  • the target attribute, that you want to predict.

Docker

FROM python:3.6.8-alpine

WORKDIR /usr/src/app
RUN apk add --no-cache git && git clone https://github.com/tofti/python-id3-trees.git

WORKDIR /usr/src/app/python-id3-trees

ENTRYPOINT [ "python", "id3.py" ]

To run the built in examples:

docker run tofti-id3-trees ./resources/tennis.cfg

Or your own example after creating a config file, and csv data file:

docker run -v <localpath>:/<dockerpath>" tofti-id3-trees <dockerpath>/config.cfg

e.g.

docker run -v "/c/Users/tofti/dvol/id3:/data" tofti-id3-trees /data/credithistory_test.cfg

Examples

  1. tennis.cfg is the 'Play Tennis' example from Machine Learning, by Tim Mitchell, also used by Dr. Lutz Hamel in his lecture notes, both referenced above.
  2. credithistory.cfg is the credit risk assement example from Artificial Intelligence: Structures and Strategies for Complex Problem Solving (6th Edition), Luger, see Table 10.1 & Figure 10.14 (full text is available online asof 11/19/2017).

Results

results

TODO

  • Add code to classify data.
  • Add code to prune rules (C4.5 modifications)

Releases

No releases published

Packages

No packages published

Languages