Skip to content

jupiters1117/mico

Repository files navigation

License

MICO: Mutual Information and Conic Optimization for feature selection

MICO is a Python package that implements a conic optimization based feature selection method with mutual information (MI) measure [1]. The idea behind the approach is to measure the features’relevance and redundancy using MI, and then formulate a feature selection problem as a pure-binary quadratic optimization problem, which can be heuristically solved by an efficient randomization algorithm via semidefinite programming [2]. Optimization software Colin [6] is used for solving the underlying conic optimization problems.

This package

  • implements three methods for feature selections:
    • MICO : Conic Optimization approach
    • MIFS : Forward Selection approach
    • MIBS : Backward Selection approach
  • supports three different MI measures:
    • JMI : Joint Mutual Information [3]
    • JMIM : Joint Mutual Information Maximisation [4]
    • MRMR : Max-Relevance Min-Redundancy [5]
  • generates feature importance scores for all selected features.
  • provides scikit-learn compatible APIs.

Installation

  1. Download Colin distribution from http://www.colinopt.org/downloads.php and unpack it into a chosen directory (<CLNHOME>). Then install Colin package:
cd <CLNHOME>/python
pip install -r requirements.txt
python setup.py install
  1. To install MICO package, use:
pip install -r requirements.txt
python setup.py install

or

pip install colin-mico

To install the development version, you may use:

pip install --upgrade git+https://github.com/jupiters1117/mico

Usage

This package provides scikit-learn compatible APIs:

  • fit(X, y)
  • transform(X)
  • fit_transform(X, y)

Examples

The following example illustrates the use of the package:

import pandas as pd
from sklearn.datasets import load_breast_cancer

# Prepare data.
data = load_breast_cancer()
y = data.target
X = pd.DataFrame(data.data, columns=data.feature_names)

# Perform feature selection.
mico = MutualInformationConicOptimization(verbose=1, categorical=True)
mico.fit(X, y)

# Populate selected features.
print("Selected features: {}".format(mico.get_support()))

# Populate feature importance scores.
print("Feature importance scores: {}".format(mico.feature_importances_))

# Call transform() on X.
X_transformed = mico.transform(X)

Documentation

User guide, examples, and API are available here.

References

[1]T Naghibi, S Hoffmann and B Pfister, "A semidefinite programming based search strategy for feature selection with mutual information measure", IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(8), pp. 1529--1541, 2015. [Pre-print]
[2]M Goemans and D Williamson, "Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming", J. ACM, 42(6), pp. 1115--1145, 1995 [Pre-print]
[3]H Yang and J Moody, "Data Visualization and Feature Selection: New Algorithms for Nongaussian Data", NIPS 1999. [Pre-print]
[4]M Bennasar, Y Hicks, abd R Setchi, "Feature selection using Joint Mutual Information Maximisation", Expert Systems with Applications, 42(22), pp. 8520--8532, 2015 [pre-print]
[5]H Peng, F Long, and C Ding, "Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy", IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(8), pp. 1226--1238, 2005. [Pre-print]
[6]Colin: Conic-form Linear Optimizer (www.colinopt.org).

Credits

  • KuoLing Huang, 2019-presents

Licensing

MICO is 3-clause BSD licensed.

Note

MICO is heavily inspired from MIFS: Parallelized Mutual Information based Feature Selection module by Daniel Homola.