Universal rank-order method

Highly vectorized python implementation of a universal rank-order method to analyze noisy data. The method is valuable for

nonlinear regression with heavy-tailed noise, and for
distinguishing between deterministic chaos and stochastic noise.

The method is briefly outlined in the section nonlinear regression and detailed in the publication

G. Ierley and A. Kostinski. Universal rank-order transform to extract signals from noisy data. Phys Rev. X 9, 031039 (arXiv version)

Nonlinear regression

The aim of nonlinear regression is to extract parameters from data through a mathematical model. Concretely, the regression using the rank-order method requires

data points (x_k, y_ik), with n_s sampling locations x_k and n_r repeated observations y_ik at each loaction, and
a model function f(x, ξ) with adjustable parameters ξ = (ξ₁, ..., ξ_{n_p}).

These ingredients are illustrated in Fig. 1 for certain values of the parameters ξ. From the residuals r_ik = y_ik - f(x_k, ξ) the method estimates the remaining signal content by a universal transform. The optimal parameters ξ^* are then found by numerically minimizing the signal content in the residuals.

Figure 1: Ingredients for nonlinear regression

Universal transform

The universal transform consists of the following steps as exemplified in Fig. 2.

The transform starts with the residual matrix r_ik. All residuals are ranked with increasing values for each row separately (blue shading) and build up the entries of the rank matrix R_ik. In other words, the element r_ik has rank R_ik in the ith row of the residual matrix.
From the rank matrix R_ik the occurances of each rank value are counted for each column separately (orange shading) and are summarized in the population matrix P_rk. That is the rank r appears P_rk times in the kth column of the rank matrix.
The population matrix P_rk is partitioned once in vertical and in horizontal direction between the jth and (j+1)th row and kth and (k+1)th column, respectively (green cross). From the partition the element

follows, where the angle brackets ⟨·⟩ denote the mean over the highlighted matrix elements.

In the end we obtain the Q matrix. Finally, the signal content in the residuals is measured by the root mean square (rms) of the elements Q_jk:

The ranking makes the method robust to outliers as ranks are ignorant to the precise residual values. Thus the universal regression method is valuable for data with heavy-tailed noise, where it outperforms the least-squares method.

Figure 2: Steps of the universal transform

Implementation

As every regression analysis requires many universal transforms, the speed of the transform is crucial. It is improved by using vectorized functions from numpy and a method with better time complexity as presented in the publication

D. Kestner, G. Ierley and A. Kostinski. A fast algorithm for computing a matrix transform used to detect trends in noisy data. Comput. Phys. Commun. 254, 107382 (arXiv version)

Installation

The package can be installed and updated via the package manager pip with either of the following commands.

# install package
pip install git+https://github.com/samuehae/rankorder.git

# install package with additional dependencies for running examples
pip install git+https://github.com/samuehae/rankorder.git#egg=rankorder[examples]

The usage of the package is demonstrated with the Python scripts included in the folder examples.

The authors of the publication Comput. Phys. Commun. 254, 107382 provide implementations for Julia and Matlab.

Practical challenges and solutions

This section discusses practical challenges of the universal regression method and suggests potential solutions. Most of them are related to the discrete nature of method.

Additive offsets

The universal method ignores additive offsets in the data, since the ranking step discards any information about values. As a consequence the model function can and should neglect additive offsets (see examples/hyperbola_fit.py).

If desired additive offsets can be retrieved from the residuals after fitting, for example as the mean ⟨r_ik⟩. This follows from the unweighted least-squares method with a constant model function f(x, ξ) = ξ.

Missing data

In practice data sets might have missing data points, which should be excluded from the regression analysis. Such points (x_k, y_ik) can simply be masked by setting the variable y_ik to numpy.nan (see examples/hyperbola_fit.py).

Numerical minimization

The rough structure of the cost function Q_rms versus the parameters ξ may trap the optimization routine in local minima (see examples/biexponential_fit.py). This challenge may be solved with the following suggestions.

Try different minimization routines that are free of derivatives. For example the Nelder-Mead (scipy.optimize.minimize) or Basin-hopping (scipy.optimize.basinhopping) routines.
Tune the arguments of the minimization routines, such as the initial guess and the tolerances.
Plot the cost function Q_rms versus the parameters ξ or a subset thereof. This may help to shift the initial guess closer to the global minimum.
Reduce the number of free parameters n_p if possible. For example by fixing parameters with separate measurements or by using a simpler model function.
To smoothen the cost function, increase the number of repetitions n_r by collecting or simulating observations. For simulating observations, the measured observations are copied and weak normal noise is added. An appropriate choice for the standard deviation is a fraction of the uncertainty inherent to the data.

Random noise

Data with less random noise exhibits a sharper minimum in the root mean square Q_rms versus the fit parameters ξ around their optimal values ξ^* (see examples/hyperbola_fit.py).

Accordingly it is challenging to find the minimum for data with small noise level. Pre-whitening the data can thus help to find the global minimum. A suitable choice is to add normal noise with a standard deviation being a fraction of the uncertainty inherent to the data. The normal noise can be created with the function numpy.random.normal.

Ties in ranking

During the ranking step it may happen that the same residual value appears several times. Especially for data with little noise, where the residuals can become close to zero. Such ties can be treated in different ways (see function rankorder.fitting.q_rms_fit):

Ordinal ranking methods break ties according to the location of the elements in the residual matrix. The method might introduce biases, which can be mitigated as follows.
Pre-whitening the data with random noise avoids ties in the first place and randomizes remaining ties during the ordinal ranking method. The pre-whitening is detailed in the section random noise.
Random ranking resolves ties by shuffling the order of ties randomly. A side effect is that the minimization needs to deal with a fluctuating cost function (root mean square Q_rms).

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
examples		examples
figures		figures
rankorder		rankorder
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

examples

examples

figures

figures

rankorder

rankorder

tests

tests

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

setup.py

setup.py

Repository files navigation

Universal rank-order method

Nonlinear regression

Universal transform

Implementation

Installation

Practical challenges and solutions

Additive offsets

Missing data

Numerical minimization

Random noise

Ties in ranking

About

Releases

Packages

Languages

License

samuehae/rankorder

Folders and files

Latest commit

History

Repository files navigation

Universal rank-order method

Nonlinear regression

Universal transform

Implementation

Installation

Practical challenges and solutions

Additive offsets

Missing data

Numerical minimization

Random noise

Ties in ranking

About

Topics

Resources

License

Stars

Watchers

Forks

Languages