Skip to content

Online statistics implementations, including average, variance and standard deviation; exponentially weighted versions as well.

License

Notifications You must be signed in to change notification settings

tupol/online-stats

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

online-stats

Maven Central   GitHub   Travis (.org)   Codecov   Javadocs   Gitter   Twitter  

Scope

Naive implementation of a few online statistical algorithms.

The idea behind this implementation is to be used as a tool for stateful streaming computations.

Algos covered so far:

  • 4 statistical moments and the derived features:
    • average
    • variance and standard deviation
    • skewness
    • kurtosis
  • covariance
  • exponentially weighted moving averages and variance

Algos to be researched:

  • exponentially weighted moving skewness
  • exponentially weighted moving kurtosis

Using a more formal and mature library like Apache Commons Math is probably a better idea for production applications, but this is also tested against it.

Description

The main concepts introduced in this library are the Stats, EWeightedStats (exponentially weighted stats), VectorStats and Covariance. Each of them can be composed using either the append or the |+| functions.

For example, if we have a sequence of numbers, we can compute the statistics like this:

  val xs1 = Seq(1.0, 3.0)
  val stats1: Stats = xs1.foldLeft(Stats.Nil)((s, x) => s |+| x)
  val xs2 = Seq(5.0, 7.0)
  val stats2: Stats = xs2.foldLeft(Stats.Nil)((s, x) => s |+| x)
  val totalStats = stats1 |+| stats2
  val newStats = totalStats |+| 4.0

The Stats type with the |+| operation also form a monoid, since |+| has an identity (unit) element, Stats.Nil, and it is associative.

Also the |+| operation is also commutative, which makes appealing for distributed computing as well.

Same goes for VectorStats and Covariance.

EWeightedStats is an exception for now, as two EWeightedStats instances can not be composed. However, the |+| works between an EWeightedStats instance and a double.

Complexity

Feature Space Complexity (O) Time Complexity (O)
Count, Sum, Min, Max O(1) (1 * MU) O(1)
Average O(1) (2 * MU) O(1)
Variance, Standard deviation O(1) (3 * MU) O(1)
Skewness O(1) (4 * MU) O(1)
Kurtosis O(1) (5 * MU) O(1)
Exponentially weighted average O(1) (2 * MU) O(1)
Exponentially weighted variance O(1) (2 * MU) O(1)

MU: Memory Unit, e.g. Int: 4 bytes, Double 8: bytes

Demos and Examples

The streaming-anomalies-demos project was created to explore and demonstrate some basic use cases for the online-stats library.

References

About

Online statistics implementations, including average, variance and standard deviation; exponentially weighted versions as well.

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Packages

No packages published

Languages