Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Global alignment kernel returns NaN for all timeseries #510

Open
NAThompson opened this issue Feb 28, 2024 · 1 comment
Open

Global alignment kernel returns NaN for all timeseries #510

NAThompson opened this issue Feb 28, 2024 · 1 comment

Comments

@NAThompson
Copy link

Describe the bug

gak(x,y) returns nan for all x,y.

To Reproduce

import random
import numpy
from math import pi as π
from tslearn.metrics import sigma_gak, gak

def test_reproduce():
    f0 = 20e9
    period = 1/f0
    ω0 = 2*π*f0
    # works at length 405; dies at 410:
    waveforms = numpy.empty(shape=(5, 410))
    times = numpy.linspace(-period/2, period/2, waveforms.shape[1])
    for i in range(waveforms.shape[0]):
        φ = random.gauss(0.0, 0.5)
        k = 3 + random.uniform(-0.5, 0.5)
        waveforms[i, :] = 0.5*(numpy.tanh(k*ω0*times + φ) + 1) + random.uniform(-0.05, 0.05)

    typical_values = 0.5*(numpy.tanh(3*ω0*times) + 1)
    σ = sigma_gak(typical_values)
    scores = numpy.empty(waveforms.shape[0])
    for i in range(waveforms.shape[0]):
        scores[i] = gak(typical_values, waveforms[i, :], sigma=σ)

    print(scores)
    print(gak(waveforms[0, :], waveforms[0, :], sigma=σ))

Expected behavior

The computation should not return NaN; maybe it needs to be stabilized with the log-sum-exp method?

Environment (please complete the following information):

  • OS: ARM MacOS
  • tslearn version: 0.6.3
@NAThompson NAThompson added the bug label Feb 28, 2024
@YannCabanes YannCabanes removed the bug label Mar 20, 2024
@YannCabanes
Copy link
Contributor

YannCabanes commented Mar 20, 2024

Hello @NAThompson,
Thank you for your issue.
Indeed, there is an overflow which can occur if the function gak is run on long time series which have very few variations.
We have: gak(s1, s2) = unnormalized_gak(s1, s2) / sqrt(unnormalized_gak(s1, s1) * unnormalized_gak(s2, s2))
For example, if s1 is a constant time series (worst possible scenario) of size n, the function unnormalized_gak(s1, s1) will return the number of possible warpings to match s1 with itself, each warping being optimal.
This number is equal to the number of paths in a grid of n rows and n columns to go from the top left cell to the bottom right cell when the only possible moves are going right, down or diagonally (right and down).
When s1 is a constant time series, unnormalized_gak(s1, s1) returns a float when n <= 405 an returns inf when n >= 406.
If s1 and s2 are both constant time series of size n, the product unnormalized_gak(s1, s1) * unnormalized_gak(s2, s2) returns a float when n <= 204 and returns inf when n >= 205.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants