Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calculation in katz_fd seems to be incorrect #34

Open
Khaktos opened this issue Feb 13, 2024 · 1 comment
Open

Calculation in katz_fd seems to be incorrect #34

Khaktos opened this issue Feb 13, 2024 · 1 comment

Comments

@Khaktos
Copy link

Khaktos commented Feb 13, 2024

The current implementation of katz_fd in antropy/fractal.py looks like the following:

def katz_fd(x, axis=-1):
  x = np.asarray(x)
  dists = np.abs(np.diff(x, axis=axis))
  ll = dists.sum(axis=axis)
  ln = np.log10(ll / dists.mean(axis=axis))
  aux_d = x - np.take(x, indices=[0], axis=axis)
  d = np.max(np.abs(aux_d), axis=axis)
  kfd = np.squeeze(ln / (ln + np.log10(d / ll)))
  if not kfd.ndim:
      kfd = kfd.item()
  return kfd

However, both in the documentation and in the original paper the distance (dist and d) is defined as the Euclidean distance between the given points, which is calculated differently.

A BASIC implementation was also provided in the Appendix of the original paper:
image

The important differences are in lines 130 and 150. (Note: the exponential operator ^ probably missing due to scanning/printing issues)

With this and by looking at another implementation in MATLAB, I think the above code for katz_fd should be changed to represent the original definition. However, I do not know if this change would break existing works using the current (apparently incorrect) functionality, so I propose the following change:

  • Do not remove the current implementation, but include a flag to switch between the "legacy" version and the new version.
  • Add a warning if the legacy version is used
  • Change the dists and d calculations to something like the following (Note: this was only tested with single channel data)
ind = np.arange(len(x)-1)
A = np.stack((ind,x[:-1]))
B = np.stack((ind+1,x[1:]))
dists = np.linalg.norm(B-A,axis=0)

ind = np.arange(len(x))
A = np.stack((ind,x))
first = np.reshape([0,x[0]],(2,1))
aux_d = np.linalg.norm(A-first,axis=0)
d = np.max(aux_d)

If there are any comments or suggestions, please let me know.
(Also this is my first contribution to an open source project, so I am a bit unsure about the etiquette/conventions)

@raphaelvallat
Copy link
Owner

Hi @Khaktos,

Thank you for the deep dive and catching this issue ! My opinion is that if the current implementation is incorrect, then we should just go ahead and release a new version of antropy with the correction. Users can decide to downgrade (or not upgrade) their version if they really want to maintain to the old behavior.

If that sounds ok, could you please submit a PR, ideally including unit tests against the Matlab implementation (assuming that it is correct)?

Thanks again,
Raphael

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants