You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
while closely reflecting the original paper, may be reflecting a issue and inconsistency of that paper. $$ABOF=VAR\left(\frac{\langle AB,AC \rangle}{||AB||^2 \cdot ||AC||^2}\right)$$
uses the product of the squared weights for normalization, which puts an extreme high emphasis on distance (hence causing a high similarity to kNN).
But if you then look how the equation is expanded, another factor is added: $$=\frac{\sum_B\sum_C \left(\frac{1}{||AB||\cdot ||AC||}\cdot\frac{\langle AB, AC \rangle}{||AB||^2 \cdot ||AC||^2}\right)^2}{\sum_B\sum_C \frac{1}{||AB||\cdot ||AC||}} - \left(\frac{\sum_B\sum_C \frac{1}{||AB||\cdot ||AC||}\cdot\frac{\langle AB, AC \rangle}{||AB||^2 \cdot ||AC||^2}}{\sum_B\sum_C \frac{1}{||AB||\cdot ||AC||}}\right)^2$$
(clearly resembling $E[X^2]-E[X]^2$, with each point weighted by $\frac{1}{||AB||\cdot ||AC||}$, which would match the text).
Note that this would simplify to having even the cubic(!) distances in there.
The IMHO correct version would be the following: $$ABOF=VAR_{weighted}\left(\frac{\langle AB,AC \rangle}{||AB|| \cdot ||AC||}\right)$$$$=\frac{\sum_B\sum_C \left(\frac{1}{||AB||\cdot ||AC||}\cdot\frac{\langle AB, AC \rangle}{||AB|| \cdot ||AC||}\right)^2}{\sum_B\sum_C \frac{1}{||AB||\cdot ||AC||}} - \left(\frac{\sum_B\sum_C \frac{1}{||AB||\cdot ||AC||}\cdot\frac{\langle AB, AC \rangle}{||AB|| \cdot ||AC||}}{\sum_B\sum_C \frac{1}{||AB||\cdot ||AC||}}\right)^2$$
which then can further be simplified to: $$=\frac{\sum_B\sum_C \left(\frac{\langle AB, AC \rangle}{||AB||^2 \cdot ||AC||^2}\right)^2}{\sum_B\sum_C \frac{1}{||AB||\cdot ||AC||}} - \left(\frac{\sum_B\sum_C \frac{\langle AB, AC \rangle}{||AB||^2 \cdot ||AC||^2}}{\sum_B\sum_C \frac{1}{||AB||\cdot ||AC||}}\right)^2$$
We now got the squares in the equation seen before, one coming from the weighting scheme and one from the angle.
Note that this computation is likely to produce numerical instabilities, though, and I would rather use $1/\sqrt{||AB|| \cdot ||AC||}$ for weighting (which uses the geometric mean distance, instead of the product of distances).
Hence,
there may be an issue in the original ABOD paper
the "Var" is not the standard variance call, but a weighted variance, as the expansion of the equation shows. The current PyOD implementation does not implement this:
it may be good to allow the user to choose among the two "angle" computations (with the extra square, and the standard definition of angle), and three weighting schemes ((1) unweighted variance, as currently implemented in PyOD, (2) variance weighted with 1/(d1 * d2) as in the ABOD paper, and (3) with the geometric mean distance to the neighbors, 1/sqrt(d1 * d2), which makes the weighting weaker)
the code needs to be carefully optimized for numerial issues, in particular as some of these values can become very small fast due to products of two powers of three.
Note that also, the "norm" function in the top snipped involves a square root that is then followed by a square.
Now I argue that the most meaningful variant is using standard angles is very simple. It is the only version that is invariant to scaling the data set with a constant. If we multiply all lengths by 2, $$\left(\frac{\langle 2AB,2AC \rangle}{||2AB||^2 \cdot ||2AC||^2}\right)=\left(\frac{4\langle AB,AC \rangle}{4||AB||^2 \cdot 4||AC||^2}\right)=\frac{1}{4} \left(\frac{\langle AB,AC \rangle}{||AB||^2 \cdot ||AC||^2}\right)$$ Removing the extra squares - using the standard angle - resolve this and makes the angle invariant to scaling with a constant.
The text was updated successfully, but these errors were encountered:
kno10
changed the title
ABOD weighting schemes
ABOD weighting schemes / not matching the definition?
Aug 16, 2023
The ABOD implementation of PyOD tends to perform very similar to kNN, because the scoring is dominated by the squared (!) distances.
The following lines:
pyod/pyod/models/abod.py
Lines 50 to 53 in 6c77e27
while closely reflecting the original paper, may be reflecting a issue and inconsistency of that paper.
$$ABOF=VAR\left(\frac{\langle AB,AC \rangle}{||AB||^2 \cdot ||AC||^2}\right)$$
$$=\frac{\sum_B\sum_C \left(\frac{1}{||AB||\cdot ||AC||}\cdot\frac{\langle AB, AC \rangle}{||AB||^2 \cdot ||AC||^2}\right)^2}{\sum_B\sum_C \frac{1}{||AB||\cdot ||AC||}} - \left(\frac{\sum_B\sum_C \frac{1}{||AB||\cdot ||AC||}\cdot\frac{\langle AB, AC \rangle}{||AB||^2 \cdot ||AC||^2}}{\sum_B\sum_C \frac{1}{||AB||\cdot ||AC||}}\right)^2$$ $E[X^2]-E[X]^2$ , with each point weighted by $\frac{1}{||AB||\cdot ||AC||}$ , which would match the text).
uses the product of the squared weights for normalization, which puts an extreme high emphasis on distance (hence causing a high similarity to kNN).
But if you then look how the equation is expanded, another factor is added:
(clearly resembling
Note that this would simplify to having even the cubic(!) distances in there.
The IMHO correct version would be the following:
$$ABOF=VAR_{weighted}\left(\frac{\langle AB,AC \rangle}{||AB|| \cdot ||AC||}\right)$$ $$=\frac{\sum_B\sum_C \left(\frac{1}{||AB||\cdot ||AC||}\cdot\frac{\langle AB, AC \rangle}{||AB|| \cdot ||AC||}\right)^2}{\sum_B\sum_C \frac{1}{||AB||\cdot ||AC||}} - \left(\frac{\sum_B\sum_C \frac{1}{||AB||\cdot ||AC||}\cdot\frac{\langle AB, AC \rangle}{||AB|| \cdot ||AC||}}{\sum_B\sum_C \frac{1}{||AB||\cdot ||AC||}}\right)^2$$
$$=\frac{\sum_B\sum_C \left(\frac{\langle AB, AC \rangle}{||AB||^2 \cdot ||AC||^2}\right)^2}{\sum_B\sum_C \frac{1}{||AB||\cdot ||AC||}} - \left(\frac{\sum_B\sum_C \frac{\langle AB, AC \rangle}{||AB||^2 \cdot ||AC||^2}}{\sum_B\sum_C \frac{1}{||AB||\cdot ||AC||}}\right)^2$$ $1/\sqrt{||AB|| \cdot ||AC||}$ for weighting (which uses the geometric mean distance, instead of the product of distances).
which then can further be simplified to:
We now got the squares in the equation seen before, one coming from the weighting scheme and one from the angle.
Note that this computation is likely to produce numerical instabilities, though, and I would rather use
Hence,
pyod/pyod/models/abod.py
Line 89 in 6c77e27
Note that also, the "norm" function in the top snipped involves a square root that is then followed by a square.
Now I argue that the most meaningful variant is using standard angles is very simple. It is the only version that is invariant to scaling the data set with a constant. If we multiply all lengths by 2,
$$\left(\frac{\langle 2AB,2AC \rangle}{||2AB||^2 \cdot ||2AC||^2}\right)=\left(\frac{4\langle AB,AC \rangle}{4||AB||^2 \cdot 4||AC||^2}\right)=\frac{1}{4} \left(\frac{\langle AB,AC \rangle}{||AB||^2 \cdot ||AC||^2}\right)$$ Removing the extra squares - using the standard angle - resolve this and makes the angle invariant to scaling with a constant.
The text was updated successfully, but these errors were encountered: