ABOD weighting schemes / not matching the definition? #522

kno10 · 2023-08-16T09:38:19Z

The ABOD implementation of PyOD tends to perform very similar to kNN, because the scoring is dominated by the squared (!) distances.

The following lines:

pyod/pyod/models/abod.py

Lines 50 to 53 in 6c77e27

    
           # wcos = (<a_curr, b_curr>/((|a_curr|*|b_curr|)^2) 
        
           wcos = np.dot(a_curr, b_curr) / ( 
        
                   np.linalg.norm(a_curr, 2) ** 2) / ( 
        
                          np.linalg.norm(b_curr, 2) ** 2)

while closely reflecting the original paper, may be reflecting a issue and inconsistency of that paper.
$$ABOF=VAR\left(\frac{\langle AB,AC \rangle}{||AB||^2 \cdot ||AC||^2}\right)$$
uses the product of the squared weights for normalization, which puts an extreme high emphasis on distance (hence causing a high similarity to kNN).
But if you then look how the equation is expanded, another factor is added:
$$=\frac{\sum_B\sum_C \left(\frac{1}{||AB||\cdot ||AC||}\cdot\frac{\langle AB, AC \rangle}{||AB||^2 \cdot ||AC||^2}\right)^2}{\sum_B\sum_C \frac{1}{||AB||\cdot ||AC||}} - \left(\frac{\sum_B\sum_C \frac{1}{||AB||\cdot ||AC||}\cdot\frac{\langle AB, AC \rangle}{||AB||^2 \cdot ||AC||^2}}{\sum_B\sum_C \frac{1}{||AB||\cdot ||AC||}}\right)^2$$
(clearly resembling $E[X^2]-E[X]^2$, with each point weighted by $\frac{1}{||AB||\cdot ||AC||}$, which would match the text).
Note that this would simplify to having even the cubic(!) distances in there.

The IMHO correct version would be the following:
$$ABOF=VAR_{weighted}\left(\frac{\langle AB,AC \rangle}{||AB|| \cdot ||AC||}\right)$$ $$=\frac{\sum_B\sum_C \left(\frac{1}{||AB||\cdot ||AC||}\cdot\frac{\langle AB, AC \rangle}{||AB|| \cdot ||AC||}\right)^2}{\sum_B\sum_C \frac{1}{||AB||\cdot ||AC||}} - \left(\frac{\sum_B\sum_C \frac{1}{||AB||\cdot ||AC||}\cdot\frac{\langle AB, AC \rangle}{||AB|| \cdot ||AC||}}{\sum_B\sum_C \frac{1}{||AB||\cdot ||AC||}}\right)^2$$
which then can further be simplified to:
$$=\frac{\sum_B\sum_C \left(\frac{\langle AB, AC \rangle}{||AB||^2 \cdot ||AC||^2}\right)^2}{\sum_B\sum_C \frac{1}{||AB||\cdot ||AC||}} - \left(\frac{\sum_B\sum_C \frac{\langle AB, AC \rangle}{||AB||^2 \cdot ||AC||^2}}{\sum_B\sum_C \frac{1}{||AB||\cdot ||AC||}}\right)^2$$
We now got the squares in the equation seen before, one coming from the weighting scheme and one from the angle.
Note that this computation is likely to produce numerical instabilities, though, and I would rather use $1/\sqrt{||AB|| \cdot ||AC||}$ for weighting (which uses the geometric mean distance, instead of the product of distances).

Hence,

there may be an issue in the original ABOD paper
the "Var" is not the standard variance call, but a weighted variance, as the expansion of the equation shows. The current PyOD implementation does not implement this:

pyod/pyod/models/abod.py

Line 89 in 6c77e27

return np.var(wcos_list)
it may be good to allow the user to choose among the two "angle" computations (with the extra square, and the standard definition of angle), and three weighting schemes ((1) unweighted variance, as currently implemented in PyOD, (2) variance weighted with 1/(d1 * d2) as in the ABOD paper, and (3) with the geometric mean distance to the neighbors, 1/sqrt(d1 * d2), which makes the weighting weaker)
the code needs to be carefully optimized for numerial issues, in particular as some of these values can become very small fast due to products of two powers of three.

Note that also, the "norm" function in the top snipped involves a square root that is then followed by a square.

Now I argue that the most meaningful variant is using standard angles is very simple. It is the only version that is invariant to scaling the data set with a constant. If we multiply all lengths by 2,
$$\left(\frac{\langle 2AB,2AC \rangle}{||2AB||^2 \cdot ||2AC||^2}\right)=\left(\frac{4\langle AB,AC \rangle}{4||AB||^2 \cdot 4||AC||^2}\right)=\frac{1}{4} \left(\frac{\langle AB,AC \rangle}{||AB||^2 \cdot ||AC||^2}\right)$$ Removing the extra squares - using the standard angle - resolve this and makes the angle invariant to scaling with a constant.

yzhao062 · 2023-08-28T18:30:16Z

This is deeply appreciated! Since I am just moving between states, will need to get back on this once settling down :)

kno10 changed the title ~~ABOD weighting schemes~~ ABOD weighting schemes / not matching the definition? Aug 16, 2023

yzhao062 self-assigned this Aug 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ABOD weighting schemes / not matching the definition? #522

ABOD weighting schemes / not matching the definition? #522

kno10 commented Aug 16, 2023 •

edited

yzhao062 commented Aug 28, 2023

ABOD weighting schemes / not matching the definition? #522

ABOD weighting schemes / not matching the definition? #522

Comments

kno10 commented Aug 16, 2023 • edited

yzhao062 commented Aug 28, 2023

kno10 commented Aug 16, 2023 •

edited