Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Horribly slow on PyPy #44

Open
ovanes opened this issue Jun 2, 2018 · 2 comments
Open

Horribly slow on PyPy #44

ovanes opened this issue Jun 2, 2018 · 2 comments

Comments

@ovanes
Copy link

ovanes commented Jun 2, 2018

I ran the tests below and found out that on PyPy tdigest is horribly slow.

# -*- coding: utf-8 -*-
from __future__ import print_function

import sys
from tdigest import TDigest
from numpy.random import randint, random
from time import time

def make_tdigest(items):
    result = TDigest()
    for _ in range(items):
        result.update(random())
    return result


def make_tdigest2(items):
    result = TDigest()
    result.batch_update(random(items))
    return result


def tdigests(count, factory):
    i = 0
    for _ in range(count):
        i+=1
        if i%100==0:
            print('generated items:', i)
        yield dict(timestamp=randint(1,15), tdigest=factory(500))


if __name__=='__main__':
    print('running test in', sys.version)

    print('generating tdigests in batch')
    start = time()
    result = [t for t in tdigests(100, make_tdigest2)]
    end = time() - start
    print('generating tdigests took:', end)
    print('----------')

    print('generating tdigests one by one')
    start = time()
    tdigests = [t for t in tdigests(100, make_tdigest)]
    end = time() - start
    print('generating tdigests took:', end)
    print('----------')
==========
PyPy
running test in 2.7.13 (0e7ea4fe15e82d5124e805e2e4a37cae1a402d4b, Jan 06 2018, 12:46:49)
[PyPy 5.10.0 with GCC 4.2.1 Compatible Apple LLVM 9.0.0 (clang-900.0.39.2)]
generating tdigests in batch
generated items: 100
generating tdigests took: 32.5672068596
----------
generating tdigests one by one
generated items: 100
generating tdigests took: 17.4209430218
----------

==================
Python
running test in 2.7.14 (default, Mar  9 2018, 23:57:12) 
[GCC 4.2.1 Compatible Apple LLVM 9.0.0 (clang-900.0.39.2)]
generating tdigests in batch
generated items: 100
generating tdigests took: 4.16117596626
----------
generating tdigests one by one
generated items: 100
generating tdigests took: 2.38711595535
----------

I've repeated the test in the official PyPy docker container with PyPy 6.0.0 (compatible with python 3) with the same outcome: https://hub.docker.com/_/pypy/

Any ideas?

@CamDavidsonPilon
Copy link
Owner

No ideas, no - unfortunately I don't know much about PyPy

@ovanes
Copy link
Author

ovanes commented Jun 2, 2018

I created an issue in PyPy's repo
https://bitbucket.org/pypy/pypy/issues/2845/tdigest-with-pypy-is-8x-slower-than-with

The first explanation (without profiling and verification) is that AccumulationTree uses Cython and it is known to be slower on PyPy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants