Efficiently Loading a Large Time Series Dataset into KairosDB #647

AKheli · 2021-09-05T18:52:24Z

I am trying to load 100 billion multi-dimensional time series datapoints into KairosDB from a CSV file with the following format:

timestamp value_1 value_2 .... value_n

I tried to find a fast loading method on the official documentation and here's how I am currently doing the insertion (my codebase is in Python):

f = open(args.file, "r")
# Insert
i = 0
with tqdm(total=int(rows)) as pbar:
    while i < len(range(rows)):   
        data = []
        batch_size = 65000 / column
        while i < len(range(rows)) and batch_size > 0:
            batch_size -= 1
            # print(batch_size)
            i += 1
            values = f.readline()[:-1].split(" ")
            t = (get_datetime(values[0])[0] - datetime(1970, 1, 1)).total_seconds() * 1000
            t = int(t)
            for j in range(column):
                    data.append({
                            "name": "master.data",
                            "datapoints": [[t, values[j + 1]]],
                            "tags": {
                                    "dim": "dim" + str(j)
                            }
                    })    
        r = requests.post("http://localhost:8080/api/v1/datapoints", data = json.dumps(data))
        pbar.update(65000 / column)
pbar.close()

As the code above shows, my code is reading the dataset CSV file and preparing batches of 65000 data points, then sending the datapoints using requests.post.

However, this method is not very efficient. In fact, I am trying to load 100 billion data points and this is taking way longer than expected, loading only 3 Million rows with 100 columns each has been running for 29 hours and still has 991 hours to finish!!!!

I am certain there is a better way to load the dataset into KairosDB. Any suggestions for a faster loading better.

The text was updated successfully, but these errors were encountered:

biswaKL · 2021-09-06T03:07:39Z

can you please share the CSV?

…

On Mon, Sep 6, 2021 at 12:22 AM Abdelouahab Khelifati < ***@***.***> wrote: I am trying to load 100 billion multi-dimensional time series datapoints into KairosDB from a CSV file with the following format: timestamp value_1 value_2 .... value_n I tried to find a fast loading method on the official documentation and here's how I am currently doing the insertion (my codebase is in Python): f = open(args.file, "r") Insert i = 0 with tqdm(total=int(rows)) as pbar: while i < len(range(rows)): data = [] batch_size = 65000 / column while i < len(range(rows)) and batch_size > 0: batch_size -= 1 # print(batch_size) i += 1 values = f.readline()[:-1].split(" ") t = (get_datetime(values[0])[0] - datetime(1970, 1, 1)).total_seconds() * 1000 t = int(t) for j in range(column): data.append({ "name": "master.data", "datapoints": [[t, values[j + 1]]], "tags": { "dim": "dim" + str(j) } }) r = requests.post("http://localhost:8080/api/v1/datapoints", data = json.dumps(data)) pbar.update(65000 / column) pbar.close() As the code above shows, my code is reading the dataset CSV file and preparing batches of 65000 data points, then sending the datapoints using requests.post. However, this method is not very efficient. In fact, I am trying to load 100 billion data points and this is taking way longer than expected, loading only 3 Million rows with 100 columns each has been running for 29 hours and still has 991 hours to finish!!!! enter image description here I am certain there is a better way to load the dataset into KairosDB. Any suggestions for a faster loading better. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#647>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AEJX3WNVOX3DLD3USEX7XNDUAO37VANCNFSM5DPCUD6A> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

-- *Thanks and Regards,* [image: Knowledge Lens Pvt. Ltd.] <https://www.knowledgelens.com> http://www.knowledgelens.com Biswajit Sahu Senior Tech. Lead Mobile: +919124392903 *Our Expertise:* Big Data | Artificial Intelligence | Internet of Things ||Our Expertise: *Our Products*: GLens | iLens | MLens |AiLens | SearchLens

biswaKL · 2021-09-06T03:07:51Z

Is it possible for you to share the CSV file?

brianhks · 2021-09-15T13:44:37Z

I posted a comment on the forum in response to this, that should be sufficient.

AKheli · 2021-09-15T13:56:19Z

Hello, thanks for your response! Please find my CSV file in the link hereby <https://drive.google.com/file/d/11ECeoGj2g-qxFrpSktoy1VQnoi63fMRf/view?usp=sharing> . Yours sincerely, Abdel

…

On Wed, Sep 15, 2021 at 3:44 PM Brian Hawkins ***@***.***> wrote: I posted a comment on the forum in response to this, that should be sufficient. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#647 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADUPDQXRSSVOYRFUB2O7YCDUCCPNBANCNFSM5DPCUD6A> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Efficiently Loading a Large Time Series Dataset into KairosDB #647

Efficiently Loading a Large Time Series Dataset into KairosDB #647

AKheli commented Sep 5, 2021 •

edited

biswaKL commented Sep 6, 2021 via email

biswaKL commented Sep 6, 2021

brianhks commented Sep 15, 2021

AKheli commented Sep 15, 2021 via email

Efficiently Loading a Large Time Series Dataset into KairosDB #647

Efficiently Loading a Large Time Series Dataset into KairosDB #647

Comments

AKheli commented Sep 5, 2021 • edited

biswaKL commented Sep 6, 2021 via email

biswaKL commented Sep 6, 2021

brianhks commented Sep 15, 2021

AKheli commented Sep 15, 2021 via email

AKheli commented Sep 5, 2021 •

edited