-
Notifications
You must be signed in to change notification settings - Fork 344
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Efficiently Loading a Large Time Series Dataset into KairosDB #647
Comments
can you please share the CSV?
…On Mon, Sep 6, 2021 at 12:22 AM Abdelouahab Khelifati < ***@***.***> wrote:
I am trying to load 100 billion multi-dimensional time series datapoints
into KairosDB from a CSV file with the following format:
timestamp value_1 value_2 .... value_n
I tried to find a fast loading method on the official documentation and
here's how I am currently doing the insertion (my codebase is in Python):
f = open(args.file, "r")
Insert
i = 0
with tqdm(total=int(rows)) as pbar:
while i < len(range(rows)):
data = []
batch_size = 65000 / column
while i < len(range(rows)) and batch_size > 0:
batch_size -= 1
# print(batch_size)
i += 1
values = f.readline()[:-1].split(" ")
t = (get_datetime(values[0])[0] - datetime(1970, 1, 1)).total_seconds() *
1000
t = int(t)
for j in range(column):
data.append({
"name": "master.data",
"datapoints": [[t, values[j + 1]]],
"tags": {
"dim": "dim" + str(j)
}
})
r = requests.post("http://localhost:8080/api/v1/datapoints", data =
json.dumps(data))
pbar.update(65000 / column)
pbar.close()
As the code above shows, my code is reading the dataset CSV file and
preparing batches of 65000 data points, then sending the datapoints using
requests.post.
However, this method is not very efficient. In fact, I am trying to load
100 billion data points and this is taking way longer than expected,
loading only 3 Million rows with 100 columns each has been running for 29
hours and still has 991 hours to finish!!!!
enter image description here
I am certain there is a better way to load the dataset into KairosDB. Any
suggestions for a faster loading better.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#647>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AEJX3WNVOX3DLD3USEX7XNDUAO37VANCNFSM5DPCUD6A>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
--
*Thanks and Regards,*
[image: Knowledge Lens Pvt. Ltd.] <https://www.knowledgelens.com>
http://www.knowledgelens.com
Biswajit Sahu
Senior Tech. Lead
Mobile: +919124392903
*Our Expertise:* Big Data | Artificial Intelligence | Internet of Things ||Our
Expertise: *Our Products*: GLens | iLens | MLens |AiLens | SearchLens
|
Is it possible for you to share the CSV file? |
I posted a comment on the forum in response to this, that should be sufficient. |
Hello, thanks for your response!
Please find my CSV file in the link hereby
<https://drive.google.com/file/d/11ECeoGj2g-qxFrpSktoy1VQnoi63fMRf/view?usp=sharing>
.
Yours sincerely,
Abdel
…On Wed, Sep 15, 2021 at 3:44 PM Brian Hawkins ***@***.***> wrote:
I posted a comment on the forum in response to this, that should be
sufficient.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#647 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADUPDQXRSSVOYRFUB2O7YCDUCCPNBANCNFSM5DPCUD6A>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I am trying to load 100 billion multi-dimensional time series datapoints into KairosDB from a CSV file with the following format:
timestamp value_1 value_2 .... value_n
I tried to find a fast loading method on the official documentation and here's how I am currently doing the insertion (my codebase is in Python):
As the code above shows, my code is reading the dataset CSV file and preparing batches of 65000 data points, then sending the datapoints using requests.post.
However, this method is not very efficient. In fact, I am trying to load 100 billion data points and this is taking way longer than expected, loading only 3 Million rows with 100 columns each has been running for 29 hours and still has 991 hours to finish!!!!
I am certain there is a better way to load the dataset into KairosDB. Any suggestions for a faster loading better.
The text was updated successfully, but these errors were encountered: