Replies: 19 comments 69 replies
-
What is your use case: How often do you need to add or/and update documents? How critical is it to have your data indexed as quickly as possible? What is an acceptable duration between the time a document is sent to Meilisearch and the time it is searchable? Most of all regarding the last answers, why? Any other information that might help us understand your context: We are a marketing automation solution and we are looking to use move our mail search to Meilisearch. We have customers that also use us for transactional email analytics (AWS SES, Mailgun...) and many of them do tens of thousands of daily emails. The goal is to allow them to be able to search for mails which is why indexing has to be fast. This hasn't been the case. What we have also noticed is that tasks are stuck and did not process at all. I just checked and I have tasks stuck in queue since 4 days ago. I added a test document and it's yet to be indexed in over 50 minutes.
PS: you may need to paginate the task APIs - |
Beta Was this translation helpful? Give feedback.
-
The version of Meilisearch you are using How do you host Meilisearch? Is it on a Cloud provider? If yes, which one? If you send your documents by batch. How big are these batches? Dataset information Contact: - 100k records Booking: - 56k records Invoices: - 235k records Emails: - 416k records The language of the dataset. The settings of your index(es): What is your use case: How often do you need to add or/and update documents? Which type is it: - Both Documents are triggered and changes are made by users actions, the data is then typically needed in proceeding processes. How critical is it to have your data indexed as quickly as possible? - Critical Often users are creating records such as a customer contact. They would then search for/lookup that contact to add them to other records such as orders etc therefore if the contact is unavailable due to indexation it impacts the user experience. What is an acceptable duration between the time a document is sent to Meilisearch and the time it is searchable? - A few seconds When users are managing a larger dataset/Crm type product the search becomes critical to navigation and linking various resources together, say you create a contact, then an order, if you search for the contact to link the order and its not found or leave the contact page and there is a delay which prevents you finding that contact (until indexed) the experience is jarring and reflects badly on the search as users perceive this as "i cant find things". Misc PHP SDK, via Laravel. The Laravel integration uses Laravel Scout which includes no support for batching and simply pushes an update each time an object is updated. This suits most use-cases so is understandable and perfect for our use case, we had previously used Algolia but as our dataset grew (like most) it became unviable. Having battled with index performance for a long time we are eagerly awaiting the next release however even on a large server the performance could be quite slow (given updates i imagine are still only 1 index at a time) which would need to be addressed lest we end up with multiple meili instances with 1 index per instance. Lastly, with complete ignorance to how your internal queue/batching is planned, it would be beneficial if the index delete commands are actionable immediately? vs processing old records then clearing the index. The use case being that if the queue falls behind with smaller updates (our was/is weeks behind) then we could delete the index out of hours, recreate it and batch update it rapidly. At this point if we issued a delete command it could be weeks before the index is deleted or we physically need to delete the index on disk. |
Beta Was this translation helpful? Give feedback.
-
I'm testing and running MeiliSearch at ecs s6.xlarge.4, 4CPU, 16G RAM. It looks like (just guess): I have restart MeiliSearch, and update one document. I can't believe that.. MeiliSerch Version: v0.25.2 Index setting:
Document example:
Total documents is 1364100. And import 10000 every batch. |
Beta Was this translation helpful? Give feedback.
-
I have issue with update document it takes some time to get effect, for 627786 document update a single document takes about 1 minute to be done, I have around 15 million document in prod server, if does the case update a single document is going to take about 30 minutes !!!, my meiliSearch version is:
I have 627786 documents:
as you can see the
after sending this payload I have received this reponse:
when I checked the task status:
as you can see it took almost one minute to done the update, this way I can depend on the meilisearch %100 because of update in delay, is there any environment variable which I can set and make the update document instantly ? |
Beta Was this translation helpful? Give feedback.
-
Indexing speed problems, it takes ~25 seconds to index 5000 documents, 6 core CPU. [2022-03-09T06:00:53Z INFO meilisearch_lib::index::updates] document addition done: DocumentAdditionResult { indexed_documents: 5000, number_of_documents: 95000 } |
Beta Was this translation helpful? Give feedback.
-
update: after adjust the searchable attrs, it becomes much quicker, hope this helps. Hello guys, I found the update in ms is really slow, is there a quicker way to do this? Shall Mongo be more suitable? or i should delete the doc first then add a new one? this is my machine configuration: i use the latest ms docker: getmeili/meilisearch:latest there are almost 600,000 docs in ms, then i got 180 seconds to update 22 docs. here is the updating log (query procssing status every 10 seconds): # update code
index.update_documents_in_batches(data, 1000) INFO:root:====================================================================================================
INFO:root:Begining Procssing comment
INFO:root:Total 22 items to update...
INFO:root:Total 1 tasks, processing the 1 task.
INFO:root:{'uid': 2052, 'indexUid': 'users_cangdian', 'status': 'enqueued', 'type': 'documentPartial', 'details': {'receivedDocuments': 22, 'indexedDocuments': None}, 'duration': None, 'enqueuedAt': '2022-03-13T10:13:34.531720510Z', 'startedAt': None, 'finishedAt': None}
INFO:root:Total 1 tasks, processing the 1 task.
INFO:root:{'uid': 2052, 'indexUid': 'users_cangdian', 'status': 'processing', 'type': 'documentPartial', 'details': {'receivedDocuments': 22, 'indexedDocuments': None}, 'duration': None, 'enqueuedAt': '2022-03-13T10:13:34.531720510Z', 'startedAt': '2022-03-13T10:13:34.536439205Z', 'finishedAt': None}
INFO:root:Total 1 tasks, processing the 1 task.
INFO:root:{'uid': 2052, 'indexUid': 'users_cangdian', 'status': 'processing', 'type': 'documentPartial', 'details': {'receivedDocuments': 22, 'indexedDocuments': None}, 'duration': None, 'enqueuedAt': '2022-03-13T10:13:34.531720510Z', 'startedAt': '2022-03-13T10:13:34.536439205Z', 'finishedAt': None}
INFO:root:Total 1 tasks, processing the 1 task.
INFO:root:{'uid': 2052, 'indexUid': 'users_cangdian', 'status': 'processing', 'type': 'documentPartial', 'details': {'receivedDocuments': 22, 'indexedDocuments': None}, 'duration': None, 'enqueuedAt': '2022-03-13T10:13:34.531720510Z', 'startedAt': '2022-03-13T10:13:34.536439205Z', 'finishedAt': None}
INFO:root:Total 1 tasks, processing the 1 task.
INFO:root:{'uid': 2052, 'indexUid': 'users_cangdian', 'status': 'processing', 'type': 'documentPartial', 'details': {'receivedDocuments': 22, 'indexedDocuments': None}, 'duration': None, 'enqueuedAt': '2022-03-13T10:13:34.531720510Z', 'startedAt': '2022-03-13T10:13:34.536439205Z', 'finishedAt': None}
INFO:root:Total 1 tasks, processing the 1 task.
INFO:root:{'uid': 2052, 'indexUid': 'users_cangdian', 'status': 'processing', 'type': 'documentPartial', 'details': {'receivedDocuments': 22, 'indexedDocuments': None}, 'duration': None, 'enqueuedAt': '2022-03-13T10:13:34.531720510Z', 'startedAt': '2022-03-13T10:13:34.536439205Z', 'finishedAt': None}
INFO:root:Total 1 tasks, processing the 1 task.
INFO:root:{'uid': 2052, 'indexUid': 'users_cangdian', 'status': 'processing', 'type': 'documentPartial', 'details': {'receivedDocuments': 22, 'indexedDocuments': None}, 'duration': None, 'enqueuedAt': '2022-03-13T10:13:34.531720510Z', 'startedAt': '2022-03-13T10:13:34.536439205Z', 'finishedAt': None}
INFO:root:Total 1 tasks, processing the 1 task.
INFO:root:{'uid': 2052, 'indexUid': 'users_cangdian', 'status': 'processing', 'type': 'documentPartial', 'details': {'receivedDocuments': 22, 'indexedDocuments': None}, 'duration': None, 'enqueuedAt': '2022-03-13T10:13:34.531720510Z', 'startedAt': '2022-03-13T10:13:34.536439205Z', 'finishedAt': None}
INFO:root:Total 1 tasks, processing the 1 task.
INFO:root:{'uid': 2052, 'indexUid': 'users_cangdian', 'status': 'processing', 'type': 'documentPartial', 'details': {'receivedDocuments': 22, 'indexedDocuments': None}, 'duration': None, 'enqueuedAt': '2022-03-13T10:13:34.531720510Z', 'startedAt': '2022-03-13T10:13:34.536439205Z', 'finishedAt': None}
INFO:root:Total 1 tasks, processing the 1 task.
INFO:root:{'uid': 2052, 'indexUid': 'users_cangdian', 'status': 'processing', 'type': 'documentPartial', 'details': {'receivedDocuments': 22, 'indexedDocuments': None}, 'duration': None, 'enqueuedAt': '2022-03-13T10:13:34.531720510Z', 'startedAt': '2022-03-13T10:13:34.536439205Z', 'finishedAt': None}
INFO:root:Total 1 tasks, processing the 1 task.
INFO:root:{'uid': 2052, 'indexUid': 'users_cangdian', 'status': 'processing', 'type': 'documentPartial', 'details': {'receivedDocuments': 22, 'indexedDocuments': None}, 'duration': None, 'enqueuedAt': '2022-03-13T10:13:34.531720510Z', 'startedAt': '2022-03-13T10:13:34.536439205Z', 'finishedAt': None}
INFO:root:Total 1 tasks, processing the 1 task.
INFO:root:{'uid': 2052, 'indexUid': 'users_cangdian', 'status': 'processing', 'type': 'documentPartial', 'details': {'receivedDocuments': 22, 'indexedDocuments': None}, 'duration': None, 'enqueuedAt': '2022-03-13T10:13:34.531720510Z', 'startedAt': '2022-03-13T10:13:34.536439205Z', 'finishedAt': None}
INFO:root:Total 1 tasks, processing the 1 task.
INFO:root:{'uid': 2052, 'indexUid': 'users_cangdian', 'status': 'processing', 'type': 'documentPartial', 'details': {'receivedDocuments': 22, 'indexedDocuments': None}, 'duration': None, 'enqueuedAt': '2022-03-13T10:13:34.531720510Z', 'startedAt': '2022-03-13T10:13:34.536439205Z', 'finishedAt': None}
INFO:root:Total 1 tasks, processing the 1 task.
INFO:root:{'uid': 2052, 'indexUid': 'users_cangdian', 'status': 'processing', 'type': 'documentPartial', 'details': {'receivedDocuments': 22, 'indexedDocuments': None}, 'duration': None, 'enqueuedAt': '2022-03-13T10:13:34.531720510Z', 'startedAt': '2022-03-13T10:13:34.536439205Z', 'finishedAt': None}
INFO:root:Total 1 tasks, processing the 1 task.
INFO:root:{'uid': 2052, 'indexUid': 'users_cangdian', 'status': 'processing', 'type': 'documentPartial', 'details': {'receivedDocuments': 22, 'indexedDocuments': None}, 'duration': None, 'enqueuedAt': '2022-03-13T10:13:34.531720510Z', 'startedAt': '2022-03-13T10:13:34.536439205Z', 'finishedAt': None}
INFO:root:Total 1 tasks, processing the 1 task.
INFO:root:{'uid': 2052, 'indexUid': 'users_cangdian', 'status': 'processing', 'type': 'documentPartial', 'details': {'receivedDocuments': 22, 'indexedDocuments': None}, 'duration': None, 'enqueuedAt': '2022-03-13T10:13:34.531720510Z', 'startedAt': '2022-03-13T10:13:34.536439205Z', 'finishedAt': None}
INFO:root:Total 1 tasks, processing the 1 task.
INFO:root:{'uid': 2052, 'indexUid': 'users_cangdian', 'status': 'processing', 'type': 'documentPartial', 'details': {'receivedDocuments': 22, 'indexedDocuments': None}, 'duration': None, 'enqueuedAt': '2022-03-13T10:13:34.531720510Z', 'startedAt': '2022-03-13T10:13:34.536439205Z', 'finishedAt': None}
INFO:root:Successfully Procssed comment
INFO:root:==================================================================================================== |
Beta Was this translation helpful? Give feedback.
-
Wow - Indexing with 0.26.1 seems so much faster than 0.25. It's finished indexing ~ 8.6 million documents < 16 hours on a 48GB / 2CPU VPS. Previously I'd move the index over to a beefier VPS to do the indexing and it still took longer. I've enabled autobatching. I load in ~ 50k records per batch when building the index. Great work guys! {"databaseSize"=>39974383616,
"lastUpdate"=>"2022-03-19T00:02:24.969411905Z",
"indexes"=>
{"books"=>
{"numberOfDocuments"=>8591382,
"isIndexing"=>false,
"fieldDistribution"=>
{"author"=>8591382,
"id"=>8591382,
"series"=>8591382,
"title"=>8591382,
"work_no"=>8591382}}}} |
Beta Was this translation helpful? Give feedback.
-
I can confirm that indexing is faster in last version with auto batching turned on but is still about 5x slower than typesense which basically does it in real-time (we use both in production). |
Beta Was this translation helpful? Give feedback.
-
This is an update to my previous replies. This is a customer service app running on production. It is adding 1-3 documents within 5 minutes interval. The queue is being processed like 2 days later (and the delay is getting bigger). Each update is taking around ~530 seconds (~8 minutes) now. I am pretty sure the documents are very very small. The batching related plugins are not stable enough to get started. Will see what is needed to implement that. |
Beta Was this translation helpful? Give feedback.
-
This is a further update to my previous replies. I have tried with following settings. Version: 0.27 rc3 Tweaks:
Results: Log:
|
Beta Was this translation helpful? Give feedback.
-
@vprelovac, Edit: see comments below. This article was measuring the indexing speed in the wrong way, and Meilisearch time is wrong. |
Beta Was this translation helpful? Give feedback.
-
Hello everyone watching this issue! We have just released v0.29.0rc1, which is a release candidate of v0.29.0 🔥 Binaries are attached to the release, or you can use the docker image: docker run -it --rm \
-p 7700:7700 \
getmeili/meilisearch:v0.29.0rc1 Let us know about any bugs or feedback! 😄 It would be really helpful. FYI, the official v0.29.0 release will be available on 3rd October. |
Beta Was this translation helpful? Give feedback.
-
Hello everyone here! |
Beta Was this translation helpful? Give feedback.
-
Indexing is fast enough, but it crash after running for a while. I have a instance on my vps with docker-compose meilisearch:
container_name: "chii-base-meilisearch"
image: "getmeili/meilisearch:v0.30.5"
command: meilisearch --env production
restart: always
environment:
MEILI_ENABLE_METRICS_ROUTE: "true" # YES I Know it's not woking
MEILI_MASTER_KEY: "..."
MEILI_LOG_LEVEL: "WARN" This happens from 0.28, 0.29, and now I'm using My use case is to search about 400k documents, and indexing speed it not important data file is about 25G size before it crash(count by After removing old data and re-add all documents, data file take only 8.2G
Technical informationMeilisearch version: happens in v0.28, v0.29 and v0.30.5 Additional context x64, ubuntu 20.04, 4c 8g, swap off. How often do you need to add or/and update documents? Which type is it:Less than 50 adds per day. Updates 10k per day (about 1.3/s). Previous I send payload (update and adding) one by one, these days I send them in batch (1k), I'm using And this happens whether I send payload to meilisearch in batch or not. (data in queue waiting to flush to meilisearch) and it crash after it running for a while (about weeks or months), without any useful logging, and sdk doesn't return any error. Dataset informationmy documents look like this: type subjectIndex struct {
ID uint32 `json:"id"`
Summary string `json:"summary"`
Tag []string `json:"tag,omitempty" filterable:"true"`
Name []string `json:"name"`
Date int `json:"date,omitempty" filterable:"true" sortable:"true"`
Score float64 `json:"score" filterable:"true" sortable:"true"`
PageRank float64 `json:"page_rank" sortable:"true"`
Heat uint32 `json:"heat" sortable:"true"`
Rank uint32 `json:"rank" filterable:"true" sortable:"true"`
Platform uint16 `json:"platform,omitempty"`
Type uint8 `json:"type" filterable:"true"`
NSFW bool `json:"nsfw" filterable:"true"`
} setting: {
"displayedAttributes": ["*"],
"searchableAttributes": ["name", "summary", "tag", "type", "id"],
"filterableAttributes": ["date", "nsfw", "rank", "score", "tag", "type"],
"sortableAttributes": ["date", "heat", "page_rank", "rank", "score"],
"rankingRules": [
"exactness",
"words",
"typo",
"proximity",
"attribute",
"sort",
"id:asc",
"rank:asc",
"score:desc",
"nsfw:asc"
],
"stopWords": [],
"synonyms": {},
"distinctAttribute": null,
"typoTolerance": {
"enabled": true,
"minWordSizeForTypos": {
"oneTypo": 5,
"twoTypos": 9
},
"disableOnWords": [],
"disableOnAttributes": []
},
"faceting": {
"maxValuesPerFacet": 100
},
"pagination": {
"maxTotalHits": 1000
}
} stats: {
"numberOfDocuments": 413702,
"isIndexing": false,
"fieldDistribution": {
"date": 345888,
"heat": 413702,
"id": 413702,
"name": 413702,
"nsfw": 413702,
"page_rank": 413702,
"platform": 285143,
"rank": 413702,
"score": 413702,
"summary": 413702,
"tag": 230165,
"type": 413702
}
} I'm running meilisearch in docker, so dockerd restart it after it crash. High cpu, memory, and high io read. and not high memory usage, I don't think it's killed by OS. I also have a prometheus exporter to export some data, hopefully it's useful, you can see meilisearch's "enqueued" and "processing" task uid is keep increasing but it didn't finish any task. This happened in 9/26, 10/20, 12/7 and today, meilisearch also didn't give any useful logging in previous crashing. I already rm my old data file, if you need them, I can only share them the next time it happens |
Beta Was this translation helpful? Give feedback.
-
Meilisearch version: 1.0.2 Runs on: VirtualBox VM of 10Gb RAM, 8 cores of Ryzen 3700X. Documents structure is like this:
The problem:
What are those tasks waiting for? I've tried small chunks, large chunks, manual chunking, auto-batching - the behavior is the same. Indexing works for first portion only, and then for all the rest. |
Beta Was this translation helpful? Give feedback.
-
I have a question. When I add some documents to an index, does Meilisearch reindex whole index from the start? |
Beta Was this translation helpful? Give feedback.
-
Meilisearch Version Machine Details
Batch Size Dataset Information An example of data we index is as follows: Blocks
Wallets
Index settings for the {
"displayedAttributes": [
"*"
],
"searchableAttributes": [
"id"
],
"filterableAttributes": [],
"sortableAttributes": [
"timestamp"
],
"rankingRules": [
"typo",
"words",
"proximity",
"attribute",
"sort",
"exactness"
],
"stopWords": [],
"synonyms": {},
"distinctAttribute": null,
"typoTolerance": {
"enabled": true,
"minWordSizeForTypos": {
"oneTypo": 5,
"twoTypos": 9
},
"disableOnWords": [
"id"
],
"disableOnAttributes": []
},
"faceting": {
"maxValuesPerFacet": 100
},
"pagination": {
"maxTotalHits": 1000
}
} Usecase
Issue Description What we noticed is that data import is alright in terms of speed; after a couple hours we have the data available and ready to go. The issue that we run into is when we try to add new documents to the existing index, mainly the one containing The question here is whether these times are expected, or if Meilisearch should be able to handle faster additions/updates with a change in configuration. Another thing we noticed is that adding new documents only ever utilises a single CPU core/thread. As a result, both server instances need around the same time to add documents to the index, even though one has 32 cores while the other only has 4. Is this also an expected result with the way Meilisearch handles the indexes or can we somehow make it use more of the available resources to increase the speed? Looking forward to a reply to better understand the limitations we may be running into. |
Beta Was this translation helpful? Give feedback.
-
I have unknown anomaly using Meilisearch 1.3.1. After indexing, first search request is about 100 times slower. After that, search works super-fast again. Added log output below
|
Beta Was this translation helpful? Give feedback.
-
Closing this discussion. We made huge improvements with v1.6.0 and v1.7.0. We recommend that anyone encountering indexing issues upgrade their Meilisearch instance to the latest version. If not enough, please open an issue directly in the Meilisearch repository with your indexing time and expectations: https://github.com/meilisearch/meilisearch/issues |
Beta Was this translation helpful? Give feedback.
-
Hello everyone 👋
If you are here it means you probably got some issues during your indexation with Meilisearch: the document addition might be really slow or even has led to a crash due to memory consumption.
The whole Meilisearch team is really sorry for this inconvenience. Be sure we are always working on making our search engine better regarding these points.
Before posting with your problem, please read the whole post.
Current issues
We have currently identified 3 types of issues during indexation:
Current solutions to fix the indexation
Here are some solutions we have already implemented and documented to fix the indexation issues.
Please let us know about your experience with this experimental feature in this discussion, it would definitely help us improve our search engine!
If you still have issues with indexation
If after having tested all the previous points and you still get an issue (bad performance or a crash), please let us know about your use case in this discussion.
We strongly need the following information to get your feedback in the most efficient way.
Technical information
http://127.0.0.1:7700
by the server address of your Meilisearch instance if you don't use Meilisearch locally.The metrics of your machine (number of core, RAM, distribution etc...)
How do you host Meilisearch? Is it on a Cloud provider? If yes, which one?
Is your Meilisearch running with Kubernetes? Is your Meilisearch running in a Docker container?
If you send your documents by batch. How big are these batches?
Dataset information
If possible, provide your dataset. We completely understand you cannot share your data publically, but you can still send it to me in private by email. Be sure we will use it only for test purposes and will delete it right after the tests.
If you cannot share your dataset, please let us know about the following points.
The size of your dataset and the number of documents.
For example, the movies.json dataset we provide in the documentation has a size of 9,1Mb.
The composition of your documents, I mean, the number of fields per document and the number of words per field.
Ex:
This dataset of 6 documents contains between 3 and 4 fields per document and less than 10 words per field.
The language of the dataset. For example, Chinese is slower to tokenize than Latin languages.
The settings of your index(es):
Your usecase
What is your use case:
How often do you need to add or/and update documents? Which type is it:
How critical is it to have your data indexed as quickly as possible?
What is an acceptable duration between the time a document is sent to Meilisearch and the time it is searchable?
Most of all regarding the last answers, why?
Misc
Thanks for reading this, and most of all, thanks for your time for your feedback 🙂
Beta Was this translation helpful? Give feedback.
All reactions