Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: upload all fields to qdrant #2947

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
9 changes: 9 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,12 @@
## 0.13.6-dev0

### Enhancements
* **Upload all element fields to Qdrant** Since Qdrant allows any content that can be represented as JSON as the "payload", upload all element fields to the destination. Previous approach included redundant parsing.

### Features

### Fixes

## 0.13.5

### Enhancements
Expand Down
2 changes: 1 addition & 1 deletion unstructured/__version__.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
__version__ = "0.13.5" # pragma: no cover
__version__ = "0.13.6-dev0" # pragma: no cover
14 changes: 3 additions & 11 deletions unstructured/ingest/connector/qdrant.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
import json
import multiprocessing as mp
import typing as t
import uuid
Expand All @@ -16,7 +15,6 @@
)
from unstructured.ingest.logger import logger
from unstructured.ingest.utils.data_prep import chunk_generator
from unstructured.staging.base import flatten_dict
from unstructured.utils import requires_dependencies

if t.TYPE_CHECKING:
Expand Down Expand Up @@ -133,13 +131,7 @@ def normalize_dict(self, element_dict: dict) -> dict:
return {
"id": str(uuid.uuid4()),
"vector": element_dict.pop("embeddings", {}),
"payload": {
"text": element_dict.pop("text", None),
"element_serialized": json.dumps(element_dict),
**flatten_dict(
element_dict,
separator="-",
flatten_lists=True,
),
},
# In Qdrant, "payload" can be any information that can be represented using JSON.
# https://qdrant.tech/documentation/concepts/payload/#payload
"payload": element_dict,
}