Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Creating batches with more than one class breaks on v4 #1005

Open
glesperance opened this issue Apr 10, 2024 · 3 comments
Open

BUG: Creating batches with more than one class breaks on v4 #1005

glesperance opened this issue Apr 10, 2024 · 3 comments

Comments

@glesperance
Copy link

glesperance commented Apr 10, 2024

The only way to insert objects with v4 is using collection.batch and its associated _ContextManagerWrapper.
Unfortunately, this breaks as the underlying grpcio runs into resource contention raising the exceptions listed below.

The output is rather unpredictable as the exceptions are only printed out but not raised. The batches will also sometimes hang indefinitely, some other times work 100%, or as shown below sometimes work only partially.

With v3 it used to be possible to insert, in batches, objects from different classes. It would be great to continue supporting this use case.

import weaviate
import weaviate.classes as wvc
import weaviate.util

print("Weaviate version:", weaviate.__version__)

client =  weaviate.connect_to_local()

client.collections.delete(["A", "B"])
client.collections.create("A")
client.collections.create("B")

a_collection = client.collections.get("A")
b_collection = client.collections.get("B")

print({
  "A": a_collection.aggregate.over_all(),
  "B": b_collection.aggregate.over_all()
})
print("-" * 80)


a_collection.config.add_property(wvc.config.Property(name="a_name", data_type=wvc.config.DataType.TEXT))
a_collection.config.add_reference(wvc.config.ReferenceProperty(name="bRef", target_collection="B"))

b_collection.config.add_property(wvc.config.Property(name="b_name", data_type=wvc.config.DataType.TEXT))
b_collection.config.add_reference(wvc.config.ReferenceProperty(name="aRef", target_collection="A"))

with a_collection.batch.dynamic() as a_batch:
  with b_collection.batch.dynamic() as b_batch:
    for i in range(1000):
      a_batch.add_object(properties={"a_name": "test"}, 
                          uuid=weaviate.util.generate_uuid5(i))
      b_batch.add_object(properties={"b_name": "test"}, 
                          uuid=weaviate.util.generate_uuid5(i))
      

print("-" * 80)
print({
  "A": a_collection.aggregate.over_all(),
  "B": b_collection.aggregate.over_all()
})

Output:

Weaviate version: 4.5.5
{'A': AggregateReturn(properties={}, total_count=0), 'B': AggregateReturn(properties={}, total_count=0)}
--------------------------------------------------------------------------------
Exception in callback PollerCompletionQueue._handle_events(<_UnixSelecto...e debug=False>)()
handle: <Handle PollerCompletionQueue._handle_events(<_UnixSelecto...e debug=False>)()>
Traceback (most recent call last):
  File "/usr/local/Cellar/python@3.10/3.10.12_1/Frameworks/Python.framework/Versions/3.10/lib/python3.10/asyncio/events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
  File "src/python/grpcio/grpc/_cython/_cygrpc/aio/completion_queue.pyx.pxi", line 147, in grpc._cython.cygrpc.PollerCompletionQueue._handle_events
BlockingIOError: [Errno 35] Resource temporarily unavailable
Exception in callback PollerCompletionQueue._handle_events(<_UnixSelecto...e debug=False>)()
handle: <Handle PollerCompletionQueue._handle_events(<_UnixSelecto...e debug=False>)()>
Traceback (most recent call last):
  File "/usr/local/Cellar/python@3.10/3.10.12_1/Frameworks/Python.framework/Versions/3.10/lib/python3.10/asyncio/events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
  File "src/python/grpcio/grpc/_cython/_cygrpc/aio/completion_queue.pyx.pxi", line 147, in grpc._cython.cygrpc.PollerCompletionQueue._handle_events
BlockingIOError: [Errno 35] Resource temporarily unavailable
Exception in callback PollerCompletionQueue._handle_events(<_UnixSelecto...e debug=False>)()
handle: <Handle PollerCompletionQueue._handle_events(<_UnixSelecto...e debug=False>)()>
Traceback (most recent call last):
  File "/usr/local/Cellar/python@3.10/3.10.12_1/Frameworks/Python.framework/Versions/3.10/lib/python3.10/asyncio/events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
  File "src/python/grpcio/grpc/_cython/_cygrpc/aio/completion_queue.pyx.pxi", line 147, in grpc._cython.cygrpc.PollerCompletionQueue._handle_events
BlockingIOError: [Errno 35] Resource temporarily unavailable
Exception in callback PollerCompletionQueue._handle_events(<_UnixSelecto...e debug=False>)()
handle: <Handle PollerCompletionQueue._handle_events(<_UnixSelecto...e debug=False>)()>
Traceback (most recent call last):
  File "/usr/local/Cellar/python@3.10/3.10.12_1/Frameworks/Python.framework/Versions/3.10/lib/python3.10/asyncio/events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
  File "src/python/grpcio/grpc/_cython/_cygrpc/aio/completion_queue.pyx.pxi", line 147, in grpc._cython.cygrpc.PollerCompletionQueue._handle_events
BlockingIOError: [Errno 35] Resource temporarily unavailable
--------------------------------------------------------------------------------
{'A': AggregateReturn(properties={}, total_count=1000), 'B': AggregateReturn(properties={}, total_count=805)}
@glesperance glesperance changed the title BUG: Creating batches with more than one class break on v4 BUG: Creating batches with more than one class breaks on v4 Apr 10, 2024
@tsmith023
Copy link
Contributor

tsmith023 commented Apr 11, 2024

Hi @glesperance, thanks for raising this one! We did not intend for collection-level batching to be ran simultaneously. What you're seeing is the underlying multi-threaded algorithms colliding since each collection-level batch handles its own separate resources internally.

The only way to insert objects with v4 is using collection.batch and its associated _ContextManagerWrapper.

This is not true. To accomplish your use-case, you should use client-level batching instead like so:

with client.batch.dynamic() as batch:
    batch.add_object(
        collection=a_collection.name,
        properties={"a_name": "test"}, 
        uuid=weaviate.util.generate_uuid5(i)
    )
    batch.add_object(
        collection=b_collection.name,
        properties={"b_name": "test"}, 
        uuid=weaviate.util.generate_uuid5(i)
    )

as described here. Cheers 😁

@glesperance
Copy link
Author

I certainly missed that part of the documentation. Thanks for pointing this out. Do we want to leave this open for the nested collection batch edge case?

@tsmith023
Copy link
Contributor

tsmith023 commented Apr 11, 2024

Yes, I will look into throwing an exception if multiple batches run in a nested setup. Thanks for the report!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants