Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug(python): offset overflow when issing table update #1291

Open
alexkohler opened this issue May 10, 2024 · 4 comments
Open

bug(python): offset overflow when issing table update #1291

alexkohler opened this issue May 10, 2024 · 4 comments
Labels
bug Something isn't working

Comments

@alexkohler
Copy link
Contributor

LanceDB version

0.6.8

What happened?

I seem to be running into an offset overflow when issuing an update spanning my entire table:

>> import lancedb
>> import os
>> os.environ["RUST_BACKTRACE"] = 1
>> db = lancedb.connect("/data/lance_local")
>>> table = db.open_table("05-10-testing")
>>> print(len(table))
92787275
>>> table.update(where="test_train_split = 'test'", values={"test_train_split": 'TEST'})

thread 'lance_background_thread' panicked at /root/.cargo/registry/src/index.crates.io-6f17d22bba15001f/arrow-data-50.0.0/src/transform/utils.rs:42:56:
offset overflow
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/akohler/.local/lib/python3.10/site-packages/lancedb/table.py", line 1600, in update
    self._dataset_mut.update(values_sql, where)
  File "/home/akohler/.local/lib/python3.10/site-packages/lance/dataset.py", line 995, in update
    self._ds.update(updates, where)
OSError: LanceError(IO): Execution error: External error: Execution error: ExecNode(Take): thread panicked: task 1103859 panicked, /home/runner/work/lance/lance/rust/lance-datafusion/src/chunker.rs:58:46

Any guidance on how to potentially work around this/apply my updates in smaller batches? Happy to provide additional info.

Are there known steps to reproduce?

No response

@alexkohler alexkohler added the bug Something isn't working label May 10, 2024
@alexkohler alexkohler changed the title bug(python): offset overflow when issing tbale update bug(python): offset overflow when issing table update May 10, 2024
@wjones127
Copy link
Contributor

It's unclear where this is coming from.

I tried to reproduce this in Lance, but it worked fine. Is there any important details I might be missing? Here's my script:

import lance
import pyarrow as pa
import random
import string
import tqdm

def rand_string(n):
    return ''.join(random.choices(string.ascii_lowercase +
                             string.digits, k=n))

# Create a batch with 100MB of string data 
data = pa.table({
    "text": pa.array([rand_string(100 * 1024) for _ in range(1024)]),
})

# Write over 5GB of data
for _ in tqdm.tqdm(range(500)):
    ds = lance.write_dataset(data, "test", mode="append")

# Try running an update query
ds.update(updates={"text": "'hello'"}, where="text = '{}'".format(data['text'][0].as_py()))

@alexkohler
Copy link
Contributor Author

Hm, my table is pretty wide (14 columns). Would that potentially come into play when walking through the pages during the update?

@wjones127
Copy link
Contributor

It's possible is has something to do with that. Could you share what operations you did to write your table? That would help me figure out how to reproduce this.

@alexkohler
Copy link
Contributor Author

This table has grown a lot over time (and hence has a lot of versions, although I've periodically cleaned those up using cleanup_old_version(...)). Also possibly relevant: the field where I was running into this was the only column I added using add_columns and then backfilled using an update call. I unfortunately can't share the table, but I'm happy to help with debugging this any way I can beyond that!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants