Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running MERGE INTO with more than one WHEN condition fails if the number of columns in the target table is > 321 #10294

Open
andreaschiappacasse opened this issue May 9, 2024 · 2 comments
Labels
AWS bug Something isn't working

Comments

@andreaschiappacasse
Copy link

andreaschiappacasse commented May 9, 2024

Apache Iceberg version

None

Query engine

Athena (engine v3)

Please describe the bug 馃悶

Hello everyone, today my team incurred in a very strange bug using Iceberg via Athena. I'll descrive the steps we used to reproduce the error below:

1. We create an iceberg table with an "id" column and 321 other columns with random strings - in the example below we use awsrangler to create the table, but the same happens when the table is created using Athena directly.

import awswrangler as wr
import pandas as pd
import random, string

NUM_COLS=322

def get_random_string(length):
    letters = string.ascii_lowercase
    result_str = ''.join(random.choice(letters) for i in range(length))
    return result_str

columns = ['id']+[get_random_string(5) for i in range(NUM_COLS-1) ]
data = pd.DataFrame(data=[columns], columns=columns)


wr.athena.to_iceberg(
    data,
    workgroup="my-workgroup",
    database="my_database",
    table="iceberg_limits_322",
    table_location="s3://my_bucket/iceberg_limits",
)

2. we then run the following query in athena to insert a random value

MERGE INTO my_database.iceberg_limits_322 as existing 
using (
	SELECT 'something' as id
) as new on existing.id = new.id
WHEN NOT MATCHED
THEN INSERT (id) VALUES (new.id)
WHEN MATCHED THEN DELETE

3. which results in the error:

[ErrorCode: INTERNAL_ERROR_QUERY_ENGINE] Amazon Athena experienced an internal error while executing this query. Please contact AWS support for further assistance. You will not be charged for this query. We apologize for the inconvenience.

Notice that the error only occurs when multiple WHEN are used in the MERGE INTO query! - in case one WHEN is used (just to insert or to delete records) everything works fine, and the table can be used normally.

We can replicate this behaviour on multiple AWS accounts and with different tables/databases/s3 locations.

After trying with different number of columns we consistently found that 321 is the maximum limit for the number of columns of the table. Everything works fine below this threshold.

@andreaschiappacasse andreaschiappacasse added the bug Something isn't working label May 9, 2024
@andreaschiappacasse andreaschiappacasse changed the title Running MERGE INTO with more than one condition fails if number of columns is > 321 Running MERGE INTO with more than one WHEN condition fails if the number of columns in the target table is > 321 May 10, 2024
@andreaschiappacasse
Copy link
Author

Possibly something similar to trinodb/trino#15848?

@nastra nastra added the AWS label May 11, 2024
@andreaschiappacasse
Copy link
Author

Update: it seems that even a MERGE INTO with a single NOT MACHED THEN INSERT condition fails, given that the table is big enough (in our case 633 columns)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
AWS bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants