Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: load data bz2 compression file on tke env cost almost an hour has not been successful #16250

Open
1 task done
heni02 opened this issue May 20, 2024 · 7 comments
Open
1 task done
Assignees
Labels
kind/bug Something isn't working severity/s0 Extreme impact: Cause the application to break down and seriously affect the use
Milestone

Comments

@heni02
Copy link
Contributor

heni02 commented May 20, 2024

Is there an existing issue for the same bug?

  • I have checked the existing issues.

Branch Name

main

Commit ID

6ff549f

Other Environment Information

- Hardware parameters:
- OS type:
- Others:

Actual Behavior

job:https://github.com/matrixorigin/mo-nightly-regression/actions/runs/9154938692
企业微信截图_af96b052-9ab3-4a63-857a-6b3cae91c033

mo log:
https://grafana.ci.matrixorigin.cn/explore?panes=%7B%22XGF%22:%7B%22datasource%22:%22loki%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22expr%22:%22%7Bnamespace%3D%5C%22mo-nightly-regression-20240520%5C%22%7D%20%7C%3D%20%60%60%22,%22queryType%22:%22range%22,%22datasource%22:%7B%22type%22:%22loki%22,%22uid%22:%22loki%22%7D,%22editorMode%22:%22builder%22%7D%5D,%22range%22:%7B%22from%22:%221716190082822%22,%22to%22:%221716193620430%22%7D%7D%7D&schemaVersion=1&orgId=1

profile:
2024-05-20_16_00_31.zip

Expected Behavior

No response

Steps to Reproduce

ddl:
create table table_100_columns(
clo1 tinyint,
clo2 smallint,
clo3 int,
clo4 bigint,
clo5 tinyint unsigned,
clo6 smallint unsigned,
clo7 int unsigned,
clo8 bigint unsigned,
col9 float,
col10 double,
col11 varchar(255),
col12 Date,
col13 DateTime,
col14 timestamp,
col15 bool,
col16 decimal(5,2),
col17 text,
col18 varchar(255),
col19 varchar(255),
col20 varchar(255),
col21 varchar(255),
col22 varchar(255),
col23 varchar(255),
col24 varchar(255),
col25 varchar(255),
col26 varchar(255),
col27 varchar(255),
col28 varchar(255),
col29 varchar(255),
col30 varchar(255),
col31 varchar(255),
col32 varchar(255),
col33 varchar(255),
col34 varchar(255),
col35 varchar(255),
col36 varchar(255),
col37 varchar(255),
col38 varchar(255),
col39 varchar(255),
col40 varchar(255),
col41 varchar(255),
col42 varchar(255),
col43 varchar(255),
col44 varchar(255),
col45 varchar(255),
col46 varchar(255),
col47 varchar(255),
col48 varchar(255),
col49 varchar(255),
col50 varchar(255),
col51 varchar(255),
col52 varchar(255),
col53 varchar(255),
col54 varchar(255),
col55 varchar(255),
col56 varchar(255),
col57 varchar(255),
col58 varchar(255),
col59 varchar(255),
col60 varchar(255),
col61 varchar(255),
col62 varchar(255),
col63 varchar(255),
col64 varchar(255),
col65 varchar(255),
col66 varchar(255),
col67 varchar(255),
col68 varchar(255),
col69 varchar(255),
col70 varchar(255),
col71 varchar(255),
col72 varchar(255),
col73 varchar(255),
col74 varchar(255),
col75 varchar(255),
col76 varchar(255),
col77 varchar(255),
col78 varchar(255),
col79 varchar(255),
col80 varchar(255),
col81 varchar(255),
col82 varchar(255),
col83 varchar(255),
col84 varchar(255),
col85 varchar(255),
col86 varchar(255),
col87 varchar(255),
col88 varchar(255),
col89 varchar(255),
col90 varchar(255),
col91 varchar(255),
col92 varchar(255),
col93 varchar(255),
col94 varchar(255),
col95 varchar(255),
col96 varchar(255),
col97 varchar(255),
col98 varchar(255),
col99 varchar(255),
col100 varchar(255)
);
load data url s3option {'endpoint'='http://cos.ap-guangzhou.myqcloud.com','access_key_id'='***','secret_access_key'='***','bucket'='mo-load-guangzhou-1308875761', 'filepath'='compressed_file/100000000_100_columns_load_data.csv.bz2', 'compression'='bz2'} into table test.table_100_columns fields terminated by ',' lines terminated by '\n' parallel 'true';

Additional information

No response

@heni02 heni02 added kind/bug Something isn't working severity/s0 Extreme impact: Cause the application to break down and seriously affect the use labels May 20, 2024
@heni02 heni02 added this to the 1.2.1 milestone May 20, 2024
@jensenojs
Copy link
Contributor

jensenojs commented May 22, 2024

从日志上看checksum有问题

image

我自己用下面pr的测试文件试了一下, 用bzip2压缩的文件导入是失败的, 用tar -j 的方式导入是没有问题的

@heni02
Copy link
Contributor Author

heni02 commented May 22, 2024

使用bzip2命令压缩10万数据量文件,load成功
企业微信截图_bff8cde2-3096-4592-8ae3-476e3ae90e17
企业微信截图_ef133b3b-0f3f-41ff-80a2-eb5adf4e5778
估计是1亿该文件压缩问题,重新压缩再试下

@heni02
Copy link
Contributor Author

heni02 commented May 22, 2024

日志里报错bzip2 data invalid,但前端一直没有返回错误,这个问题需要修复

@heni02 heni02 assigned jensenojs and unassigned heni02 May 22, 2024
@jensenojs
Copy link
Contributor

同上

@aressu1985 aressu1985 modified the milestones: 1.2.1, 1.3.0 May 30, 2024
@jensenojs
Copy link
Contributor

同上

1 similar comment
@jensenojs
Copy link
Contributor

同上

@jensenojs
Copy link
Contributor

无进展

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working severity/s0 Extreme impact: Cause the application to break down and seriously affect the use
Projects
None yet
Development

No branches or pull requests

4 participants