New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] [Sink] Bug Hive insert error org.apache.spark.sql.execution.QueryExecutionException: Parquet column cannot be converted in file hdfs:/db/xx/dt_mon=xxxx/qwxsadas12321321.parquet. Column: [xxx ], Expected: decimal(12,2), Found: FIXED_LEN_BYTE_ARRAY #6750
Comments
Please provide the schema of oracle table and hive table. |
hive table below,Oracle same cloumns |
It's better to provide the definition of the "zhanbi" field in the Oracle table. It seems like there might be an issue with the type conversion of the Oracle connector. |
But it works well on seatunnel1.5.7,so i don't what is the reason that cause this Oralce below: |
Lines 85 to 95 in cd4b30b
Because the scale of the field ❯ parq ./T_836191201227964417_6d287c425d_0_1_0.parquet -s 18:45:25
# Schema
<pyarrow._parquet.ParquetSchema object at 0x7ff161507b40>
required group field_id=-1 SeaTunnelRecord {
optional fixed_len_byte_array(16) field_id=-1 f (Decimal(precision=38, scale=18));
} You can use sql cast the field as source {
Jdbc {
result_table_name = tbl
driver = oracle.jdbc.driver.OracleDriver
url = "jdbc:oracle:thin:@localhost:49161/xe"
user = xxxxx
password = xxx
query = "select F from tbl"
properties {
database.oracle.jdbc.timezoneAsRegion = "false"
}
}
}
transform {
sql {
source_table_name = tbl
result_table_name = t_tbl
query = "select cast(F as decimal(12,2)) as F1 from tbl"
}
}
sink {
LocalFile {
source_table_name = t_tbl
path = "/tmp/hive/warehouse/test3"
file_format_type = "parquet"
}
} |
This issue has been solved at #5872. If you consider upgrading, you can also use the latest version |
Great, the problem is solved, which means that I can directly use decimal(38,18) or decimal(38,0) to process any precision numbers in the future. |
@hailin0 please close this issue. |
Search before asking
What happened
Everything displays normally when inserting data, and the partition information displays normally. However, when I execute a simple Select, an error occurs.
org.apache.spark.sql.execution.QueryExecutionException: Parquet column cannot be converted in file hdfs://xxxx/user/hive/warehouse/xxxx.db/xxxx/dt_mon=2024-04/xxxx.parquet. Column: [xxx], Expected: decimal(12,2), Found: FIXED_LEN_BYTE_ARRAY
But this did not happen in the earlier version seatunnel-1.5.7
Can you restore an earlier version such as seatunnel-1.5.7 to solve the parquet partition field problem
SeaTunnel Version
2.3.4
SeaTunnel Config
Running Command
sh /data/seatunnel/seatunnel-2.3.4/bin/start-seatunnel-spark-3-connector-v2.sh \ --master yarn \ --deploy-mode cluster \ --queue xxxx\ --executor-instances 2 \ --executor-cores 6 \ --executor-memory 6g \ --name "h010-xxxxx" \ --config /data/ghyworkbase/seatunnel/H02-01-ODS_CONF-2.3.4/h010-xxxxx.conf
Error Exception
Zeta or Flink or Spark Version
spark-3.3.0
Java or Scala Version
/jdk/jdk1.8.0_341
Screenshots
Are you willing to submit PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: