[Bug] [Sink] Bug Hive insert error org.apache.spark.sql.execution.QueryExecutionException: Parquet column cannot be converted in file hdfs:/db/xx/dt_mon=xxxx/qwxsadas12321321.parquet. Column: [xxx ], Expected: decimal(12,2), Found: FIXED_LEN_BYTE_ARRAY #6750

AdkinsHan · 2024-04-24T10:23:26Z

Search before asking

I had searched in the issues and found no similar issues.

What happened

Everything displays normally when inserting data, and the partition information displays normally. However, when I execute a simple Select, an error occurs.
org.apache.spark.sql.execution.QueryExecutionException: Parquet column cannot be converted in file hdfs://xxxx/user/hive/warehouse/xxxx.db/xxxx/dt_mon=2024-04/xxxx.parquet. Column: [xxx], Expected: decimal(12,2), Found: FIXED_LEN_BYTE_ARRAY
But this did not happen in the earlier version seatunnel-1.5.7
Can you restore an earlier version such as seatunnel-1.5.7 to solve the parquet partition field problem

   Hive {
     sql = """
     insert overwrite table datadwd.xxx partition(dt_year)
     select * from xxxx
     """
   }

SeaTunnel Version

2.3.4

SeaTunnel Config

env {
  execution.parallelism = 4
  job.mode = "BATCH"
  # job.name spark引擎 不起作用，要在shell脚本中单独执行废弃
  spark.sql.catalogImplementation = "hive"
}

source {
	Jdbc  {
	    #2.3.4版本是query
		url = "jdbc:oracle:thin:@xxxxxx:1521:xxxxDB"
		driver  = "oracle.jdbc.OracleDriver"
		user = "xxxx"
		password = "xxxx"
		query = """
		    select xxxx from dual
		"""
		result_table_name = "dddd_mm"
        }
}

transform {

}

sink {
   Hive {
     #标准写法，table_name和metastore_uri必须要
     table_name = "data.dddd_mm"
     metastore_uri = "thrift://xxxx:9083"
   }
}

Running Command

sh /data/seatunnel/seatunnel-2.3.4/bin/start-seatunnel-spark-3-connector-v2.sh \
  --master yarn \
  --deploy-mode cluster \
  --queue xxxx\
  --executor-instances 2 \
  --executor-cores 6 \
  --executor-memory 6g \
  --name "h010-xxxxx" \
  --config /data/ghyworkbase/seatunnel/H02-01-ODS_CONF-2.3.4/h010-xxxxx.conf

Error Exception

org.apache.kyuubi.KyuubiSQLException: org.apache.kyuubi.KyuubiSQLException: Error operating ExecuteStatement: org.apache.spark.SparkException: Job aborted due to stage failure: Task 3 in stage 847.0 failed 4 times, most recent failure: Lost task 3.3 in stage 847.0 (TID 14569) (datalake136.ghy.com.cn executor 1): org.apache.spark.sql.execution.QueryExecutionException: Parquet column cannot be converted in file hdfs://xxxx/user/hive/warehouse/xxx.db/xxx/dt_mon=2024-04/xx.parquet. Column: [xx], Expected: decimal(12,2), Found: FIXED_LEN_BYTE_ARRAY at org.apache.spark.sql.errors.QueryExecutionErrors$.unsupportedSchemaColumnConvertError(QueryExecutionErrors.scala:706) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:278) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:116) at scala.collection.Iterator$$anon$10.hasNext(Iterator. ...

Zeta or Flink or Spark Version

spark-3.3.0

Java or Scala Version

/jdk/jdk1.8.0_341

Screenshots

Are you willing to submit PR?

Yes I am willing to submit a PR!

Code of Conduct

I agree to follow this project's Code of Conduct

The text was updated successfully, but these errors were encountered:

CheneyYin · 2024-04-25T17:18:37Z

Please provide the schema of oracle table and hive table.

AdkinsHan · 2024-04-26T03:38:39Z

hive table below，Oracle same cloumns
-- drop table if exists ghydata.ods_webcsmsgl_ecsms_visit_info_mi;
CREATE TABLE IF NOT EXISTS ghydata.ods_webcsmsgl_ecsms_visit_info_mi
(
isvisit string COMMENT '',
clwz string COMMENT '',
content string COMMENT '',
plan string COMMENT ' ',
cuser string COMMENT '创建',
cdate timestamp COMMENT '时间',
zhanbi decimal(12, 2) COMMENT ' ',
entertime timestamp COMMENT '进',
leavetime timestamp COMMENT '离'
)
COMMENT 'SFA-'
PARTITIONED BY (dt_mon string COMMENT '月快照增量表')
STORED AS PARQUET;

CheneyYin · 2024-04-26T04:26:33Z

It's better to provide the definition of the "zhanbi" field in the Oracle table. It seems like there might be an issue with the type conversion of the Oracle connector.

AdkinsHan · 2024-04-26T05:10:04Z

But it works well on seatunnel1.5.7,so i don't what is the reason that cause this

Oralce below:
Column Name # Type Type Mod Not Null Default Comment
ZHANBI 11 NUMBER(12,2) [NULL] false [NULL] 占比

CheneyYin · 2024-04-26T11:08:47Z

seatunnel/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/oracle/OracleTypeMapper.java

Lines 85 to 95 in cd4b30b

    
           case ORACLE_NUMBER: 
        
               if (scale == 0) { 
        
                   if (precision <= 9) { 
        
                       return BasicType.INT_TYPE; 
        
                   } else if (precision <= 18) { 
        
                       return BasicType.LONG_TYPE; 
        
                   } else if (precision <= 38) { 
        
                       return new DecimalType(38, 0); 
        
                   } 
        
               } 
        
               return new DecimalType(38, 18);

Because the scale of the field ZHANBI is 2, the field type will set to Decimal(38, 18).

❯ parq ./T_836191201227964417_6d287c425d_0_1_0.parquet -s                                                                       18:45:25

 # Schema 
 <pyarrow._parquet.ParquetSchema object at 0x7ff161507b40>
required group field_id=-1 SeaTunnelRecord {
  optional fixed_len_byte_array(16) field_id=-1 f (Decimal(precision=38, scale=18));
}

You can use sql cast the field as decimal(12, 2). Like this:

source {
  Jdbc {
    result_table_name = tbl
    driver = oracle.jdbc.driver.OracleDriver
    url = "jdbc:oracle:thin:@localhost:49161/xe"
    user = xxxxx
    password = xxx
    query = "select F from tbl"
    properties {
       database.oracle.jdbc.timezoneAsRegion = "false"
    }
  }
}

transform {
  sql {
    source_table_name = tbl
    result_table_name = t_tbl
    query = "select cast(F as decimal(12,2)) as F1 from tbl"
  }
}

sink {
  LocalFile {
    source_table_name = t_tbl
    path = "/tmp/hive/warehouse/test3"
    file_format_type = "parquet"
 }
}

CheneyYin · 2024-04-26T11:21:36Z

This issue has been solved at #5872. If you consider upgrading, you can also use the latest version 2.3.5, so you don't need to use sql transform.

AdkinsHan · 2024-04-28T08:19:11Z

Great, the problem is solved, which means that I can directly use decimal(38,18) or decimal(38,0) to process any precision numbers in the future.

CheneyYin · 2024-04-28T11:24:12Z

@hailin0 please close this issue.

AdkinsHan added the bug label Apr 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] [Sink] Bug Hive insert error org.apache.spark.sql.execution.QueryExecutionException: Parquet column cannot be converted in file hdfs:/db/xx/dt_mon=xxxx/qwxsadas12321321.parquet. Column: [xxx ], Expected: decimal(12,2), Found: FIXED_LEN_BYTE_ARRAY #6750

[Bug] [Sink] Bug Hive insert error org.apache.spark.sql.execution.QueryExecutionException: Parquet column cannot be converted in file hdfs:/db/xx/dt_mon=xxxx/qwxsadas12321321.parquet. Column: [xxx ], Expected: decimal(12,2), Found: FIXED_LEN_BYTE_ARRAY #6750

AdkinsHan commented Apr 24, 2024

CheneyYin commented Apr 25, 2024

AdkinsHan commented Apr 26, 2024 •

edited

CheneyYin commented Apr 26, 2024

AdkinsHan commented Apr 26, 2024 •

edited

CheneyYin commented Apr 26, 2024

CheneyYin commented Apr 26, 2024 •

edited

AdkinsHan commented Apr 28, 2024

CheneyYin commented Apr 28, 2024

[Bug] [Sink] Bug Hive insert error org.apache.spark.sql.execution.QueryExecutionException: Parquet column cannot be converted in file hdfs:/db/xx/dt_mon=xxxx/qwxsadas12321321.parquet. Column: [xxx ], Expected: decimal(12,2), Found: FIXED_LEN_BYTE_ARRAY #6750

[Bug] [Sink] Bug Hive insert error org.apache.spark.sql.execution.QueryExecutionException: Parquet column cannot be converted in file hdfs:/db/xx/dt_mon=xxxx/qwxsadas12321321.parquet. Column: [xxx ], Expected: decimal(12,2), Found: FIXED_LEN_BYTE_ARRAY #6750

Comments

AdkinsHan commented Apr 24, 2024

Search before asking

What happened

SeaTunnel Version

SeaTunnel Config

Running Command

Error Exception

Zeta or Flink or Spark Version

Java or Scala Version

Screenshots

Are you willing to submit PR?

Code of Conduct

CheneyYin commented Apr 25, 2024

AdkinsHan commented Apr 26, 2024 • edited

CheneyYin commented Apr 26, 2024

AdkinsHan commented Apr 26, 2024 • edited

CheneyYin commented Apr 26, 2024

CheneyYin commented Apr 26, 2024 • edited

AdkinsHan commented Apr 28, 2024

CheneyYin commented Apr 28, 2024

AdkinsHan commented Apr 26, 2024 •

edited

AdkinsHan commented Apr 26, 2024 •

edited

CheneyYin commented Apr 26, 2024 •

edited