Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] [S3File] [zeta-local] Error writing to S3File in version 2.3.4:: Java lang. An IllegalStateException: Connection pool shut down #6678

Closed
2 of 3 tasks
LeonYoah opened this issue Apr 10, 2024 · 20 comments · Fixed by #6717
Assignees

Comments

@LeonYoah
Copy link
Contributor

Search before asking

  • I had searched in the issues and found no similar issues.

What happened

When jdbc- > s3File is executed using the local mode, it occurs sporadically, good and bad in most cases, but there is no problem with using the cluster mode:
image

I probably searched for this error about using aws-sdk: Connection pool shut down, the related issues is awslabs/amazon-sqs-java-messaging-lib#96.
It mentions the problem of multithreaded connection pooling, so I initially guess whether it is possible that there is a [aggregate commit] operation after [S3file] executes the [sink] operation, but the [sink] operation executes the close method, which causes the [rename] of [commit] to report an error, indicating that the connection pool is closed, but why the lcoal mode is good and bad, the cluster mode is not a problem.
And I've tried the 2.3.3 local mode, and that's not the case.
error.txt

SeaTunnel Version

2.3.4

SeaTunnel Config

env {
  execution.parallelism = 1
  job.mode = "BATCH"
}

source {
	Jdbc {
		result_table_name = "TEST_100W"
		query = "select * from SYSDBA.TEST_100W limit 10"
		fetch_size = 5000
		table_path = "SYSDBA.TEST_100W"
		driver = "com.mysql.cj.jdbc.Driver"
		url = "jdbc:mysql://10.28.23.xxx:3306/test"
		user = "test"
		password ="xxx"
	} 
}
sink {
  S3File {
    source_table_name = "aa"
    path = "/xugurtp/seatunnel/tmp/6af80b38f3434aceb573cc65b9cd12216a/3918"
    bucket = "s3a://xugurtp"
    fs.s3a.endpoint = "http://10.28.23.xxx:9010"
    fs.s3a.aws.credentials.provider = "org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider"
    access_key = "xxxx"
    secret_key = "xxxxxxx"
    custom_filename = true
    file_name_expression = "output_params"
    file_format_type = "json"
    is_enable_transaction = false
    ##sink_columns = ["ID"]
  }
}

Running Command

./bin/seatunnel.sh  -e local --config job/s3_sink.conf

Error Exception

2024-04-10 14:02:28,387 ERROR [.c.FileSinkAggregatedCommitter] [hz.main.generic-operation.thread-43] - commit aggregatedCommitInfo error, aggregatedCommitInfo = FileAggregatedCommitInfo(transactionMap={/tmp/seatunnel/seatunnel/830321799827816449/3eff006528/T_830321799827816449_3eff006528_0_1={/tmp/seatunnel/seatunnel/830321799827816449/3eff006528/T_830321799827816449_3eff006528_0_1/NON_PARTITION/output_params_0.json=/xugurtp/seatunnel/tmp/6af80b38f3434aceb573cc65b9cd12216a/3918/output_params_0.json}}, partitionDirAndValuesMap={}) 
java.lang.IllegalStateException: Connection pool shut down
        at com.amazonaws.thirdparty.apache.http.util.Asserts.check(Asserts.java:34) ~[aws-java-sdk-bundle-1.11.271.jar:?]
        at com.amazonaws.thirdparty.apache.http.pool.AbstractConnPool.lease(AbstractConnPool.java:184) ~[aws-java-sdk-bundle-1.11.271.jar:?]
        at com.amazonaws.thirdparty.apache.http.impl.conn.PoolingHttpClientConnectionManager.requestConnection(PoolingHttpClientConnectionManager.java:251) ~[aws-java-sdk-bundle-1.11.271.jar:?]

Zeta or Flink or Spark Version

local模式

Java or Scala Version

1.8

Screenshots

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@LeonYoah LeonYoah added the bug label Apr 10, 2024
@LeonYoah
Copy link
Contributor Author

In addition, there's nothing wrong with debugging with idea, but that's what happens on the server.

@LeonYoah
Copy link
Contributor Author

@ruanwenjun may have something to do with your issue #5903, which is whether checkpoint uses hdfs or cache. I saw that you submitted #6039 and noticed the cache problem, but checkpoint did not: 在这个类;org.apache.seatunnel.engine.checkpoint.storage.hdfs.common.HdfsConfiguration
image

@LeonYoah
Copy link
Contributor Author

And this code:
f69f6109842ad735b032077b37310df
d9e9d4eb54ab346be53e7b1dadb6e56
Sometimes hadoopConf's getSchema method returns s3n instead of s3a, resulting in the s3 connector actually being cached

@LeonYoah
Copy link
Contributor Author

LeonYoah commented Apr 15, 2024

s3n is because the S3Conf obtained from AggregatedCommit does not use this buildWithConfig method, but uses DEFAULT_SCHEMA. debug finds that it seems to be related to dag:

buildWithConfig method screenshot:

image

DAG initialization S3conf screenshot:

image

image

AggregatedCommit获取的hadoopconf和shema截图:

image

So far I have found two solutions:

1. Change DEFAULT_SCHEMA to s3a:

image

2. Set up the profile:

image

_ But I am not familiar with the code of dag and AggregatedCommint, I need to ask my teacher to help me look at the root cause! _

@LeonYoah LeonYoah changed the title [Bug] [S3File] [zeta-local] There is an occasional problem with S3File writing. An error was reported [Bug] [S3File] [zeta-local] Write times wrong version 2.3.4 S3File: Java lang. An IllegalStateException: Connection pool shut down Apr 15, 2024
@LeonYoah LeonYoah changed the title [Bug] [S3File] [zeta-local] Write times wrong version 2.3.4 S3File: Java lang. An IllegalStateException: Connection pool shut down [Bug] [S3File] [zeta-local] Error writing to S3File in version 2.3.4:: Java lang. An IllegalStateException: Connection pool shut down Apr 15, 2024
@LeonYoah
Copy link
Contributor Author

@EricJoy2048 I see that you have been working on the mutli table feature #6698 of S3file connector recently. Have you ever encountered that the schema in hadoop conf obtained by mutli table is the default s3n instead of the s3a specified in the configuration file

@EricJoy2048
Copy link
Member

I'll look at that as soon as I can

@EricJoy2048
Copy link
Member

  1. SeaTunnel Engine(Zeta) will use hdfs api to write checkpoint to FileSystem, The config is at $SEATUNNEL_HOME/conf/seatunnel.yaml, And the codes are in seatunnel-engine/seatunnel-engine-storage/checkpoint-storage-plugins/checkpoint-storage-hdfs/src/main/java/org/apache/seatunnel/engine/checkpoint/storage/hdfs/common package . I do not know the content about this config file in your environment, But I think we need disable the cache in checkpoint storage too.

@EricJoy2048
Copy link
Member

  1. I can show you how the FileSinkAggregatedCommitter init.
image image image

@EricJoy2048
Copy link
Member

image

@EricJoy2048
Copy link
Member

Sometimes hadoopConf's getSchema method returns s3n instead of s3a, resulting in the s3 connector actually being cached Is this reproducible?

@LeonYoah
Copy link
Contributor Author

Sometimes hadoopConf's getSchema method returns s3n instead of s3a, resulting in the s3 connector actually being cached Is this reproducible?#00000;是可复制的?

Yes, I am now 100% repeat, I officially downloaded a seatunnel2.3.4 package and plug-in, deployed on the server and executed in local mode, there will be such a problem, but local idea debugging is not possible, it cannot appear

@EricJoy2048
Copy link
Member

Sometimes hadoopConf's getSchema method returns s3n instead of s3a, resulting in the s3 connector actually being cached Is this reproducible?

Can you try use remote debug?

@LeonYoah
Copy link
Contributor Author

LeonYoah commented Apr 16, 2024

Sometimes hadoopConf's getSchema method returns s3n instead of s3a, resulting in the s3 connector actually being cached Is this reproducible?#00000;是可复制的?

Can you try use remote debug?你能尝试使用远程调试吗?

Yes, I discovered through remote debugging that s3n was actually being passed
image

@EricJoy2048
Copy link
Member

我注意到这块代码其实是没问题的,通过执行buildwith方法会赋值shema为s3a,但是关键在于整个sink往下游传递给mutiltable sink时,会执行反序列化操作

Sometimes hadoopConf's getSchema method returns s3n instead of s3a, resulting in the s3 connector actually being cached Is this reproducible?#00000;是可复制的?

Can you try use remote debug?你能尝试使用远程调试吗?

Yes, I discovered through remote debugging that s3n was actually being passed image

I understand now. It's really a matter of serialization and deserialization of static variables. I don't think we should define static variables for classes that need to be serialized and deserialized. Can you put up a pr to fix this?

@LeonYoah
Copy link
Contributor Author

Sometimes hadoopConf's getSchema method returns s3n instead of s3a, resulting in the s3 connector actually being cached Is this reproducible?#00000;是可复制的?

Can you try use remote debug?你能尝试使用远程调试吗?

I seem to have spotted the problem, noting that this code is actually fine, by executing the buildWithConfig method it assigns [SCHEMA] to "s3a", but the key is that when the whole [sink] is passed downstream to [multiTableSink], it is de-serialized, The deserialization process re-instantiates the static variables, and the member variables of the entire S3CONF class are [stastic] modified, including [SCHEMA], resulting in [mutiltable sink] getting the default value of [SCHEMA], which is "s3n".

image

@LeonYoah
Copy link
Contributor Author

我注意到这块代码其实是没问题的,通过执行buildwith方法会赋值shema为s3a,但是关键在于整个sink往下游传递给mutiltable sink时,会执行反序列化操作

Sometimes hadoopConf's getSchema method returns s3n instead of s3a, resulting in the s3 connector actually being cached Is this reproducible?#00000;是可复制的?

Can you try use remote debug?你能尝试使用远程调试吗?

Yes, I discovered through remote debugging that s3n was actually being passed image是的,我通过远程调试发现s3n实际上正在传递 image

I understand now. It's really a matter of serialization and deserialization of static variables. I don't think we should define static variables for classes that need to be serialized and deserialized. Can you put up a pr to fix this?我现在明白了这实际上是 static variables 的序列化和重复化的问题。我不认为我们应该为需要序列化和非序列化的类定义 static variables 。你能做个公关来解决这个问题吗?

Ok, I am willing to submit the PR, I have modified the first version to try to remove the [static] changes, but I see that you submitted a change to the s3 connector, maybe my submission caused a conflict

@EricJoy2048
Copy link
Member

我注意到这块代码其实是没问题的,通过执行buildwith方法会赋值shema为s3a,但是关键在于整个sink往下游传递给mutiltable sink时,会执行反序列化操作

Sometimes hadoopConf's getSchema method returns s3n instead of s3a, resulting in the s3 connector actually being cached Is this reproducible?#00000;是可复制的?

Can you try use remote debug?你能尝试使用远程调试吗?

Yes, I discovered through remote debugging that s3n was actually being passed image是的,我通过远程调试发现s3n实际上正在传递 image

I understand now. It's really a matter of serialization and deserialization of static variables. I don't think we should define static variables for classes that need to be serialized and deserialized. Can you put up a pr to fix this?我现在明白了这实际上是 static variables 的序列化和重复化的问题。我不认为我们应该为需要序列化和非序列化的类定义 static variables 。你能做个公关来解决这个问题吗?

Ok, I am willing to submit the PR, I have modified the first version to try to remove the [static] changes, but I see that you submitted a change to the s3 connector, maybe my submission caused a conflict

Don't worry, I will resolve the conflict after your pr merge.

@EricJoy2048
Copy link
Member

Related PR: #6698

@EricJoy2048 EricJoy2048 reopened this Apr 16, 2024
@EricJoy2048
Copy link
Member

You can add close #6678 to the description of pr in pr to ensure that the issue is automatically closed when pr merges

@LeonYoah
Copy link
Contributor Author

You can add close #6678 to the description of pr in pr to ensure that the issue is automatically closed when pr merges
Ok

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants