Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Spark jar task failed to run #1114

Open
2 of 3 tasks
Narcasserun opened this issue Sep 1, 2023 · 3 comments
Open
2 of 3 tasks

[Bug] Spark jar task failed to run #1114

Narcasserun opened this issue Sep 1, 2023 · 3 comments
Labels
bug Something isn't working

Comments

@Narcasserun
Copy link

Search before asking

  • I had searched in the issues and found no similar issues.

What happened

When configuring the cluster components and running the Spark jar task, it was found that it could not run successfully

What you expected to happen

image

How to reproduce

I ran a spark pi task with parameters of 10 or 100, and the Application Master would link the parameters as hosts

Application application_1693541457708_0007 failed 1 times (global limit =2; local limit is =1) due to AM Container for appattempt_1693541457708_0007_000001 exited with exitCode: 10
Failing this attempt.Diagnostics: [2023-09-01 15:56:35.718]Exception from container-launch.
Container id: container_e130_1693541457708_0007_01_000001
Exit code: 10
[2023-09-01 15:56:35.719]Container exited with a non-zero exit code 10. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
etrying ...
23/09/01 15:56:33 ERROR yarn.ApplicationMaster: Failed to connect to driver at 10:0, retrying ...
23/09/01 15:56:33 ERROR yarn.ApplicationMaster: Failed to connect to driver at 10:0, retrying ...
23/09/01 15:56:33 ERROR yarn.ApplicationMaster: Failed to connect to driver at 10:0, retrying ...
23/09/01 15:56:33 ERROR yarn.ApplicationMaster: Failed to connect to driver at 10:0, retrying ...
23/09/01 15:56:33 ERROR yarn.ApplicationMaster: Failed to connect to driver at 10:0, retrying ...
23/09/01 15:56:33 ERROR yarn.ApplicationMaster: Failed to connect to driver at 10:0, retrying ...
23/09/01 15:56:33 ERROR yarn.ApplicationMaster: Failed to connect to driver at 10:0, retrying ...
23/09/01 15:56:33 ERROR yarn.ApplicationMaster: Failed to connect to driver at 10:0, retrying ...
23/09/01 15:56:34 ERROR yarn.ApplicationMaster: Failed to connect to driver at 10:0, retrying ...
23/09/01 15:56:34 ERROR yarn.ApplicationMaster: Failed to connect to driver at 10:0, retrying ...
23/09/01 15:56:34 ERROR yarn.ApplicationMaster: Failed to connect to driver at 10:0, retrying ...
23/09/01 15:56:34 ERROR yarn.ApplicationMaster: Failed to connect to driver at 10:0, retrying ...
23/09/01 15:56:34 ERROR yarn.ApplicationMaster: Failed to connect to driver at 10:0, retrying ...
23/09/01 15:56:34 ERROR yarn.ApplicationMaster: Failed to connect to driver at 10:0, retrying ...
23/09/01 15:56:34 ERROR yarn.ApplicationMaster: Failed to connect to driver at 10:0, retrying ...
23/09/01 15:56:34 ERROR yarn.ApplicationMaster: Failed to connect to driver at 10:0, retrying ...
23/09/01 15:56:34 ERROR yarn.ApplicationMaster: Failed to connect to driver at 10:0, retrying ...
23/09/01 15:56:34 ERROR yarn.ApplicationMaster: Failed to connect to driver at 10:0, retrying ...
23/09/01 15:56:35 ERROR yarn.ApplicationMaster: Failed to connect to driver at 10:0, retrying ...
23/09/01 15:56:35 ERROR yarn.ApplicationMaster: Failed to connect to driver at 10:0, retrying ...
23/09/01 15:56:35 ERROR yarn.ApplicationMaster: Failed to connect to driver at 10:0, retrying ...
23/09/01 15:56:35 ERROR yarn.ApplicationMaster: Failed to connect to driver at 10:0, retrying ...
23/09/01 15:56:35 ERROR yarn.ApplicationMaster: Failed to connect to driver at 10:0, retrying ...
23/09/01 15:56:35 ERROR yarn.ApplicationMaster: Uncaught exception:
org.apache.spark.SparkException: Failed to connect to driver!
at org.apache.spark.deploy.yarn.ApplicationMaster.waitForSparkDriver(ApplicationMaster.scala:579)
at org.apache.spark.deploy.yarn.ApplicationMaster.runExecutorLauncher(ApplicationMaster.scala:434)
at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:256)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$main$1.apply$mcV$sp(ApplicationMaster.scala:766)
at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:67)
at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:66)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:66)
at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:764)
at org.apache.spark.deploy.yarn.ExecutorLauncher$.main(ApplicationMaster.scala:787)
at org.apache.spark.deploy.yarn.ExecutorLauncher.main(ApplicationMaster.scala)
23/09/01 15:56:35 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 10, (reason: Uncaught exception: org.apache.spark.SparkException: Failed to connect to driver!)
23/09/01 15:56:35 INFO yarn.ApplicationMaster: Unregistering ApplicationMaster with FAILED (diag message: Uncaught exception: org.apache.spark.SparkException: Failed to connect to driver!)
23/09/01 15:56:35 INFO yarn.ApplicationMaster: Deleting staging directory hdfs://lcc-ambari-server01:8020/user/admin/.sparkStaging/application_1693541457708_0007
23/09/01 15:56:35 INFO util.ShutdownHookManager: Shutdown hook called
[2023-09-01 15:56:35.719]Container exited with a non-zero exit code 10. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
etrying ...
23/09/01 15:56:33 ERROR yarn.ApplicationMaster: Failed to connect to driver at 10:0, retrying ...
23/09/01 15:56:33 ERROR yarn.ApplicationMaster: Failed to connect to driver at 10:0, retrying ...
23/09/01 15:56:33 ERROR yarn.ApplicationMaster: Failed to connect to driver at 10:0, retrying ...
23/09/01 15:56:33 ERROR yarn.ApplicationMaster: Failed to connect to driver at 10:0, retrying ...
23/09/01 15:56:33 ERROR yarn.ApplicationMaster: Failed to connect to driver at 10:0, retrying ...
23/09/01 15:56:33 ERROR yarn.ApplicationMaster: Failed to connect to driver at 10:0, retrying ...
23/09/01 15:56:33 ERROR yarn.ApplicationMaster: Failed to connect to driver at 10:0, retrying ...
23/09/01 15:56:33 ERROR yarn.ApplicationMaster: Failed to connect to driver at 10:0, retrying ...
23/09/01 15:56:34 ERROR yarn.ApplicationMaster: Failed to connect to driver at 10:0, retrying ...
23/09/01 15:56:34 ERROR yarn.ApplicationMaster: Failed to connect to driver at 10:0, retrying ...
23/09/01 15:56:34 ERROR yarn.ApplicationMaster: Failed to connect to driver at 10:0, retrying ...
23/09/01 15:56:34 ERROR yarn.ApplicationMaster: Failed to connect to driver at 10:0, retrying ...
23/09/01 15:56:34 ERROR yarn.ApplicationMaster: Failed to connect to driver at 10:0, retrying ...
23/09/01 15:56:34 ERROR yarn.ApplicationMaster: Failed to connect to driver at 10:0, retrying ...
23/09/01 15:56:34 ERROR yarn.ApplicationMaster: Failed to connect to driver at 10:0, retrying ...
23/09/01 15:56:34 ERROR yarn.ApplicationMaster: Failed to connect to driver at 10:0, retrying ...
23/09/01 15:56:34 ERROR yarn.ApplicationMaster: Failed to connect to driver at 10:0, retrying ...
23/09/01 15:56:34 ERROR yarn.ApplicationMaster: Failed to connect to driver at 10:0, retrying ...
23/09/01 15:56:35 ERROR yarn.ApplicationMaster: Failed to connect to driver at 10:0, retrying ...
23/09/01 15:56:35 ERROR yarn.ApplicationMaster: Failed to connect to driver at 10:0, retrying ...
23/09/01 15:56:35 ERROR yarn.ApplicationMaster: Failed to connect to driver at 10:0, retrying ...
23/09/01 15:56:35 ERROR yarn.ApplicationMaster: Failed to connect to driver at 10:0, retrying ...
23/09/01 15:56:35 ERROR yarn.ApplicationMaster: Failed to connect to driver at 10:0, retrying ...
23/09/01 15:56:35 ERROR yarn.ApplicationMaster: Uncaught exception:
org.apache.spark.SparkException: Failed to connect to driver!
at org.apache.spark.deploy.yarn.ApplicationMaster.waitForSparkDriver(ApplicationMaster.scala:579)
at org.apache.spark.deploy.yarn.ApplicationMaster.runExecutorLauncher(ApplicationMaster.scala:434)
at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:256)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$main$1.apply$mcV$sp(ApplicationMaster.scala:766)
at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:67)
at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:66)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:66)
at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:764)
at org.apache.spark.deploy.yarn.ExecutorLauncher$.main(ApplicationMaster.scala:787)
at org.apache.spark.deploy.yarn.ExecutorLauncher.main(ApplicationMaster.scala)
23/09/01 15:56:35 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 10, (reason: Uncaught exception: org.apache.spark.SparkException: Failed to connect to driver!)
23/09/01 15:56:35 INFO yarn.ApplicationMaster: Unregistering ApplicationMaster with FAILED (diag message: Uncaught exception: org.apache.spark.SparkException: Failed to connect to driver!)
23/09/01 15:56:35 INFO yarn.ApplicationMaster: Deleting staging directory hdfs://lcc-ambari-server01:8020/user/admin/.sparkStaging/application_1693541457708_0007
23/09/01 15:56:35 INFO util.ShutdownHookManager: Shutdown hook called
For more detailed output, check the application tracking page: http://lcc-ambari-server01:8188/applicationhistory/app/application_1693541457708_0007 Then click on links to logs of each attempt.
. Failing the application.

Anything else

No response

Version

master

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@Narcasserun Narcasserun added the bug Something isn't working label Sep 1, 2023
@Narcasserun
Copy link
Author

@vainhope @mortalYoung

@vainhope
Copy link
Collaborator

vainhope commented Sep 4, 2023

从日志中看,是spark任务的AppMaster无法连接至Driver,所以任务失败
确认下是否有网络不通的问题呢

@Narcasserun
Copy link
Author

Narcasserun commented Sep 5, 2023

它会拿taier上spark jar任务的输入参数,作为dirver的host, 0 作为port, 我试了不同的spark jar 任务,都是一样的问题 @vainhope

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants