Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] [OSS Checkpoint] OSS checkpoint not working #6779

Open
2 of 3 tasks
shawyb opened this issue Apr 30, 2024 · 2 comments
Open
2 of 3 tasks

[Bug] [OSS Checkpoint] OSS checkpoint not working #6779

shawyb opened this issue Apr 30, 2024 · 2 comments
Labels

Comments

@shawyb
Copy link

shawyb commented Apr 30, 2024

Search before asking

  • I had searched in the issues and found no similar issues.

What happened

When I use Aliyun OSS to store the checkpoint, the configuration is as follows, the storage is successful, and I can find the checkpoint from the OSS file. I deployed the seatunnel server using Docker. When I restart, the real-time synchronization tasks I previously established disappear, and the server does not reload historical tasks from the checkpoint. Does seatunel have the ability to reload historical tasks from checkpoint?

seatunnel:
engine:
history-job-expire-minutes: 1440
backup-count: 1
queue-type: blockingqueue
print-execution-info-interval: 60
print-job-metrics-info-interval: 60
slot-service:
dynamic-slot: true
checkpoint:
interval: 10000
timeout: 600000
storage:
type: hdfs
max-retained: 3
plugin-config:
storage.type: oss
namespace: /tmp/seatunnel/checkpoint_snapshot
oss.bucket: oss://xxx
fs.oss.accessKeyId: xxx
fs.oss.accessKeySecret: xxx
fs.oss.endpoint: oss-cn-hangzhou.aliyuncs.com

SeaTunnel Version

2.3.3

SeaTunnel Config

seatunnel:
  engine:
    history-job-expire-minutes: 1440
    backup-count: 1
    queue-type: blockingqueue
    print-execution-info-interval: 60
    print-job-metrics-info-interval: 60
    slot-service:
      dynamic-slot: true
    checkpoint:
      interval: 10000
      timeout: 600000
      storage:
        type: hdfs
        max-retained: 3
        plugin-config:
          storage.type: oss
          namespace: /tmp/seatunnel/checkpoint_snapshot
          oss.bucket: oss://xxx
          fs.oss.accessKeyId: xxx
          fs.oss.accessKeySecret: xxx
          fs.oss.endpoint: oss-cn-hangzhou.aliyuncs.com

Running Command

run

Error Exception

ERROR org.apache.seatunnel.engine.server.operation.GetJobStatusOperation - [localhost]:5801 [seatunnel] [5.1] null
java.lang.NullPointerException: null
    at org.apache.seatunnel.engine.server.operation.GetJobStatusOperation.run(GetJobStatusOperation.java:81) ~[seatunnel-starter.jar:2.3.3]
    at com.hazelcast.spi.impl.operationservice.Operation.call(Operation.java:189) ~[seatunnel-starter.jar:2.3.3]
    at com.hazelcast.spi.impl.operationservice.impl.OperationRunnerImpl.call(OperationRunnerImpl.java:273) ~[seatunnel-starter.jar:2.3.3]
    at com.hazelcast.spi.impl.operationservice.impl.OperationRunnerImpl.run(OperationRunnerImpl.java:248) ~[seatunnel-starter.jar:2.3.3]
    at com.hazelcast.spi.impl.operationservice.impl.OperationRunnerImpl.run(OperationRunnerImpl.java:213) ~[seatunnel-starter.jar:2.3.3]
    at com.hazelcast.spi.impl.operationexecutor.impl.OperationExecutorImpl.run(OperationExecutorImpl.java:411) ~[seatunnel-starter.jar:2.3.3]
    at com.hazelcast.spi.impl.operationexecutor.impl.OperationExecutorImpl.runOrExecute(OperationExecutorImpl.java:438) ~[seatunnel-starter.jar:2.3.3]
    at com.hazelcast.spi.impl.operationservice.impl.Invocation.doInvokeLocal(Invocation.java:601) ~[seatunnel-starter.jar:2.3.3]
    at com.hazelcast.spi.impl.operationservice.impl.Invocation.doInvoke(Invocation.java:580) ~[seatunnel-starter.jar:2.3.3]
    at com.hazelcast.spi.impl.operationservice.impl.Invocation.invoke0(Invocation.java:541) ~[seatunnel-starter.jar:2.3.3]
    at com.hazelcast.spi.impl.operationservice.impl.Invocation.invoke(Invocation.java:241) ~[seatunnel-starter.jar:2.3.3]
    at com.hazelcast.spi.impl.operationservice.impl.InvocationBuilderImpl.invoke(InvocationBuilderImpl.java:61) ~[seatunnel-starter.jar:2.3.3]
    at com.hazelcast.client.impl.protocol.task.AbstractInvocationMessageTask.processInternal(AbstractInvocationMessageTask.java:38) ~[seatunnel-starter.jar:2.3.3]
    at com.hazelcast.client.impl.protocol.task.AbstractAsyncMessageTask.processMessage(AbstractAsyncMessageTask.java:71) ~[seatunnel-starter.jar:2.3.3]
    at com.hazelcast.client.impl.protocol.task.AbstractMessageTask.initializeAndProcessMessage(AbstractMessageTask.java:153) ~[seatunnel-starter.jar:2.3.3]
    at com.hazelcast.client.impl.protocol.task.AbstractMessageTask.run(AbstractMessageTask.java:116) ~[seatunnel-starter.jar:2.3.3]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_261]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_261]
    at java.lang.Thread.run(Thread.java:748) [?:1.8.0_261]
    at com.hazelcast.internal.util.executor.HazelcastManagedThread.executeRun(HazelcastManagedThread.java:76) ~[seatunnel-starter.jar:2.3.3]
    at com.hazelcast.internal.util.executor.HazelcastManagedThread.run(HazelcastManagedThread.java:102) ~[seatunnel-starter.jar:2.3.3]

Zeta or Flink or Spark Version

zeta 2.3.3

Java or Scala Version

1.8

Screenshots

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@shawyb shawyb added the bug label Apr 30, 2024
@xinfeingxia85
Copy link

I also encountered this issue, but now I've been utilizing Ali OSS HDFS to store my checkpoints successfully. I suggest you consider testing this solution!

@shawyb
Copy link
Author

shawyb commented May 8, 2024

I also encountered this issue, but now I've been utilizing Ali OSS HDFS to store my checkpoints successfully. I suggest you consider testing this solution!

我存储成功了,但是如果重启docker的话不会读取checkpoint,所有任务都丢失了

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants