Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workflow hangs, unable to proceed and mark completed when a sub dag failed to resolve the output parameter #12869

Open
3 of 4 tasks
tczhao opened this issue Apr 2, 2024 · 0 comments · May be fixed by #12991
Open
3 of 4 tasks
Assignees
Labels
area/controller Controller issues, panics area/templates/dag P1 High priority. All bugs with >=5 thumbs up that aren’t P0, plus: Any other bugs deemed high priority type/bug

Comments

@tczhao
Copy link
Member

tczhao commented Apr 2, 2024

Pre-requisites

  • I have double-checked my configuration
  • I can confirm the issue exists when I tested with :latest
  • I have searched existing issues and could not find a match for this bug
  • I'd like to contribute the fix myself (see contributing guide)

What happened/what did you expect to happen?

Workflow should marked Errored/Failed when an inner dag template failed

image

Version

latest

Paste a small workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflows that uses private images.

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: loop-test-
spec:
  entrypoint: main
  templates:
  - name: main
    dag:
      tasks:
        - name: print-json-entry-print-exitcode
          template: print-json-entry-print-exitcode
          arguments:
            parameters:
            - name: index
              value: '0'
        - name: call-access-aggregate-output
          depends: "print-json-entry-print-exitcode"
          template: access-aggregate-output
          arguments:
            parameters:
            - name: aggregate-results
              value: '{{tasks.print-json-entry-print-exitcode.outputs.parameters.exit-code}}'
  - name: print-json-entry-print-exitcode
    inputs:
      parameters:
        - name: index
    outputs:
      parameters:
        - name: exit-code
          valueFrom:
            parameter: "{{tasks.print-exitcode.outputs.result}}"
    dag:
      tasks:
        - name: print-json-entry
          template: print-json-entry
          arguments:
            parameters:
            - name: index
              value: '{{inputs.parameters.index}}'
        - name: print-exitcode
          depends: "print-json-entry.Failed"
          template: print-exitcode
          arguments:
            parameters:
            - name: exitcode
              value: '{{tasks.print-json-entry.exitCode}}'
  - name: print-json-entry
    inputs:
      parameters:
      - name: index
    container:
      image: alpine:latest
      command: [sh, -c]
      args: ["echo intentional failure; exit {{inputs.parameters.index}}"]
  - name: access-aggregate-output
    inputs:
      parameters:
      - name: aggregate-results
        value: 'no-value'
    script:
      image: alpine:latest
      command: [sh]
      source: |
        echo 'inputs.parameters.aggregate-results: "{{inputs.parameters.aggregate-results}}"'
  - name: print-exitcode
    inputs:
      parameters:
      - name: exitcode
        value: ''
    script:
      image: alpine:latest
      command: [sh]
      source: |
        echo '{{inputs.parameters.exitcode}}'

Logs from the workflow controller

github says comment too long but you can submit the workflow and reproduce it
here is the last few lines from controller log


time="2024-04-02T04:30:00.001Z" level=debug msg="Resolving the template" base="*v1alpha1.Workflow (namespace=,name=)" tmpl="*v1alpha1.WorkflowStep (workflow-template-whalesay-template/whalesay-template#false)"
time="2024-04-02T04:30:00.001Z" level=debug msg="Found stored template" base="*v1alpha1.Workflow (namespace=,name=)" tmpl="*v1alpha1.WorkflowStep (workflow-template-whalesay-template/whalesay-template#false)"
time="2024-04-02T04:30:00.001Z" level=debug msg="Resolving the template" base="*v1alpha1.Workflow (namespace=,name=)" tmpl="*v1alpha1.WorkflowStep (workflow-template-whalesay-template/whalesay-template#false)"
time="2024-04-02T04:30:00.001Z" level=debug msg="Found stored template" base="*v1alpha1.Workflow (namespace=,name=)" tmpl="*v1alpha1.WorkflowStep (workflow-template-whalesay-template/whalesay-template#false)"
time="2024-04-02T04:30:00.001Z" level=debug msg="Resolving the template" base="*v1alpha1.Workflow (namespace=,name=)" tmpl="*v1alpha1.WorkflowStep (workflow-template-whalesay-template/whalesay-template#false)"
time="2024-04-02T04:30:00.001Z" level=debug msg="Found stored template" base="*v1alpha1.Workflow (namespace=,name=)" tmpl="*v1alpha1.WorkflowStep (workflow-template-whalesay-template/whalesay-template#false)"
time="2024-04-02T04:30:00.001Z" level=debug msg="Resolving the template" base="*v1alpha1.Workflow (namespace=,name=)" tmpl="*v1alpha1.WorkflowStep (workflow-template-whalesay-template/whalesay-template#false)"
time="2024-04-02T04:30:00.001Z" level=debug msg="Found stored template" base="*v1alpha1.Workflow (namespace=,name=)" tmpl="*v1alpha1.WorkflowStep (workflow-template-whalesay-template/whalesay-template#false)"
time="2024-04-02T04:30:00.001Z" level=debug msg="Resolving the template" base="*v1alpha1.Workflow (namespace=,name=)" tmpl="*v1alpha1.WorkflowStep (workflow-template-whalesay-template/whalesay-template#false)"
time="2024-04-02T04:30:00.001Z" level=debug msg="Found stored template" base="*v1alpha1.Workflow (namespace=,name=)" tmpl="*v1alpha1.WorkflowStep (workflow-template-whalesay-template/whalesay-template#false)"
time="2024-04-02T04:30:00.001Z" level=debug msg="Getting the template by name: whalesay" base="*v1alpha1.Workflow (namespace=,name=)" tmpl="*v1alpha1.WorkflowStep (whalesay)"
time="2024-04-02T04:30:00.001Z" level=debug msg="Resolving the template" base="*v1alpha1.Workflow (namespace=,name=)" tmpl="*v1alpha1.WorkflowStep (whalesay)"
time="2024-04-02T04:30:00.001Z" level=debug msg="Getting the template" base="*v1alpha1.Workflow (namespace=,name=)" tmpl="*v1alpha1.WorkflowStep (whalesay)"
time="2024-04-02T04:30:00.001Z" level=debug msg="Getting the template by name: whalesay" base="*v1alpha1.Workflow (namespace=,name=)" tmpl="*v1alpha1.WorkflowStep (whalesay)"
time="2024-04-02T04:30:00.001Z" level=info msg="delightful-poochenheimer is suspended, skipping execution" namespace=argo workflow=delightful-poochenheimer
time="2024-04-02T04:30:00.004Z" level=debug msg="Patch cronworkflows 200"
time="2024-04-02T04:30:00.004Z" level=debug msg="Patch cronworkflows 200"
time="2024-04-02T04:30:00.112Z" level=info msg="cleaning up pod" action=killContainers key=argo/loop-test-pwszv-print-json-entry-2612912699/killContainers
time="2024-04-02T04:30:00.273Z" level=info msg="Processing workflow" Phase=Running ResourceVersion=286082 namespace=argo workflow=loop-test-pwszv
time="2024-04-02T04:30:00.274Z" level=info msg="Task-result reconciliation" namespace=argo numObjs=1 workflow=loop-test-pwszv
time="2024-04-02T04:30:00.274Z" level=debug msg="task result:\n&WorkflowTaskResult{ObjectMeta:{loop-test-pwszv-2612912699  argo  8c15dd8c-4999-4df5-a2f9-d2e96e14f732 286074 2 2024-04-02 04:29:53 +0000 UTC <nil> <nil> map[workflows.argoproj.io/report-outputs-completed:true workflows.argoproj.io/workflow:loop-test-pwszv] map[] [{argoproj.io/v1alpha1 Workflow loop-test-pwszv 5e453b4e-ca11-4530-b2ba-8d2e28a2072f <nil> <nil>}] [] [{argoexec Update argoproj.io/v1alpha1 2024-04-02 04:29:56 +0000 UTC FieldsV1 {\"f:metadata\":{\"f:labels\":{\".\":{},\"f:workflows.argoproj.io/report-outputs-completed\":{},\"f:workflows.argoproj.io/workflow\":{}},\"f:ownerReferences\":{\".\":{},\"k:{\\\"uid\\\":\\\"5e453b4e-ca11-4530-b2ba-8d2e28a2072f\\\"}\":{}}},\"f:outputs\":{\".\":{},\"f:artifacts\":{}}} }]},NodeResult:NodeResult{Phase:,Message:,Outputs:&Outputs{Parameters:[]Parameter{},Artifacts:[]Artifact{Artifact{Name:main-logs,Path:,Mode:nil,From:,ArtifactLocation:ArtifactLocation{ArchiveLogs:nil,S3:&S3Artifact{S3Bucket:S3Bucket{Endpoint:,Bucket:,Region:,Insecure:nil,AccessKeySecret:nil,SecretKeySecret:nil,RoleARN:,UseSDKCreds:false,CreateBucketIfNotPresent:nil,EncryptionOptions:nil,CASecret:nil,},Key:loop-test-pwszv/loop-test-pwszv-print-json-entry-2612912699/main.log,},Git:nil,HTTP:nil,Artifactory:nil,HDFS:nil,Raw:nil,OSS:nil,GCS:nil,Azure:nil,},GlobalName:,Archive:nil,Optional:false,SubPath:,RecurseMode:false,FromExpression:,ArtifactGC:nil,Deleted:false,},},Result:nil,ExitCode:nil,},Progress:,},}" namespace=argo workflow=loop-test-pwszv
time="2024-04-02T04:30:00.274Z" level=debug msg="task result name:\nloop-test-pwszv-2612912699" namespace=argo workflow=loop-test-pwszv
time="2024-04-02T04:30:00.274Z" level=debug msg="Marking task result complete loop-test-pwszv-2612912699" namespace=argo workflow=loop-test-pwszv
time="2024-04-02T04:30:00.274Z" level=info msg="task-result changed" namespace=argo nodeID=loop-test-pwszv-2612912699 workflow=loop-test-pwszv
time="2024-04-02T04:30:00.274Z" level=debug msg="Skipping artifact GC" namespace=argo workflow=loop-test-pwszv
time="2024-04-02T04:30:00.274Z" level=debug msg="Evaluating node loop-test-pwszv: template: *v1alpha1.WorkflowStep (main), boundaryID: " namespace=argo workflow=loop-test-pwszv
time="2024-04-02T04:30:00.274Z" level=debug msg="Resolving the template" base="*v1alpha1.Workflow (namespace=argo,name=loop-test-pwszv)" tmpl="*v1alpha1.WorkflowStep (main)"
time="2024-04-02T04:30:00.274Z" level=debug msg="Getting the template" base="*v1alpha1.Workflow (namespace=argo,name=loop-test-pwszv)" tmpl="*v1alpha1.WorkflowStep (main)"
time="2024-04-02T04:30:00.274Z" level=debug msg="Getting the template by name: main" base="*v1alpha1.Workflow (namespace=argo,name=loop-test-pwszv)" tmpl="*v1alpha1.WorkflowStep (main)"
time="2024-04-02T04:30:00.274Z" level=debug msg="unresolved is allowed " error=unresolved
time="2024-04-02T04:30:00.274Z" level=debug msg="unresolved is allowed " error=unresolved
time="2024-04-02T04:30:00.274Z" level=debug msg="Executing node loop-test-pwszv of DAG is Running" namespace=argo workflow=loop-test-pwszv
time="2024-04-02T04:30:00.274Z" level=debug msg="Resolving the template" base="*v1alpha1.Workflow (namespace=argo,name=loop-test-pwszv)" tmpl="*v1alpha1.DAGTask (access-aggregate-output)"
time="2024-04-02T04:30:00.274Z" level=debug msg="Getting the template" base="*v1alpha1.Workflow (namespace=argo,name=loop-test-pwszv)" tmpl="*v1alpha1.DAGTask (access-aggregate-output)"
time="2024-04-02T04:30:00.274Z" level=debug msg="Getting the template by name: access-aggregate-output" base="*v1alpha1.Workflow (namespace=argo,name=loop-test-pwszv)" tmpl="*v1alpha1.DAGTask (access-aggregate-output)"
time="2024-04-02T04:30:00.274Z" level=debug msg="unresolved is allowed " error=unresolved
time="2024-04-02T04:30:00.274Z" level=debug msg="unresolved is allowed " error=unresolved
time="2024-04-02T04:30:00.274Z" level=debug msg="unresolved is allowed " error=unresolved
time="2024-04-02T04:30:00.274Z" level=info msg="TaskSet Reconciliation" namespace=argo workflow=loop-test-pwszv
time="2024-04-02T04:30:00.274Z" level=info msg=reconcileAgentPod namespace=argo workflow=loop-test-pwszv


### Logs from in your workflow's wait container

```text
kubectl logs -n argo -c wait -l workflows.argoproj.io/workflow=${workflow},workflow.argoproj.io/phase!=Succeeded
@tczhao tczhao added type/bug area/controller Controller issues, panics labels Apr 2, 2024
@tczhao tczhao changed the title Workflow hangs, fails to mark completed when a sub dag failed to resolve the output parameter Workflow hangs, unable to proceed to mark completed when a sub dag failed to resolve the output parameter Apr 2, 2024
@tczhao tczhao changed the title Workflow hangs, unable to proceed to mark completed when a sub dag failed to resolve the output parameter Workflow hangs, unable to proceed and mark completed when a sub dag failed to resolve the output parameter Apr 2, 2024
@isubasinghe isubasinghe self-assigned this Apr 2, 2024
@agilgur5 agilgur5 added area/templates/dag P1 High priority. All bugs with >=5 thumbs up that aren’t P0, plus: Any other bugs deemed high priority labels Apr 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/controller Controller issues, panics area/templates/dag P1 High priority. All bugs with >=5 thumbs up that aren’t P0, plus: Any other bugs deemed high priority type/bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants