Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A child job step run on a remote node reports _Killed_ but Job reports _Succeeded_ #9093

Open
ajxb opened this issue May 1, 2024 · 0 comments

Comments

@ajxb
Copy link
Contributor

ajxb commented May 1, 2024

Describe the bug

I have a job that is set to execute locally. As part of the workflow it executes some commands then hands over to a job reference step that uses the referenced job defined node filter. The referenced job is designed to restart other nodes based on a parameter.

The referenced job itself references a common job that we use to fetch packages from an artefact repository.

When we run this job it completes successfully, and I've verified that all the steps have been executed. However the job execution reports that the common referenced job was Killed.

image

This output is misleading and is going to confuse some of my users to the point where I'm reluctant to upgrade.

This problem became apparent in Rundeck 4.17.2, it is not evident in Rundeck 4.17.1. I have a suspicion that it might be related to changes made in PR #8494.

I have put together a simplified, but rather contrived example of what I'm seeing in the To Reproduce section below.

My Rundeck detail

  • Rundeck version: 4.17.2
  • install type: rpm
  • OS Name/version: Oracle Linux 8
  • DB Type/version: mysql (mariadb)

To Reproduce

  1. Create a new project with Default Node Executor set to Local, and an additional Stub node source. The project properties file can be seen below:

    project.retry-counter=3
    resources.source.2.config.tags=stub
    project.later.executions.enable=false
    resources.source.2.config.count=1
    project.name=bug-example
    project.jobs.gui.groupExpandLevel=1
    project.execution.history.cleanup.batch=500
    project.disable.executions=false
    project.execution.history.cleanup.retention.minimum=50
    project.ssh-authentication=privateKey
    resources.source.2.type=stub
    project.nodeCache.enabled=true
    project.execution.history.cleanup.retention.days=60
    project.later.schedule.enable=false
    project.description=
    project.nodeCache.firstLoadSynch=true
    project.later.schedule.disable=false
    service.NodeExecutor.default.provider=local
    project.execution.history.cleanup.schedule=0 0 0 1/1 * ? *
    project.disable.schedule=false
    project.later.executions.disable=false
    resources.source.2.config.prefix=node
    project.label=bug-example
    resources.source.1.type=local
    service.FileCopier.default.provider=sshj-scp
    resources.source.2.config.delay=0
    project.output.allowUnsanitized=false
    project.execution.history.cleanup.enabled=false
  2. Create a common job called fetch package:

    - defaultTab: nodes
      description: ''
      executionEnabled: true
      id: b7e60fdf-79e5-4b1e-8651-113a50421018
      loglevel: INFO
      multipleExecutions: true
      name: fetch package
      nodeFilterEditable: false
      nodefilters:
        dispatch:
          excludePrecedence: true
          keepgoing: true
          rankOrder: ascending
          successOnEmptyNodeFilter: false
          threadcount: 1
        filter: ''
      nodesSelectedByDefault: true
      plugins:
        ExecutionLifecycle: null
      scheduleEnabled: true
      sequence:
        commands:
        - description: Fetch Package
          exec: echo "fetch package"
        keepgoing: false
        strategy: node-first
      uuid: b7e60fdf-79e5-4b1e-8651-113a50421018
  3. Create a second job called restart, this calls fetch package during execution:

    - defaultTab: nodes
      description: ''
      executionEnabled: true
      id: dff91746-f17a-4647-8c4a-391d54c4156d
      loglevel: INFO
      multipleExecutions: true
      name: restart
      nodeFilterEditable: false
      nodefilters:
        dispatch:
          excludePrecedence: true
          keepgoing: true
          rankOrder: ascending
          successOnEmptyNodeFilter: false
          threadcount: 35
        filter: ${option.Node}
      nodesSelectedByDefault: true
      plugins:
        ExecutionLifecycle: null
      scheduleEnabled: true
      sequence:
        commands:
        - description: Get restart scripts
          jobref:
            group: ''
            name: fetch package
            nodeStep: 'true'
            nodefilters:
              filter: '"${node.name}"'
            useName: 'true'
            uuid: b7e60fdf-79e5-4b1e-8651-113a50421018
        - description: Restart service
          exec: echo "restart"
        keepgoing: false
        strategy: node-first
      timeout: 20m
      uuid: dff91746-f17a-4647-8c4a-391d54c4156d
  4. Create the top-level job deploy that runs some steps on localhost, then executes restart on other nodes:

    - defaultTab: nodes
      description: ''
      executionEnabled: true
      id: 60546626-1044-4a10-a3a5-3bfc9b51c73c
      loglevel: INFO
      multipleExecutions: true
      name: deploy
      nodeFilterEditable: false
      plugins:
        ExecutionLifecycle: null
      scheduleEnabled: true
      sequence:
        commands:
        - description: Deploy applications
          exec: echo "do something on localhost"
          plugins:
            LogFilter: []
        - description: Restart - Pool 0
          jobref:
            args: -Node node-0
            failOnDisable: true
            group: ''
            name: restart
            nodeStep: 'true'
            uuid: dff91746-f17a-4647-8c4a-391d54c4156d
        keepgoing: false
        strategy: node-first
      timeout: 20m
      uuid: 60546626-1044-4a10-a3a5-3bfc9b51c73c
  5. Run the top level job and observe the resulting execution page.

Expected behavior

All job steps report as OK once the job completes execution, similar to the result we see for Rundeck 4.17.1:

image

Desktop

  • OS: Kubuntu 22.04
  • Browser: Chrome, Firefox, others not tested
  • Version:
    • Chrome: 124.0.6367.118 (Official Build) (64-bit)
    • Firefox: 125.0 (64-bit)

Additional context

N/A

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant