Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Scale down sorts by obsolete status but stops/removes containers in the reverse order #11781

Open
Adam-Carr opened this issue May 1, 2024 · 0 comments

Comments

@Adam-Carr
Copy link

Description

This was initially logged in #11460 but only part of it was fixed. The fix applied by #11473 does prevent scale down recreating a container incorrectly, however it didn't solve the issue of the container sorting being wrong when finding which ones need to be removed to scale down.

If I scale up to 2 instances, then scale down to 1, the newest container is the one that's removed instead of the oldest. This causes an issue if you use scale up with no-recreate to deploy the new version of an image, then use scale down to get rid of the old one. Currently the newest gets removed so you're left with the old version of an image running.

The old sort logic before #11473:

sort.Slice(containers, func(i, j int) bool {
    return containers[i].Created < containers[j].Created
})

The new sort logic after #11473:

sort.Slice(containers, func(i, j int) bool {
  // select obsolete containers first, so they get removed as we scale down
  if obsolete, _ := mustRecreate(service, containers[i], recreate); obsolete {
	  // i is obsolete, so must be first in the list
	  return true
  }
  if obsolete, _ := mustRecreate(service, containers[j], recreate); obsolete {
	  // j is obsolete, so must be first in the list
	  return false
  }
  
  // For up-to-date containers, sort by container number to preserve low-values in container numbers
  ni, erri := strconv.Atoi(containers[i].Labels[api.ContainerNumberLabel])
  nj, errj := strconv.Atoi(containers[j].Labels[api.ContainerNumberLabel])
  if erri == nil && errj == nil {
	  return ni < nj
  }
  
  // If we don't get a container number (?) just sort by creation date
  return containers[i].Created < containers[j].Created
  })

The loop following:

for i, container := range containers {
    if i >= expected {
        // Scale Down
        container := container
        traceOpts := append(tracing.ServiceOptions(service), tracing.ContainerOptions(container)...)
        eg.Go(tracing.SpanWrapFuncForErrGroup(ctx, "service/scale/down", traceOpts, func(ctx context.Context) error {
            return c.service.stopAndRemoveContainer(ctx, container, timeout, false)
        }))
        continue
    }
...

So more logic is added to check which containers are obsolete which is great, as well as then going by date created, but the order we remove them in is still wrong. Obsolete/old containers are first in the list, but when looping and scaling down the containers first in the list are treated as valid, we instead get rid of those lower down in the list of containers i believe.

Steps To Reproduce

  1. Create a compose file with single service and no scale options set, then run compose up to launch the container.
  2. Use the following to scale up docker compose up -d --scale (service-name)=2 --no-recreate, which will launch another container for the same service.
  3. Now this has been confirmed to be running, scale back down to 1 instance using docker compose up -d --scale (service-name)=1 --no-recreate
  4. The newer container is removed, which shouldn't be the case, the older container should be removed.

Even if the image used for container 2 is newer, it'll still remove container 2 first, leaving the older image in container 1 running.

Compose Version

Docker Compose version v2.27.0

Docker Environment

Client: Docker Engine - Community
 Version:    26.1.1
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.14.0
    Path:     /usr/libexec/docker/cli-plugins/docker-buildx
  compose: Docker Compose (Docker Inc.)
    Version:  v2.27.0
    Path:     /usr/libexec/docker/cli-plugins/docker-compose

Server:
 Containers: 2
  Running: 2
  Paused: 0
  Stopped: 0
 Images: 27
 Server Version: 26.1.1
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Using metacopy: false
  Native Overlay Diff: false
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: systemd
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: e377cd56a71523140ca6ae87e30244719194a521
 runc version: v1.1.12-0-g51d5e94
 init version: de40ad0
 Security Options:
  apparmor
  seccomp
   Profile: builtin
  cgroupns
 Kernel Version: 6.5.0-1019-azure
 Operating System: Ubuntu 22.04.3 LTS
 OSType: linux
 Architecture: x86_64
 CPUs: 4
 Total Memory: 7.742GiB
 Name: dravubnt02
 ID: e55ebd04-5ca1-46e0-9c6f-2036e43445ae
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

Anything else?

No response

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant