Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ECS] [Feature Request]: with longer stoptimeout, need "Force Stop" to kill rogue tasks/containers #2320

Open
ashish-logmaster opened this issue Apr 1, 2024 · 0 comments
Labels
ECS Amazon Elastic Container Service Proposed Community submitted issue

Comments

@ashish-logmaster
Copy link

Community Note

  • Please vote on this issue by adding a 馃憤 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Tell us about your request
The new StopTimeout that allows us to set interval between SIGTERM and SIGKILL to be longer than 2 minutes is great.
We set it for 24 hours for our batch jobs and it works great.
However, we have run into issues where we know that that containers have received a SIGTERM and started draining (Desired State: Stopped), but the container has run into a busy wait loop (looking at their logs).
In these cases we want to "Force stop" the container/task.
It would be nice to have this features within the ECS gui under "ECS->->Tasks" as "Stop Selected Forced".
Right now we way we deal with this problem is with automation use SSM to kill the offending containers using the "docker kill" command.
Reference: https://aws.amazon.com/blogs/containers/improvements-to-amazon-ecs-task-launch-behavior-when-tasks-have-prolonged-shutdown/

Which service(s) is this request for?
ECS

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?
Help us and other customers better manage rogue tasks which need to be force killed before "stopTimeout" has expired.

Are you currently working around this issue?
How are you currently solving this problem?
We built a helper program in python that finds tasks where desiredStatus=="STOPPED" and "lastStatus" == "RUNNING", find the underlying containers and ec2 instances and then use SSM to run "docker kill" on each server.

Additional context
If you want to discuss this issue, you can reach out to me at ashish.desai@zillasecurity.com

@ashish-logmaster ashish-logmaster added the Proposed Community submitted issue label Apr 1, 2024
@herrhound herrhound added the ECS Amazon Elastic Container Service label Apr 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ECS Amazon Elastic Container Service Proposed Community submitted issue
Projects
None yet
Development

No branches or pull requests

2 participants