Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mangle 3.0 Stability - Cassandra DB goes down suddenly #105

Open
Anvesh42 opened this issue Mar 15, 2022 · 6 comments
Open

Mangle 3.0 Stability - Cassandra DB goes down suddenly #105

Anvesh42 opened this issue Mar 15, 2022 · 6 comments
Assignees
Milestone

Comments

@Anvesh42
Copy link

Anvesh42 commented Mar 15, 2022

Environment: OpenShift v4.6.36
Kubernetes Version: v1.19.0
Mangle Version: 3.0
Issue:

  1. Cassandra DB goes down with failed connections causing mangle POD to do multiple retries on the cassandra DB
  2. Mangle product UI is not available for this entire duration

Interim Solution Being Followed:

  1. Restart Cassandra POD
  2. Restart mangle POD
  3. Increase the resource limits on cassandra statefulset template as recommended by the mangle team during working session.

Previous:

 - resources:
       limits:
           cpu: '1'
           memory: 8Gi
        requests:
           cpu: '500m'
           memory: 2Gi

Current:

 - resources:
       limits:
           cpu: '2'
           memory: 8Gi
        requests:
           cpu: '1'
           memory: 4Gi

Frequency Of This Issue: Once every few weeks. Typically 7-8 weeks but it may be random too.

Logs:

  1. Please find the attached logs from mangle & cassandra POD's when this issue downtime happened recently in the last week of February, 2022

cassandra_pod_failure_0227.txt
mangle_pod_failure_0227.txt

Deployment Templates:

  1. Please find the attached cassandra statefulset & mangle deployment template resource
    cassandra_statefulset_template.txt
    mangle_deployment_template.txt
@rpraveen-vmware
Copy link
Contributor

Hi @Anvesh42
Let us know on the stability of the cassandra pod after increasing the resource limits.

@Anvesh42
Copy link
Author

@rpraveen-vmware I have increased the resources on the Cassandra configuration as discussed during our session. I shall monitor it for few days and observe the stability.
Thanks!

@Anvesh42
Copy link
Author

Anvesh42 commented Apr 18, 2022

@ashrimalivmware @rpraveen-vmware Even after increasing the resource limits (as stated above), the cassandra POD still goes down.
Attaching the latest log
cassandra_04182022.txt
.

@rpraveen-vmware
Copy link
Contributor

@Anvesh42
What is the frequency of cassandra pod going down now with the increased resource limits..?
cc: @ashrimalivmware

@Anvesh42
Copy link
Author

@ashrimalivmware @rpraveen-vmware Can you please share the docker files for mangle & Cassandra that were used to build these standard images?

In regards to Cassandra POD stability, I am looking at options to explore/enhance the possible solution for this.

Thanks
Anvesh

@Anvesh42
Copy link
Author

@ashrimalivmware @rpraveen-vmware

In continuation to previous query in the same thread, we would like get some insights into the modifications that we can do to prevent cassandra POD from going down frequently. Please let us know. Details provided below.

Cassandra POD resources & ENV values:

image

Latest Cassandra Failure Log:

cassandra-0-1214.log

We also observe that the standard cassandra.yaml provided by Vmware doesn't have liveness probe. Could that be one the reasons?

Thanks
Anvesh

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants