Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory usage detection #1734

Open
petrovicboban opened this issue Feb 22, 2018 · 6 comments
Open

Memory usage detection #1734

petrovicboban opened this issue Feb 22, 2018 · 6 comments

Comments

@petrovicboban
Copy link

Hi,
in our case, Singularity 0.18.2 can't detect memory usage for slaves:

image

but it can overall, for cluster:

image

Also, in request view, it shows 0 for memory usage:

image

but on task level, that's not the case:

image

@ssalinas
Copy link
Member

ssalinas commented Mar 6, 2018

It may depend on the isolators you have configured. Do you have either the cgroups/mem or posix/mem isolators configured for your mesos slaves?

@petrovicboban
Copy link
Author

Its posix. But how it detects cpu usage? Its posix isolator for cpu too.

@ssalinas
Copy link
Member

ssalinas commented Mar 23, 2018

That slave memory view is based off of adding up task usages, so if tasks aren't showing it, the slaves will not show it.

Mesos is the entity collecting the actual metric values in this case, not singularity. It will collect them differently based on how each isolator is implemented.

If you hit an endpoint like {hostname}:5051/monitor/statistics on one of your mesos slaves/agents, do you see memory statistics reported? For example, with our slaves we get back a list of objects like:

{
    "executor_id": "{id}",
    "executor_name": "",
    "framework_id": "{id}",
    "source": "{task id}",
    "statistics": {
      "cpus_limit": 1.1,
      "cpus_system_time_secs": 17.9,
      "cpus_user_time_secs": 140.66,
      "mem_anon_bytes": 714723328,
      "mem_cache_bytes": 2695168,
      "mem_critical_pressure_counter": 0,
      "mem_file_bytes": 2695168,
      "mem_limit_bytes": 1314914304,
      "mem_low_pressure_counter": 0,
      "mem_mapped_file_bytes": 106496,
      "mem_medium_pressure_counter": 0,
      "mem_rss_bytes": 714723328,
      "mem_swap_bytes": 0,
      "mem_total_bytes": 741773312,
      "mem_unevictable_bytes": 0,
      "timestamp": 1521811482.55977
    }
  }

That endpoint on the mesos slave is what singularity is polling to get usage statistics. If it is not being reported there, either you are on an older mesos slave version, or your isolator does not collect those metrics. In which case the feature will not function

@petrovicboban
Copy link
Author

petrovicboban commented Mar 23, 2018

This is what our mesos slaves return:

    {
        "executor_id": "kg45",
        "executor_name": "",
        "framework_id": "Singularity",
        "source": "test_template_test_job_2-test_job_2_deploy_19-1519399684082-1-db07-DEFAULT",
        "statistics": {
            "cpus_limit": 0.2,
            "cpus_system_time_secs": 1324.95,
            "cpus_user_time_secs": 1780.07,
            "mem_limit_bytes": 201326592,
            "mem_rss_bytes": 596295680,
            "timestamp": 1521822348.65732
        }
    }

Much less than yours, so I guess it's because of posix isolator. Mesos itself is not too old (1.1)

@ssalinas
Copy link
Member

Ok, I'll leave this open so we can implement a version that works with the smaller subset of metrics

@ssalinas ssalinas reopened this Mar 23, 2018
@felixgborrego
Copy link

We run into this issue too, and workaround it by using mem_limit_bytes instead of mem_total_bytes.

Not particularly proud of the hack, but still give us useful information. you can see the change at:
https://github.com/HubSpot/Singularity/compare/master...Nitro:fix-memory-cgroup?expand=1 (open to send a PR)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants