Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add metrics for debuginfo #1704

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

ksankeerth
Copy link

@ksankeerth ksankeerth commented Sep 12, 2022

Hi Parca team,
I tried to fix #1275. Not sure I have covered everything mentioned in #1275. Please check and let me know if anything else to be added in this PR.

I have added metrics for..

  1. errors and attempts for uploading debuginfo
  2. errors and attempts for validation
  3. time taken for uploading debuginfo, updating metdata, and, exists check

I'm not sure what exactly I have to do for below metrics. Will you be able to guide me?

Debuginfo Client metrics (will be exposed on Agent)

Debuginfod Metrics

Fixes #1275

Signed-off-by: shankeerthan-kasilingam shankeerthan1995@gmail.com

1. errors and attempts for uploading debuginfo
2. errors and attempts for updating metadata
3. time taken for uploading debuginfo, updating metdata, and, exists check

Signed-off-by: shankeerthan-kasilingam <shankeerthan1995@gmail.com>
@ksankeerth ksankeerth requested review from a team as code owners September 12, 2022 04:04
@CLAassistant
Copy link

CLAassistant commented Sep 12, 2022

CLA assistant check
All committers have signed the CLA.

Copy link
Member

@kakkoyun kakkoyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good. Thanks for handling this.

I think we can be more consistent with the variable naming.

And as far as I can tell, we don't increment error metrics every error returns. Let's make sure we cover everything.

You might also want to consider extracting validation logic to the helper function fro ease of instrumenting.

metadataUpdateDuration: metadataUpdateDuration,
validationAttemptsTotal: validationAttemptsTotal,
validationErrorsTotal: *validationErrorsTotal,
uploadDebugInfoAttemptsTotal: uploadDebugInfoAttemptsTotal,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
uploadDebugInfoAttemptsTotal: uploadDebugInfoAttemptsTotal,
debugInfoUploadAttemptsTotal: uploadDebugInfoAttemptsTotal,

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
uploadDebugInfoAttemptsTotal: uploadDebugInfoAttemptsTotal,
debugInfoUploadAttemptsTotal: uploadDebugInfoAttemptsTotal,

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion. I'll make the changes.

validationAttemptsTotal: validationAttemptsTotal,
validationErrorsTotal: *validationErrorsTotal,
uploadDebugInfoAttemptsTotal: uploadDebugInfoAttemptsTotal,
uploadDebugInfoErrorsTotal: *uploadDebugInfoErrorsTotal,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
uploadDebugInfoErrorsTotal: *uploadDebugInfoErrorsTotal,
debugInfoUploadErrorsTotal: *uploadDebugInfoErrorsTotal,

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. will make the changes

validationErrorsTotal: *validationErrorsTotal,
uploadDebugInfoAttemptsTotal: uploadDebugInfoAttemptsTotal,
uploadDebugInfoErrorsTotal: *uploadDebugInfoErrorsTotal,
uploadDebugInfoDuration: uploadDebugInfoDuration,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
uploadDebugInfoDuration: uploadDebugInfoDuration,
debugInfoUploadDuration: uploadDebugInfoDuration,

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will make the changes

@@ -253,7 +329,9 @@ func (s *Store) upload(ctx context.Context, buildID, hash string, r io.Reader) e

// At this point we know that we received a better version of the debug information file,
// so let the client upload it.

defer func(begin time.Time) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is semantically correct. There are lots of other operations happening in the rest of the method.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. thanks for correcting it. I think I cannot use defer here. I'll update the PR


validationAttemptsTotal prometheus.Counter
validationErrorsTotal prometheus.CounterVec
metadataUpdateDuration prometheus.Histogram
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should encapsulate this in metadata types and files rather than here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A small doubt on this. I think We can move metadataUpdateDuration to metadata.go and encapsulate inside ObjectStoreMetadata struct. I'm not sure this comment for validationAttemptsTotal and validationErrorsTotal. Currently validation metrics are around this

if err := elfutils.ValidateFile(objFile); err != nil {

Could you please clarify this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we don't need those at all. Considering we'll always try to validate these files, maybe just having the duration is enough. What do you think?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ksankeerth, I think we can get rid of additional validation attempts and error metrics. And add additional reason ("validation") to the actual action metrics (e.g upload).

nice. It's clear now. I'll update the PR

@ksankeerth
Copy link
Author

Looking good. Thanks for handling this.

I think we can be more consistent with the variable naming.

And as far as I can tell, we don't increment error metrics every error returns. Let's make sure we cover everything.

You might also want to consider extracting validation logic to the helper function fro ease of instrumenting.

Thanks for reviewing the PR. I'll check the comments and update the PR

@ksankeerth
Copy link
Author

And as far as I can tell, we don't increment error metrics every error returns.

Just some clarifications needed on this(sorry if it's a dump question).

validationErrorsTotal prometheus.CounterVec

uploadDebugInfoErrorsTotal prometheus.CounterVec

Two error metrics are used here. I increase the error metrics with reason before return if it's related to the metrics we measure. Is this correct? or in any places did i increase error metrics irrelevantly? (may be this location

s.validationErrorsTotal.WithLabelValues("has_dwarf_object_file").Inc()
)

@ksankeerth
Copy link
Author

You might also want to consider extracting validation logic to the helper function fro ease of instrumenting

If I'm going to extract validation logic to a function, Can I extract the below code inside a function in store.go?

if err := elfutils.ValidateFile(objFile); err != nil {
// Failed to validate. Mark the file as corrupted, and let the client try to upload it again.
if err := s.metadata.MarkAsCorrupted(ctx, buildID); err != nil {
level.Warn(s.logger).Log("msg", "failed to update metadata as corrupted", "err", err)
}
level.Error(s.logger).Log("msg", "failed to validate object file", "buildid", buildID)
s.validationErrorsTotal.WithLabelValues("validate_object_file").Inc()
// Client will retry.
return status.Error(codes.Internal, err.Error())
}
// Valid.
f, err := elf.Open(objFile)
if err != nil {
level.Debug(s.logger).Log("msg", "failed to open object file", "err", err)
} else {
hasDWARF, err := elfutils.HasDWARF(f)
if err != nil {
level.Debug(s.logger).Log("msg", "failed to check for DWARF", "err", err)
s.validationErrorsTotal.WithLabelValues("has_dwarf_object_file").Inc()
}
f.Close()
if hasDWARF {
return status.Error(codes.AlreadyExists, "debuginfo already exists")
}
}

@kakkoyun
Copy link
Member

@ksankeerth, I think we can get rid of additional validation attempts and error metrics. And add additional reason ("validation") to the actual action metrics (e.g upload).

@kakkoyun
Copy link
Member

You might also want to consider extracting validation logic to the helper function fro ease of instrumenting

If I'm going to extract validation logic to a function, Can I extract the below code inside a function in store.go?

if err := elfutils.ValidateFile(objFile); err != nil {
// Failed to validate. Mark the file as corrupted, and let the client try to upload it again.
if err := s.metadata.MarkAsCorrupted(ctx, buildID); err != nil {
level.Warn(s.logger).Log("msg", "failed to update metadata as corrupted", "err", err)
}
level.Error(s.logger).Log("msg", "failed to validate object file", "buildid", buildID)
s.validationErrorsTotal.WithLabelValues("validate_object_file").Inc()
// Client will retry.
return status.Error(codes.Internal, err.Error())
}
// Valid.
f, err := elf.Open(objFile)
if err != nil {
level.Debug(s.logger).Log("msg", "failed to open object file", "err", err)
} else {
hasDWARF, err := elfutils.HasDWARF(f)
if err != nil {
level.Debug(s.logger).Log("msg", "failed to check for DWARF", "err", err)
s.validationErrorsTotal.WithLabelValues("has_dwarf_object_file").Inc()
}
f.Close()
if hasDWARF {
return status.Error(codes.AlreadyExists, "debuginfo already exists")
}
}

You can extract it. I would prefer the metadata update logic outside the function on the call site, though.

Signed-off-by: shankeerthan-kasilingam <shankeerthan1995@gmail.com>
@ksankeerth
Copy link
Author

@kakkoyun I have done following changes

  1. changed the name of metrics as per your suggestions.
  2. removed metrics for validationAttempts and validationErrors (since metrics were removed, i didn't extract metadata update logic to a separate function)
  3. metadataUpdateDuration was moved to metadata.go and used metaData.UploadFinishedAt and metaData.UploadStartedAt for the metrics.
    Seems race condition can happen
    // There's a small window where a race could happen.
    if metaData.State == MetadataStateUploaded {
    return nil
    }
    . I didn't update metadataUpdateDuration metrics in this case.

Please check the new changes.Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

debuginfo: Add metrics for debuginfo package
3 participants