Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request to have metrics emitted on health check failures #2255

Open
timmydo opened this issue Sep 15, 2023 · 4 comments
Open

Request to have metrics emitted on health check failures #2255

timmydo opened this issue Sep 15, 2023 · 4 comments
Labels
Type: Idea This issue is a high-level idea for discussion.
Milestone

Comments

@timmydo
Copy link

timmydo commented Sep 15, 2023

What should we add or change to make your life better?

Requesting metrics in some of the default classes such as:

Log.ActiveDestinationHealthStateIsSetToUnhealthy(_logger, destination.DestinationId, cluster.ClusterId);

https://github.com/microsoft/reverse-proxy/blob/04b1fe8d8cd0f25b814805c2fb91c68fb26c7f29/src/ReverseProxy/Health/ActiveHealthCheckMonitor.cs#L197C31-L197C31

https://learn.microsoft.com/en-us/dotnet/core/diagnostics/metrics-collection

Specifically, today, I'm looking for a way to have alerts go off when our health checks fail without reimplementing YARP's default implementations. More generally it would be nice if YARP had metrics to complement the Log statements.

Why is this important to you?

We are able to set up alerts on metrics but not on individual log messages. Metrics can alert the on-call engineers to look at the logs for further information.

@timmydo timmydo added the Type: Idea This issue is a high-level idea for discussion. label Sep 15, 2023
@timmydo
Copy link
Author

timmydo commented Sep 15, 2023

here is a quick list I put together by searching for Log. and removing some that didn't look relevant:

src\ReverseProxy\Configuration\ConfigProvider\ConfigurationConfigProvider.cs:
   70:             Log.LoadData(_logger);

   88:                 Log.ConfigurationDataConversionFailed(_logger, ex);

  110:                 Log.ErrorSignalingChange(_logger, ex);


src\ReverseProxy\Forwarder\ForwarderHttpClientFactory.cs:
  40:             Log.ClientReused(_logger, context.ClusterId);

  60:         Log.ClientCreated(_logger, context.ClusterId);

src\ReverseProxy\Forwarder\ForwarderMiddleware.cs:
  50:             Log.NoAvailableDestinations(_logger, cluster.ClusterId);

  62:             Log.MultipleDestinationsAvailable(_logger, cluster.ClusterId);

  79:             ForwarderTelemetry.Log.ForwarderInvoke(cluster.ClusterId, route.Config.RouteId, destination.DestinationId);

src\ReverseProxy\Forwarder\HttpForwarder.cs:
  150:                     Log.NotProxying(_logger, context.Response.StatusCode);

  154:                 Log.Proxying(_logger, destinationRequest, isStreamingRequest);

  171:                         Log.RetryingWebSocketDowngradeNoConnect(_logger);

  179:                         Log.RetryingWebSocketDowngradeNoHttp2(_logger);

  214:             Log.ResponseReceived(_logger, destinationResponse);

  490:                 Log.InvalidSecWebSocketKeyHeader(_logger, key);

  922:         Log.ErrorProxying(_logger, error, ex);

src\ReverseProxy\Health\ActiveHealthCheckMonitor.cs:
   61:                 Log.ExplicitActiveCheckOfAllClustersHealthFailed(_logger, ex);

  116:         Log.StartingActiveHealthProbingOnCluster(_logger, cluster.ClusterId);

  142:             Log.ActiveHealthProbingFailedOnCluster(_logger, cluster.ClusterId, ex);

  156:                 Log.ErrorOccuredDuringActiveHealthProbingShutdownOnCluster(_logger, cluster.ClusterId, ex);
  158  
  159:             Log.StoppedActiveHealthProbingOnCluster(_logger, cluster.ClusterId);

  176:             Log.ActiveHealthProbeConstructionFailedOnCluster(_logger, destination.DestinationId, cluster.ClusterId, ex);
  177  

  187:             Log.SendingHealthProbeToEndpointOfDestination(_logger, request.RequestUri, destination.DestinationId, cluster.ClusterId);
  189:             Log.DestinationProbingCompleted(_logger, destination.DestinationId, cluster.ClusterId, (int)response.StatusCode);

  197:             Log.DestinationProbingFailed(_logger, destination.DestinationId, cluster.ClusterId, ex);

src\ReverseProxy\Health\DestinationHealthUpdater.cs:
   43:                     Log.ActiveDestinationHealthStateIsSetToUnhealthy(_logger, destination.DestinationId, cluster.ClusterId);

   47:                     Log.ActiveDestinationHealthStateIsSet(_logger, destination.DestinationId, cluster.ClusterId, newHealth);

   85:             Log.UnhealthyDestinationIsScheduledForReactivation(_logger, destination.DestinationId, reactivationPeriod);

  100:             Log.PassiveDestinationHealthResetToUnkownState(_logger, destination.DestinationId);


src\ReverseProxy\LoadBalancing\LoadBalancingMiddleware.cs:
  59:             Log.NoAvailableDestinations(_logger, proxyFeature.Cluster.Config.ClusterId);


src\ReverseProxy\Model\ProxyPipelineInitializerMiddleware.cs:
  40:             Log.NoClusterFound(_logger, route.Config.RouteId);


src\ReverseProxy\WebSocketsTelemetry\WebSocketsTelemetryMiddleware.cs:
  56:                 WebSocketsTelemetry.Log.WebSocketClosed(

  80:                 WebSocketsTelemetry.Log.WebSocketClosed(

@adityamandaleeka
Copy link
Member

@timmydo Thanks for reporting this. We believe there's value in making some of these metrics, but perhaps not all of them (e.g. error conditions that should really be addressed immediately).

If you could provide a list of the ones that are the highest priority to add metrics for, that would be great.

@adityamandaleeka
Copy link
Member

Triage: perhaps another interesting metric might be a catch-all "this request couldn't be proxied" metric.

@adityamandaleeka adityamandaleeka added this to the v.Next milestone Sep 19, 2023
@timmydo
Copy link
Author

timmydo commented Sep 21, 2023

Counters for these files above look to be higher priority, with higher priority being to logs that would be error or warning:

src\ReverseProxy\Forwarder\ForwarderMiddleware.cs
src\ReverseProxy\Health\DestinationHealthUpdater.cs
src\ReverseProxy\Health\ActiveHealthCheckMonitor.cs
src\ReverseProxy\LoadBalancing\LoadBalancingMiddleware.cs

having counters for successful requests is useful also to validate whether no errors is a data-missing scenario or no errors. Also, you can measure the reliability.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Idea This issue is a high-level idea for discussion.
Projects
None yet
Development

No branches or pull requests

2 participants