Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Http4k OpenTelemetry Metrics deviate from OTel Semantic Conventions #943

Open
krissrex opened this issue Jul 5, 2023 · 3 comments
Open

Comments

@krissrex
Copy link
Contributor

krissrex commented Jul 5, 2023

Background

OpenTelemetry has a recommended way of naming metrics: the Semantic Conventions.

They also have a shared implementation of metric factories used by their intrumentors: client and server metric factories in open-telemetry/opentelemetry-java-instrumentation instrumentation-api-semconv module.
I don't believe it is published as a dependency we can use, but the settings used could be copied.

Problem

Http4k defines http.server.request.latency, where semconv says http.server.duration.

val server = MetricsDefaults(
"http.server.request.count" to "Total number of server requests",
"http.server.request.latency" to "Timing of server requests"
) {
it.copy(
labels = mapOf(
"method" to it.request.method.toString(),
"status" to it.response.status.code.toString(),
"path" to it.routingGroup.replace('/', '_').replaceRegexes().replace('.', '_').replace(notAlphaNumUnderscore, "")
)
)
}
val client = MetricsDefaults(
"http.client.request.count" to "Total number of client requests",
"http.client.request.latency" to "Timing of client requests"

This is defined in the http4k-core, then used in http4k-opentelemetry:

fun RequestTimer(
meter: Meter = Http4kOpenTelemetry.meter,
name: String = defaults.timerDescription.first,
description: String = defaults.timerDescription.second,
labeler: HttpTransactionLabeler = defaults.labeler,
clock: Clock = systemUTC()
): Filter {

Change

I doubt we can use the same implementation as instrumentation-api-semconv.
Instead, we have to reproduce the conventions in http4k.

Rename

I suggest renaming the metric to http.server.duration and http.client.duration.

Change unit to seconds

Semconv says seconds, we use ms.

Note that the unit recommended is s, but the reference instrumentation-api-semconv may use ms for backwards-compatible reasons.

val meterInstance = meter.histogramBuilder(name)
.setDescription(description).setUnit("ms").build()

Explicit buckets

I suggest we use the defined histogram buckets from SemConv: [ 0, 0.005, 0.01, 0.025, 0.05, 0.075, 0.1, 0.25, 0.5, 0.75, 1, 2.5, 5, 7.5, 10 ].

Http4k would do it here:

val meterInstance = meter.histogramBuilder(name)
.setDescription(description).setUnit("ms").build()

For the reference implementation, this is done in
https://github.com/open-telemetry/opentelemetry-java-instrumentation/blob/15337549e3160b6db4acf8ea5ce39fb95a6bd63d/instrumentation-api-semconv/src/main/java/io/opentelemetry/instrumentation/api/instrumenter/http/HttpMetricsUtil.java#L25-L29
and
https://github.com/open-telemetry/opentelemetry-java-instrumentation/blob/15337549e3160b6db4acf8ea5ce39fb95a6bd63d/instrumentation-api-semconv/src/main/java/io/opentelemetry/instrumentation/api/instrumenter/http/HttpMetricsUtil.java#L37-L41

Label

I also suggest assuring that all attached labels are correct according to SemConv, and all required labels are present.
These are for server:

  • http.route (if possible)
  • http.request.method
  • http.response.status_code (if possible)
  • server.address
  • server.port (when not 80 or 443)
  • url.scheme (http/https)

For client:

  • http.request.method
  • http.response.status_code (if possible)
  • server.address
  • server.port (when not 80 or 443)

Possible issues

This is a downstream breaking change, because OpenTelemetry metric consumers will now create a new metric,
and any usage of latency in dashboards, alarms etc. will no longer work untill migrated to duration.

An option is to not remove the old metric, but rather introduce another one for duration, in a backwards compatible manner.
(This will increase the amount of metrics, which for some may incur costs for ingestion and storage.)

@jenarros-lw
Copy link

+1

Although it is a breaking change, introducing duration first before removing latency in a separate release will allow migrating any dependencies on the latency metric to duration without disruption.

@krissrex
Copy link
Contributor Author

krissrex commented Jul 17, 2023

The bucket advice API seems to be in incubation still, and is not available right now.
Thus explicit buckets has to wait.

@stevesea
Copy link

stevesea commented Oct 31, 2023

version 1.20.0 of the otel semconv was http.server.duration

the latest semconv standard (1.22) now recommends http.server.request.duration https://github.com/open-telemetry/semantic-conventions/blob/main/docs/http/http-metrics.md
you could look at OTEL_SEMCONV_STABILITY_OPT_IN to decide which metrics to create, as described at that URL

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants