Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] SIGSEGV: segmentation violation in banyand/query.(*topNQueryProcessor).Rev.func1() #12219

Open
2 of 3 tasks
Almot77 opened this issue May 13, 2024 · 24 comments · Fixed by apache/skywalking-banyandb#445
Open
2 of 3 tasks
Assignees
Labels
bug Something isn't working and you are sure it's a bug! database BanyanDB - SkyWalking native database

Comments

@Almot77
Copy link

Almot77 commented May 13, 2024

Search before asking

  • I had searched in the issues and found no similar issues.

Apache SkyWalking Component

BanyanDB (apache/skywalking-banyandb)

What happened

Server crash on segmentation fault in docker. I use :latest SW and BYDB images (new releases).

What you expected to happen

{"level":"info","module":"STREAM-SEGMENT.SCHEDULER.RETENTION","name":"retention","now":"2024-05-13T14:49:52Z","time":"2024-05-13T14:49:52Z","message":"start"}
{"level":"info","module":"STREAM","group":"stream-browser_error_log","time":"2024-05-13T14:49:52Z","message":"creating a tsdb"}
{"level":"info","module":"STREAM-BROWSER_ERROR_LOG","path":"/tmp/stream-data/stream/stream-browser_error_log","time":"2024-05-13T14:49:52Z","message":"initialized"}
{"level":"info","module":"STREAM-BROWSER_ERROR_LOG.SCHEDULER.RETENTION","name":"retention","now":"2024-05-13T14:49:52Z","time":"2024-05-13T14:49:52Z","message":"start"}
{"level":"info","module":"STREAM","group":"stream-zipkin_span","time":"2024-05-13T14:49:52Z","message":"creating a tsdb"}
{"level":"info","module":"STREAM-ZIPKIN_SPAN","path":"/tmp/stream-data/stream/stream-zipkin_span","time":"2024-05-13T14:49:52Z","message":"initialized"}
{"level":"info","module":"STREAM-ZIPKIN_SPAN.SCHEDULER.RETENTION","name":"retention","now":"2024-05-13T14:49:52Z","time":"2024-05-13T14:49:52Z","message":"start"}
{"level":"error","module":"QUERY.TOPN.MEASURE-MINUTE.ENDPOINT_RESP_TIME_MINUTE_TOPN","error":"failed to query measure: unmarshal tag value: unsupported tag value type","req":{"groups":["measure-minute"], "name":"endpoint_resp_time_minute_topn", "timeRange":{"begin":"2024-05-13T14:20:00Z", "end":"2024-05-13T14:51:00Z"}, "topN":10, "agg":"AGGREGATION_FUNCTION_MEAN", "conditions":[{"name":"service_id", "op":"BINARY_OP_EQ", "value":{"str":{"value":"cGhwLW1zay1sZWdhY3k=.1"}}}], "fieldValueSort":"SORT_DESC"},"time":"2024-05-13T14:50:30Z","message":"fail to close the topn plan"}
panic: runtime error: invalid memory address or nil pointer dereference
        panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x1251163]

goroutine 356 [running]:
github.com/apache/skywalking-banyandb/banyand/query.(*topNQueryProcessor).Rev.func1()
        /src/banyand/query/processor_topn.go:126 +0x23
panic({0x13d6e40?, 0x259ebd0?})
        /usr/local/go/src/runtime/panic.go:770 +0x132
github.com/apache/skywalking-banyandb/banyand/query.(*topNQueryProcessor).Rev(0xc000010fd8, {{0x156fd20, 0xc0085ce280}, {0x15f10a0, 0x5}, 0x17cf13deadbc52eb, 0x0})
        /src/banyand/query/processor_topn.go:133 +0xfd6
github.com/apache/skywalking-banyandb/pkg/bus.(*Bus).Subscribe.func1({0x1a71de0, 0xc000010fd8}, 0xc0001d09c0)
        /src/pkg/bus/bus.go:274 +0xfa
created by github.com/apache/skywalking-banyandb/pkg/bus.(*Bus).Subscribe in goroutine 1
        /src/pkg/bus/bus.go:270 +0x28f

How to reproduce

Docker compose docker compose --profile banyandb up -d

version: '3.8'
services:
  elasticsearch:
    profiles:
      - "elasticsearch"
    image: itbgk/elasticsearch-oss:7.9.2
    container_name: skywalking-elasticsearch
    ports:
      - "9200:9200"
    networks:
      - skywalking
    volumes:
      - elastic-sw:/usr/share/elasticsearch/data
    healthcheck:
      test: [ "CMD-SHELL", "curl --silent --fail localhost:9200/_cluster/health || exit 1" ]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 10s
    restart: always
    environment:
      - discovery.type=single-node
      - bootstrap.memory_lock=true
      - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
    ulimits:
      memlock:
        soft: -1
        hard: -1

  banyandb:
    profiles:
      - "banyandb"
    image: ${BANYANDB_IMAGE:-apache/skywalking-banyandb:latest}
    container_name: banyandb
    restart: always
    networks:
      - skywalking
    expose:
      - 17912
    ports:
      - 17913:17913
    volumes:
      - banyandb-stream-data:/tmp/stream-data
      - banyandb-measure-data:/tmp/measure-data

    command: standalone --stream-root-path /tmp/stream-data --measure-root-path /tmp/measure-data
    healthcheck:
      test: [ "CMD", "sh", "-c", "nc -nz 127.0.0.1 17912" ]
      interval: 5s
      timeout: 60s
      retries: 120

  oap-base: &oap-base
    profiles: [ "none" ]
    image: ${OAP_IMAGE:-ghcr.io/apache/skywalking/oap:latest}
    ports:
      - "11800:11800"
      - "12800:12800"
      - "9099:9090"
      - "3100:3100"
    networks:
      - skywalking
    healthcheck:
      test: [ "CMD-SHELL", "curl http://localhost:12800/internal/l7check" ]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 10s
#    restart: always
    environment: &oap-env
      TZ: Europe/Moscow
      SW_HEALTH_CHECKER: default
      SW_OTEL_RECEIVER: default
      SW_OTEL_RECEIVER_ENABLED_OC_RULES: vm
      SW_OTEL_RECEIVER_ENABLED_OTEL_METRICS_RULES: vm
      SW_TELEMETRY: prometheus
      JAVA_OPTS: "-Xms2048m -Xmx2048m"
      SW_CORE_RECORD_DATA_TTL: 2 # https://skywalking.apache.org/docs/main/next/en/setup/backend/ttl/
      SW_CORE_METRICS_DATA_TTL: 2
      SW_DCS_MAX_INBOUND_MESSAGE_SIZE: 5000000000

  oap-es:
    <<: *oap-base
    profiles:
      - "elasticsearch"
    container_name: skywalking-server # rename to something else if switching to BanyanDB
    depends_on:
      elasticsearch:
        condition: service_healthy
    environment:
      <<: *oap-env
      SW_STORAGE: elasticsearch
      SW_STORAGE_ES_CLUSTER_NODES: elasticsearch:9200
      SW_CORE_RECORD_DATA_TTL: 2 # https://skywalking.apache.org/docs/main/next/en/setup/backend/ttl/
      SW_CORE_METRICS_DATA_TTL: 2
      SW_DCS_MAX_INBOUND_MESSAGE_SIZE: 5000000000

  oap-bdb:
    <<: *oap-base
    profiles:
      - "banyandb"
    container_name: skywalking-server-bdb # rename to oap if switching to Elasticsearch
    depends_on:
      banyandb:
        condition: service_healthy
    environment:
      <<: *oap-env
      SW_STORAGE: banyandb
      SW_STORAGE_BANYANDB_TARGETS: banyandb:17912
      SW_CORE_RECORD_DATA_TTL: 14 # https://skywalking.apache.org/docs/main/next/en/setup/backend/ttl/
      SW_CORE_METRICS_DATA_TTL: 14
      SW_DCS_MAX_INBOUND_MESSAGE_SIZE: 5000000000

  ui:
    image: ${UI_IMAGE:-ghcr.io/apache/skywalking/ui:latest}
    container_name: skywalking-ui
    ports:
      - "1010:8080"
    networks:
      - skywalking
    restart: always
    environment:
      <<: *oap-env
      SW_OAP_ADDRESS: http://skywalking-server-bdb:12800
      SW_ZIPKIN_ADDRESS: http://skywalking-server-bdb:9412

volumes:
  elastic-sw:
  banyandb-stream-data:
    external: true
  banyandb-measure-data:
    external: true

networks:
  skywalking:

Anything else

No response

Are you willing to submit a pull request to fix on your own?

  • Yes I am willing to submit a pull request on my own!

Code of Conduct

@Almot77 Almot77 added the bug Something isn't working and you are sure it's a bug! label May 13, 2024
@wu-sheng wu-sheng added the database BanyanDB - SkyWalking native database label May 13, 2024
@wu-sheng wu-sheng added this to the BanyanDB - 0.7.0 milestone May 13, 2024
@wu-sheng
Copy link
Member

Your configuration is not well formatted. Please correct them. And what does SW_STORAGE: elasticsearch mean? I think we don't need Elasticsearch when you use BanyanDB.

And we don't have banyandb-helm 0.2 release, how do you deploy the database?

@lujiajing1126
Copy link
Contributor

lujiajing1126 commented May 13, 2024

After checking the code, it seems error is not handled properly.

image

@wu-sheng
Copy link
Member

@lujiajing1126 What is the case of occurring this error?

@Almot77
Copy link
Author

Almot77 commented May 13, 2024

Your configuration is not well formatted. Please correct them. And what does SW_STORAGE: elasticsearch mean? I think we don't need Elasticsearch when you use BanyanDB.

And we don't have banyandb-helm 0.2 release, how do you deploy the database?

s/bus.go:270 +

It`s docker-compose.yml file.
I run SW with selected db profile: elastic or banyandb.

Right way to run it:
docker compose --profile banyandb up -d

@wu-sheng
Copy link
Member

Are you using docker quick start? We haven't upgraded it to latest. It needs v10 oap and latest banyandb 0.6.

@wu-sheng
Copy link
Member

This error is easy to fix, @Almot77 but we want to know how you could trigger it, as we have run many tests to verify features.

@hanahmily
Copy link
Contributor

The nil error is fixed by https://github.com/apache/skywalking-banyandb/pull/445/files#diff-695073ea8dec3fcdaae77a3fcfb4eabc7290daade34399aee1de429999d7b476R124

But the error below is a bit tricky.

{"level":"error","module":"QUERY.TOPN.MEASURE-MINUTE.ENDPOINT_RESP_TIME_MINUTE_TOPN","error":"failed to query measure: unmarshal tag value: unsupported tag value type","req":{"groups":["measure-minute"], "name":"endpoint_resp_time_minute_topn", "timeRange":{"begin":"2024-05-13T14:20:00Z", "end":"2024-05-13T14:51:00Z"}, "topN":10, "agg":"AGGREGATION_FUNCTION_MEAN", "conditions":[{"name":"service_id", "op":"BINARY_OP_EQ", "value":{"str":{"value":"cGhwLW1zay1sZWdhY3k=.1"}}}], "fieldValueSort":"SORT_DESC"},"time":"2024-05-13T14:50:30Z","message":"fail to close the topn plan"}

@Almot77 Would you pls use the last banyandb image built from apache/skywalking-banyandb#445 to output more context about this error?

If you have an appropriate docker environment, issuing make docker.build is all you need.

@wu-sheng
Copy link
Member

I reopened this as we don't know gow this happens.

@Almot77 We will need more, could you package the whole data folder to us, then we could address what is the illegal data. Or could you share how we could reproduce this.

@Almot77
Copy link
Author

Almot77 commented May 14, 2024

Okay, i'l test it now.

We use php application in docker + sw php libay for trace collecting + grafana dashboards (made from sw examples for grafana)

@Almot77
Copy link
Author

Almot77 commented May 14, 2024

How to install go libaries ?
Ubuntu 22.04
I do:

sudo apt install time nodejs npm
sudo npm cache clean -f
sudo npm install -g n
sudo n stable
wget https://go.dev/dl/go1.22.3.linux-amd64.tar.gz
sudo  rm -rf /usr/local/go && tar -C /usr/local -xzf go1.22.3.linux-amd64.tar.gz
export PATH=$PATH:/usr/local/go/bin

$ make docker.build

make docker -C docker; \
if [ $? -ne 0 ]; then \
        exit 1; \
fi; \

make[1]: Entering directory '/home/srvdocker/skywalking/build/skywalking-banyandb/docker'
Build apache/skywalking-banyandb:latest
[+] Building 1.3s (14/18)                                                                                                                                       docker:default
 => [internal] load .dockerignore                                                                                                                                         0.0s
 => => transferring context: 2B                                                                                                                                           0.0s
 => [internal] load build definition from Dockerfile                                                                                                                      0.0s
 => => transferring dockerfile: 2.03kB                                                                                                                                    0.0s
 => [internal] load metadata for docker.io/library/busybox:stable-glibc                                                                                                   0.5s
 => [internal] load metadata for docker.io/library/alpine:edge                                                                                                            0.5s
 => [internal] load metadata for docker.io/library/golang:1.22                                                                                                            0.5s
 => [base 1/4] FROM docker.io/library/golang:1.22@sha256:b1e05e2c918f52c59d39ce7d5844f73b2f4511f7734add8bb98c9ecdd4443365                                                 0.0s
 => [internal] load build context                                                                                                                                         0.1s
 => => transferring context: 73.62kB                                                                                                                                      0.1s
 => CACHED [build-linux 1/4] FROM docker.io/library/busybox:stable-glibc@sha256:9bc27a72a82d22e54b4cc8bd7b99d3907a442869f77f075e0119104f2404953d                          0.0s
 => [certs 1/2] FROM docker.io/library/alpine:edge@sha256:e31c3b1cd47718260e1b6163af0a05b3c428dc01fa410baf72ca8b8076e22e72                                                0.0s
 => CACHED [certs 2/2] RUN apk add --no-cache ca-certificates && update-ca-certificates                                                                                   0.0s
 => CACHED [base 2/4] WORKDIR /src                                                                                                                                        0.0s
 => CACHED [base 3/4] COPY go.* ./                                                                                                                                        0.0s
 => CACHED [base 4/4] RUN go mod download                                                                                                                                 0.0s
 => ERROR [builder 1/2] RUN --mount=target=.             --mount=type=cache,target=/root/.cache/go-build             BUILD_DIR=/out BUILD_TAGS=prometheus make -C banyan  0.8s
------
 > [builder 1/2] RUN --mount=target=.             --mount=type=cache,target=/root/.cache/go-build             BUILD_DIR=/out BUILD_TAGS=prometheus make -C banyand banyand-server-static:
0.215 make: Entering directory '/src/banyand'
0.233 Building static binary
0.233 CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build \
0.233         -buildvcs=false \
0.233   -a --ldflags '-X github.com/apache/skywalking-banyandb/pkg/version.build=v0.6.0-1-gc827067-main -extldflags "-static"' -tags "netgo prometheus" -installsuffix netgo \
0.233   -o /out/banyand-server-static github.com/apache/skywalking-banyandb/banyand/cmd/server
0.434 ../api/data/data.go:24:2: no required module provides package github.com/apache/skywalking-banyandb/api/proto/banyandb/measure/v1; to add it:
0.434   go get github.com/apache/skywalking-banyandb/api/proto/banyandb/measure/v1
0.434 ../api/data/data.go:25:2: no required module provides package github.com/apache/skywalking-banyandb/api/proto/banyandb/stream/v1; to add it:
0.434   go get github.com/apache/skywalking-banyandb/api/proto/banyandb/stream/v1
0.434 ../pkg/run/run.go:36:2: no required module provides package github.com/apache/skywalking-banyandb/api/proto/banyandb/database/v1; to add it:
0.434   go get github.com/apache/skywalking-banyandb/api/proto/banyandb/database/v1
0.434 dquery/measure.go:25:2: no required module provides package github.com/apache/skywalking-banyandb/api/proto/banyandb/common/v1; to add it:
0.434   go get github.com/apache/skywalking-banyandb/api/proto/banyandb/common/v1
0.434 dquery/dquery.go:27:2: no required module provides package github.com/apache/skywalking-banyandb/api/proto/banyandb/model/v1; to add it:
0.434   go get github.com/apache/skywalking-banyandb/api/proto/banyandb/model/v1
0.434 metadata/schema/checker.go:27:2: no required module provides package github.com/apache/skywalking-banyandb/api/proto/banyandb/property/v1; to add it:
0.434   go get github.com/apache/skywalking-banyandb/api/proto/banyandb/property/v1
0.434 ../ui/embed.go:25:12: pattern dist: no matching files found
0.434 queue/pub/client.go:29:2: no required module provides package github.com/apache/skywalking-banyandb/api/proto/banyandb/cluster/v1; to add it:
0.434   go get github.com/apache/skywalking-banyandb/api/proto/banyandb/cluster/v1
0.440 make: *** [../scripts/build/build.mk:61: /out/banyand-server-static] Error 1
0.440 make: Leaving directory '/src/banyand'
------
Dockerfile:40
--------------------
  39 |
  40 | >>> RUN --mount=target=. \
  41 | >>>             --mount=type=cache,target=/root/.cache/go-build \
  42 | >>>             BUILD_DIR=/out BUILD_TAGS=prometheus make -C banyand banyand-server-static
  43 |     RUN --mount=target=. \
--------------------
ERROR: failed to solve: process "/bin/sh -c BUILD_DIR=/out BUILD_TAGS=prometheus make -C banyand banyand-server-static" did not complete successfully: exit code: 2
Command exited with non-zero status 1
0.15user 0.15system 0:01.51elapsed 20%CPU (0avgtext+0avgdata 48096maxresident)k
128inputs+56outputs (0major+10330minor)pagefaults 0swaps
make[1]: *** [../scripts/build/docker.mk:46: docker] Error 1
make[1]: Leaving directory '/home/srvdocker/skywalking/build/skywalking-banyandb/docker'
make: *** [Makefile:154: docker.build] Error 1

I try get libs:

$ go get github.com/apache/skywalking-banyandb/api/proto/banyandb/measure/v1
go: github.com/apache/skywalking-banyandb/api/proto/banyandb/measure/v1: no matching versions for query "upgrade"

$ go get github.com/apache/skywalking-banyandb/api/proto/banyandb/property/v1
go: github.com/apache/skywalking-banyandb/api/proto/banyandb/property/v1: no matching versions for query "upgrade"

@wu-sheng
Copy link
Member

@wu-sheng
Copy link
Member

Or simply, try dev image from here, https://github.com/apache/skywalking-banyandb/pkgs/container/skywalking-banyandb/215721861?tag=c8270670d47a9c6caa2661af434157656c4b7eaf

@Almot77
Copy link
Author

Almot77 commented May 14, 2024

Ok, db start success, but i have 3 troubles:

  1. I lost data from my panel in grafana. Possible bug was in this place.

image

Query to
endpoint_sla{parent_service='$service', layer='$layer', top_n='15', order='ASC'} / 100

Grafana query inspect:

{
  "request": {
    "url": "api/ds/query?ds_type=prometheus&requestId=Q362",
    "method": "POST",
    "data": {
      "queries": [
        {
          "datasource": {
            "type": "prometheus",
            "uid": "fdjzti6mhdam8c"
          },
          "editorMode": "code",
          "exemplar": false,
          "expr": "endpoint_sla{parent_service='php-kz-prod', layer='GENERAL', top_n='15', order='ASC'} / 100",
          "format": "time_series",
          "instant": false,
          "legendFormat": "{{endpoint}}",
          "range": true,
          "refId": "A",
          "requestId": "71A",
          "utcOffsetSec": 10800,
          "interval": "",
          "datasourceId": 2,
          "intervalMs": 60000,
          "maxDataPoints": 1358
        }
      ],
      "from": "1715666087538",
      "to": "1715669687539"
    },
    "hideFromInspector": false
  },
  "response": {
    "results": {
      "A": {
        "error": "expected object type",
        "errorSource": "",
        "status": 200,
        "frames": [
          {
            "schema": {
              "refId": "A",
              "meta": {
                "typeVersion": [
                  0,
                  0
                ],
                "executedQueryString": "Expr: endpoint_sla{parent_service='php-kz-prod', layer='GENERAL', top_n='15', order='ASC'} / 100\nStep: 1m0s"
              },
              "fields": []
            },
            "data": {
              "values": []
            }
          }
        ],
        "refId": "A"
      }
    }
  }
}
  1. I have doubles in my Slow Service instance dashboard
    image
    image

Query:

{
  "request": {
    "url": "api/ds/query?ds_type=prometheus&requestId=Q423",
    "method": "POST",
    "data": {
      "queries": [
        {
          "datasource": {
            "type": "prometheus",
            "uid": "fdjzti6mhdam8c"
          },
          "editorMode": "code",
          "exemplar": false,
          "expr": "endpoint_sla{parent_service='php-kz-prod', layer='GENERAL', top_n='15', order='ASC'} / 100",
          "format": "time_series",
          "instant": false,
          "legendFormat": "{{endpoint}}",
          "range": true,
          "refId": "A",
          "requestId": "71A",
          "utcOffsetSec": 10800,
          "interval": "",
          "datasourceId": 2,
          "intervalMs": 60000,
          "maxDataPoints": 940
        }
      ],
      "from": "1715666368835",
      "to": "1715669968835"
    },
    "hideFromInspector": false
  },
  "response": {
    "results": {
      "A": {
        "error": "expected object type",
        "errorSource": "",
        "status": 200,
        "frames": [
          {
            "schema": {
              "refId": "A",
              "meta": {
                "typeVersion": [
                  0,
                  0
                ],
                "executedQueryString": "Expr: endpoint_sla{parent_service='php-kz-prod', layer='GENERAL', top_n='15', order='ASC'} / 100\nStep: 1m0s"
              },
              "fields": []
            },
            "data": {
              "values": []
            }
          }
        ],
        "refId": "A"
      }
    }
  }
}
  1. I have a lot in logs:
WARNING: Exception processing message
io.grpc.StatusRuntimeException: RESOURCE_EXHAUSTED: gRPC message exceeds maximum size 4194304: 5728590
        at io.grpc.Status.asRuntimeException(Status.java:525)
        at io.grpc.internal.MessageDeframer.processHeader(MessageDeframer.java:392)
        at io.grpc.internal.MessageDeframer.deliver(MessageDeframer.java:272)

in docker-compose i have env SW_DCS_MAX_INBOUND_MESSAGE_SIZE when i run Skywalking:

oap-bdb:
    <<: *oap-base
    profiles:
      - "banyandb"
    container_name: skywalking-server-bdb # rename to oap if switching to Elasticsearch
    depends_on:
      banyandb:
        condition: service_healthy
    environment:
      <<: *oap-env
      SW_STORAGE: banyandb
      SW_STORAGE_BANYANDB_TARGETS: banyandb:17912
      SW_CORE_RECORD_DATA_TTL: 14 # https://skywalking.apache.org/docs/main/next/en/setup/backend/ttl/
      SW_CORE_METRICS_DATA_TTL: 14
      SW_DCS_MAX_INBOUND_MESSAGE_SIZE: 5000000000

When i use elastic - i have no doubles, i have all panels, and i still have a lot gRPC messages :)

I attach my docker log
log.tar.gz

@wu-sheng
Copy link
Member

If your data is just for testing, could you tar the whole data folder and upload here?

It would be eaiser to verified your query through the same dataset.

@Almot77
Copy link
Author

Almot77 commented May 14, 2024

Its data from skywalking php exporter, i don`t now how to collect and save it, may be export docker container with collected data ?

@wu-sheng
Copy link
Member

SW_DCS_MAX_INBOUND_MESSAGE_SIZE is not for this case. We need to check BanyanDB Java client(storage/banyandb/... in application.yml) for relative settings(maybe missed for now).

@hanahmily
Copy link
Contributor

  banyandb:
    profiles:
      - "banyandb"
    image: ${BANYANDB_IMAGE:-apache/skywalking-banyandb:latest}
    container_name: banyandb
    restart: always
    networks:
      - skywalking
    expose:
      - 17912
    ports:
      - 17913:17913
    volumes:
      - <you host path>:/tmp

@Almot77 could you mount your host path to the banyandb's /tmp. then archive the whole path then upload here?
The path should be like

image

@Almot77
Copy link
Author

Almot77 commented May 14, 2024

Sure.
Image with fix

version: '3.8'
services:
  banyandb:
    profiles:
      - "banyandb"
    image: ${BANYANDB_IMAGE:-ghcr.io/apache/skywalking-banyandb:c8270670d47a9c6caa2661af434157656c4b7eaf}
    container_name: banyandb
#    restart: always
    networks:
      - skywalking
    expose:
      - 17912
    ports:
      - 17913:17913
    volumes:
      - ./tmp:/tmp

banyandb container log:
banyandb_container_log.tar.gz

tmp:
https://filetransfer.io/data-package/POXU16no#link

Screens from grafana Elastic vs BanyanDB
Elastic
image

Banyan
image
image

@hanahmily
Copy link
Contributor

@Almot77 Could you try ghcr.io/apache/skywalking-banyandb:4270ef1ff8adab3c5de68f9b5c467e838d8bc8ae which contains the patch raised by apache/skywalking-banyandb#447

@Almot77
Copy link
Author

Almot77 commented May 21, 2024

After few minutes ater start i got:

  1. bdb and sw containers still woring, but traces stop collectiong.

bdb logs:
image

  1. I still have service_name doubles on sw graphs.
  2. top_n still not working

queries:
top_n(endpoint_sla,10,asc)/100
top_n(endpoint_resp_time,10,des)
top_n(endpoint_cpm,10,des)

image

@hanahmily
Copy link
Contributor

@Almot77 Thank you for your feedback.

Since there are several issues here, let's focus on the "errors" in bdb's log. I have created a debug image docker.io/hanahmily/skywalking-banyandb:13af6cb01078c29a3b346342b89ec56882466891, which can provide additional messages to help with this issue. Please collect all messages the bdb will print on the console.

If you're interested, please join our Slack channel at https://skywalking.apache.org/docs/main/next/en/guides/community/. This will help us communicate more efficiently.

@Almot77
Copy link
Author

Almot77 commented May 22, 2024

Hi! I send request to join, thank you.
Here logs:
bdb.logs.tar.gz

@wu-sheng
Copy link
Member

@Almot77
Copy link
Author

Almot77 commented May 26, 2024

Much better.

  1. No doubles in service instances
  2. Stable work, no errors
    image

But i still have'nt data for Endpoint Success Rate in Current Service (%)
image

Query top_n(endpoint_sla,10,asc)/100
did
Internal IO exception, query metrics error.
Screen from SW:
image

Same query from grafana:
did expected object type
image

PromQL:
Query

Expr: endpoint_sla{parent_service='php-msk-prod', layer='GENERAL', top_n='15', order='ASC'} / 100
Step: 1m0s

Response:

{
  "request": {
    "url": "api/ds/query?ds_type=prometheus&requestId=Q640",
    "method": "POST",
    "data": {
      "queries": [
        {
          "datasource": {
            "type": "prometheus",
            "uid": "fdjzti6mhdam8c"
          },
          "editorMode": "code",
          "exemplar": false,
          "expr": "endpoint_sla{parent_service='php-***-prod', layer='GENERAL', top_n='15', order='ASC'} / 100",
          "format": "time_series",
          "instant": false,
          "legendFormat": "{{endpoint}}",
          "range": true,
          "refId": "A",
          "requestId": "71A",
          "utcOffsetSec": 10800,
          "interval": "",
          "datasourceId": 2,
          "intervalMs": 60000,
          "maxDataPoints": 407
        }
      ],
      "from": "1716747985680",
      "to": "1716751585680"
    },
    "hideFromInspector": false
  },
  "response": {
    "results": {
      "A": {
        "error": "**expected object type**",
        "errorSource": "",
        "status": 200,
        "frames": [
          {
            "schema": {
              "refId": "A",
              "meta": {
                "typeVersion": [
                  0,
                  0
                ],
                "executedQueryString": "Expr: endpoint_sla{parent_service='php-***-prod', layer='GENERAL', top_n='15', order='ASC'} / 100\nStep: 1m0s"
              },
              "fields": []
            },
            "data": {
              "values": []
            }
          }
        ],
        "refId": "A"
      }
    }
  }
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working and you are sure it's a bug! database BanyanDB - SkyWalking native database
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants