Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Webaction runs twice within one activation, and reported as failure when it did not actually fail #5481

Open
mretallack opened this issue May 2, 2024 · 2 comments

Comments

@mretallack
Copy link

Environment details:

  • K8s Deployment using AWS ESK, 1.29

Steps to reproduce the issue:

  1. Have an incoming HTTP POST from AWS IoT (using HTTP as a Action Rule)
  2. Have a webaction (visible on internet), using python:3.9 action runner
  3. Monitor the failures of the web action.

Provide the expected results and outputs:

Expected output

2024-05-02 08:07:27 aa9313bdf52a490b9313bdf52ac90b6b python:3 warm  170ms      success         dataspace/iot_action:0.0.76:

    "logs": [
        "2024-05-02T08:07:27.203249047Z stdout: Message type is None",
        "2024-05-02T08:07:27.203353817Z stdout: Authenticated with deployment test",
        "2024-05-02T08:07:27.203359561Z stdout: Processing IoT message iot/statusResponse",
        "2024-05-02T08:07:27.203362535Z stdout: Message: {\"location\": {\"latitude\": \"x\", \"locationDescription\": \"\", \"longitude\": \"x\"}, \"tenantName\": \"LoadTest1\"}",
        "2024-05-02T08:07:27.203366872Z stdout: Publishing statusResponse to topic example.fct.iot_test_statusresponse.1",
        "2024-05-02T08:07:27.203369844Z stdout: New keys ['\"1454419940501\"']",
        "2024-05-02T08:07:27.203372379Z stdout: Finished processing",
        "2024-05-02T08:07:27.20337515Z  stdout: returning: {\"headers\": {\"Content-Type\": \"application/json\", \"Connection\": \"close\"}, \"statusCode\": 200, \"body\": \"\\\"OK\\\"\"}"
    ],

Provide the actual results and outputs:

    "logs": [
        "2024-05-02T08:03:35.441164892Z stdout: Message type is None",
        "2024-05-02T08:03:35.441233404Z stdout: Authenticated with deployment test",
        "2024-05-02T08:03:35.44123762Z  stdout: Processing IoT message iot/statusResponse",
        "2024-05-02T08:03:35.441241006Z stdout: Message: {\"location\": {\"latitude\": \"X\", \"locationDescription\": \"Engineering Lab\", \"longitude\": \"X\"}, \"tenantName\": \"LoadTest1\"}",
        "2024-05-02T08:03:35.441244653Z stdout: Publishing statusResponse to topic topic example.fct.iot_test_statusresponse.1",
        "2024-05-02T08:03:35.441247605Z stdout: New keys ['\"1454419850988\"']",
        "2024-05-02T08:03:35.441250359Z stdout: Finished processing",
        "2024-05-02T08:03:35.441253345Z stdout: returning: {\"headers\": {\"Content-Type\": \"application/json\", \"Connection\": \"close\"}, \"statusCode\": 200, \"body\": \"\\\"OK\\\"\"}",
        "2024-05-02T08:05:10.601380606Z stdout: Message type is None",
        "2024-05-02T08:05:10.601398036Z stdout: Authenticated with deployment test",
        "2024-05-02T08:05:10.601401152Z stdout: Processing Stratos message iot/status",
        "2024-05-02T08:05:10.601404588Z stdout: Message: {\"commonName\": \"1454419850988\", .......,
        "2024-05-02T08:05:10.6014228Z   stdout: Publishing status to topic example.fct.iot_test_status.1",
        "2024-05-02T08:05:10.601426211Z stdout: New keys ['\"1454419850988\"']",
        "2024-05-02T08:05:10.601428478Z stdout: Finished processing",
        "2024-05-02T08:05:10.601431725Z stdout: returning: {\"headers\": {\"Content-Type\": \"application/json\", \"Connection\": \"close\"}, \"statusCode\": 200, \"body\": \"\\\"OK\\\"\"}",
        "2024-05-02T08:05:10.604225532Z stderr: The action did not initialize or run as expected. Log data might be missing."
    ],

Additional information you deem important:

It appears that the same activation is capturing the output from two runs. And therefore the second (or first) run is assumed to of failed.

I have tried adding a sleep into the while loop (thinking it was a timeing issue), but still does it·

https://github.com/apache/openwhisk-runtime-python/blob/1389697e3251ef08a930ed95629379551a8d3c9a/core/python3Action/lib/launcher.py#L54

@rabbah
Copy link
Member

rabbah commented May 6, 2024

You might see this if the webaction returned a 50x status code - nginx may be automatically retrying the request and hence running the action again. You can confirm this by checking the nginx logs for the action in question.

@mretallack
Copy link
Author

Thanks for the suggestion,

I will retry it a few times over the next day, but this is what I see on the access log in nginx:

10.0.0.65 - - [02/May/2024:10:31:17 +0000] [#tid_8daa7a35d7e1b39c3114d4c94d99ae12] POST /api/v1/web/dataspace/iot_package/iot_action HTTP/1.1 200 4 - AHC/2.1 172.20.41.68:8080
10.0.1.70 - - [02/May/2024:10:31:18 +0000] [#tid_63cd895864243ab89c8ed39fec47d83f] POST /api/v1/web/dataspace/iot_package/iot_action HTTP/1.1 200 4 - AHC/2.1 172.20.41.68:8080

From reading this it looks like both are returning 200.

I am wondering if AWS do something strange with there HTTP client and send multiple requests over the same connection and dont wait for a reply (only a theory)

I will keep looking at the logs until somthing else comes up :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants