Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Random API Connection Failures #83

Open
FrAllard opened this issue Feb 28, 2024 · 28 comments
Open

Random API Connection Failures #83

FrAllard opened this issue Feb 28, 2024 · 28 comments

Comments

@FrAllard
Copy link

I know Issues on Github aren't made for this, I couldn't find a better place for it.

I wanted to share the result of the base I design for 3D printing for the project you shared.

It's not perfect yet, I have one more station to print and assemble where I made a little modification to the aligment of the screen with the DESPI module. When I confirm that the screen line up perfectly with the DESPI module I'll upload my 3D files to printables.com

I did not install a reset button and I wonder if i'll regret it in the future... I can still modify and print an other base though if I decide to add one. There is a pin hole on the side to hit the onboard reset though in case I really need to hit reset, but it's not as easy as hitting a big button.

I did find out during tests on my bench, disassembled, that the API sometimes fail to connect, I'd rather have the code not tell me or retry 2 ou 3 times before telling me. I found myself pressing the reset only when there was an error shown on the screen.

Printed with Overture ROCK PLA Rock White on a BambuLab P1S 0.4mm nozzle at 0.2mm layer height.

20240227_230316
20240227_230252
20240227_230136
20240227_230147
X-Ray

@lmarzen
Copy link
Owner

lmarzen commented Feb 28, 2024

@FrAllard,

Thank you for the kind words and constructive feedback.

Secondly, I want to say, wow that is a great-looking base! Probably the most well-thought-out one I have seen and the bottom panel is a nice touch. I would be happy to link to it if/when you share it on printables (feel free to link it as a reply here or open a pull request). Your build looks so clean, great job.

Lastly, I appreciate the constructive feedback. I have begun experiencing the same API connection errors more and more frequently in the last month. There does currently exist a retry mechanism in the software. Connection is attempted 3 times before the error is displayed. This used to seem nearly 100% effective at preventing these errors, however, it no longer seems to be doing the trick. I suspect that adding additional delay between retries may resolve this issue. Regardless, this is something that I plan to look into and fix (anticipate sometime mid-March as I have midterm exams this week and next week). I'll tag this thread when a fix is pushed.

Regards,

Luke

@lmarzen
Copy link
Owner

lmarzen commented Mar 3, 2024

I have experimented with adding some delay, which seems to have fixed it, though it is hard to tell due to the random nature of the issue. I'll wait a few more days and if I still don't see the error again then I'll push the fix.

@FrAllard
Copy link
Author

FrAllard commented Mar 3, 2024

Great thank you!

I too don't see the error that much since the thing has been assembled and placed in the living room instead of being partly assembled on the worktop bench where I work all day when I'm working from home!

I'll program a second one soon. I'll test the new modifications you've done!

@FrAllard
Copy link
Author

FrAllard commented Mar 4, 2024

I posted my project featuring your's at this address. Feel free to share it.
https://www.printables.com/model/791477-weather-station-using-a-esp32

@Marckau
Copy link

Marckau commented Mar 4, 2024 via email

lmarzen added a commit that referenced this issue Mar 4, 2024
@lmarzen
Copy link
Owner

lmarzen commented Mar 4, 2024

Wow, great instructions! I love all the pictures, you clearly put in a lot of effort. I added a link to it in the project readme. Thanks for sharing.

@FrAllard
Copy link
Author

FrAllard commented Mar 4, 2024

I did put a lot of effort, but for me this is my "painting". I find this relaxing to 3D design this kind of stuff...
You also did put a lot of effort in your firmware and follow up on issue.

Long live open source and open hardware
United we stand, divided we fall

@lmarzen
Copy link
Owner

lmarzen commented Mar 26, 2024

Update about occasional -1 Connection Refused HTTPC errors. Due to the unpredictable nature for the issue, it has been challenging to debug. However, I think I am on to a solution that involves adding a slight delay before retrying the API call. I have experimented with delays as short as 50ms which did not fix the issue, and delays as long as 1s which seemed to fix the issue since after two weeks time using this delay I did not observe the Connection Refused error. I am currently experimenting with 200ms delay which appears to be sufficient to prevent these errors. I am going to wait another week, and if I don't see the error again I will push this as the fix.

@lmarzen
Copy link
Owner

lmarzen commented Apr 9, 2024

I have begun observing -1 Connection Refused errors again. Delay doesn't seem to fix it. Progress update though, I managed to capture the error message over the serial monitor. Will continue to investigate.

[  9912][E][WiFiClientSecure.cpp:144] connect(): start_ssl_client: -1
  -1 Connection Refused

@lmarzen lmarzen changed the title Thank you for this project Random API Connection Failures Apr 29, 2024
@LiHuihhh
Copy link

LiHuihhh commented May 4, 2024

@lmarzen Well, I found this amazing project in Github, and I made it.However, I met the same problem,so I was looking for the resolutions for hours. And I failed. I found this conversation, but I still didn't find it out. It really troubles me for two days at least as I want to make it as a special birthday present for my best friend. It usually errors "-1 connection refused " or "-11 Read timeout". I tried a lot, for instance, I set the NTP_TIMEOUT at least 600000 ms. But it still errors.It seems that these mistakes are unpreventable and unpredictable, it definitly worries me a lot.I'm still waiting for your official resolution.But I'm worried that I won't give the present for her birthday, which absulutely is a pity.

@LiHuihhh
Copy link

LiHuihhh commented May 4, 2024

Looking for your response as soon as possible,only one day for me to improve the machine.

@LiHuihhh
Copy link

LiHuihhh commented May 4, 2024

Sometimes the mistake is also "-258 Deserialization Incomplete Input"

@LiHuihhh
Copy link

LiHuihhh commented May 4, 2024

屏幕截图 2024-05-04 214333
屏幕截图 2024-05-04 214404

@LiHuihhh
Copy link

LiHuihhh commented May 4, 2024

Maybe Wifi in my house is not good as well.

@lmarzen
Copy link
Owner

lmarzen commented May 4, 2024

Okay, Let's try to get this figured out asap.

NTP_TIMEOUT is only for syncing the time, increasing this timeout will not fix your issue.

I think we can increase the timeout for http requests and that should help. I have had a tremendously difficult time debugging these issues in the past since I cannot reproduce them reliably.

If you are able to capture terminal outputs for any of these errors that would be immensely helpful.

There is another work around that I can implement which I call 'silent retries'. The idea is that for certain types of errors like API errors we shouldn't display them the first time, we should just wait a minute and start over and hope the second or third time the error resolves itself. So we would only display API errors if it happens several times in a row.

Can you estimate how frequently you see these errors? Is it almost everytime? once an hour? once a day? etc?

@LiHuihhh
Copy link

LiHuihhh commented May 4, 2024

I agree with you. Once I came up with a resolution that if api errors the screen shouldn't be refreshed, just waiting for the next time it resets. Maybe this is a kind of coincidence. To be honest, I'm not really good at coding,so I may make some wrong steps.I think if we can increase the timeout for http requests , it may be solved , at least reducing the possibilities that the errors happen. You can try it. I hope to receive the update.

@LiHuihhh
Copy link

LiHuihhh commented May 4, 2024

Haha, maybe it will be a complexed problem for you.

@LiHuihhh
Copy link

LiHuihhh commented May 4, 2024

IMG_8329
屏幕截图 2024-05-04 222211
It errors for about 7 times in 10,as I set the sleep duration "3"

@LiHuihhh
Copy link

LiHuihhh commented May 4, 2024

Maybe you can start with the resolution ,increasing the timeout for http requests, as a user-defined setting. Maybe the server is too far from my area to receive the datas on time.

@LiHuihhh
Copy link

LiHuihhh commented May 4, 2024

Uploading IMG_8330.jpg…

@lmarzen
Copy link
Owner

lmarzen commented May 4, 2024

Okay, I think I am finally figuring it out. Your error messages helped me along. I'm working on it now. I'll get back soon.

@lmarzen
Copy link
Owner

lmarzen commented May 4, 2024

I have managed to reproduce -11: Read Timeout for the first time. I think I have finally figured this one out.

@lmarzen
Copy link
Owner

lmarzen commented May 4, 2024

I never thought it was a timeout issue since I only ever witnessed the Connection Refused error. By decreasing the timeout to 100ms I was able to reproduce the read timeout error. The default http tcp timeout is 5000ms. I have increased this for this project to 10000ms. You can increase this further if you need in config.cpp. Let me know if this resolves your issue and if you needed to increase the timeout further.

// HTTP
// The following errors are likely the result of insuffient http client tcp 
// timeout:
//   -1   Connection Refused
//   -11  Read Timeout
//   -258 Deserialization Incomplete Input
const unsigned HTTP_CLIENT_TCP_TIMEOUT = 10000; // ms

@LiHuihhh
Copy link

LiHuihhh commented May 4, 2024

Thank you so much! I'll test it after I wake up. Well,there is still an error"connection refused", which I couldn't understand.So if you can ,you can implement 'silent retries',and provide an extra option that users define the times the system retries.So maybe the system will be more flexible and user-defined.On the other hand,you will receive less feedbacks, just updating, rather than debugging a variety of errors.Good luck!

@LiHuihhh
Copy link

LiHuihhh commented May 7, 2024

Well,I want to consult about the problem that if I enter another Wifi which is far from my house,it seems that my friend couldn't make it run well via touching the reset bottom.It still doesn't work again.I think that if the first running didn't succeed, it won't run again , as a circulation.Since I couldn't get the displaying conveniently,it may be hard for me to inspect it. I hardly go to my friend's home. Could you please give me some advice?

@lmarzen
Copy link
Owner

lmarzen commented May 7, 2024

Can you please clarify the problem? It is unclear to me what the issue is that you are experiencing. Did the latest updates from e41f6fa fix your issue with the API failures?

@RemindZ
Copy link

RemindZ commented May 20, 2024

@lmarzen

first off, thanks for your project, this makes for some stunning home accessories.

Unfortunately since yesterday I am running into issues with mine, for a couple days this was running completely fine, refreshing every ten minutes, however, since yesterday, it shows -517: Connection Lost.
It still does refresh every 10 minutes, but unfortunately no luck.

I've re-flashed with different settings (namely, I upped the HTTP timeout and WiFi timeout a bit) but also no luck with that.

Nothing about my network setup has changed in the last weeks.

I can actually also see this issue in the API usage statistics for the one call API:

Date (UTC) | Total calls
May 20, 2024 | 6
May 19, 2024 | 150
May 18, 2024 | 156
20240520_193305

@lmarzen
Copy link
Owner

lmarzen commented May 21, 2024

The "Connection Lost" error occurs when a WiFi connection is made successfully but is lost before the API requests can be made. Does this error go away if you move closer to your access point?

One potential solution (would require code modifications) could be to attempt a WiFi reconnect if WiFi status is not good when trying to make API requests.

For reference, from the arduino docs:

WL_CONNECTION_LOST: assigned when the connection is lost

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants