-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Telegraf not restarting when OPC UA server restarts #15074
Comments
Hi, Without logs or a config it is hard to say for sure, but this could also be a duplicate of #13296. You can check out this comment on some of the work that would be required to enable listening for nodes.
The big change was #13514 which was indented to ensure we were reconnected. Again without seeing logs to understand what is going on I'm not sure what we can do. |
This looks like a duplicate of #13296, but what I don't understand is why the error message from OPCUA changed from bad server not connected to bad node id? Is that due to an update in the library with how it handles errors? If we are getting a bad node id should we assume we are not connected and re-connect? |
I think this is the only explanation, since nothing else changed in the opc ua code. I noticed the the version we use is quite outdated already, so maybe we should consider updating soon(ish)
I would prefer not to. In theory, if the node id is bad, reconnecting to the server shouldn't make a difference. In practice, opc ua is a super complex standard and reconnecting is easier than properly fixing the issue. Sometimes server are not implemented properly, or maybe we set up the connection improperly. Unfortunately, without access to the server to test against, this will be very hard to reproduce. @john-heywood are you able to reproduce this issue with the open62541/open62541 docker image we use for the unit tests? |
Our gopcua in master is at 0.5.3, which looks to be the latest? That went out in v1.30. |
I have replicated the issue on an OPC UA server hosted on the same manufacture hardware. Siemens Model: S7-1515F-2PN, Article No:6ES7515-2FM02-0AB0, Firmware 2.9.7. Telegraf reconnects in v1.27.1 and does not successfully reconnect in v1.27.2+ including the latest v1.30.0 that updated to gopcua 0.5.3. Attached is my telegraf.conf, telegraf logs, and wire shark captures of successful reconnects in 1.27.1 and failure to reconnect in 1.27.2. Restarting telegraf in v1.27.2+ was the only way I could get telegraf to reconnect. telegrafLog_v1_27_2_UnsuccessfulReconnect_TelegrafRestartRequired.txt telegrafLog_v1_27_1_SuccessfulReconnect.txt telegraf_v1_27_2_opcua_wireshark_capture_UnsuccessfulReconnect_FixedAfterTelegrafRestart.pcapng.gz telegraf_v1_27_1_opcua_wireshark_capture_SuccessfulReconnect.pcapng.gz |
Were you able to reproduce this with the open62541/open62541 docker image that Lars mentioned above? That would help narrow down if this is an upstream issue or telegraf issue. |
Hello! I am closing this issue due to inactivity. I hope you were able to resolve your problem, if not please try posting this question in our Community Slack or Community Forums or provide additional details in this issue and reqeust that it be re-opened. Thank you! |
Relevant telegraf.conf
n/a
Logs from Telegraf
System info
Telegraf 1.29.0, Telegraf 1.27.1, Telegraf 1.27.2
Docker
No response
Steps to reproduce
Expected behavior
When downloading to a PLC running an OPC UA server, I expected the server to restart, Telegraf recognize the change an disconnect, Telegraf reconnect, and Telegraf returning to reading data from the PLC OPC UA server.
Actual behavior
When the OPC UA server restarts, Telegraf loses connection to all the nodes, giving "StatusBadNodeIDUnkown" error messages. This will continue indefinitely until Telegraf is manually restarted, after which it runs as expected.
Additional info
This issue popped up only after upgrading to Telegraf v1.27.2 The version description says that some dependencies were updated related to OPC UA. When I downgrade to Telegraf v1.27.1 this problem does not occur. I also have an instance of Telegraf v1.29.0 where this has occurred.
The text was updated successfully, but these errors were encountered: