Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OVT Streamworker can get stuck, eating all system memory #1583

Open
SceneCityDev opened this issue Apr 3, 2024 · 2 comments
Open

OVT Streamworker can get stuck, eating all system memory #1583

SceneCityDev opened this issue Apr 3, 2024 · 2 comments
Assignees
Labels
investigating Being checked

Comments

@SceneCityDev
Copy link

An Edge for unknown reasons has reconnected its OVT connection to the Origin:

[2024-04-01 00:03:11.996] W [SPOvtPub-t9000:957880] Socket.Pool.Worker | socket_pool_worker.cpp:218  | [#16] [0x7fc9020c9b10] Failed to send data for 10000ms - This socket is going to be garbage collected (<ClientSocket: 0x7fc8f7550010, #27, Connected, TCP, Nonblocking, 136.243.153.227:43788>)
[2024-04-01 00:03:11.996] W [SPOvtPub-t9000:957880] Socket | socket.cpp:2109 | [#27] [0x7fc8f7550010] Socket was closed even though there was 4244 commands in the socket
[2024-04-01 00:03:17.009] I [SPOvtPub-t9000:957880] OVT | ovt_publisher.cpp:181  | OvtProvider is connected : <ClientSocket: 0x7fc8f7401c10, #34, Connected, TCP, Nonblocking, 142.132.205.123:55476>
[2024-04-01 00:03:17.024] I [SPOvtPub-t9000:957880] Monitor | stream_metrics.cpp:114  | A new session has started playing #default#origin1/revision-main on the OVT publisher. OVT(16)/Stream total(16)/App total(16)

Due to this, the OVT queue then got stuck and started overflowing:

[2024-04-01 00:03:17.674] W [AW-OVT0:957902] ManagedQueue | managed_queue.h:444  | [122] mngq:v=#default#origin1:s=revision-main:p=pub:n=streamworker_ovt size has exceeded the threshold: queue: 2281, threshold: 500, peak: 2281
[2024-04-01 00:03:22.681] W [AW-OVT0:957902] ManagedQueue | managed_queue.h:444  | [122] mngq:v=#default#origin1:s=revision-main:p=pub:n=streamworker_ovt size has exceeded the threshold: queue: 4306, threshold: 500, peak: 4306
[2024-04-01 00:03:27.696] W [AW-OVT0:957902] ManagedQueue | managed_queue.h:444  | [122] mngq:v=#default#origin1:s=revision-main:p=pub:n=streamworker_ovt size has exceeded the threshold: queue: 6341, threshold: 500, peak: 6341
[2024-04-01 00:03:32.708] W [AW-OVT0:957902] ManagedQueue | managed_queue.h:444  | [122] mngq:v=#default#origin1:s=revision-main:p=pub:n=streamworker_ovt size has exceeded the threshold: queue: 8399, threshold: 500, peak: 8399
[2024-04-01 00:03:37.719] W [AW-OVT0:957902] ManagedQueue | managed_queue.h:444  | [122] mngq:v=#default#origin1:s=revision-main:p=pub:n=streamworker_ovt size has exceeded the threshold: queue: 10445, threshold: 500, peak: 10445
[2024-04-01 00:03:42.738] W [AW-OVT0:957902] ManagedQueue | managed_queue.h:444  | [122] mngq:v=#default#origin1:s=revision-main:p=pub:n=streamworker_ovt size has exceeded the threshold: queue: 12491, threshold: 500, peak: 12491
[2024-04-01 00:03:47.769] W [AW-OVT0:957902] ManagedQueue | managed_queue.h:444  | [122] mngq:v=#default#origin1:s=revision-main:p=pub:n=streamworker_ovt size has exceeded the threshold: queue: 14544, threshold: 500, peak: 14544
[2024-04-01 00:03:52.781] W [AW-OVT0:957902] ManagedQueue | managed_queue.h:444  | [122] mngq:v=#default#origin1:s=revision-main:p=pub:n=streamworker_ovt size has exceeded the threshold: queue: 16591, threshold: 500, peak: 16591
[2024-04-01 00:03:57.797] W [AW-OVT0:957902] ManagedQueue | managed_queue.h:444  | [122] mngq:v=#default#origin1:s=revision-main:p=pub:n=streamworker_ovt size has exceeded the threshold: queue: 18629, threshold: 500, peak: 18629
[2024-04-01 00:04:02.807] W [AW-OVT0:957902] ManagedQueue | managed_queue.h:444  | [122] mngq:v=#default#origin1:s=revision-main:p=pub:n=streamworker_ovt size has exceeded the threshold: queue: 20673, threshold: 500, peak: 20673
[2024-04-01 00:04:07.828] W [AW-OVT0:957902] ManagedQueue | managed_queue.h:444  | [122] mngq:v=#default#origin1:s=revision-main:p=pub:n=streamworker_ovt size has exceeded the threshold: queue: 22685, threshold: 500, peak: 22685
[2024-04-01 00:04:12.842] W [AW-OVT0:957902] ManagedQueue | managed_queue.h:444  | [122] mngq:v=#default#origin1:s=revision-main:p=pub:n=streamworker_ovt size has exceeded the threshold: queue: 24691, threshold: 500, peak: 24691
[2024-04-01 00:04:17.852] W [AW-OVT0:957902] ManagedQueue | managed_queue.h:444  | [122] mngq:v=#default#origin1:s=revision-main:p=pub:n=streamworker_ovt size has exceeded the threshold: queue: 26671, threshold: 500, peak: 26671
[2024-04-01 00:04:22.863] W [AW-OVT0:957902] ManagedQueue | managed_queue.h:444  | [122] mngq:v=#default#origin1:s=revision-main:p=pub:n=streamworker_ovt size has exceeded the threshold: queue: 28650, threshold: 500, peak: 28650

...it continued to do so for a full hour until all 16GB system memory had been used up, and the Linux kernel OOM killer killed the OME process:

[2024-04-01 01:01:03.903] W [AW-OVT0:957902] ManagedQueue | managed_queue.h:444  | [122] mngq:v=#default#origin1:s=revision-main:p=pub:n=streamworker_ovt size has exceeded the threshold: queue: 1422024, threshold: 500, peak: 1422024
[2024-04-01 01:01:08.921] W [AW-OVT0:957902] ManagedQueue | managed_queue.h:444  | [122] mngq:v=#default#origin1:s=revision-main:p=pub:n=streamworker_ovt size has exceeded the threshold: queue: 1424007, threshold: 500, peak: 1424007
[2024-04-01 01:17:46.146] W [AW-OVT0:957902] ManagedQueue | managed_queue.h:444  | [122] mngq:v=#default#origin1:s=revision-main:p=pub:n=streamworker_ovt size has exceeded the threshold: queue: 1425251, threshold: 500, peak: 1425251

There may be something buggy with the socket error handling here.

For the OVT streamworker, SetPreventExceedThreshold is not set, which I find dangerous. But in general I think starting to drop messages would not be the right way to respond to some bug in other parts of the code. I think there should be some error handling in managed queue class for the case where a queue is getting stuck, in this case killing the stuck streamworker thread.

@getroot getroot added the investigating Being checked label Apr 4, 2024
@getroot getroot self-assigned this Apr 4, 2024
@getroot
Copy link
Sponsor Member

getroot commented Apr 4, 2024

This appears to be a bug where the OVT Pulbisher gets stuck in a deadlock. Could you please post the full log containing the problem? If you do not want your logs to be made public, please send them to support@airensoft.com. thank you

@SceneCityDev
Copy link
Author

Sent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
investigating Being checked
Projects
None yet
Development

No branches or pull requests

2 participants