Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bandwhich loses interface after resume from sleep, only solution is to kill the app and restart. #195

Open
ioogithub opened this issue Oct 14, 2020 · 11 comments

Comments

@ioogithub
Copy link

ioogithub commented Oct 14, 2020

I am monitoring tun0 (an openvpn connection). It works fine until the computer goes to sleep. After I resume from sleep, the tun0 interface is reconnected but bandwhich will not see it (or cannot use it) any longer. In this state it only displays lo (loopback) which is not useful.

The issue is that the interface reconnects after resuming from sleep. It is still called tun0 however bandwhich can no longer use it. I would guess that bandwhich resumes from sleep, looks for the interface before it reconnects, doesn't find it and removed it from it's list. The only solution is to kill bandwhich and restart it which works as expected.

I have tried using the -i flag, this does not work.

Is it possible to have the application remember the interface and reconnect after resuming form sleep?

The app isn't useful to me until it can work across a sleep/wake event by automatically reconnecting to an interface.

@imsnif
Copy link
Owner

imsnif commented Oct 15, 2020

Hey @ioogithub - are you using the latest version of bandwhich (0.19.0)? We included a fix that should address this.

Bandwhich should try to reconnect the interface indefinitely. If you're on 0.19.0 and this isn't working for you, I'd be happy to investigate further.

@ioogithub
Copy link
Author

Yes I am using version: bandwhich 0.19.0. the binary downloaded from the website a few days ago.

Specifically here is what I see:

  1. Start bandwhich with no arguments, app starts normally, displaying events from several interfaces (loopback interface, local network traffic, internet traffic). This is all working as expected.
  2. Put computer to sleep && wake up
  3. Computer reconnects to wifi first then to openvpn connection automatically. The openvpn connection is handled by the openvpn plugin for network manager.
  4. bandwhich will not displaying events.

I do not see any errors or indications of a problem other than it is not showing events now.

When the computer wakes up from sleep there is a slight delay to reconnect the wifi (4 or 5 seconds). After this, the openvpn connection (tun0) takes an additional 2 or 3 seconds to reconnect.

How do you query the devices to monitor? Is it possible that the few second delay to reconnect is causing bandwhich to think there is no device any longer and removing it from the list? Would it be possible to requery after a period or on an interval?

@imsnif
Copy link
Owner

imsnif commented Oct 15, 2020

Every time bandwhich attempts to query the network device and receives an error, it waits for 1 second and then attempts to re-establish a link with that network device. It will do this indefinitely until it succeeds, so this should work. The only case it doesn't do this is when it receives a timeout from the network device.

I'm attaching a binary here that does this even when we receive a timeout from the network device. Let's see if it works for you?
bandwhich-temp-debug.tar.gz

@ioogithub
Copy link
Author

Thanks for the quick reply. I have tried the binary and unfortunately it does not fix the error. I did notice something interesting however during this last trial.

Previously I reported that after resuming from sleep bandwhich doesn't show any network events. During one past test I saw it reporting a single lo (loopback) event after a resume. This time however I saw it reporting a single local network event from the wifi network card interface.

I did another test. When I start bandwhich I have 3 network interfaces (from ifconfig):

lo - loopback interface
tun0 - Openvpn connection
wlp3sp - wifi card

In this state, all external traffic is directed though the vpn connection- tun0. Bandwhich works as expected.

After a resume from sleep if I do a ifconfig as soon as possible:

lo
wlp3sp

then 2 or 3 seconds later network manager reconnects the vpn connection and adds the tun0 back to the system. Now when I do an ifconfig I see the same thing as I saw before the sleep event:

lo
tun0
wlp3ps

All of the traffic automatically goes out though tun0, again as expected however bandwhich which is already running seems to have lost the ability to monitor the vpn interface where all the traffic is going.

So bandwhich knew about tun0 before the sleep event. After the sleep event it might appear that the tun0 interface had disappeared from the system for a few seconds then reappears causing bandwhich to lose, ignore or remove it (not sure which one) from the list of devices it is monitoring.

I initially thought bandwhich couldn't detect any events at all after a resume but it can still work with lo and wlp3ps as they never actually disappear they just change states. When network manager reconnects the vpn however all of the traffic is automatically directed to tun0 so I rarely see any events on wlp3ps and it looked like bandwhich stopped monitoring all together.

I guess the question is, how does bandwhich handle an event where an interface disappears for a few seconds and then reappears. Is it possible to continue to query for interfaces and if a new one appears automatically add it and start monitoring or does this only happen when bandwhich is first started?

I tried to think of another common scenario where an interface might change (be added or removed) during normal operation. I have a usb to RJ45 network dongle. When I connect it to my laptop it brings up an ethernet interface. I did two tests:

  1. Add the dongle and request traffic over it.
  2. Start bandwhich, it works as expected and can monitor the traffic.

Next I did this:

  1. Start bandwhich
  2. Add the dongle
  3. request traffic over the interface.

Here bandwhich also does not show any activity. If I then do this:

  1. Stop and start bandwhich, it works as expected and is able to monitor the traffic.

I believe this is a similar problem to the vpn. If an interface is added or removed while bandwhich is running it will not be able to monitor that interface.

I would love to see this issue resolved. I am trying to use bandwhich to create a system dashboard and this is the last piece of the puzzle.

@imsnif
Copy link
Owner

imsnif commented Oct 15, 2020

Thanks for all the debugging here and for suggesting some ideas.

So first, a little bit about bandwhich's relationship with the network interfaces so we're all on the same page. :)

When bandwhich starts, it scans all network interfaces on the machine that are up and then starts 1 thread for each one of them. It uses that thread to sniff the network traffic. If it does not manage to access the interface, it retries every one second indefinitely.

This would explain why the experiment with the USB dongle didn't work, I guess. Because it didn't show up in the initial scan, so a thread was never allocated for it. We can fix that, but that's an extra feature, and I think is not related to the behaviour we're seeing here. :)

I feel I'd still like to get some more information. I think if we know more, the cause will hopefully become clearer.

I made another binary that I'm attaching here. This one doesn't draw the UI at all, but rather prints some debugging messages in relevant places (when it gets a timeout from the network channel, which for me happens quite often, when it gets an error from the channel, when it recreates the channel and when the sniffing thread dies).

What do you think about running it under the scenario you mentioned and seeing what happens when?
bandwhich-temp-debug-with-logs.tar.gz

@ioogithub
Copy link
Author

Here is the log after conducting the same test.

https://paste.ubuntu.com/p/QyFHN58R7h/

Note: When I Ctrl+C to exit the app it looks like it adds a bunch of whitespace in the middle, not sure if it is wiping out lines you need or just inserting the whitespace.

I redacted the IP and MAC addresses with a global find and replace.

@imsnif
Copy link
Owner

imsnif commented Oct 19, 2020

Hey, cool - thanks for helping out debugging this!
So what we're seeing here is that bandwhich detects the interface being disconnected (that's the error on interface in the middle there), and then resets the channel. It seems to be doing so correctly. So on the surface, things here seem to be doing what they're supposed to.

I'm attaching another version here with some more debugging messages. This time it writes out debug messages when traffic is received on the interface (so it might be a little spammy, my apologies!)
I want to see if we get traffic from this interface after we re-establish the channel. I'm thinking maybe we're getting some and for some reason not handling them correctly. Thanks!
bandwhich-temp-debug-with-more-logs.tar.gz

@ioogithub
Copy link
Author

Here is the new log: https://paste.ubuntu.com/p/PbPjFRWR39/

I can see tun0 traffic before the sleep
After sleep and resume I can see the no device error as expected: get_datalink_channel error OtherError("tun0: No such device (os error 19)")
Then I can periodic unsuccessful attempts reconnect: timeout on NetworkInterface { name: "tun0"

I can confirm that openvpn did recreate tun0 and I loaded a new webpage on it after resuming from sleep.

It looks like it tries to connect to tun0 after resuming from sleep and it cannot.

@ioogithub
Copy link
Author

Was this last log helpful? Is there anything further I can do to help test or troubleshoot the issue?

@imsnif
Copy link
Owner

imsnif commented Oct 30, 2020

Hey @ioogithub - my apologies for taking some time with this. I am almost 100% sure this is an upstream issue with one of our dependencies, but I'll need to write some stub that would check this and eliminate everything else. Unfortunately I've been rather busy and realistically might take a little while to get back to you. I'm sorry for the delay and hope to be able to get to this soon! Thanks for bearing with me. :)

@ioogithub
Copy link
Author

Thanks for the update, I was excitedly checking every day for a while. I understand you are busy. If there is anything else I can do to help troubleshoot or test the issue please let me know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants