Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stuck thread on silent connection close #215

Open
xavetar opened this issue Mar 30, 2023 · 12 comments
Open

Stuck thread on silent connection close #215

xavetar opened this issue Mar 30, 2023 · 12 comments
Labels
bug Something isn't working downloader good first issue Good for newcomers

Comments

@xavetar
Copy link

xavetar commented Mar 30, 2023

Hi, threads die after getting some kind of error and don't start again.

@Skallwar Skallwar added bug Something isn't working good first issue Good for newcomers downloader labels Mar 30, 2023
@Skallwar
Copy link
Owner

Thanks for reporting this

@xavetar
Copy link
Author

xavetar commented Mar 31, 2023

Thanks for reporting this

And from the wishes, there is not enough --timeout option only --tries. Some web servers, when parsing content, do not give out content or it is missing at the endpoint and as a result they are loaded endlessly (Connection Black Hole) until the connection is broken and the thread itself also dies at that moment.

@Skallwar
Copy link
Owner

Skallwar commented Apr 9, 2023

Could you provide the command that triggered this behavior?

@xavetar
Copy link
Author

xavetar commented Apr 12, 2023

Could you provide the command that triggered this behavior?

This behavior is caused not by a command, but by a resource that does not want to be parsed. This resource is the most dirty, it is full of black holes. If you use less than 100, it will get stuck altogether and then throw an exception. You can try 10 to start, not immediately 100. In order to see it clearly, I recommend disabling this random-range and tries.

./suckit -vc -j 100 --include-download ".pdf|.ps.Z" --exclude-download ".png|.jpg|.mp4" --random-range 10 --tries 3 -u "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.77 Safari/537.36 Edg/91.0.864.37" "https://www.cl.cam.ac.uk/"

@Skallwar
Copy link
Owner

Can you pinpoint one link that hangs for ever. I'm running this for a few hours without issue

@xavetar
Copy link
Author

xavetar commented Apr 12, 2023

Can you pinpoint one link that hangs for ever. I'm running this for a few hours without issue

I recorded only two endpoints that had problems with infinite loading, including the browser, at the moment they are throwing a timeout or an exception.

  1. https://www.cl.cam.ac.uk/~aas23/papers_aas/PoolSDA2.ps
  2. https://www.cl.cam.ac.uk/~aas23/papers_aas/fair4.pdf

Perhaps now these errors do not exist due to the fact that they began to close access to these files on their resource. If you don't have these downloaded endpoints in the root directory, then it's like this:

2

@Skallwar
Copy link
Owner

When trying to download both of them directly I get this error:

tcp connect error: No route to host (os error 113)

I don't notice any "stuck" thread

@xavetar
Copy link
Author

xavetar commented Apr 13, 2023

When trying to download both of them directly I get this error:

tcp connect error: No route to host (os error 113)

I don't notice any "stuck" thread

Congratulations, I repeat, you don't have a disconnect timeout from the server when the server doesn't tell the client that a disconnect has occurred.

PREVIOUSLY, when trying to access content from these links from a browser, the page would load for about a minute and then the client would time out, not a server. NOW the server timeout or connection fails.

ABOUT 10 HOURS AGO, I couldn't access any of the endpoints posted above. NOW some of them are available.

Threads always fall and it had nothing to do with this resource. This resource only enhanced this action by such processing of requests.

@Skallwar
Copy link
Owner

Skallwar commented Apr 13, 2023

I understand what's going on, thanks.

My problem now is to try to have some kind of test that I can run in order to test a potential fix.

I will try to implement something similar in our test suite when I have some time. Feel free to open a pull request if you want to try and fix it by yourself, I would gladly review and merge it

@xavetar
Copy link
Author

xavetar commented Apr 13, 2023

I will try to implement something similar in our test suite when I have some time. Feel free to open a pull request if you want to try and fix it by yourself, I would gladly review and merge it

I would join, but right now I don't have this language in my stack. This will change soon.

I think adding only the client timeout would greatly reduce the chance of a threading problem, but not completely eliminate it.

@Skallwar Skallwar changed the title Multi-threading Stuck thread on silent connection close Apr 27, 2023
@Skallwar
Copy link
Owner

It seems that adding a timeout would be quite easy to do with the reqwest crate we are using. However I'm not entirely sure what to do when we try to download a huge file like 1G or something.

If I set a timeout of 30s it will solve this issue but might also break the download of a huge file depending on your connection speed. If I put say 30 minutes as a timeout value, then I might get away with the huge file case but your will also get threads stuck for much longer...

Reqwest have a test for this and it seems to me that they just throw a timeout error

@xavetar
Copy link
Author

xavetar commented Apr 27, 2023

It seems that adding a timeout would be quite easy to do with the reqwest crate we are using. However I'm not entirely sure what to do when we try to download a huge file like 1G or something.

If I set a timeout of 30s it will solve this issue but might also break the download of a huge file depending on your connection speed. If I put say 30 minutes as a timeout value, then I might get away with the huge file case but your will also get threads stuck for much longer...

Reqwest have a test for this and it seems to me that they just throw a timeout error

The problem is in the reqwest code. The timeout should work out before the connection is established, they made it global, it works for them both after the initialization and at the time of data transfer. In fact, the timeout should work until the server sends a new packet.

With so many dependencies, I'm actually surprised it works. The feeling that the author decided to make a salad out of everything in the world. Now I'm thinking of writing my own client, because the implementation of the rest doesn't sympathize with me too much. Perhaps there are other clients that work well. The problem is the dependencies itself, and one mistake spawns hundreds of others.

And can someone explain to me why add direct work with JSON to the library for working with requests. It feels like javascripters have taken over the world. And this is only part of everything, the principle of divide and conquer apparently does not work for them.

That is simple illustration (uncomment signatures and other lines):

use std::io::prelude::*;
use std::fs::{File, remove_file};
use std::path::Path;
use reqwest::Client;
// use std::time::Duration;

// async fn download_file_with_timeout(url: &str, timeout: Duration, file_path: &str) -> Result<(), reqwest::Error> {
async fn download_file(url: &str, file_path: &str) -> Result<(), reqwest::Error> {

    match Path::new(file_path).try_exists() {
        Ok(_value) => {
            let status_remove = remove_file(file_path);
            match status_remove {
                Ok(_value) => println!("File removed!"),
                Err(_value) => println!("File is not removed!")
            }
        },
        Err(_error) => {
            println!("File is not exists");
        }
    } 

    let output_file_result = File::create(file_path);

    let mut output_file = match output_file_result {
        Ok(value) => {
            println!("File is created!");
            value
        }
        Err(error) => {
            panic!("Can't create the file: {:?}", error);
        }
    };

    let client = Client::builder().build()?;
    // let client = Client::builder().timeout(timeout).build()?;
    let mut response = client.get(url).send().await?;

    while let Some(chunk) = response.chunk().await? {
        output_file.write_all(&chunk).unwrap();
    }

    Ok(())
}

#[tokio::main]
async fn main() {
    let url = "https://speed.hetzner.de/10GB.bin";
    // let timeout = Duration::from_secs(30);

    let file_path = "/Users/$USER$/Downloads/10GB.bin";

    // download_file_with_timeout(url, timeout, file_path).await;
    download_file_with_timeout(url, file_path).await;
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working downloader good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

2 participants