Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proxies are not rotated #759

Open
regnull opened this issue Mar 22, 2023 · 1 comment
Open

Proxies are not rotated #759

regnull opened this issue Mar 22, 2023 · 1 comment

Comments

@regnull
Copy link

regnull commented Mar 22, 2023

I'm using an array of HTTP proxies and setting up the collector as described in the example:

c := colly.NewCollector(
		colly.MaxDepth(cfg.MaxDepth),
		colly.URLFilters(
                     // ...
		),
	)

	c.Limit(&colly.LimitRule{
		DomainGlob:  "*",
		Parallelism: cfg.Parallelism,
		Delay:       time.Duration(cfg.RandomDelay) * time.Millisecond,
	})
                roundRobinSwitcher, err := proxy.RoundRobinProxySwitcher(cfg.Proxy...)
		if err != nil {
			log.Fatal().Err(err).Msg("failed to create proxy switcher")
		}
		c.SetProxyFunc(roundRobinSwitcher)

However, I've noticed that only the first proxy is getting used. I've verified this by putting a breakpoint roundRobinSwitcher getProxy() function - it is called only once.

I've traced the problem here: https://cs.opensource.google/go/go/+/refs/tags/go1.19.3:src/net/http/transport.go;l=539

	if altRT := t.alternateRoundTripper(req); altRT != nil {
		if resp, err := altRT.RoundTrip(req); err != ErrSkipAltProtocol {
			return resp, err
		}
		var err error
		req, err = rewindBody(req)
		if err != nil {
			return nil, err
		}
	}

On the first pass, it doesn't go into the body of the IF, proceeds and eventually hits the GetProxy function. On the second pass, it gets the alternativeRoundTripper, goes into the IF, and returns, which means it doesn't call GetProxy function again.

Unfortunately, at this point I exceeded the limits of my knowledge and didn't research further. Perhaps someone on the team knows what is this about.

Great library, btw, thanks for your work!

@POFK
Copy link

POFK commented Mar 25, 2023

I found the same problem and got the solution from #399 and #567
You should set the DisableKeepAlives as true to make sure that the ProxyFunc is called on every request.

c.WithTransport(&http.Transport{
    DisableKeepAlives: true,
})

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants