Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cookies from the Cookie request header are not processed #1992

Open
exotfboy opened this issue May 16, 2016 · 8 comments · Fixed by #2400 · May be fixed by #4812
Open

Cookies from the Cookie request header are not processed #1992

exotfboy opened this issue May 16, 2016 · 8 comments · Fixed by #2400 · May be fixed by #4812

Comments

@exotfboy
Copy link

exotfboy commented May 16, 2016

I am new in scrapy, and I meet some problems which I can not get answer from google, so I post it here:

1 Cookie not work even set in DEFAULT_REQUEST_HEADERS:

DEFAULT_REQUEST_HEADERS = {
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
    'accept-encoding': 'gzip, deflate, sdch',
    'cache-control': 'no-cache',
    'cookie': 'xx=yy',
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.94 Safari/537.36'
}
class MySpider(scrapy.Spider):
    def make_requests_from_url(self, url):
        return scrapy.http.Request(url, headers=DEFAULT_REQUEST_HEADERS)

I know the make_requests_from_url will only called once for the start_urls, and in my opinion, the first request will send the cookie I set in the DEFAULT_REQUEST_HEADERS, however it does not.

2 Share settings between spiders.

I have multiple spiders in the project which share most of the settings like RandomAgentMiddleware RandomProxyMiddleware UserAgent DEFAULT_REQUEST_HEADERS and etc, however they are configured inside the settings.py for each spider.

Is it possible to share these settings?


The
COOKIES_ENABLED is set to true.

@BruceDone
Copy link

BruceDone commented May 16, 2016

how about the COOKIES_ENABLED from the setting.py ? did you set it to False ?

@kmike
Copy link
Member

kmike commented May 16, 2016

So the 1st issue is that CookiesMiddleware sets Cookie header even if cookiejar is empty, or, broadly speaking, that it discards Cookie header set on a request instead of adding to it. This happens here. I think this is a valid concern. A pull request to fix that is welcome.

Sorry, I don't get the second issue. All settings defined in settings.py are shared between spiders, you can't configure per-spider settings in settings.py file. What do you mean?

@elacuesta
Copy link
Member

A question about priorities: when creating a Request, if a name is specified both directly as part of headers['Cookie'] and as a value in the cookies argument, which one should be used? I feel tempted to keep the one in cookies, but that's just my opinion.

@kmike
Copy link
Member

kmike commented Nov 21, 2016

@elacuesta yeah, I agree that using value set in cookies argument makes more sense in this case.

@Gallaecio
Copy link
Member

Reopening as per #4823

@Gallaecio Gallaecio reopened this Oct 8, 2020
@Gallaecio Gallaecio linked a pull request Oct 8, 2020 that will close this issue
@elacuesta elacuesta changed the title DEFAULT_REQUEST_HEADERS not work as expected Cookies from the Cookie request header are not processed Oct 30, 2020
@GeorgeA92
Copy link
Contributor

As setting cookies directly in headers is not an option because of #4823 the only remaining way to set custom cookies is to.. assign it directly to CookieJar (that later will be used by CookiesMiddleware) as requested on #1878

With possibility to directly set Cookie in CookieJar in spider code - we don't need to maintain possibility to process cookies from Cookie as request header or even from cookie request argument (and It resolves priority issue mentioned on #1992 (comment)).

@Gallaecio
Copy link
Member

Gallaecio commented Jan 22, 2024

Technically we don’t need to process the Cookie header given the cookie argument from Request that you mention, but people still expect them to work if they do set them as a header. So this is not a very important feature request since it was for a different way to do something that is already possible, but it is still worth considering.

@GeorgeA92
Copy link
Contributor

@Gallaecio

but people still expect them to work if they do set them as a header.

Real issue is that there are multiple interpretations of this statement:

  1. Originally (correct me if it is not) according to http protocol - it is expected that cookie header contain only values(key=value) previously received from Set-Cookie header received from previously sent responses (sending only values from cookiejar where it stored). It is possible that cookie values have long valid time (long enough to be reused on other run tomorrow - but this part is not implemented in scrapy). With this logic: setting 'cookie': 'xx=yy' in DEFAULT_SETTINGS means that user needs to disable CookiesMiddleware as suggested on Cookies from the Cookie request header are not processed #1992 (comment) .Maintaining/processing of cookies is not needed as cookie header value - completely static in this case.

  2. 'cookie': 'xx=yy' as header.. some users may interpret as "set xx to value yy for existing cookiejar while keeping active cookie processing logic from CookiesMiddleware"(as result to send something'cookie':'xx=yy; .. other values from cookiejar as header). Almost like it implemented in processing cookie as Scrapy.Request param. I think that this approach.. doesn't follow logic of http protocol related to cookies (where expected that cookie header contain all values from cookiejar). And also it may lead to additional issues if requests return responses that leads to.. other requests (from RetryMiddleware or from RedirectMiddleware).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
7 participants