Skip to content
This repository has been archived by the owner on Apr 20, 2023. It is now read-only.

Hashtag Scraper KeyError: 'graphql' when using Selenium webdriver or Sessionid Cookie #138

Open
kalebm1 opened this issue Jul 20, 2021 · 2 comments

Comments

@kalebm1
Copy link

kalebm1 commented Jul 20, 2021

Describe the bug
I am trying to scrape posts from a hashtag.
I am have used the both the Selenium driver and headers with a sessionid way of getting around the Instagram redirect to login page error. Before Instagram was redirecting to the login page, I was able to successfully scrape the hashtag with no problem. Once the redirection occurred, I inputted my sessionid into the headers field and got the following error: post_arr = self.json_dict["entry_data"]["TagPage"][0]["graphql"]["hashtag"]["edge_hashtag_to_media"]["edges"] KeyError: 'graphql'. I am fairly new to the library, so I decided to poke around in the code a bit and read through similar issues. After poking around, I think this error is similar to #124 in the sense that the json_dicts are not structured the same. I printed the json_dict out to a file and found that there is no graphql available nor are there many of the other things that the get_recent_posts looks for. I hope the fix for this error is as simple as the other issue.

To Reproduce
Steps to reproduce the behavior:

def __init__(self):
    self.headers = {
            "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) 
             Chrome/79.0.3945.74 Safari/537.36 Edg/79.0.309.43",
      }
    self.hashtag = Hashtag(hashtagUrl)
    self.hashtag.scrape(headers=self.headers)
    self.hashtags = self.hashtag.get_recent_posts()

Expected behavior
The expected outcome is a List[Posts] as what should typically be returned when calling the hashtag.get_recent_posts() method.

Screenshots
Screenshot (313)

Desktop (please complete the following information):

  • OS: Windows 10
  • Browser: Chrome
  • Version: 91.0.4472.124
@havelar
Copy link

havelar commented Sep 16, 2021

I'm having exactly the same problem, and I'm also sending the SessionID in cookies if anyone say it might be the problem... Still trying to understand what could be causing this issue

@yemregundogmus
Copy link

I have the same issue when I search using proxy and sessionid. I think the problem is defining the sessionid, that's why missing data is coming. And the library gives error but I couldn't find how to solve it.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants