Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

使用extract方法进行内容抽取的时候如果加了对body内容的xpath配置就报错 #104

Open
tranzwalle opened this issue Dec 23, 2020 · 3 comments
Assignees
Labels
bug Something isn't working

Comments

@tranzwalle
Copy link

使用GeneralNewsExtractor的extract方法进行内容抽取的时候如果加了对body内容的xpath配置就报错

  1. 你期望的返回是?
    • 能够按xpath解析抽取对应的内容
  2. 实际GNE给你的返回是?
    • 会报错

如何复现

  1. 目标网址:https://mp.weixin.qq.com/s?__biz=MzIyNjg4NjA1OQ==&mid=2247493977&idx=1&sn=70a88dfa860017a8af96a11c362a2e1a&scene=0
  2. 你怎么调用GNE的
    body = selector.xpath(body_xpath)[0]
    IndexError: list index out of range

屏幕截图
image

使用环境:

  • OS: [e.g. macOS ]
  • Python版本 [e.g. 3.8]
  • GNE版本 [e.g. 0.2.5]
@tranzwalle tranzwalle added the bug Something isn't working label Dec 23, 2020
@kingname
Copy link
Collaborator

你可以显看看,你获取到的html_content里面,有没有rich_media_content这个class

@tranzwalle
Copy link
Author

tranzwalle commented Dec 23, 2020

你可以显看看,你获取到的html_content里面,有没有rich_media_content这个class

返回的是有那个class的,只是这个方法里面有个selector参数,在这个地方源码没有传,导致进去用下标获取时会报错

image
image
image

@kingname
Copy link
Collaborator

这个 selector 参数就是我传进去的 element。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants