Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

正文的第一张图片可能无法提取 #76

Open
kingname opened this issue Mar 23, 2020 · 1 comment
Open

正文的第一张图片可能无法提取 #76

kingname opened this issue Mar 23, 2020 · 1 comment
Assignees
Labels
bug Something isn't working

Comments

@kingname
Copy link
Collaborator

bug的现象

  1. 你期望的返回是?

提取正文和正文最上面的图片

  1. 实际GNE给你的返回是?

漏掉了正文最上方的图片

如何复现

  1. 目标网址

http://www.eeo.com.cn/2020/0321/378971.shtml

  1. 你怎么调用GNE的

用Gne Online

屏幕截图

使用环境:

  • OS: [e.g. Ubuntu 19.04/Windows 10/macOS ]
  • Python版本 [e.g. 3.7.1]
  • GNE版本 [e.g. 0.1.4]
@kingname kingname added the bug Something isn't working label Mar 23, 2020
@kingname kingname self-assigned this Mar 23, 2020
@kingname
Copy link
Collaborator Author

因为第一张图片与正文是分离在两个标签的,由于GNE使用的文本密度算法导致第一张图片必定被漏掉。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant