Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

判断是否包含多个关键词中的任意一个 #265

Closed
NaiboWang opened this issue Jan 3, 2024 · 1 comment
Closed

判断是否包含多个关键词中的任意一个 #265

NaiboWang opened this issue Jan 3, 2024 · 1 comment
Labels
documentation Improvements or additions to documentation

Comments

@NaiboWang
Copy link
Owner

NaiboWang commented Jan 3, 2024

【能否直接增加多关键词判断中针对当前循环项的JavaScript指令功能】

群主在github上给出的一个中国地震台网表格案例(https://github.com/NaiboWang/EasySpider/wiki/Example-of-JavaScript-instruction-for-the-current-iteration-in-a-conditional-statement#%E4%B8%AD%E5%9B%BD%E5%9C%B0%E9%9C%87%E5%8F%B0%E7%BD%91%E8%A1%A8%E6%A0%BC%E6%A1%88%E4%BE%8B)

这个例子给出了采集表格中第五项深度字段包含“10”的条目,需要在循环中添加判断条件,并将条件设置为如下命令:
return document.evaluate("./td[5]", arguments[0], null, XPathResult.FIRST_ORDERED_NODE_TYPE, null).singleNodeValue.innerText.indexOf("10") >= 0

但是,如果要采集表格中第五项深度字段包含“10”或“14”或“16”或“17”或“22”的条目(或更多的关键词),下一版本能否直接支持呢(例如关键词都放在数组)?

另外,在当前的0.6.0版本中,如果用JavaScript,下面的代码好像不能直接用于EasySpider的“代码/脚本内容”下方的文本框里。该如何实现呢?

let keywords = ["10", "14", "16", "17", "22"];  
let result = false;  
  
for (let keyword of keywords) {  
    let xpathQuery = "./td[contains(., '" + keyword + "')]";  
    let nodes = document.evaluate(xpathQuery, arguments[0], null, XPathResult.FIRST_ORDERED_NODE_TYPE, null);  
    if (nodes.singleNodeValue) {  
        result = true;  
        break;  
    }  
} 

Originally posted by @cat2123 in #25 (comment)

@NaiboWang NaiboWang added the documentation Improvements or additions to documentation label Jan 3, 2024
@NaiboWang
Copy link
Owner Author

NaiboWang commented Jan 3, 2024

首先,你的代码少一个return result,最后面添加就可以运行:

let keywords = ["10", "14", "16", "17", "22"];
let result = false;

for (let keyword of keywords) {
  let xpathQuery = "./td[contains(., '" + keyword + "')]";
  let nodes = document.evaluate(xpathQuery, arguments[0], null, XPathResult.FIRST_ORDERED_NODE_TYPE, null);
  if (nodes.singleNodeValue) {
    result = true;
    break;
  }
}
return result;

注意中国地震台网的第一行是th元素,所以动态调试的时候会因为XPath设置为td而无法测试出任何有效条件。

另外,这段代码并不能实现当前循环的那一行数据中第5个表格元素包含多个值中的任意一个时就提取数据的需求,这段代码实际上实现的是当前循环的那一行数据中任意一个td元素包含以上5个值中的一个时就提取数据的需求(不只是第5个td元素)。

如果要实现你想要的需求,需要这样写:

let keywords = ["10", "14", "16", "17", "22"];
let result = false;

for (let keyword of keywords) {
  let xpathQuery = "./td[5]"; //指定为第5个td元素
  let nodes = document.evaluate(xpathQuery, arguments[0], null, XPathResult.FIRST_ORDERED_NODE_TYPE, null);
  if (nodes.singleNodeValue.innerText.includes(keyword)) { //用includes来判断是否包含元素
    result = true;
    break;
  }
}
return result;

需要注意的是,如果某些行第5个td元素中的值为142,条件也会成立,因为142包含14,因此如果要精确匹配,需要将nodes.singleNodeValue.innerText.includes(keyword)改为nodes.singleNodeValue.innerText == keyword

希望能解决你的问题。

@NaiboWang NaiboWang changed the title 多关键词判断中针对当前循环项的JavaScript指令功能 判断是否包含多个关键词中的任意一个 Jan 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

1 participant