GitHub - metalwarrior665/actor-keywords-extractor: Extract list of all keywords fro many website

Keyword Extractor

Can deeply crawl a website and counts how many times are provided keywords found on the page.

How to use

You can pass in any number of keywords that you want to count.
You can combine Start URLs, Pseudo Urls and link selector to traverse any number of pages accross websites. Check our scraping tutorial on how to use these.
You can specify maxDepth and maxPagesPerCrawl to limit the scope of the scrape. Start URLs have depth 0. So if you want just the start URLs, set maxDepth to 0, etc.
You can pick case sensitive search and search through scripts.
You can choose to scrape with or without browser. Browser is more expensive but allows JavaScript rendering and waiting.
For browser, you can use many additional features

How are keywords determined

The text is split into words by word boundaries. Each word is then compared with each keyword. In the future, we may add other types of boundaries to choose from.

Example Output

For keywords:

["watch", "watches", "rolex"]

starting on https://www.chrono24.com/watches/mens-watches--62.htm

[
    {
        "url": "https://www.chrono24.com/watches/mens-watches--62.htm",
        "depth": 0,
        "result": {
            "watch": 63,
            "watches": 81,
            "rolex": 57
        }
    },
    {
        "url": "https://www.chrono24.com/user/index.htm",
        "depth": 1,
        "result": {
            "watch": 9,
            "watches": 13,
            "rolex": 1
        }
    },
    {
        "url": "https://www.chrono24.com/info/watch-collection.htm",
        "depth": 1,
        "result": {
            "watch": 56,
            "watches": 23,
            "rolex": 1
        }
    },
...
]

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
src		src
.gitignore		.gitignore
.npmignore		.npmignore
Dockerfile		Dockerfile
INPUT_SCHEMA.json		INPUT_SCHEMA.json
README.md		README.md
apify.json		apify.json
package-lock.json		package-lock.json
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

src

src

.gitignore

.gitignore

.npmignore

.npmignore

Dockerfile

Dockerfile

INPUT_SCHEMA.json

INPUT_SCHEMA.json

README.md

README.md

apify.json

apify.json

package-lock.json

package-lock.json

package.json

package.json

Repository files navigation

Keyword Extractor

How to use

How are keywords determined

Example Output

About

Releases

Packages

Languages

metalwarrior665/actor-keywords-extractor

Folders and files

Latest commit

History

Repository files navigation

Keyword Extractor

How to use

How are keywords determined

Example Output

About

Topics

Resources

Stars

Watchers

Forks

Languages