Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About crawlab.json #26

Open
ma-pony opened this issue Oct 31, 2022 · 6 comments
Open

About crawlab.json #26

ma-pony opened this issue Oct 31, 2022 · 6 comments
Labels
enhancement New feature or request

Comments

@ma-pony
Copy link
Contributor

ma-pony commented Oct 31, 2022

When I was working with the SDK, I found that the SDK was not very convenient for schedules and deployment of multiple spiders, so I wondered if it could be designed to look like the following

.
| ── packages
│         | ── js_spiders
│         |         | ── js_spider_1
│         |         |         | ── index.js
│         |         | ── js_spider_2
│         |         |         | ── index.js
│         |         | ── package.json
│         |         | ── .....
│         | ──  py_spiders
│         |         | ── py_spider_1
│         |         |         | ── main.py
│         |         | ── py_spider_2
│         |         |         | ── main.py
│         |         | ── setup.py
│         |         | ── .....
│ ── crawlab.json
│ ── makefile

crawlab.json

{
  "spiders": [
    {
      "path": "packages/js_spider",
      "exclude_path": "node_modules",
      "name": "js spiders",
      "description": "js spiders",
      "cmd": "node",
      "schedules": [
        {
          "name": "js spider 1 cron",
          "cron": "* 1 * * *",
          "command": "node js_spider_1/index.js",
          "param": "",
          "mode": "random",
          "description": "js spider 1 cron",
          "enabled": true
        },
        {
          "name": "js spider 2 cron",
          "cron": "* 2 * * *",
          "command": "node js_spider_2/index.js",
          "param": "",
          "mode": "random",
          "description": "js spider 2 cron",
          "enabled": true
        }
      ]
    },
    {
      "path": "packages/py_spider",
      "exclude_path": ".venv",
      "name": "py spiders",
      "description": "py spiders",
      "cmd": "python",
      "schedules": [
        {
          "name": "py spider 1 cron",
          "cron": "* 1 * * *",
          "command": "python py_spider_1/main.py",
          "param": "",
          "mode": "random",
          "description": "py spider 1 cron",
          "enabled": true
        },
        {
          "name": "py spider 2 cron",
          "cron": "* 2 * * *",
          "command": "python py_spider_2/main.py",
          "param": "",
          "mode": "random",
          "description": "py spider 2 cron",
          "enabled": true
        }
      ]
    }
  ]
}

I can help implement this if you think it is possible
@tikazyq

@tikazyq
Copy link
Contributor

tikazyq commented Oct 31, 2022

Multi-spider support is on the way. Please follow this issue crawlab-team/crawlab#1190

@tikazyq tikazyq added the enhancement New feature or request label Oct 31, 2022
@ma-pony
Copy link
Contributor Author

ma-pony commented Oct 31, 2022

Multi-spider support is on the way. Please follow this issue crawlab-team/crawlab#1190

Will schedules deployments also be included?

@tikazyq
Copy link
Contributor

tikazyq commented Oct 31, 2022

Would you elaborate a bit?

@ma-pony
Copy link
Contributor Author

ma-pony commented Nov 1, 2022

Would you elaborate a bit?

In practice, I need to create dozens of new cronjobs along with a new crawler spider,
crawler spider upload can be done from the command line, so can cronjobs be done too?
then I can write these commands to CICD.

So I would like to add a new param schedules to the crawlab.json to publish and manage cronjobs, like this

    {
      "path": "packages/py_spider",
      "exclude_path": ".venv",
      "name": "py spiders",
      "description": "py spiders",
      "cmd": "python",
      "schedules": [
        {
          "name": "py spider 1 cron",
          "cron": "* 1 * * *",
          "command": "python py_spider_1/main.py",
          "param": "",
          "mode": "random",
          "description": "py spider 1 cron",
          "enabled": true
        },
        {
          "name": "py spider 2 cron",
          "cron": "* 2 * * *",
          "command": "python py_spider_2/main.py",
          "param": "",
          "mode": "random",
          "description": "py spider 2 cron",
          "enabled": true
        },
       ...
      ]
    }

what do you think of these ideas, or do you have any other better suggestions?

@ma-pony
Copy link
Contributor Author

ma-pony commented Nov 8, 2022

Would you elaborate a bit?

@tikazyq What do you think about the above

@tikazyq
Copy link
Contributor

tikazyq commented Nov 9, 2022

I think that's a good idea but it might take some time to implement it. Let's create a new enhancement issue in the main repo https://github.com/crawlab-team/crawlab

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants