Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

添加txt格式代理获取方式 #770

Open
wants to merge 14 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
4 changes: 2 additions & 2 deletions .github/workflows/docker-image-latest.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,11 +25,11 @@ jobs:
id: meta
uses: docker/metadata-action@v3
with:
images: jhao104/proxy_pool
images: wingser/proxy_pool

- name: Build and push Docker image
uses: docker/build-push-action@v2
with:
context: .
push: true
tags: jhao104/proxy_pool:latest
tags: wingser/proxy_pool:latest
2 changes: 1 addition & 1 deletion .github/workflows/docker-image-tags.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ jobs:
id: meta
uses: docker/metadata-action@v3
with:
images: jhao104/proxy_pool
images: wingser/proxy_pool

- name: Build and push Docker image
uses: docker/build-push-action@v2
Expand Down
19 changes: 19 additions & 0 deletions .vscode/launch.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
{
// Use IntelliSense to learn about possible attributes.
// Hover to view descriptions of existing attributes.
// For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387
"version": "0.2.0",
"configurations": [
{
"name": "Python: Current File",
"type": "python",
"request": "launch",
"program": "${file}",
"console": "integratedTerminal",
//"justMyCode": true,
//"python": "${command:python.interpreterPath}",
"env": {"PYTHONPATH":"${workspaceRoot}"},
"envFile": "${workspaceRoot}/.env"
}
]
}
4 changes: 2 additions & 2 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
FROM python:3.6-alpine

MAINTAINER jhao104 <j_hao104@163.com>
MAINTAINER wingser <wingser@foxmail.com>

WORKDIR /app

Expand All @@ -13,7 +13,7 @@ RUN sed -i 's/dl-cdn.alpinelinux.org/mirrors.ustc.edu.cn/g' /etc/apk/repositorie
RUN apk add -U tzdata && cp /usr/share/zoneinfo/Asia/Shanghai /etc/localtime && apk del tzdata

# runtime environment
RUN apk add musl-dev gcc libxml2-dev libxslt-dev && \
RUN apk add bash musl-dev gcc libxml2-dev libxslt-dev && \
pip install --no-cache-dir -r requirements.txt && \
apk del gcc musl-dev

Expand Down
52 changes: 19 additions & 33 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,8 @@ ProxyPool 爬虫代理IP池

爬虫代理IP池项目,主要功能为定时采集网上发布的免费代理验证入库,定时验证入库的代理保证代理的可用性,提供API和CLI两种使用方式。同时你也可以扩展代理源以增加代理池IP的质量和数量。

[原作者项目地址](https://github.com/jhao104/proxy_pool) 感谢jhao104之前项目贡献,因原项目更新缓慢,自己fork开启更新维护。欢迎提建议,我尽量更新,如果我更新也慢了,你可以考虑自己从原项目fork一份自己维护。

* 文档: [document](https://proxy-pool.readthedocs.io/zh/latest/) [![Documentation Status](https://readthedocs.org/projects/proxy-pool/badge/?version=latest)](https://proxy-pool.readthedocs.io/zh/latest/?badge=latest)

* 支持版本: [![](https://img.shields.io/badge/Python-2.7-green.svg)](https://docs.python.org/2.7/)
Expand All @@ -43,13 +45,14 @@ ProxyPool 爬虫代理IP池
* git clone

```bash
git clone git@github.com:jhao104/proxy_pool.git
git clone git@github.com:wingser/proxy_pool.git
```

* releases

```bash
https://github.com/jhao104/proxy_pool/releases 下载对应zip文件
https://github.com/wingser/proxy_pool/releases 下载对应zip文件
建议docker安装。
```

##### 安装依赖:
Expand All @@ -74,14 +77,6 @@ PORT = 5000 # 监听端口

DB_CONN = 'redis://:pwd@127.0.0.1:8888/0'


# 配置 ProxyFetcher

PROXY_FETCHER = [
"freeProxy01", # 这里是启用的代理抓取方法名,所有fetch方法位于fetcher/proxyFetcher.py
"freeProxy02",
# ....
]
```

#### 启动项目:
Expand All @@ -101,9 +96,9 @@ python proxyPool.py server
### Docker Image

```bash
docker pull jhao104/proxy_pool
docker pull wingser/proxy_pool

docker run --env DB_CONN=redis://:password@ip:port/0 -p 5010:5010 jhao104/proxy_pool:latest
docker run --env DB_CONN=redis://:password@ip:port/0 -p 5010:5010 --name wingser_pool wingser/proxy_pool:latest
```
### docker-compose

Expand All @@ -122,7 +117,9 @@ docker-compose up -d
| ----| ---- | ---- | ----|
| / | GET | api介绍 | None |
| /get | GET | 随机获取一个代理| 可选参数: `?type=https` 过滤支持https的代理|
| /gettxt | GET | 随机获取一个代理,非json,ip:port格式| 可选参数: `?type=https` 过滤支持https的代理|
| /pop | GET | 获取并删除一个代理| 可选参数: `?type=https` 过滤支持https的代理|
| /poptxt | GET | 获取并删除一个代理,非json,ip:port格式| 可选参数: `?type=https` 过滤支持https的代理|
| /all | GET | 获取所有代理 |可选参数: `?type=https` 过滤支持https的代理|
| /count | GET | 查看代理数量 |None|
| /delete | GET | 删除代理 |`?proxy=host:ip`|
Expand Down Expand Up @@ -185,20 +182,7 @@ class ProxyFetcher(object):
# 确保每个proxy都是 host:ip正确的格式返回
```

* 2、添加好方法后,修改[setting.py](https://github.com/jhao104/proxy_pool/blob/1a3666283806a22ef287fba1a8efab7b94e94bac/setting.py#L47)文件中的`PROXY_FETCHER`项:

  在`PROXY_FETCHER`下添加自定义方法的名字:

```python
PROXY_FETCHER = [
"freeProxy01",
"freeProxy02",
# ....
"freeProxyCustom1" # # 确保名字和你添加方法名字一致
]
```


* 2、添加好方法后,改为自动加载,无需配置。(原设计不太合理,我自己提交都漏掉几次,直接改自动加载):
  `schedule` 进程会每隔一段时间抓取一次代理,下次抓取时会自动识别调用你定义的方法。

### 免费代理源
Expand All @@ -210,29 +194,31 @@ PROXY_FETCHER = [
| 站大爷 | ✔ | ★ | ** | [地址](https://www.zdaye.com/) | [`freeProxy01`](/fetcher/proxyFetcher.py#L28) |
| 66代理 | ✔ | ★ | * | [地址](http://www.66ip.cn/) | [`freeProxy02`](/fetcher/proxyFetcher.py#L50) |
| 开心代理 | ✔ | ★ | * | [地址](http://www.kxdaili.com/) | [`freeProxy03`](/fetcher/proxyFetcher.py#L63) |
| FreeProxyList | ✔ | ★ | * | [地址](https://www.freeproxylists.net/zh/) | [`freeProxy04`](/fetcher/proxyFetcher.py#L74) |
| 快代理 | ✔ | ★ | * | [地址](https://www.kuaidaili.com/) | [`freeProxy05`](/fetcher/proxyFetcher.py#L92) |
| FateZero | ✔ | ★★ | * | [地址](http://proxylist.fatezero.org) | [`freeProxy06`](/fetcher/proxyFetcher.py#L111) |
| 云代理 | ✔ | ★ | * | [地址](http://www.ip3366.net/) | [`freeProxy07`](/fetcher/proxyFetcher.py#L124) |
| 小幻代理 | ✔ | ★★ | * | [地址](https://ip.ihuan.me/) | [`freeProxy08`](/fetcher/proxyFetcher.py#L134) |
| 免费代理库 | ✔ | ☆ | * | [地址](http://ip.jiangxianli.com/) | [`freeProxy09`](/fetcher/proxyFetcher.py#L144) |
| 89代理 | ✔ | ☆ | * | [地址](https://www.89ip.cn/) | [`freeProxy10`](/fetcher/proxyFetcher.py#L155) |
| 稻壳代理 | ✔ | ★★ | *** | [地址](https://www.docip.ne) | [`freeProxy11`](/fetcher/proxyFetcher.py#L165) |
| SEO方法代理 | ✔ | ☆ | * | [地址](https://proxy.seofangfa.com/) | [`wingser01`](/fetcher/proxyFetcher.py#L194) |
| 小舒代理 | ✔ | ☆ | * | [地址](http://www.xsdaili.cn/) | [`wingser02`](/fetcher/proxyFetcher.py#L206) |
| PzzQz代理 | ✔ | ☆ | * | [地址](https://pzzqz.com/) | [`wingser03`](/fetcher/proxyFetcher.py#L244) |
| proxy-list | ✔ | ☆ | * | [地址](https://proxy-list.org/) | [`wingser04`](/fetcher/proxyFetcher.py#L269) |
| proxylistplus| ✔ | ☆ | * | [地址](https://list.proxylistplus.com/)| [`wingser05`](/fetcher/proxyFetcher.py#L284) |



如果还有其他好的免费代理网站, 可以在提交在[issues](https://github.com/jhao104/proxy_pool/issues/71), 下次更新时会考虑在项目中支持。
如果还有其他好的免费代理网站, 可以在提交在[Issues](https://github.com/zwingser/proxy_pool/issues), 下次更新时会考虑在项目中支持。

### 问题反馈

  任何问题欢迎在[Issues](https://github.com/jhao104/proxy_pool/issues) 中反馈,同时也可以到我的[博客](http://www.spiderpy.cn/blog/message)中留言
  任何问题欢迎在[Issues](https://github.com/zwingser/proxy_pool/issues) 中反馈。

  你的反馈会让此项目变得更加完美。

### 贡献代码

  本项目仅作为基本的通用的代理池架构,不接收特有功能(当然,不限于特别好的idea)。

  本项目依然不够完善,如果发现bug或有新的功能添加,请在[Issues](https://github.com/jhao104/proxy_pool/issues)中提交bug(或新功能)描述,我会尽力改进,使她更加完美。
  本项目依然不够完善,如果发现bug或有新的功能添加,请在[Issues](https://github.com/zwingser/proxy_pool/issues)中提交bug(或新功能)描述,我会尽力改进,使她更加完美。

  这里感谢以下contributor的无私奉献:

Expand Down
12 changes: 12 additions & 0 deletions api/proxyApi.py
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,9 @@ def force_type(cls, response, environ=None):

api_list = [
{"url": "/get", "params": "type: ''https'|''", "desc": "get a proxy"},
{"url": "/gettxt", "params": "type: ''https'|''", "desc": "get a proxy"},
{"url": "/pop", "params": "", "desc": "get and delete a proxy"},
{"url": "/poptxt", "params": "", "desc": "get and delete a proxy"},
{"url": "/delete", "params": "proxy: 'e.g. 127.0.0.1:8080'", "desc": "delete an unable proxy"},
{"url": "/all", "params": "type: ''https'|''", "desc": "get all proxy from proxy pool"},
{"url": "/count", "params": "", "desc": "return proxy count"}
Expand All @@ -62,13 +64,23 @@ def get():
proxy = proxy_handler.get(https)
return proxy.to_dict if proxy else {"code": 0, "src": "no proxy"}

@app.route('/gettxt/')
def gettxt():
https = request.args.get("type", "").lower() == 'https'
proxy = proxy_handler.get(https)
return proxy._proxy if proxy else {"code": 0, "src": "no proxy"}

@app.route('/pop/')
def pop():
https = request.args.get("type", "").lower() == 'https'
proxy = proxy_handler.pop(https)
return proxy.to_dict if proxy else {"code": 0, "src": "no proxy"}

@app.route('/poptxt/')
def poptxt():
https = request.args.get("type", "").lower() == 'https'
proxy = proxy_handler.pop(https)
return proxy._proxy if proxy else {"code": 0, "src": "no proxy"}

@app.route('/refresh/')
def refresh():
Expand Down