Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

关于运行环境 #23

Open
Luobeia opened this issue Apr 3, 2022 · 16 comments
Open

关于运行环境 #23

Luobeia opened this issue Apr 3, 2022 · 16 comments

Comments

@Luobeia
Copy link

Luobeia commented Apr 3, 2022

你好,请问这个应该怎么运行,我在win10和vm的centos7上按照使用方法来操作,配置了两天环境还是不能运行,请问除了requirement.txt里的软件需要安装外,还需要安装什么吗,万分感谢

@baabaaox
Copy link
Owner

baabaaox commented Apr 3, 2022

@Luobeia 卡在哪一步,报啥错误?

$ git clone https://github.com/baabaaox/ScrapyDouban.git
# 构建并运行容器
$ cd ./ScrapyDouban/docker
$ sudo docker-compose up --build -d
# 进入 douban_scrapyd 容器
$ sudo docker exec -it douban_scrapyd bash
# 进入 scrapy 目录
$ cd /srv/ScrapyDouban/scrapy
$ scrapy list

@Luobeia
Copy link
Author

Luobeia commented Apr 3, 2022

sudo docker-compose up --build -d这一步就跟演示视频不同了,刚开始我去下docker和docker-compose解决了这两个命令不能识别的错误,我在centos下运行的,然后我这里报下面的错误,第一次做scrapy相关的项目,小白,忘见谅
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 710, in urlopen
chunked=chunked,
File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 398, in _make_request
conn.request(method, url, **httplib_request_kw)
File "/usr/lib64/python3.6/http/client.py", line 1254, in request
self._send_request(method, url, body, headers, encode_chunked)
File "/usr/lib64/python3.6/http/client.py", line 1300, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "/usr/lib64/python3.6/http/client.py", line 1249, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/usr/lib64/python3.6/http/client.py", line 1036, in _send_output
self.send(msg)
File "/usr/lib64/python3.6/http/client.py", line 974, in send
self.connect()
File "/usr/local/lib/python3.6/site-packages/docker/transport/unixconn.py", line 30, in connect
sock.connect(self.unix_socket)
FileNotFoundError: [Errno 2] No such file or directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/requests/adapters.py", line 450, in send
timeout=timeout
File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 786, in urlopen
method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
File "/usr/local/lib/python3.6/site-packages/urllib3/util/retry.py", line 550, in increment
raise six.reraise(type(error), error, _stacktrace)
File "/usr/local/lib/python3.6/site-packages/urllib3/packages/six.py", line 769, in reraise
raise value.with_traceback(tb)
File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 710, in urlopen
chunked=chunked,
File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 398, in _make_request
conn.request(method, url, **httplib_request_kw)
File "/usr/lib64/python3.6/http/client.py", line 1254, in request
self._send_request(method, url, body, headers, encode_chunked)
File "/usr/lib64/python3.6/http/client.py", line 1300, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "/usr/lib64/python3.6/http/client.py", line 1249, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/usr/lib64/python3.6/http/client.py", line 1036, in _send_output
self.send(msg)
File "/usr/lib64/python3.6/http/client.py", line 974, in send
self.connect()
File "/usr/local/lib/python3.6/site-packages/docker/transport/unixconn.py", line 30, in connect
sock.connect(self.unix_socket)
urllib3.exceptions.ProtocolError: ('Connection aborted.', FileNotFoundError(2, 'No such file or directory'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/docker/api/client.py", line 214, in _retrieve_server_version
return self.version(api_version=False)["ApiVersion"]
File "/usr/local/lib/python3.6/site-packages/docker/api/daemon.py", line 181, in version
return self._result(self._get(url), json=True)
File "/usr/local/lib/python3.6/site-packages/docker/utils/decorators.py", line 46, in inner
return f(self, *args, **kwargs)
File "/usr/local/lib/python3.6/site-packages/docker/api/client.py", line 237, in _get
return self.get(url, **self._set_request_timeout(kwargs))
File "/usr/local/lib/python3.6/site-packages/requests/sessions.py", line 542, in get
return self.request('GET', url, **kwargs)
File "/usr/local/lib/python3.6/site-packages/requests/sessions.py", line 529, in request
resp = self.send(prep, **send_kwargs)
File "/usr/local/lib/python3.6/site-packages/requests/sessions.py", line 645, in send
r = adapter.send(request, **kwargs)
File "/usr/local/lib/python3.6/site-packages/requests/adapters.py", line 501, in send
raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', FileNotFoundError(2, 'No such file or directory'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/bin/docker-compose", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.6/site-packages/compose/cli/main.py", line 81, in main
command_func()
File "/usr/local/lib/python3.6/site-packages/compose/cli/main.py", line 200, in perform_command
project = project_from_options('.', options)
File "/usr/local/lib/python3.6/site-packages/compose/cli/command.py", line 70, in project_from_options
enabled_profiles=get_profiles_from_options(options, environment)
File "/usr/local/lib/python3.6/site-packages/compose/cli/command.py", line 153, in get_project
verbose=verbose, version=api_version, context=context, environment=environment
File "/usr/local/lib/python3.6/site-packages/compose/cli/docker_client.py", line 43, in get_client
environment=environment, tls_version=get_tls_version(environment)
File "/usr/local/lib/python3.6/site-packages/compose/cli/docker_client.py", line 170, in docker_client
client = APIClient(use_ssh_client=not use_paramiko_ssh, **kwargs)
File "/usr/local/lib/python3.6/site-packages/docker/api/client.py", line 197, in init
self._version = self._retrieve_server_version()
File "/usr/local/lib/python3.6/site-packages/docker/api/client.py", line 222, in _retrieve_server_version
f'Error while fetching server API version: {e}'
docker.errors.DockerException: Error while fetching server API version: ('Connection aborted.', FileNotFoundError(2, 'No such file or directory'))

@baabaaox
Copy link
Owner

baabaaox commented Apr 3, 2022

@Luobeia 查了下你这个报错 docker/compose#7896 好像是你 centos 的 docker 服务没启动起来
1.检查一下 docker 服务状态

 sudo systemctl status docker

2.如果没运行则启动 docker 服务

sudo systemctl start docker

3.再执行上面的

$ cd ./ScrapyDouban/docker
$ sudo docker-compose up --build -d
# 进入 douban_scrapyd 容器
$ sudo docker exec -it douban_scrapyd bash
# 进入 scrapy 目录
$ cd /srv/ScrapyDouban/scrapy
$ scrapy list

@Luobeia
Copy link
Author

Luobeia commented Apr 4, 2022

好的多谢,我去试试,有问题再来请教你

@Luobeia
Copy link
Author

Luobeia commented Apr 4, 2022

不好意思,又遇到问题了,ScrapyDouban/docker/Dockerfile里面有apt-get命令,我的这个centos用的是yum命令,我把它全改成yum命令运行不了,不改的话识别不了apt-get命令

@Luobeia
Copy link
Author

Luobeia commented Apr 4, 2022

上面那个问题解决了,打扰了,不过还是爬不了数据,估计是因为代理的问题?我后面再来研究研究代理,谢谢!!

@baabaaox
Copy link
Owner

baabaaox commented Apr 4, 2022

@Luobeia 如果大量403的话,就需要用代理IP来解决

@Luobeia
Copy link
Author

Luobeia commented Apr 4, 2022

怎么看403呢,我看代码有error,
ERROR: Gave up retrying <GET https://m.douban.com/movie/subject/1292052/> (failed 3 times): DNS lookup failed: no results for hostname lookup: m.douban.com.
twisted.internet.error.DNSLookupError: DNS lookup failed: no results for hostname lookup: m.douban.com.

@baabaaox
Copy link
Owner

baabaaox commented Apr 4, 2022

@Luobeia DNS 解析失败了,是不是你虚拟机网络有啥问题 ,自己 ping 看看

ping m.douban.com 

@Luobeia
Copy link
Author

Luobeia commented Apr 5, 2022

好的,我去看看我网络是不是Ping不同,感谢!

@Luobeia
Copy link
Author

Luobeia commented Apr 10, 2022

大佬,想问一下那个是不是得先爬取电影的id才能爬取电影的数据啊,就是有多少个id就爬多少部电影

@Luobeia
Copy link
Author

Luobeia commented Apr 10, 2022

我下午还爬了1000多组数据,但是后面好像ip被封了,一条都爬不了了,还有那个数据在我centos系统里根本找不到是为啥

@baabaaox
Copy link
Owner

@Luobeia

  1. movie_subject spider 这行代码 里面的数组就是定义了从哪些页面开始搜集 douban id,原理就是 spider 抓取数组里面的链接, 然后查找页面里面是否有电影链接,有的话就提取出 douban id 返回给管道,再递进的爬取电影链接,理论上这个开始数组里面的链接范围要足够分散,足够多,spider 才能爬得足够远,不然它遇到爬过的链接它就会停止,你自己在数组里面多填一些有效的链接进去。
  2. ip 被封肯定就不能获取到你想要的数据了,数据存储在 docker 运行 的 mysql 容器里面的,通过你 centos 主机 IP:8080 访问数据库管理界面,登陆所需参数,服务器:mysql 用户名:root 密码:public

@Luobeia
Copy link
Author

Luobeia commented Apr 11, 2022

我用phpmyadmin的,在网页上输入,192.168.122.1:8080/phmyadmin,访问不了是为啥

@Luobeia
Copy link
Author

Luobeia commented Apr 11, 2022

看了一眼演示视频,用的是adminer,我自己去试试,没看到

@Luobeia
Copy link
Author

Luobeia commented Apr 25, 2022

请问一下,我想爬豆瓣里的预告片信息,用xpath定位,用浏览器插件检查也获得了网址,但是我修改原来的movie_meta.py文件,让official_site字段爬我想爬的信息,为啥不行

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants