Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

connecting to private petals using ec2 dht problems #604

Open
brandnamewater opened this issue Dec 9, 2023 · 0 comments
Open

connecting to private petals using ec2 dht problems #604

brandnamewater opened this issue Dec 9, 2023 · 0 comments

Comments

@brandnamewater
Copy link

brandnamewater commented Dec 9, 2023

I can start a DHT with python3 -m petals.cli.run_dht --host_maddrs /ip4/0.0.0.0/tcp/31337 --identity_path bootstrap1.id in the EC2

In the ec2, I also used pip install . --no-cache-dir because it's a small instance, if that might impact it at all?

I'm having issues running python -m petals.cli.run_server enoch/llama-65b-hf --initial_peers /ip4/IP_ADDRESS/tcp/31337/p2p/abc /ip4/127.0.0.1/tcp/31337/p2p/abc from my computer into the EC2

and getting "hivemind.p2p.p2p_daemon_bindings.utils.P2PDaemonError: Daemon failed to start: 2023/12/08 23:53:41 failed to connect to bootstrap peers"

I have no issues doing this locally on my own computer setting up a DHT, and running peers on it from new tabs but using EC2 to connect peers from my computer to the started DHT on the EC2 i can't seem to get running

My inbound rules:

SSH - 20 - 0.0.0.0/0
HTTP - 80 - 0.0.0.0/0
HTTPS - 443 - 0.0.0.0/0
Custom TCP - 31337 - 0.0.0.0/0

I even tried with All Traffic - 0.0.0.0/0

I'm using WSL Ubuntu 22.04.3 on windows 11

On powershell:

tnc EC2_IP -port 31337

ComputerName     : EC2_IP
RemoteAddress    : EC2_IP
RemotePort       : 31337
InterfaceAlias   : Wi-Fi
SourceAddress    : MY_IP
TcpTestSucceeded : True

WSL ubuntu:

telnet EC2_IP 31337

Trying EC2_IP...
Connected to EC2_IP.
Escape character is '^]'.
/multistream/1.0.0

setup.cfg

[metadata]
name = petals
version = attr: petals.__version__
author = Petals Developers
author_email = petals-devs@googlegroups.com
description = Easy way to efficiently run 100B+ language models without high-end GPUs
long_description = file: README.md
long_description_content_type = text/markdown
url = https://github.com/bigscience-workshop/petals
project_urls =
    Bug Tracker = https://github.com/bigscience-workshop/petals/issues
classifiers =
    Development Status :: 4 - Beta
    Intended Audience :: Developers
    Intended Audience :: Science/Research
    License :: OSI Approved :: MIT License
    Programming Language :: Python :: 3
    Programming Language :: Python :: 3.8
    Programming Language :: Python :: 3.9
    Programming Language :: Python :: 3.10
    Programming Language :: Python :: 3.11
    Topic :: Scientific/Engineering
    Topic :: Scientific/Engineering :: Mathematics
    Topic :: Scientific/Engineering :: Artificial Intelligence
    Topic :: Software Development
    Topic :: Software Development :: Libraries
    Topic :: Software Development :: Libraries :: Python Modules

[options]
package_dir =
    = src
packages = find:
python_requires = >=3.8
install_requires =
    torch>=1.12
    bitsandbytes==0.41.1
    accelerate>=0.22.0
    huggingface-hub>=0.11.1,<1.0.0
    tokenizers>=0.13.3
    transformers>=4.32.0,<4.35.0  # if you change this, please also change version assert in petals/__init__.py
    speedtest-cli==2.1.3
    pydantic>=1.10,<2.0  # 2.0 is incompatible with hivemind yet
    hivemind==1.1.10.post2
    tensor_parallel==1.0.23
    humanfriendly
    async-timeout>=4.0.2
    cpufeature>=0.2.0; platform_machine == "x86_64"
    packaging>=20.9
    sentencepiece>=0.1.99
    peft==0.5.0
    safetensors>=0.3.1
    Dijkstar>=2.6.0

[options.extras_require]
dev =
    pytest==6.2.5
    pytest-forked
    pytest-asyncio==0.16.0
    black==22.3.0
    isort==5.10.1
    psutil

[options.packages.find]
where = src

If i use python -m petals.cli.run_server enoch/llama-65b-hf --port 31337 --public_ip EC2_IP_ADDRESS

"hivemind.p2p.p2p_daemon_bindings.utils.P2PDaemonError: Daemon failed to start: {"level":"info","ts":"2023-12-09T00:16:02.265-0500","logger":"dht/RtRefreshManager","caller":"rtrefresh/rt_refresh_manager.go:279","msg":"starting refreshing cpl 0 with key CIQAAABBP4AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA (routing table size was 0)"}"

I also attempted to shut down public and private windows defender firewall before calling but same issue.

Is there anything else I can do?

Is it possibly the EC2 size that can impact this, being a smaller one?

Edit Update:

I was able to get connected using another EC2 instance. I assume this is now officially a windows WSL issue? Any recommendations on what I can do? Although what i find odd is how i can do this locally, start a DHT, and run nodes. I was even able to run a peer connecting to the petals public network as well.

I also am unable to connect using colab as well

Here is the full error:

 File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/mnt/d/petals/petals/venv/lib/python3.10/site-packages/petals/cli/run_server.py", line 240, in <module>
    main()
  File "/mnt/d/petals/petals/venv/lib/python3.10/site-packages/petals/cli/run_server.py", line 224, in main
    server = Server(
  File "/mnt/d/petals/petals/venv/lib/python3.10/site-packages/petals/server/server.py", line 139, in __init__
    is_reachable = check_direct_reachability(initial_peers=initial_peers, use_relay=False, **kwargs)
  File "/mnt/d/petals/petals/venv/lib/python3.10/site-packages/petals/server/reachability.py", line 78, in check_direct_reachability
    return RemoteExpertWorker.run_coroutine(_check_direct_reachability())
  File "/mnt/d/petals/petals/venv/lib/python3.10/site-packages/hivemind/moe/client/remote_expert_worker.py", line 36, in run_coroutine
    return future if return_future else future.result()
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 458, in result
    return self.__get_result()
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
  File "/mnt/d/petals/petals/venv/lib/python3.10/site-packages/petals/server/reachability.py", line 59, in _check_direct_reachability
    target_dht = await DHTNode.create(client_mode=True, **kwargs)
  File "/mnt/d/petals/petals/venv/lib/python3.10/site-packages/hivemind/dht/node.py", line 192, in create
    p2p = await P2P.create(**kwargs)
  File "/mnt/d/petals/petals/venv/lib/python3.10/site-packages/hivemind/p2p/p2p_daemon.py", line 234, in create
    await asyncio.wait_for(ready, startup_timeout)
  File "/usr/lib/python3.10/asyncio/tasks.py", line 445, in wait_for
    return fut.result()
hivemind.p2p.p2p_daemon_bindings.utils.P2PDaemonError: Daemon failed to start: 2023/12/09 17:57:33 failed to connect to bootstrap peers
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant