connecting to private petals using ec2 dht problems #604

brandnamewater · 2023-12-09T05:03:37Z

I can start a DHT with python3 -m petals.cli.run_dht --host_maddrs /ip4/0.0.0.0/tcp/31337 --identity_path bootstrap1.id in the EC2

In the ec2, I also used pip install . --no-cache-dir because it's a small instance, if that might impact it at all?

I'm having issues running python -m petals.cli.run_server enoch/llama-65b-hf --initial_peers /ip4/IP_ADDRESS/tcp/31337/p2p/abc /ip4/127.0.0.1/tcp/31337/p2p/abc from my computer into the EC2

and getting "hivemind.p2p.p2p_daemon_bindings.utils.P2PDaemonError: Daemon failed to start: 2023/12/08 23:53:41 failed to connect to bootstrap peers"

I have no issues doing this locally on my own computer setting up a DHT, and running peers on it from new tabs but using EC2 to connect peers from my computer to the started DHT on the EC2 i can't seem to get running

My inbound rules:

SSH - 20 - 0.0.0.0/0
HTTP - 80 - 0.0.0.0/0
HTTPS - 443 - 0.0.0.0/0
Custom TCP - 31337 - 0.0.0.0/0

I even tried with All Traffic - 0.0.0.0/0

I'm using WSL Ubuntu 22.04.3 on windows 11

On powershell:

tnc EC2_IP -port 31337

ComputerName     : EC2_IP
RemoteAddress    : EC2_IP
RemotePort       : 31337
InterfaceAlias   : Wi-Fi
SourceAddress    : MY_IP
TcpTestSucceeded : True

WSL ubuntu:

telnet EC2_IP 31337

Trying EC2_IP...
Connected to EC2_IP.
Escape character is '^]'.
/multistream/1.0.0

setup.cfg

[metadata]
name = petals
version = attr: petals.__version__
author = Petals Developers
author_email = petals-devs@googlegroups.com
description = Easy way to efficiently run 100B+ language models without high-end GPUs
long_description = file: README.md
long_description_content_type = text/markdown
url = https://github.com/bigscience-workshop/petals
project_urls =
    Bug Tracker = https://github.com/bigscience-workshop/petals/issues
classifiers =
    Development Status :: 4 - Beta
    Intended Audience :: Developers
    Intended Audience :: Science/Research
    License :: OSI Approved :: MIT License
    Programming Language :: Python :: 3
    Programming Language :: Python :: 3.8
    Programming Language :: Python :: 3.9
    Programming Language :: Python :: 3.10
    Programming Language :: Python :: 3.11
    Topic :: Scientific/Engineering
    Topic :: Scientific/Engineering :: Mathematics
    Topic :: Scientific/Engineering :: Artificial Intelligence
    Topic :: Software Development
    Topic :: Software Development :: Libraries
    Topic :: Software Development :: Libraries :: Python Modules

[options]
package_dir =
    = src
packages = find:
python_requires = >=3.8
install_requires =
    torch>=1.12
    bitsandbytes==0.41.1
    accelerate>=0.22.0
    huggingface-hub>=0.11.1,<1.0.0
    tokenizers>=0.13.3
    transformers>=4.32.0,<4.35.0  # if you change this, please also change version assert in petals/__init__.py
    speedtest-cli==2.1.3
    pydantic>=1.10,<2.0  # 2.0 is incompatible with hivemind yet
    hivemind==1.1.10.post2
    tensor_parallel==1.0.23
    humanfriendly
    async-timeout>=4.0.2
    cpufeature>=0.2.0; platform_machine == "x86_64"
    packaging>=20.9
    sentencepiece>=0.1.99
    peft==0.5.0
    safetensors>=0.3.1
    Dijkstar>=2.6.0

[options.extras_require]
dev =
    pytest==6.2.5
    pytest-forked
    pytest-asyncio==0.16.0
    black==22.3.0
    isort==5.10.1
    psutil

[options.packages.find]
where = src

If i use python -m petals.cli.run_server enoch/llama-65b-hf --port 31337 --public_ip EC2_IP_ADDRESS

"hivemind.p2p.p2p_daemon_bindings.utils.P2PDaemonError: Daemon failed to start: {"level":"info","ts":"2023-12-09T00:16:02.265-0500","logger":"dht/RtRefreshManager","caller":"rtrefresh/rt_refresh_manager.go:279","msg":"starting refreshing cpl 0 with key CIQAAABBP4AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA (routing table size was 0)"}"

I also attempted to shut down public and private windows defender firewall before calling but same issue.

Is there anything else I can do?

Is it possibly the EC2 size that can impact this, being a smaller one?

Edit Update:

I was able to get connected using another EC2 instance. I assume this is now officially a windows WSL issue? Any recommendations on what I can do? Although what i find odd is how i can do this locally, start a DHT, and run nodes. I was even able to run a peer connecting to the petals public network as well.

I also am unable to connect using colab as well

Here is the full error:

 File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/mnt/d/petals/petals/venv/lib/python3.10/site-packages/petals/cli/run_server.py", line 240, in <module>
    main()
  File "/mnt/d/petals/petals/venv/lib/python3.10/site-packages/petals/cli/run_server.py", line 224, in main
    server = Server(
  File "/mnt/d/petals/petals/venv/lib/python3.10/site-packages/petals/server/server.py", line 139, in __init__
    is_reachable = check_direct_reachability(initial_peers=initial_peers, use_relay=False, **kwargs)
  File "/mnt/d/petals/petals/venv/lib/python3.10/site-packages/petals/server/reachability.py", line 78, in check_direct_reachability
    return RemoteExpertWorker.run_coroutine(_check_direct_reachability())
  File "/mnt/d/petals/petals/venv/lib/python3.10/site-packages/hivemind/moe/client/remote_expert_worker.py", line 36, in run_coroutine
    return future if return_future else future.result()
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 458, in result
    return self.__get_result()
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
  File "/mnt/d/petals/petals/venv/lib/python3.10/site-packages/petals/server/reachability.py", line 59, in _check_direct_reachability
    target_dht = await DHTNode.create(client_mode=True, **kwargs)
  File "/mnt/d/petals/petals/venv/lib/python3.10/site-packages/hivemind/dht/node.py", line 192, in create
    p2p = await P2P.create(**kwargs)
  File "/mnt/d/petals/petals/venv/lib/python3.10/site-packages/hivemind/p2p/p2p_daemon.py", line 234, in create
    await asyncio.wait_for(ready, startup_timeout)
  File "/usr/lib/python3.10/asyncio/tasks.py", line 445, in wait_for
    return fut.result()
hivemind.p2p.p2p_daemon_bindings.utils.P2PDaemonError: Daemon failed to start: 2023/12/09 17:57:33 failed to connect to bootstrap peers

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

connecting to private petals using ec2 dht problems #604

connecting to private petals using ec2 dht problems #604

brandnamewater commented Dec 9, 2023 •

edited

connecting to private petals using ec2 dht problems #604

connecting to private petals using ec2 dht problems #604

Comments

brandnamewater commented Dec 9, 2023 • edited

brandnamewater commented Dec 9, 2023 •

edited