Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

关于TEEU 上手指南仿真模式中的TEEU 安全聚合示例的疑问 #1256

Open
beinnnn opened this issue Apr 17, 2024 · 7 comments
Open

Comments

@beinnnn
Copy link

beinnnn commented Apr 17, 2024

Issue Type

Build/Install

Source

binary

Secretflow Version

secretflow 1.5.0

OS Platform and Distribution

Ubuntu 20

Python version

3.10

Bazel version

No response

GCC/Compiler version

No response

What happend and What you expected to happen.

运行该示例时,在一台机器上起了三个容器,每个容器分配一个ip地址以代表alice、bob、carol三人,每个容器里面启动一个ray集群,想请问这种部署方式是否可行?
实际运行时,alice和bob无法连接到carol,carol能ping通alice和bob,但是运行不出结果。

Reproduction code to reproduce the issue.

Alice和Bob的报错信息:
(raylet) [2024-04-17 02:39:59,135 E 326 366] (raylet) file_system_monitor.cc:111: /tmp/ray/session_2024-04-15_02-26-34_133505_135 is over 95% full, available space: 1074393088; capacity: 2360042831872. Object creation will fail if spilling is required.
(SenderProxyActor pid=5387) 2024-04-17 02:40:00.849 ERROR barriers.py:171 [alice] -- [Anonymous_job] Failed to send data to seq_id ping of carol from ping, error: <AioRpcError of RPC that terminated with:
(SenderProxyActor pid=5387) 	status = StatusCode.DEADLINE_EXCEEDED
(SenderProxyActor pid=5387) 	details = "Deadline Exceeded"
(SenderProxyActor pid=5387) 	debug_error_string = "UNKNOWN:Deadline Exceeded {created_time:"2024-04-17T02:40:00.848690293+00:00", grpc_status:4}"
(SenderProxyActor pid=5387) >
2024-04-17 02:40:00.900 INFO barriers.py:520 [alice] -- [Anonymous_job] Try ping ['carol'] at 22 attemp, up to 3600 attemps.

carol的运行结果:
2024-04-17 02:42:14 INFO barriers.py:353 [carol] --  RecverProxy was successfully created.
2024-04-17 02:42:17 INFO barriers.py:388 [carol] --  SendProxy was successfully created.
2024-04-17 02:42:17 INFO barriers.py:463 [carol] --  Try ping ['alice', 'bob'] at 0 attemp, up to 3600 attemps.
2024-04-17 02:42:17 INFO barriers.py:444 [carol] --  Succeeded to ping alice on 10.10.0.2:20001, the result: .
2024-04-17 02:42:17 INFO barriers.py:444 [carol] --  Succeeded to ping bob on 10.10.0.3:20001, the result: .
@Chrisdehe
Copy link
Member

@beinnnn hey,请确认你部署的 authmanger 能被其他节点访问到。

@beinnnn
Copy link
Author

beinnnn commented Apr 17, 2024

@beinnnn hey,请确认你部署的 authmanger 能被其他节点访问到。
60c80b9c77724cf00338abbd822aab4
检查过了,alice、bob、carol三个节点都是可以访问到authmanger的

@zheyang0825
Copy link

这样部署方式是可以的,但是你可能需要确认一下,就是 Alice 和 Bob 配置的 Carol 地址是不是正确的。

Copy link

Stale issue message. Please comment to remove stale tag. Otherwise this issue will be closed soon.

@liouxiao
Copy link

怀疑遇到类似的问题,但Alice/Bob的报错信息和楼主的不太一样:

ValueError: Failed to look up actor with name 'SenderProxyActor'. This could because 1. You are trying to look up a named actor you didn't create. 2. The named actor died. 3. You did not use a namespace matching the namespace of the actor.

Carol异常退出:

/opt/occlum/build/bin/occlum: lin 354: 528 Killed RUST_BACKTRACE=1 "$instance_dir/build/bin/occlum-run" "$@"

@liouxiao
Copy link

怀疑遇到类似的问题,但Alice/Bob的报错信息和楼主的不太一样:

ValueError: Failed to look up actor with name 'SenderProxyActor'. This could because 1. You are trying to look up a named actor you didn't create. 2. The named actor died. 3. You did not use a namespace matching the namespace of the actor.

Carol异常退出:

/opt/occlum/build/bin/occlum: lin 354: 528 Killed RUST_BACKTRACE=1 "$instance_dir/build/bin/occlum-run" "$@"

看来是我的内存太小了,导致TEEU容器异常退出。

@Chrisdehe
Copy link
Member

hi,@liouxiao 问题是否解决了呢

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants