Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TEEU报错 #1194

Open
LeoneChen opened this issue Mar 14, 2024 · 3 comments
Open

TEEU报错 #1194

LeoneChen opened this issue Mar 14, 2024 · 3 comments
Assignees

Comments

@LeoneChen
Copy link

按照TEEU教程部署TEEU时,出现如下报错,这是什么问题

W0314 17:00:32.927152     2 external/com_github_brpc_brpc/src/bvar/default_variables.cpp:434] Fail to open /proc/self/io: No such file or directory
W0314 17:00:32.944652    66 external/com_github_brpc_brpc/src/bvar/default_variables.cpp:214] Fail to open /proc/self/statm: No such file or directory
W0314 17:00:32.944726    66 external/com_github_brpc_brpc/src/bvar/default_variables.cpp:281] Fail to open /proc/loadavg: No such file or directory
INFO:root:Authority manager config is {'host': '0.0.0.0:8835', 'mr_enclave': '4c1c23a3dd87a407035b81e147e4ca64057bac51d06df71ec90ba5378ee1c62f'}
WARNING:ray.worker:File descriptor limit 1024 is too low for production servers and may result in connection errors. At least 8192 is recommended. --- Fix with 'ulimit -n 8192'
/opt/secretflow/lib/python3.8/site-packages/ray/thirdparty_files/psutil/_pslinux.py:513: RuntimeWarning: buffers, cached, shared, active, inactive memory stats couldn't be determined and were set to 0
  warnings.warn(msg, RuntimeWarning)
2024-03-14 17:00:54,182 INFO services.py:2039 -- object_store_memory is not verified when plasma_directory is set.
Traceback (most recent call last):
  File "/opt/secretflow/lib/python3.8/site-packages/ray/node.py", line 310, in __init__
    ray._private.services.wait_for_node(
  File "/opt/secretflow/lib/python3.8/site-packages/ray/_private/services.py", line 398, in wait_for_node
    raise TimeoutError("Timed out while waiting for node to startup.")
TimeoutError: Timed out while waiting for node to startup.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/root/demo.py", line 30, in <module>
    sf.init(
  File "/opt/secretflow/lib/python3.8/site-packages/secretflow/device/driver.py", line 480, in init
    fed.init(
  File "/opt/secretflow/lib/python3.8/site-packages/fed/api.py", line 164, in init
    compatible_utils.init_ray(address=address, **kwargs)
  File "/opt/secretflow/lib/python3.8/site-packages/fed/_private/compatible_utils.py", line 55, in init_ray
    ray.init(**kwargs)
  File "/opt/secretflow/lib/python3.8/site-packages/ray/_private/client_mode_hook.py", line 105, in wrapper
    return func(*args, **kwargs)
  File "/opt/secretflow/lib/python3.8/site-packages/ray/worker.py", line 1041, in init
    _global_node = ray.node.Node(
  File "/opt/secretflow/lib/python3.8/site-packages/ray/node.py", line 317, in __init__
    raise Exception(
Exception: The current node has not been updated within 30 seconds, this could happen because of some of the Ray processes failed to startup.
[mutex.cc : 926] RAW: pthread_getschedparam failed: 1
@ding77
Copy link

ding77 commented Mar 15, 2024

@LeoneChen Hello,目前根据日志信息来看,考虑是因为内存原因不足导致,可以看下之前已有的ISSUE:#860 ,如未解决问题欢迎反馈

@zhouaihui
Copy link
Member

hi, @LeoneChen , 因为ray运行在occlum上,occlum的Unix Domain Socket功能性能不佳,有时候会很慢,可以先多试几次看看。
后续版本我们会修复该问题。

@LeoneChen
Copy link
Author

感谢两位 我看看

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants