Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vearch集群重启ps选主失败 #735

Open
hanqiushi opened this issue Sep 14, 2023 · 2 comments
Open

vearch集群重启ps选主失败 #735

hanqiushi opened this issue Sep 14, 2023 · 2 comments

Comments

@hanqiushi
Copy link

hanqiushi commented Sep 14, 2023

类似 #724 ,我用最新的 vearch/vearch:latest 还是能复现这个问题。

配置如下:
三副本,外置etcd,每个副本参数都是 all (master,ps,client都包括),想做成高可用。

复现步骤如下:
建表,replica为3,插入一些数据。
把三个节点都挂掉,然后一起重启,发现选主还是不成功

下面是一台机器的异常log
2023-09-14 06:50:10,934 :0: DEBUG: raft[3] became candidate at term 674.
2023-09-14 06:50:10,934 :0: DEBUG: raft[3] became candidate at term 674.
2023-09-14 06:50:10,934 :0: DEBUG: raft[3] received vote from 1 at term 674.
2023-09-14 06:50:10,934 :0: DEBUG: raft[3] received vote from 1 at term 674.
2023-09-14 06:50:10,934 :0: DEBUG: [raft->campaign][3 logterm: 0, index: 0] sent vote request to 2 at term 674. raftFSM[0xc000ed8000]
2023-09-14 06:50:10,934 :0: DEBUG: [raft->campaign][3 logterm: 0, index: 0] sent vote request to 2 at term 674. raftFSM[0xc000ed8000]
2023-09-14 06:50:10,934 :0: DEBUG: [raft->campaign][3 logterm: 0, index: 0] sent vote request to 3 at term 674. raftFSM[0xc000ed8000]
2023-09-14 06:50:10,934 :0: DEBUG: [raft->campaign][3 logterm: 0, index: 0] sent vote request to 3 at term 674. raftFSM[0xc000ed8000]
2023-09-14 06:50:10,934 :0: DEBUG: [Transport] get connection[Replicate] to 3[] failed,error is: cannot get node network information, nodeID=[3]
2023-09-14 06:50:10,934 :0: DEBUG: [Transport] get connection[Replicate] to 3[] failed,error is: cannot get node network information, nodeID=[3]
2023-09-14 06:50:10,934 :0: DEBUG: [Transport] get connection[Replicate] to 2[] failed,error is: cannot get node network information, nodeID=[2]
2023-09-14 06:50:10,934 :0: DEBUG: [Transport] get connection[Replicate] to 2[] failed,error is: cannot get node network information, nodeID=[2]

@hanqiushi
Copy link
Author

hanqiushi commented Sep 14, 2023

2023-09-14 06:50:10,934 :0: DEBUG: [Transport] get connection[Replicate] to 3[] failed,error is: cannot get node network information, nodeID=[3]

这句log跟正常的log相比,3[] 这里数据是空的, 正常的是有数据的,如 ip:8898,感觉像是config,toml数据没有load进来或者处理过程中出错了。

@zcdb
Copy link
Member

zcdb commented Sep 15, 2023

建议使用单模块的方式验证,all的方式仅仅是简单的功能测试

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants