Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

【BUG】Alluxio distributedCp目标文件已存在会报错,会删除目标文件,并且无法拷贝成功 #18477

Open
Quan-HS opened this issue Dec 21, 2023 · 0 comments
Labels
type-bug This issue is about a bug

Comments

@Quan-HS
Copy link

Quan-HS commented Dec 21, 2023

Alluxio Version:
2.9.3

Describe the bug
【BUG】Alluxio distributedCp目标文件已存在会报错,会删除目标文件,但需要拷贝的文件无法拷贝成功

To Reproduce
/xxx/alluxio1/testcp-source
1.txt
2.txt
/xxx/alluxio1/testcp-destination
1.txt

bin/alluxio fs distributedCp --active-jobs 100 /xxx/alluxio1/testcp-source /xxx/alluxio1/testcp-destination

执行 响应结果
...
Total completed file count is 1, failed file count is 1
Finished running the command, jobControlId = 1703040627937
Here are failed files:
/xxx/alluxio1/testcp-destination/1.txt,
Check out ./logs/user/distributedCp_shein-os_ailab_prod_alluxio1_testcp_failures.csv for full list of failed files.

Expected behavior

Are you planning to fix it
公司不让

Additional context

package alluxio.job.plan.migrate;
public final class MigrateDefinition
extends AbstractVoidPlanDefinition<MigrateConfig, MigrateCommand> {
...
private static void migrate(MigrateCommand command,
WritePType writeType,
FileSystem fileSystem,
boolean overwrite,
boolean asyncwrite) throws Exception {
...
boolean processOverwrite = false;
//
if (overwrite && fileSystem.exists(destinationUri)) {
fileSystem.delete(destinationUri);
}

...
// 注释 processOverwrite那些逻辑,不要搞 delete、rename那么多弯弯绕绕的。
// 失败原因,猜测:在try里面 delete方法执行成功,但是没有从try中退出就rename,此时文件没有上传到对象存储中,文件不存在使得rename失败

//要不就在 catch 里面直接删,反正都会失败 还不如直接一点
catch (FileAlreadyExistsException e) {
if (overwrite) {
// 替换其他逻辑
fileSystem.delete(destinationUri);
} else {
throw e;
}

rename操作 Master日志
2023-12-21 20:21:53,753 WARN master-rpc-executor-TPE-thread-13713 - Exit (Error): Rename: request=path: "xxx.0x0000018C8C4F1E03.tmp"
dstPath: "/xxx/1.txt"
options {
commonOptions {
syncIntervalMs: -1
ttl: -1
ttlAction: FREE
operationId {
mostSignificantBits: 2022746861179587212
leastSignificantBits: -5963195315722751061
}
}
persist: false
}
, Error=java.io.IOException: Failed to rename s3://xxx.0x0000018C8C4F1E03.tmp to s3://xxx/testcp-d/1.txt in the under file system

2023-12-21 20:22:36,419 WARN master-rpc-executor-TPE-thread-14196 - Exit (Error): Rename: request=path: "/txxx/.0x0000018C8C4F1E03.tmp"
dstPath: "/xxx/1.txt"
options {
commonOptions {
syncIntervalMs: -1
ttl: -1
ttlAction: FREE
operationId {
mostSignificantBits: 2022746861179587212
leastSignificantBits: -5963195315722751061
}
}
persist: false
}
, Error=alluxio.exception.FileDoesNotExistException: Path "/xxx/.0x0000018C8C4F1E03.tmp" does not exist.

@Quan-HS Quan-HS added the type-bug This issue is about a bug label Dec 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type-bug This issue is about a bug
Projects
None yet
Development

No branches or pull requests

1 participant