New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using the Iceberg catalog in your file system #10326
Comments
@911432 can you please elaborate what the goal here is? Everything you described is already possible today. |
I would like to store the query engine as a container image and the iceberg table and iceberg catalog as a file system.
I wish I could do the code below as well.
I think it will make the spark-quickstart page easier. And I think I can distinguish between computing and storage more clearly. |
Available catalog types are:
|
I know |
Hi, I've done some work on fixing hadoop_catalog before. In my experience, to use a filesystem-based catalog, you currently need to rely on the filesystem to provide atomic rename operations. Object stores often do not have atomic operations. To use fileSystem_catalog with object storage, you must use some additional middleware to provide atomicity to file system operations. In addition, this type of middleware often provides multiple access protocols, such as HDFS/S3/POSIX. When you use this type of middleware proxy to access the object store, it seems that hadoop_catalog is already sufficient. Of course, this is just the status quo. I think there is a lot of work that needs to be done if you want to implement the basic functionality of catalog management on an object store that does not have atomic operations. We can discuss this further if you are interested. But please keep in mind that this is not recommended in the current version. @911432 Also, I see that you have submitted some PRs for apache paimon, and I'm sure you'd like paimon to have similar functionality, but unfortunately, paimon still has consistency issues with filesystem_catalog in s3. This is all due to the fact that the object store does not provide atomic operations.If you are interested, you can try it. |
Feature Request / Improvement
Just as we can now store our iceberg catalog in HDFS, we also want to store it in other file systems such as S3.
You can then quickly configure it as a container image, including query engines and storage.
Query engine
None
The text was updated successfully, but these errors were encountered: