Releases · apache/iceberg

16 May 07:50

nastra

apache-iceberg-1.5.2

cbb8530

Latest

The 1.5.2 release has the same changes that the 1.5.1 release has. The 1.5.1 release had issues with the spark runtime artifacts; specifically certain artifacts were built with the wrong Scala version. It is strongly recommended to upgrade to 1.5.2 for any systems that are using 1.5.1.

Assets 2

30 Apr 12:43

Fokko

apache-iceberg-1.5.1

cbb8530

Apache Iceberg 1.5.1

What's Changed

[1.5.x] API: Fix default FileIO#newInputFile ManifestFile, DataFile and DeleteFile implementations by @amogh-jahagirdar in #10114
[1.5.x] Core: Mark 502 and 504 failures as retryable to the exponential retry strategy by @amogh-jahagirdar in #10113
Core: Fix JDBC Catalog table commit when migrating from schema V0 to V1 (#101111) by @jbonofre in #10152
Core: Fix namespace SQL statement using ESCAPE character that works with MySQL/PostgreSQL (#10167) by @jbonofre in #10169
(1.5.x cherry-pick) Spark 3.5: Fix system function pushdown in CoW row-level commands by @amogh-jahagirdar in #10170
(1.5.x Cherry-pick) Spark 3.4: Fix system function pushdown in CoW row-level commands (#10119) by @amogh-jahagirdar in #10171

Full Changelog: apache-iceberg-1.5.0...apache-iceberg-1.5.1

Contributors

jbonofre and amogh-jahagirdar

Assets 2

11 Mar 11:00

Fokko

apache-iceberg-1.5.0

2519ab4

Apache Iceberg 1.5.0

Apache Iceberg 1.5.0 was released on March 11, 2024.
The 1.5.0 release adds a variety of new features and bug fixes.

API
- Extend FileIO and add EncryptingFileIO. (#9592)
- Track partition statistics in TableMetadata (#8502)
- Add sqlFor API to views to handle resolving a representation for a dialect(#9247)
Core
- Add view support for REST catalog (#7913)
- Add view support for JDBC catalog (#9487)
- Add catalog type for glue,jdbc,nessie (#9647)
- Support Avro file encryption with AES GCM streams (#9436)
- Add ApplyNameMapping for Avro (#9347)
- Add StandardEncryptionManager (#9277)
- Add REST catalog table session cache (#8920)
- Support view metadata compression (#8552)
- Track partition statistics in TableMetadata (#8502)
- Enable column statistics filtering after planning (#8803)
Spark
- Remove support for Spark 3.2 (#9295)
- Support views via SQL for Spark 3.4 and 3.5 (#9423, #9421, #9343, #9513, #9582)
- Support executor cache locality (#9563)
- Added support for delete manifest rewrites (#9020)
- Support encrypted output files (#9435)
- Add Spark UI metrics from Iceberg scan metrics (#8717)
- Parallelize reading files in add_files procedure (#9274)
- Support file and partition delete granularity (#9384)
Flink
- Remove Flink 1.15
- Adds support for 1.18 version #9211
- Emit watermarks from the IcebergSource (#8553)
- Watermark read options (#9346)
Parquet
- Support reading INT96 column in row group filter (#8988)
- Add system config for unsafe Parquet ID fallback. (#9324)
Kafka-Connect
- Initial project setup and event data structures (#8701)
- Sink connector with data writers and converters (#9466)
Spec
- Add partition stats spec (#7105)
- add nanosecond timestamp types (#8683)
- Add multi-arg transform (#8579)
Vendor Integrations
- AWS: Support setting description for Glue table (#9530)
- AWS: Update S3FileIO test to run when CLIENT_FACTORY is not set (#9541)
- AWS: Add S3 Access Grants Integration (#9385)
- AWS: Glue catalog strip trailing slash on DB URI (#8870)
- Azure: Add FileIO that supports ADLSv2 storage (#8303)
- Azure: Make ADLSFileIO implement DelegateFileIO (#8563)
- Nessie: Support views for NessieCatalog (#8909)
- Nessie: Strip trailing slash for warehouse location (#9415)
- Nessie: Infer default API version from URI (#9459)
Dependencies
- Bump Nessie to 0.77.1
- Bump ORC to 1.9.2
- Bump Arrow to 15.0.0
- Bump AWS Java SDK to 2.24.5
- Bump Azure Java SDK to 1.2.20
- Bump Google cloud libraries to 26.28.0

Note:

To enable view support for JDBC catalog, configure jdbc.schema-version to V1 in catalog properties.

New Contributors

@reswqa made their first contribution in #7745
@maxdebayser made their first contribution in #7796
@mderoy made their first contribution in #7801
@cxzl25 made their first contribution in #7825
@tilman151 made their first contribution in #7781
@TaoZex made their first contribution in #7761
@Rondiz made their first contribution in #7829
@grobgl made their first contribution in #7645
@guiyanakuang made their first contribution in #7839
@littlecatjianjiao made their first contribution in #7908
@DaVincii made their first contribution in #7874
@mumuhhh made their first contribution in #7866
@Ewan-Keith made their first contribution in #7917
@nikam14 made their first contribution in #7093
@hsiang-c made their first contribution in #7920
@ktk1012 made their first contribution in #8026
@joan38 made their first contribution in #8002
@coded9 made their first contribution in #8058
@rustyconover made their first contribution in #8074
@mr-brobot made their first contribution in #8061
@Neuw84 made their first contribution in #7988
@lintingbin made their first contribution in #8111
@mrcnc made their first contribution in #8193
@s-akhtar-baig made their first contribution in #8205
@MaxNevermind made their first contribution in #7694
@bmaisonn made their first contribution in #8209
@HonahX made their first contribution in #8215
@onerishabh made their first contribution in #8214
@kengtin made their first contribution in #7161
@aless10 made their first contribution in #8286
@advancedxy made their first contribution in #8320
@dacort made their first contribution in #8341
@gegef2009 made their first contribution in #8154
@TjuAachen made their first contribution in #8401
@baiyangtx made their first contribution in #8416
@hiteshbedre made their first contribution in #8491
@harshm-dev made their first contribution in #8385
@wForget made their first contribution in #8445
@andreacfm made their first contribution in #8528
@Paddy0523 made their first contribution in #8547
@rushilshah1 made their first contribution in #8589
@lanemoseley made their first contribution in #8618
@tlm365 made their first contribution in #8447
@jbonofre made their first contribution in #8612
@jayceslesar made their first contribution in #8558
@MehulBatra made their first contribution in #8408
@clettieri made their first contribution in #8192
@nk1506 made their first contribution in #8640
@johanhenriksson made their first contribution in #8751
@ashutosh-roy made their first contribution in #8707
@Priyansh121096 made their first contribution in #8748
@PickBas made their first contribution in #8819
@jongwooo made their first contribution in #8666
@rice668 made their first contribution in #8873
@geruh made their first contribution in #8914
@bknbkn made their first contribution in #8868
@wangtaohz made their ...

Contributors

dacort, andreacfm, and 90 other contributors

Assets 2

27 Dec 17:15

Fokko

apache-iceberg-1.4.3

9a5d24f

Apache Iceberg 1.4.3

What's Changed

Core: Scan only live entries in partitions table (#8969) by @Fokko in #9197
[1.4.x] Core: Fix missing files from transaction retries with conflicting manifest merges (#9230) by @nastra in #9337
[1.4.x] JDBC Catalog: Fix namespaceExists check with special characters (#8340) by @ismailsimsek in #9291
[1.4.x] Core: Expired Snapshot files in a transaction should be deleted by @bartash in #9223
[1.4.x] Core: Fix missing delete files from transaction (#9354) by @nastra in #9356

Full Changelog: apache-iceberg-1.4.2...apache-iceberg-1.4.3

Contributors

nastra, Fokko, and 2 other contributors

Assets 2

07 Nov 16:44

nastra

apache-iceberg-1.4.2

f6bb917

Apache Iceberg 1.4.2

What's Changed

Core: Ignore split offsets array when split offset is past file length by @amogh-jahagirdar in #8938

Full Changelog: apache-iceberg-1.4.1...apache-iceberg-1.4.2

Contributors

amogh-jahagirdar

Assets 2

23 Oct 11:03

nastra

apache-iceberg-1.4.1

445664f

Apache Iceberg 1.4.1

What's Changed

Core: Do not use a lazy split offset list in manifests (#8834) by @nastra in #8845
Core: Ignore split offsets when the last split offset is past the file length by @amogh-jahagirdar in #8861
AWS: avoid static global credentials provider which doesn't play well with lifecycle management (#8677) by @nastra in #8843
Flink: Reverting the default custom partitioner for bucket column (#8848) by @nastra in #8858

Full Changelog: apache-iceberg-1.4.0...apache-iceberg-1.4.1

Contributors

nastra and amogh-jahagirdar

Assets 3

08 Oct 00:46

aokolnychyi

apache-iceberg-1.4.0

10367c3

Apache Iceberg 1.4.0

API
- Implement bound expression sanitization (#8149)
- Remove overflow checks in DefaultCounter causing performance issues (#8297)
- Support incremental scanning with branch (#5984)
- Add a validation API to DeleteFiles which validates files exist (#8525)
Core
- Use V2 format by default in new tables (#8381)
- Use zstd compression for Parquet by default in new tables (#8593)
- Add strict metadata cleanup mode and enable it by default (#8397) (#8599)
- Avoid generating huge manifests during commits (#6335)
- Add a writer for unordered position deletes (#7692)
- Optimize DeleteFileIndex (#8157)
- Optimize lookup in DeleteFileIndex without useful bounds (#8278)
- Optimize split offsets handling (#8336)
- Optimize computing user-facing state in data tasks (#8346)
- Don't persist useless file and position bounds for deletes (#8360)
- Don't persist counts for paths and positions in position delete files (#8590)
- Support setting system-level properties via environmental variables (#5659)
- Add JSON parser for ContentFile and FileScanTask (#6934)
- Add REST spec and request for commits to multiple tables (#7741)
- Add REST API for committing changes against multiple tables (#7569)
- Default to exponential retry strategy in REST client (#8366)
- Support registering tables with REST session catalog (#6512)
- Add last updated timestamp and snapshot ID to partitions metadata table (#7581)
- Add total data size to partitions metadata table (#7920)
- Extend ResolvingFileIO to support bulk operations (#7976)
- Key metadata in Avro format (#6450)
- Add AES GCM encryption stream (#3231)
- Fix a connection leak in streaming delete filters (#8132)
- Fix lazy snapshot loading history (#8470)
- Fix unicode handling in HTTPClient (#8046)
- Fix paths for unpartitioned specs in writers (#7685)
- Fix OOM caused by Avro decoder caching (#7791)
Spark
- Added support for Spark 3.5
  - Code for DELETE, UPDATE, and MERGE commands has moved to Spark, and all related extensions have been dropped from Iceberg.
  - Support for WHEN NOT MATCHED BY SOURCE clause in MERGE.
  - Column pruning in merge-on-read operations.
  - Ability to request a bigger advisory partition size for the final write to produce well-sized output files without harming the job parallelism.
- Dropped support for Spark 3.1
- Deprecated support for Spark 3.2
- Support vectorized reads for merge-on-read operations in Spark 3.4 and 3.5 (#8466)
- Increase default advisory partition size for writes in Spark 3.5 (#8660)
- Support distributed planning in Spark 3.4 and 3.5 (#8123)
- Support pushing down system functions by V2 filters in Spark 3.4 and 3.5 (#7886)
- Support fanout position delta writers in Spark 3.4 and 3.5 (#7703)
- Use fanout writers for unsorted tables by default in Spark 3.5 (#8621)
- Support multiple shuffle partitions per file in compaction in Spark 3.4 and 3.5 (#7897)
- Output net changes across snapshots for carryover rows in CDC (#7326)
- Display read metrics on Spark SQL UI (#7447) (#8445)
- Adjust split size to benefit from cluster parallelism in Spark 3.4 and 3.5 (#7714)
- Add fast_forward procedure (#8081)
- Support filters when rewriting position deletes (#7582)
- Support setting current snapshot with ref (#8163)
- Make backup table name configurable during migration (#8227)
- Add write and SQL options to override compression config (#8313)
- Correct partition transform functions to match the spec (#8192)
- Enable extra commit properties with metadata delete (#7649)
Flink
- Add possibility of ordering the splits based on the file sequence number (#7661)
- Fix serialization in TableSink with anonymous object (#7866)
- Switch to FileScanTaskParser for JSON serialization of IcebergSourceSplit (#7978)
- Custom partitioner for bucket partitions (#7161)
- Implement data statistics coordinator to aggregate data statistics from operator subtasks (#7360)
- Support alter table column (#7628)
Parquet
- Add encryption config to read and write builders (#2639)
- Skip writing bloom filters for deletes (#7617)
- Cache codecs by name and level (#8182)
- Fix decimal data reading from ParquetAvroValueReaders (#8246)
- Handle filters with transforms by assuming data must be scanned (#8243)
ORC
- Handle filters with transforms by assuming the filter matches (#8244)
Vendor Integrations
- GCP: Fix single byte read in GCSInputStream (#8071)
- GCP: Add properties for OAtuh2 and update library (#8073)
- GCP: Add prefix and bulk operations to GCSFileIO (#8168)
- GCP: Add bundle jar for GCP-related dependencies (#8231)
- GCP: Add range reads to GCSInputStream (#8301)
- AWS: Add bundle jar for AWS-related dependencies (#8261)
- AWS: support config storage class for S3FileIO (#8154)
- AWS: Add FileIO tracker/closer to Glue catalog (#8315)
- AWS: Update S3 signer spec to allow an optional string body in S3SignRequest (#8361)
- Azure: Add FileIO that supports ADLSv2 storage (#8303)
- Azure: Make ADLSFileIO implement DelegateFileIO (#8563)
- Nessie: Provide better commit message on table registration (#8385)
Dependencies
- Bump Nessie to 0.71.0
- Bump ORC to 1.9.1
- Bump Arrow to 12.0.1
- Bump AWS Java SDK to 2.20.131

Assets 2

26 Jul 18:05

szehon-ho

apache-iceberg-1.3.1

62c3471

Apache Iceberg 1.3.1

What's Changed

Hive: Set commit state as Unknown before throwing CommitStateUnknownException by @nastra in #8029
Spark 3.4: WAP branch not propagated when using DELETE without WHERE by @nastra in #8028
Core: Include all reachable snapshots with v1 format and REF snapshot mode by @nastra in #8027
Spark 3.3: Backport 'WAP branch not propagated when using DELETE without WHERE' by @nastra in #8036
Flink: Remove the creation of default database in FlinkCatalog by @Fokko in #8039
Core: Handle optional fields by @Fokko in #8064
Core: Abort file groups should be under same lock as committerService by @ConeyLiu in #8060
Spark 3.3: Fix rewrite_position_deletes for certain partition types by @szehon-ho in #8069
Spark 3.4: Fix rewrite_position_deletes for certain partition types by @szehon-ho in #8059

Full Changelog: apache-iceberg-1.3.0...apache-iceberg-1.3.1

Contributors

nastra, Fokko, and 2 other contributors

Assets 2

01 Jun 07:48

Fokko

apache-iceberg-1.3.0

7dbdfd3

Apache Iceberg 1.3.0

What's Changed

Nessie: Remove compile-time Hadoop dependency by @nastra in #7054
Core: Fix deprecation message by @nastra in #7104
Build: Update ORC to 1.8.3 by @williamhyun in #7124
AWS: Use Apache HTTP client as default AWS HTTP client by @singhpk234 in #7119
AWS: Enable virtual-host-style requests for MinioContainer by @nastra in #7125
Flink: Bump to Flink 1.15.3 by @Fokko in #7059
Flink: Bump to Flink 1.16.1 by @Fokko in #7057
Core: Use unknown report type for forward-compatibility by @nastra in #7145
Aliyun: Remove AssertHelpers by @liuxiaocs7 in #7116
dell: remove usage of AssertHelpers by @liuxiaocs7 in #7143
Core: Minor refactoring of PartitionsTable by @ajantha-bhat in #6975
Build: Let RevAPI compare against 1.2.0 by @nastra in #7155
MR: Remove deprecate AssertHelpers by @liuxiaocs7 in #7159
Core: Remove deprecated validation APIs in MergingSnapshotProducer by @amogh-jahagirdar in #7150
data: Remove AssertHelpers Usage by @liuxiaocs7 in #7134
Flink:fix flink streaming query problem [ Cannot get a client from a closed pool] by @xuzhiwen1255 in #6614
Spark 3.3: Remove use of deprecated SparkFilesScan by @szehon-ho in #7106
Docs: Add rest to the catalog configuration by @Fokko in #7126
Contributing Docs: Add section for testing code by @nastra in #7131
Core, API: View Version implementation by @amogh-jahagirdar in #6861
Update defaults of max-concurrent-file-group-rewrites to 5 by @karuppayya in #6907
Flink: fixed Cloneable not implemented on CatalogLoader by @xuzhiwen1255 in #7168
Core: Refactor actions results by @ajantha-bhat in #7089
Docs: update doc to read easier by @joonsun-baek in #7167
API: Fix retainAll and removeAll in CharSequenceSet by @zhongyujiang in #7133
Spark 3.3: Support metadata column in the changelog table by @flyrain in #7152
Spark 3.2: Support metadata column in the changelog table by @flyrain in #7178
Flink: Backport #6614 to Flink 1.15 by @xuzhiwen1255 in #7165
Core: Remove deprecated code from 1.2.0 by @nastra in #7156
S3 Credentials provider support in DefaultAwsClientFactory #7063 by @dpaani in #7066
Core: Move InMemoryCatalog from test to core by @nastra in #7185
Doc: Retypeset the Flink document by @hililiwei in #7099
Core: Store split offset for delete files by @singhpk234 in #7011
Flink: Backport #6614 to Flink 1.14 by @xuzhiwen1255 in #7166
Core, Hive: Support pluggable ClientPool by @lirui-apache in #6698
AWS: Remove deprecated AssertHelpers by @liuxiaocs7 in #7195
Spark: Support loading function as FunctionCatalog in SparkSessionCatalog by @bowenliang123 in #7153
Flink: Implement data statistics operator to collect traffic distribution for guiding smart shuffling by @yegangy0718 in #6382
Build: Move RevApi breakage to correct version by @nastra in #7223
Ability to add multiple metrics reporters to scan by @karuppayya in #6919
Spark 3.3: Use ProcedureInput in AncestorsOfProcedure by @aokolnychyi in #7177
Core: Parse snapshot-id as long in remove-statistics update by @nastra in #7235
Bump Nessie to 0.54.0 by @snazy in #7146
Optimized spark vectorized read parquet decimal by @ConeyLiu in #3249
Core: Optimize S3 layout of Datafiles by expanding first character set of the hash by @singhpk234 in #7128
AWS: Prevent token refresh scheduling on every sign request by @nastra in #7270
Disable local credentials if remote signing is enabled by @danielcweeks in #7230
Spark: Revert "Spark: Add "Iceberg" prefix to SparkTable name string for SparkUI (#5629) by @amogh-jahagirdar in #7273
Spark: broadcast table instead of file IO in rewrite manifests by @bryanck in #7263
AWS: abort S3 input stream on close if not EOS by @bryanck in #7262
Spark 3.2: Use ProcedureInput in AncestorsOfProcedure and AddFilesProcedure by @aokolnychyi in #7260
Spark 3.3: Dataset writes for position deletes by @szehon-ho in #7029
REST: fix previous locations for refs-only load by @bryanck in #7284
Core: Fix flakiness in HadoopFileIOTest by @nastra in #7253
Flink: Data statistics operator sends local data statistics to coordinator and receive aggregated data statistics from coordinator for smart shuffling by @yegangy0718 in #7269
AWS: Make AuthSession cache static by @nastra in #7289
Core: Require namespace when creating table using InMemoryCatalog by @nastra in #7252
Refactor PartitionsTable planning by @dramaticlly in #7190
Flink: Introduce Flink 1.17 by @hililiwei in #7254
AWS: Check commit status after failed commit if AWS client performed retries by @ChristinaTech in #7198
Core: Fix errorprone warning by @ajantha-bhat in #7286
Bump Nessie to 0.56.0 by @snazy in #7283
Build: Bump actions/stale from 7.0.0 to 8.0.0 by @dependabot in #7200
Build: Bump org.apache.hadoop:hadoop-client from 3.3.4 to 3.3.5 by @dependabot in #7201
Spark: apply rewrite manifest action fix to 3.1,3.2 by @bryanck in #7296
Build: Spark version of iceberg-delta-lake to 3.3.2 by @doki23 in #7199
Nessie: Use latest hash for catalog APIs by @ajantha-bhat in #6789
Support vectorized reading int96 timestamps in imported data by @yabola in #6962
Flink: Expose write-parallelism in SQL Hints by @hililiwei in #7039
Nessie: Fix testcase failures by @ajantha-bhat in #7320
Flink: move the classes from flink.sink.shuffle.statistics pkg to one level up as flink.sink.shuffle pkg by @stevenzwu in #7322
Spark 3.3: Add doc for the changelog view procedure. by @flyrain in #7147
Bump Nessie from 0.56.0 to 0.57.0 by @snazy in #7323
Flink 1.15 1.17: Port Expose write-parallelism in SQL Hints to 1.15 & 1.17 by @hililiwei in #7327
Update issue template for 1.2.1 release by @danielcweeks in #7331
Core: Fix SnapshotProducer#targetBranch's exception message by @zhongyujiang in #7315
Bump Gradle from 8.0.2 to 8.1 by @snazy in #7333
Build: Fix flaky checkstyle issue by @ajantha-bhat in #7321
[Infra] Update vote mail sample in source-release.sh by @gaborkaszab in #7330
Core: Add missing metrics reporters when creating BaseTable by @nastra in #7341
Core, Spark 3.3: Add FileRewriter API by @aokolnychyi in #7175
Spark - Accept an output-spec-id that allows writing to a desired partition spec by @gustavoatt in #7120
[ORC][Spark] - Support selected vector with ORC reader on the row and batch reader by @pavibhai in #7197
Flink: use correct scan mode when in TABLE_SCAN_THEN_INCREMENTAL mode by @chenjunjiedada in #7338
Throw NoSuchIcebergTableException instead of ValidationException in G… by @ericlgoodman in #7277
Build: Bump Airlift from 0.21 to 0.24 by @Fokko in #7347
Docs: clarify Hive on Tez con...