Apache Iceberg
Table format from Netflix.
Currently is dependent on Hive metastore but only stores where the manifest files are stored. In the future it could be replaced by another system.
Advantages
- clear table format
- Decouple of dependency on catalog
- allow deletes
- multiple partitions
Disadvantages
- Longer writes (due to bigger metadata)
Optimizations on S3
Check AWS article
Also check S3. If we set tags on write we can tell S3 which objects are to be hard or soft deleted.