Skip to main content

Glue

Aggregates multiple systems being some of them:

ETL: spark jobs
Data Catalog: hive metastore implementation
Data Crawler: crawls for metadata from multiple systems and stores it on data catalog
Schema regsitry: equivalent to confluent schema registry

Glue crawler

Very useful to search for new tables and partitions for existing ones.
Need to be careful to enable the crawler to update the entire partitions of a table if it detects a new incompatible type (i.e. if it was an int and then we detect that it's actually a float on a later search)

Glue crawler