Skip to main content

Glue

Aggregates multiple systems being some of them:

  • ETL: spark jobs
  • Data Catalog: hive metastore implementation
  • Data Crawler: crawls for metadata from multiple systems and stores it on data catalog
  • Schema regsitry: equivalent to confluent schema registry

Glue crawler

  • Very useful to search for new tables and partitions for existing ones.
  • Need to be careful to enable the crawler to update the entire partitions of a table if it detects a new incompatible type (i.e. if it was an int and then we detect that it's actually a float on a later search)