Skip to main content
Version: Next

Metadata Cleanup

Incubating

This source is a maintenance source which cleans up old/unused aspects.

Currently it only supports:.

- DataFlow
-DataJob
- DataProcessInstance

CLI based Ingestion

Install the Plugin

The metadata-cleanup source works out of the box with acryl-datahub.

Config Details

Note that a . is used to denote nested fields in the YAML recipe.

FieldDescription
batch_size
integer
The number of entities to get in a batch from GraphQL
Default: 500
delete_empty_data_flows
boolean
Wether to delete Data Flows without runs
Default: True
delete_empty_data_jobs
boolean
Wether to delete Data Jobs without runs
Default: True
hard_delete_entities
boolean
Whether to hard delete entities
Default: False
keep_last_n
integer
Number of latest aspects to keep
Default: 5
max_workers
integer
The number of workers to use for deletion
Default: 10
retention_days
integer
Number of days to retain metadata in DataHub
Default: 10
aspects_to_clean
array
List of aspect names to clean up
Default: ['DataprocessInstance']
aspects_to_clean.string
string

Code Coordinates

  • Class Name: datahub.ingestion.source.metadata_cleanup.MetadataCleanupSource
  • Browse on GitHub

Questions

If you've got any questions on configuring ingestion for Metadata Cleanup, feel free to ping us on our Slack.