Metadata Cleanup
This source is a maintenance source which cleans up old/unused aspects.
Currently it only supports:.
- DataFlow
-DataJob
- DataProcessInstance
CLI based Ingestion
Install the Plugin
The metadata-cleanup
source works out of the box with acryl-datahub
.
Config Details
- Options
- Schema
Note that a .
is used to denote nested fields in the YAML recipe.
Field | Description |
---|---|
batch_size integer | The number of entities to get in a batch from GraphQL Default: 500 |
delete_empty_data_flows boolean | Wether to delete Data Flows without runs Default: True |
delete_empty_data_jobs boolean | Wether to delete Data Jobs without runs Default: True |
hard_delete_entities boolean | Whether to hard delete entities Default: False |
keep_last_n integer | Number of latest aspects to keep Default: 5 |
max_workers integer | The number of workers to use for deletion Default: 10 |
retention_days integer | Number of days to retain metadata in DataHub Default: 10 |
aspects_to_clean array | List of aspect names to clean up Default: ['DataprocessInstance'] |
aspects_to_clean.string string |
The JSONSchema for this configuration is inlined below.
{
"title": "MetadataCleanupConfig",
"type": "object",
"properties": {
"retention_days": {
"title": "Retention Days",
"description": "Number of days to retain metadata in DataHub",
"default": 10,
"type": "integer"
},
"aspects_to_clean": {
"title": "Aspects To Clean",
"description": "List of aspect names to clean up",
"default": [
"DataprocessInstance"
],
"type": "array",
"items": {
"type": "string"
}
},
"keep_last_n": {
"title": "Keep Last N",
"description": "Number of latest aspects to keep",
"default": 5,
"type": "integer"
},
"delete_empty_data_jobs": {
"title": "Delete Empty Data Jobs",
"description": "Wether to delete Data Jobs without runs",
"default": true,
"type": "boolean"
},
"delete_empty_data_flows": {
"title": "Delete Empty Data Flows",
"description": "Wether to delete Data Flows without runs",
"default": true,
"type": "boolean"
},
"hard_delete_entities": {
"title": "Hard Delete Entities",
"description": "Whether to hard delete entities",
"default": false,
"type": "boolean"
},
"batch_size": {
"title": "Batch Size",
"description": "The number of entities to get in a batch from GraphQL",
"default": 500,
"type": "integer"
},
"max_workers": {
"title": "Max Workers",
"description": "The number of workers to use for deletion",
"default": 10,
"type": "integer"
}
},
"additionalProperties": false
}
Code Coordinates
- Class Name:
datahub.ingestion.source.metadata_cleanup.MetadataCleanupSource
- Browse on GitHub
Questions
If you've got any questions on configuring ingestion for Metadata Cleanup, feel free to ping us on our Slack.
Is this page helpful?