Skip to main content

Modifying Tags on Datasets

Why Would You Use Tags on Datasets?

Tags are informal, loosely controlled labels that help in search & discovery. They can be added to datasets, dataset schemas, or containers, for an easy way to label or categorize entities – without having to associate them to a broader business glossary or vocabulary. For more information about tags, refer to About DataHub Tags.

Goal Of This Guide

This guide will show you how to

  • Create: create a tag named Deprecated
  • Read: read tags attached to a dataset SampleHiveDataset
  • Add: add a CustomerAccount tag to the user_name column of a dataset called fct_users_created.
  • Remove: remove a Legacy from the shipment_info column of a dataset called SampleHdfsDataset.

Prerequisites

For this tutorial, you need to deploy DataHub Quickstart and ingest sample data. For detailed information, please refer to Datahub Quickstart Guide.

note

Before modifying tags, you need to ensure the target dataset is already present in your DataHub instance. If you attempt to manipulate entities that do not exist, your operation will fail. In this guide, we will be using data from sample ingestion.

For more information on how to set up for GraphQL, please refer to How To Set Up GraphQL.

Create Tags

The following code creates a tag Deprecated.

mutation createTag {
createTag(input:
{
name: "Deprecated",
id: "deprecated",
description: "Having this tag means this column or table is deprecated."
})
}

If you see the following response, the operation was successful:

{
"data": {
"createTag": "urn:li:tag:deprecated"
},
"extensions": {}
}

Expected Outcome of Creating Tags

You can now see the new tag Deprecated has been created.

tag-created

We can also verify this operation by programmatically searching Deprecated tag after running this code using the datahub cli.

datahub get --urn "urn:li:tag:deprecated" --aspect tagProperties

{
"tagProperties": {
"description": "Having this tag means this column or table is deprecated.",
"name": "Deprecated"
}
}

Read Tags

query {
dataset(urn: "urn:li:dataset:(urn:li:dataPlatform:hive,SampleHiveDataset,PROD)") {
tags {
tags {
tag {
name
urn
properties {
description
colorHex
}
}
}
}
}
}

If you see the following response, the operation was successful:

{
"data": {
"dataset": {
"tags": {
"tags": [
{
"tag": {
"name": "Legacy",
"urn": "urn:li:tag:Legacy",
"properties": {
"description": "Indicates the dataset is no longer supported",
"colorHex": null,
"name": "Legacy"
}
}
}
]
}
}
},
"extensions": {}
}

Add Tags

The following code shows you how can add tags to a dataset. In the following code, we add a tag Deprecated to a dataset named fct_users_created.

mutation addTags {
addTags(
input: {
tagUrns: ["urn:li:tag:deprecated"],
resourceUrn: "urn:li:dataset:(urn:li:dataPlatform:hive,fct_users_created,PROD)",
}
)
}

Note that you can also add a tag on a column of a dataset if you specify subResourceType and subResource.

mutation addTags {
addTags(
input: {
tagUrns: ["urn:li:tag:deprecated"],
resourceUrn: "urn:li:dataset:(urn:li:dataPlatform:hive,fct_users_created,PROD)",
subResourceType:DATASET_FIELD,
subResource:"user_name"})
}

If you see the following response, the operation was successful:

{
"data": {
"addTags": true
},
"extensions": {}
}

Expected Outcome of Adding Tags

You can now see Deprecated tag has been added to user_name column.

tag-added

We can also verify this operation programmatically by checking the globalTags aspect using the datahub cli.

datahub get --urn "urn:li:dataset:(urn:li:dataPlatform:hive,fct_users_created,PROD)" --aspect globalTags

Remove Tags

The following code remove a tag from a dataset. After running this code, Deprecated tag will be removed from a user_name column.

mutation removeTag {
removeTag(
input: {
tagUrn: "urn:li:tag:deprecated",
resourceUrn: "urn:li:dataset:(urn:li:dataPlatform:hive,fct_users_created,PROD)",
subResourceType:DATASET_FIELD,
subResource:"user_name"})
}

Expected Outcome of Removing Tags

You can now see Deprecated tag has been removed to user_name column.

tag-removed

We can also verify this operation programmatically by checking the gloablTags aspect using the datahub cli.

datahub get --urn "urn:li:dataset:(urn:li:dataPlatform:hive,fct_users_created,PROD)" --aspect globalTags

{
"globalTags": {
"tags": []
}
}