# Compliance  & Data Governance

{% hint style="success" %}

#### Compliance & Data Governance Glossary

This advanced workshop focuses on creating a Compliance & Data Governance Glossary that addresses regulatory requirements, data privacy, and security considerations. You'll learn how to identify sensitive data, apply compliance tags, and build a Glossary that supports regulatory audits and data protection initiatives.

By the end of this workshop, you will:

* Identify and classify sensitive data elements
* Apply regulatory compliance markers (GDPR, PCI-DSS, CCPA)
* Design glossaries that support privacy-by-design
* Implement data classification frameworks
* Create audit-ready documentation
  {% endhint %}

<figure><img src="https://1051758685-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fw1qJj4OGmdcvowiklB9W%2Fuploads%2FnXCgGS08SV0czWvQE8Sd%2Fimage.png?alt=media&#x26;token=be04adda-857a-4941-81f4-946ef93b610b" alt=""><figcaption><p>Compliance &#x26; Data Governance Glossary</p></figcaption></figure>

{% embed url="<https://docs.pentaho.com/pdc-use/pdc-business-glossary>" %}

***

{% hint style="info" %}

#### Accessing Data Catalog

To get started using the Data Catalog, log in using the address and credentials provided by your Data Catalog service user or administrator.
{% endhint %}

To access your catalog, please follow these steps:

1. Open **Google Chrome** web browser.
2. Navigate to:

{% embed url="<https://pdc.pentaho.lab>" %}

3. Enter following email and password, then click **Sign In**.

Username: <david.park@adventureworks.com> (mapped to Business Steward role)

Password: Welcome123!

***

{% tabs %}
{% tab title="Regulatory Landscape" %}
{% hint style="info" %}

#### Understanding Regulatory Landscape

Each regulation has different requirements, penalties, and scope. GDPR can fine up to 4% of global revenue. PCI-DSS non-compliance can result in losing credit card processing abilities. We must understand what each regulation demands before we can build a compliant glossary. Think of regulations as business requirements that happen to be legally mandated.

* **GDPR:** Broadest scope, strictest penalties, applies globally if you have EU customers
* **PCI-DSS:** Required for any credit card processing
* **CCPA:** Growing trend, other states following California's lead
* **SOX:** Public company requirement, affects financial data
* **HIPAA excluded:** Adventure Works doesn't handle health data
  {% endhint %}

<table><thead><tr><th width="115">Regulation</th><th width="169">Scope</th><th width="209">Key Requirements</th><th>Data Impact</th></tr></thead><tbody><tr><td>GDPR</td><td>EU citizens' data</td><td>Consent, right to erasure, data portability</td><td>Names, emails, addresses, phone, National Identity, etc</td></tr><tr><td>LGPD</td><td>Brazilian citizens' data</td><td>Similar to Europe's GDPR</td><td>Covers a broad range of PII data</td></tr><tr><td>PCI-DSS</td><td>Payment card data</td><td>Encryption, access control, monitoring</td><td>Credit card numbers</td></tr><tr><td>CCPA</td><td>California residents</td><td>Disclosure, opt-out, deletion rights</td><td>Personal information</td></tr><tr><td>SOX</td><td>Financial reporting</td><td>Accuracy, audit trails, internal controls</td><td>Financial records</td></tr></tbody></table>

1. Review the following tables:

{% hint style="info" %}
We can't protect what we don't know about. This systematic scan identifies all potentially sensitive data. We're looking for anything that could identify a person, reveal financial information, or be used for identity theft. This is like taking an inventory before implementing security measures:

* Person.Person (names, demographics)
* Person.EmailAddress (contact info)
* Person.PersonPhone (phone numbers)
* Person.Address (physical locations)
* Sales.CreditCard (payment data)
* HumanResources.Employee (SSN, birth dates)
  {% endhint %}

2. Ask the following questions to help define a: 'Risk Assessment Matrix':

{% hint style="info" %}
**Critical Questions:**

1. What Terms do you have difficulty with?
2. What data could identify an individual?
3. What data requires encryption?
4. What data has retention limits?
5. What data needs access controls?
   {% endhint %}

{% endtab %}

{% tab title="Create Glossary" %}
{% hint style="info" %}

#### Test Glossary

Establishing a hierarchical structure by categorizing business terms into domains and specific categories simplifies data navigation and management. This organized structure boosts efficient data discovery and strengthens governance through role-based access controls. In the realm of data management, business terms are crucial in a data catalog, guaranteeing seamless identification, access, and utilization of data in line with organizational goals and compliance mandates.

Let's create a Test Glossary ..
{% endhint %}

1. Navigate to: Glossary

{% tabs %}
{% tab title="1. Glossary" %}
{% hint style="info" %}

#### Glossary

Creating a test Glossary will help you define the hierarchy: Glossary -> Category -> Term. Exporting the Glossary will reveal how the Glossary is structured - esp the Properties - and again help understand the API call.
{% endhint %}

1. Under Actions select 'Add New Glossary'.

<figure><img src="https://1051758685-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fw1qJj4OGmdcvowiklB9W%2Fuploads%2FXDdA1EkMqd0rlVpFu4To%2Fimage.png?alt=media&#x26;token=71ea828c-3545-40bc-a560-63508388445d" alt=""><figcaption><p>Add New Glossary</p></figcaption></figure>

2. Enter 'Test Glossary' and click 'Create'.

<figure><img src="https://1051758685-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fw1qJj4OGmdcvowiklB9W%2Fuploads%2Fci7u4jYswoZPT8uRtbFT%2Fimage.png?alt=media&#x26;token=452cd863-2d9a-48a1-85db-e2845990075c" alt=""><figcaption><p>Create Test Glossary</p></figcaption></figure>

3. Click on Edit: Enter a Definition & Purpose by clicking on the Edit option.

<figure><img src="https://1051758685-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fw1qJj4OGmdcvowiklB9W%2Fuploads%2FJYPMqmRDeY0UqvA1Nm4H%2Fimage.png?alt=media&#x26;token=0d02be7d-5ca9-4bb2-993f-9b59c34d9b64" alt=""><figcaption><p>Enter Definition &#x26; Purpose</p></figcaption></figure>

4. Click 'Save Changes'.

***

{% hint style="info" %}

#### Panel

Enter a bunch of default values so we get an idea of what's in the API call ..&#x20;
{% endhint %}

The following panels enable to track and audit any changes to the Glossary.

{% tabs %}
{% tab title="Properties" %}
{% hint style="info" %}

#### Properties

The Properties panel define required metadata properties that track and audit any changes to the Glossary.
{% endhint %}

<table><thead><tr><th width="132">Property</th><th width="203">Value(s)</th><th>Description</th></tr></thead><tbody><tr><td>Sensitivity</td><td>HIGH</td><td>Classification system that categorizes data assets based on the potential impact if that data were to be compromised, accessed inappropriately, or disclosed without authorization</td></tr><tr><td>Domain</td><td>Technology</td><td>List of Domains</td></tr><tr><td>Custodian</td><td>David Park</td><td>The user responsible for managing the glossary.</td></tr><tr><td>Business Steward</td><td>David Park</td><td>The user responsible for any modifications to the asset.</td></tr><tr><td>Critical Data Element</td><td>False</td><td>This property is usually applied to columns. These columns should be critical pieces of information that are necessary for decision making and so need to be governed with the highest care.</td></tr><tr><td>Status</td><td>Draft</td><td>Accepted, Draft, Review, Deprecated</td></tr><tr><td>Created by</td><td>The logged in user</td><td>The user who created the glossary item.</td></tr><tr><td>Updated by</td><td>The logged in user</td><td>The user who updated the glossary item.</td></tr><tr><td>Last Updated</td><td>Timestamp</td><td>A timestamp indicating when the glossary item was last updated.</td></tr></tbody></table>

<figure><img src="https://1051758685-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fw1qJj4OGmdcvowiklB9W%2Fuploads%2FxOJBNuCPf6RqiTFb75S6%2Fimage.png?alt=media&#x26;token=4819244a-bdca-4cfc-a3d9-92dfc5419df4" alt=""><figcaption><p>Glossary / Domain Properties</p></figcaption></figure>
{% endtab %}

{% tab title="Tags" %}
{% hint style="info" %}
Besides organizing your Glossary by Domain & Category, the Data Catalog allows you to assign **tags** to your resources. A tag is a label you can use to describe an element and to retrieve it later when browsing or searching.
{% endhint %}

1. You can manually add a Tag: glossary\_tag

<figure><img src="https://1051758685-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fw1qJj4OGmdcvowiklB9W%2Fuploads%2FmZ4ymCbH8jbQotWaHHl2%2Fimage.png?alt=media&#x26;token=0a2f2c1a-cf78-4938-a1e6-71f7ad0bca27" alt=""><figcaption><p>Glossary tag</p></figcaption></figure>

2. Save.
   {% endtab %}

{% tab title="Style" %}
You can select the color & change the icon.

<figure><img src="https://1051758685-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fw1qJj4OGmdcvowiklB9W%2Fuploads%2FBbwol6U1gzsyT8T2wJVN%2Fimage.png?alt=media&#x26;token=0739d119-b26d-495a-abcc-c6905fb21202" alt=""><figcaption><p>Style</p></figcaption></figure>
{% endtab %}
{% endtabs %}
{% endtab %}

{% tab title="2. Category" %}
{% hint style="info" %}

#### Category

{% endhint %}

1. Select 'Test Glossary' Domain & then 'Add New Category'

<figure><img src="https://1051758685-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fw1qJj4OGmdcvowiklB9W%2Fuploads%2FVDpESngUP4yNMc2gA1U3%2Fimage.png?alt=media&#x26;token=fa1e7427-79f6-4ac5-9d11-20d0de1a6007" alt=""><figcaption><p>Add New Category</p></figcaption></figure>

2. Enter the Category Name: 'Test Category' & select Parent: 'Test Glossary'.

<figure><img src="https://1051758685-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fw1qJj4OGmdcvowiklB9W%2Fuploads%2F0ZZevHrRzdmMPN8YlKu1%2Fimage.png?alt=media&#x26;token=17391ad9-df79-4cdd-bd01-658a1e962c1d" alt=""><figcaption><p>Create Test Category</p></figcaption></figure>

3. Click 'Create'.
4. Enter a Definition & Purpose by clicking on the Edit option.

<figure><img src="https://1051758685-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fw1qJj4OGmdcvowiklB9W%2Fuploads%2FHHK8HCdX0eJ8JO6lsxVV%2Fimage.png?alt=media&#x26;token=160ead6f-990c-488a-8017-035030e2ce58" alt=""><figcaption><p>Test Category</p></figcaption></figure>

| Property              | Value           |
| --------------------- | --------------- |
| Sensitivity           | MEDIUM          |
| Domain                | Technology      |
| Custodian             | Elena Rodriguez |
| Business Steward      | Elena Rodriguez |
| Critical Data Element | false           |
| Status                | Draft           |
| Created by            | David Park      |
| Last Updated          | David Park      |

5. Add the tag: category\_tag
6. Click 'Save Changes'.
   {% endtab %}

{% tab title="3. Term" %}
{% hint style="info" %}

#### Term

In a data catalog, a **Business Term** refers to metadata that describes the business aspects of a data asset. For example, a business term might indicate whether the data represents customer demographics, financial transactions, or product inventory.
{% endhint %}

1.

```
<figure><img src="https://content.gitbook.com/content/w1qJj4OGmdcvowiklB9W/blobs/VqCJ5YBHurHTGoyw35qi/image.png" alt=""><figcaption><p>Business Terms</p></figcaption></figure>
```

2. Select 'Test Category' & then 'Add New Term'.

<figure><img src="https://1051758685-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fw1qJj4OGmdcvowiklB9W%2Fuploads%2FsiR8PeBZbaJxrbOVqwYr%2Fimage.png?alt=media&#x26;token=9fd42f5a-19d5-469a-af8e-b68eea8baaab" alt=""><figcaption><p>Add New Term</p></figcaption></figure>

2. Enter the Term Name: 'Test Term' & select Parent: 'Test Category'.

<figure><img src="https://1051758685-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fw1qJj4OGmdcvowiklB9W%2Fuploads%2Ff4yg0THxVXvLv6JCRYn8%2Fimage.png?alt=media&#x26;token=486dc618-701e-4114-9cc2-8a6629aa193d" alt=""><figcaption></figcaption></figure>

3. Click 'Create'.
4. Enter a Definition & Purpose by clicking on the Edit option.

<figure><img src="https://1051758685-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fw1qJj4OGmdcvowiklB9W%2Fuploads%2FLHpIHqHOFCID0IIOnn0o%2Fimage.png?alt=media&#x26;token=b09b0d12-121c-436f-9a7f-60a691ef1092" alt=""><figcaption><p>Test Term</p></figcaption></figure>

***

{% hint style="info" %}

#### Custom Properties

A Glossary, Category or Term metadata properties can be 'enriched' with Custom Properties.&#x20;
{% endhint %}

{% tabs %}
{% tab title="Custom Properties" %}
{% hint style="info" %}

#### Custom Properties

A Glossary, Category or Term metadata properties can be 'enriched' with Custom Properties, which either be text or numerical.&#x20;

For example: a Term could have 5 Levels 1 to 5
{% endhint %}

1. Click the 'Custom' tab.
2. Click the “+ Add Custom Property” button.

<figure><img src="https://1051758685-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fw1qJj4OGmdcvowiklB9W%2Fuploads%2FkxobEUXUKTFVE3LIkioB%2Fimage.png?alt=media&#x26;token=decd0609-fa4d-4477-b485-4a2827d45680" alt=""><figcaption><p>Add Custom Property</p></figcaption></figure>

3. Enter the Label, default value and select either Free text or Select Value that will be associated with the Term.

<figure><img src="https://1051758685-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fw1qJj4OGmdcvowiklB9W%2Fuploads%2F7hN65hiEdhlUDaOsuMLI%2Fimage.png?alt=media&#x26;token=04de7f9b-261f-4918-bef6-16307e2ac862" alt=""><figcaption><p>Add Custom Property</p></figcaption></figure>

4. Click 'Save'.&#x20;
   {% endtab %}

{% tab title="Related" %}
{% hint style="info" %}

#### Related

Lists the associated business terms of the chosen glossary term. It helps you understand the broader business context and semantic associations among terms, improving data discoverability and clarity.&#x20;

Relationship types such as **Synonym**, **Deprecated**, **In favor of**, **Has A**, **Is a type of A**, and **Related to** allow business stewards to define how terms are conceptually connected.
{% endhint %}

1. Click the 'Custom' tab.
2. Click the “+ Add Term” button.

<figure><img src="https://1051758685-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fw1qJj4OGmdcvowiklB9W%2Fuploads%2FuGdch4NskjRx0VVOMlev%2Fimage.png?alt=media&#x26;token=83b9ecc5-4836-4dbd-9cfb-d7b56db3ba07" alt=""><figcaption><p>Set Term Relationship</p></figcaption></figure>

x
{% endtab %}

{% tab title="Data Elements" %}
{% hint style="info" %}

#### Data Elements

Lists the associated data elements across various data sources for the selected glossary term. This association provides a direct connection between business-level terminology and the underlying technical metadata, helping you understand where and how a business term is implemented within enterprise data systems.
{% endhint %}

x

x

x

{% endtab %}

{% tab title="More ..." %}

{% endtab %}
{% endtabs %}
{% endtab %}
{% endtabs %}
{% endtab %}

{% tab title="Upload Glossary" %}
{% hint style="info" %}

#### Glossary

Creating a Glossary is pretty straightforward in Data Catalog ..  and that's useful if you're testing out Hierarchies, Categories and Terms to ensure they follow convention.

However, there will be a time when you'll need to do a bulk load to get up and running and that's where the fun begins. In this section we're going to take a deeper look inside the JSON Object.

There's a couple of key features to bear in mind when your creating a Glossary:

* Data Catalog uses[ JSONL](https://jsonlines.org/) - JSON Lines is a convenient format for storing structured data that may be processed one record at a time. The values are not stored in an array, but as: One JSON Object per line.
* Each field needs specific fields, in the correct order, for a successful import:

**Required Fields**

&#x20;       `_id`: Unique identifier (e.g., "customer-name-001")

`type`: Always "term" for term definitions

`name`: Human-readable term name

`fqdn`: Full hierarchical path

`attributes`: Container for all term-specific data

**Relationship Fields**

`rootId`: ID of the root governance entity

`parentId`: ID of the parent category/container

`resourceId`: External resource reference (often empty)

**Audit Fields**

`createdAt/updatedAt`: ISO 8601 timestamps

`createdBy/updatedBy`: UUID of the user

* The 'attributes object' is a complex nested JSON Object that populates the Properties panel.

**Attributes Structure**

`features.sensitivity`: Data classification level

`formula`: Validation rules/patterns (Lexical format)

`info.custodian`: Data custodian UUID

`info.businessSteward`: Business owner UUID

`info.purpose`: Business purpose (Lexical format)

`info.definition`: Term definition (Lexical format)

`info.abbreviation`: Short code for the term

`info.status`: Lifecycle state

* You cannot apply incremental updates. The timestamp means that the whole Glossary has to uploaded.
*

{% endhint %}

x

x
{% endtab %}
{% endtabs %}
