Pentaho Academy Beta site ..

Identify the data

Data Dictionaries & Patterns ..

Data Identification: Dictionaries and Pattern Analysis

The data identification process efficiently classifies data by leveraging dictionaries and data pattern analysis. This methodology enables the automatic tagging of data based on predefined criteria in dictionary and pattern configuration files.

While Data Catalog comes equipped with a comprehensive set of dictionaries and patterns, it also offers flexibility by allowing users to create custom dictionaries and pattern analysis configurations. This customization ensures that the data identification process can be tailored to meet the specific requirements of any organization.

Management - Data Identification Methods

Data Dictionaries

A data dictionary in a data catalog is a collection of predefined terms and definitions that help classify and tag data within an organization's datasets. It serves as a reference guide for data terms, helping users understand the meaning, usage, and context of each data element.

By leveraging a data dictionary, organizations can ensure consistency, accuracy, and easier data identification and management across different data sets. Custom dictionaries can also be created to meet specific organizational needs.

Data Dictionary

Let's run through an example: Marital_Status.

  1. Navigate to the 'Management' tile & click on: Dictionaries.

  2. Search for: Marital_Status

Dictionaries - Marital_Status

When the data is profiled, in our example: 'marital_status' the value is compared, using a rule, against (with a degree of confidence) the predefined dictionary.

Once matched: Tags - PII, Marital Status, Non-Sensitive are then applied.

  1. Click on the > to View Dictionary

Dictionary - Marital_Status
  1. Next -> 1.2 Rules

Was this helpful?