PII Detection

Overview

The PII Detection feature in Data Catalog uses Machine Learning (ML) and Large Language Models (LLMs) to analyze data in JDBC tables and identify Personally Identifiable Information (PII). This feature is specifically trained for Korean and Japanese datasets and automatically detects and classifies sensitive data, such as names, addresses, and ID numbers. It helps you to streamline compliance with privacy regulations by automatically identifying and classifying personally identifiable information (PII) in datasets. To learn more, see PII Detection.

Note: This feature currently supports only JDBC data sources with Korean and Japanese content.

When you start PII Detection, Data Catalog scans the selected JDBC table for column names that contain PII entities. Once the process is complete and if PII data is identified:

A new glossary titled ML_PII is automatically created (if not already present). If the ML_PII glossary already exists, newly identified PII terms are added to it.
Detected PII entities are tagged with relevant business terms from the ML_PII glossary.

These tags appear in the Business Terms panel of the respective columns.

PreviousUsage Statistics NextMachine Learning

Last updated 4 days ago

Was this helpful?