Personal Data Identifier
GDPR Compliance ..
Personal Data Identifier Dictionary (GDPR Compliance)
This comprehensive workshop demonstrates how to create data dictionaries that not only improve data discovery but also ensure regulatory compliance and support robust data governance frameworks.
Using Microsoft's AdventureWorks2022 database as our foundation, participants will learn to build data dictionaries that meet the stringent requirements of GDPR, SOX, and other regulatory frameworks while establishing best practices for ongoing metadata management.
By the end of this workshop, you will be able to:
Create comprehensive data dictionaries that support regulatory compliance requirements
Understand the relationship between data dictionaries, data lineage, and metadata management
Implement data classification schemes that meet industry-specific regulatory standards
Establish processes for dictionary maintenance that ensure ongoing compliance
Integrate dictionaries with broader data governance and security frameworks
Design audit trails and documentation that satisfy regulatory requirements
Phase 1: Pattern-Based Discovery (This Workshop)
Structured database PII identification
Implements dictionary-based identification using column name patterns, regex matching, and metadata hints. Focuses on structured data within relational databases. Establishes foundational governance processes, documentation standards, and automated tagging mechanisms.
• Data dictionaries for all PII categories • Automated tagging rules • GDPR compliance documentation • Initial data inventory • Governance procedures
Phase 2: Content Inspection (Future)
Actual data value analysis
Expands beyond metadata to inspect actual data values using sampling, statistical analysis, and machine learning. Detects PII in free-text fields, comments, and unstructured content. Implements Named Entity Recognition (NER) for contextual PII identification.
• Content inspection rules • ML-based classifiers • False positive reduction • Unstructured data coverage • Enhanced accuracy metrics
Phase 3: Data Flow Mapping (Future)
End-to-end lineage tracking
Maps PII movement through ETL pipelines, APIs, reports, exports, and data integrations. Identifies downstream systems receiving PII. Tracks data transformations, aggregations, and derivations to understand complete data lifecycle.
• Complete data lineage maps • API/interface PII exposure analysis • ETL pipeline documentation • Report/dashboard PII tracking • Cross-system impact analysis
Phase 4: Privacy Operations Integration (Future)
Enterprise-wide privacy ecosystem
Integrates PII identification with data masking, access controls, consent management, and data subject request fulfillment. Implements automated breach notification scope assessment, retention policy enforcement, and continuous compliance monitoring.
• Integrated masking policies • Automated DSR fulfillment • Consent tracking integration • Breach assessment automation • Real-time compliance dashboards
GDPR - PII Dictionary
This workshop comprehensively covers Phase 1, providing all necessary resources, templates, procedures, and documentation to successfully implement pattern-based PII discovery.
x
Database Inventory
Objective: Create a complete inventory of all database schemas and tables to ensure comprehensive coverage.
Why This Matters: GDPR compliance requires documenting all data sources within the organization. This inventory forms the foundation for regulatory compliance.
Schema Analysis
The first tasks in our journey was to conduct a comprehensive Database Inventory for compliance.
Data governance requires a complete inventory of all data assets to ensure comprehensive coverage. This inventory forms the foundation for regulatory compliance by documenting all data sources within the organization.
For Phase 1 we're going to take the traditional route and run some SQL scripts - later we'#ll take a look at a 'Project' that leverages ML.
Why This Matters: GDPR compliance requires documenting all data sources within the organization. This enhanced inventory not only catalogs schemas but automatically identifies PII exposure, calculates risk scores, and prioritizes compliance efforts based on actual data sensitivity.

Person
13
~1,755,295
~460
8
13
212.86
CRITICAL
Personal Data
HumanResources
6
~8,179
~6.5
1
1
52.5
CRITICAL
Employee Data
Sales
19
~2,658,063
~287
4
2
36.5
HIGH
Transaction Data
Purchasing
5
~155,373
~16
1
0
20.41
HIGH
Operational Data
Production
25
~3,417,306
~251
1
1
10.06
MEDIUM
Operational Data
dbo
3
~38,308
~51
0
0
0.00
LOW
System Data
Understanding the Output
high_risk_pii_columns: Count of columns containing emails, phones, SSN, passwords, credit cards
medium_risk_pii_columns: Count of columns with names, addresses, birth dates, postal codes
low_risk_pii_columns: Count of columns with titles, suffixes, gender, marital status
pii_risk_score: Weighted calculation (High×10 + Medium×5 + Low×2) ÷ Total Columns × 100
compliance_priority: Automatic prioritization using the following thresholds:
CRITICAL: 3+ high-risk columns OR risk score ≥ 40
HIGH: Any high-risk columns OR risk score ≥ 15
MEDIUM: Any medium-risk columns OR risk score ≥ 5
LOW: Only low-risk columns OR risk score < 5
data_category: Automatic classification based on schema naming patterns
Key Insights
Person schema is CRITICAL priority with a PII risk score of 212.86 - contains 8 high-risk and 13 medium-risk PII columns spanning 1.7M+ records. This should be your primary focus for dictionary creation.
HumanResources schema also requires immediate CRITICAL attention with a risk score of 52.5 due to employee data sensitivity.
Sales and Purchasing schemas are HIGH priority containing customer contact information and vendor data.
Production schema has MEDIUM priority with minimal PII exposure (1 high, 1 medium risk column).
dbo schema shows LOW priority with no PII columns detected - likely contains system/configuration data.
Deliverable:
Export this result set to Excel for reference throughout the project
Use the
compliance_prioritycolumn to sequence your dictionary creation workDocument the
pii_risk_scorebaseline for quarterly tracking of new PII exposure
In DBeaver run the following script:
-- ============================================================================
-- ENHANCED SCHEMA ANALYSIS FOR GDPR COMPLIANCE - COMPLETE EDITION
-- ============================================================================
-- Purpose: Comprehensive database inventory with automated PII risk indicators
-- Target: SQL Server (AdventureWorks 2022 or similar)
-- Version: 2.1 (Updated with improved priority logic and detailed comments)
-- Author: Data Governance Team
-- Last Updated: 2025-10-04
-- ============================================================================
--
-- WHAT THIS SCRIPT DOES:
-- - Automatically scans all database schemas for PII-related columns
-- - Calculates risk scores based on column naming patterns
-- - Assigns compliance priorities (CRITICAL/HIGH/MEDIUM/LOW)
-- - Provides data volume metrics and relationship analysis
-- - Generates export-ready reports for compliance documentation
--
-- PREREQUISITES:
-- - SQL Server 2016+ (for STRING_AGG function)
-- - Read permissions on sys schema catalog views
-- - Database with profiled/populated tables for accurate row counts
--
-- ============================================================================
-- ============================================================================
-- VERSION 1: BASIC ENHANCED (Quick Overview with Size Metrics)
-- ============================================================================
-- USE CASE: Quick schema overview with data volume metrics
-- OUTPUT: Schema summary with row counts, sizes, and last modified dates
-- RUNTIME: Fast (<5 seconds for most databases)
-- WHEN TO USE: Daily monitoring, quick health checks, initial assessment
-- ============================================================================
SELECT
-- Schema name from system catalog
s.name AS schema_name,
-- Count distinct tables in each schema
COUNT(DISTINCT t.name) AS table_count,
-- Sum all rows across tables in schema
-- Uses sys.partitions for accurate row counts from indexes
SUM(p.rows) AS total_rows,
-- Calculate total storage size in MB
-- total_pages = 8KB pages, convert to MB by: (pages * 8) / 1024
SUM(CAST(a.total_pages AS BIGINT) * 8) / 1024.0 AS total_size_mb,
-- Count all columns across all tables in schema
COUNT(DISTINCT c.column_id) AS total_columns,
-- Get most recent modification date across all tables
MAX(t.modify_date) AS last_modified,
-- Create ordered, comma-separated list of tables
-- Note: STRING_AGG requires SQL Server 2017+. See alternative below for older versions.
STRING_AGG(t.name, ', ') WITHIN GROUP (ORDER BY t.name) AS table_list
-- ALTERNATIVE for SQL Server 2016 and earlier:
-- STUFF((SELECT ', ' + t2.name
-- FROM sys.tables t2
-- WHERE t2.schema_id = s.schema_id
-- ORDER BY t2.name
-- FOR XML PATH(''), TYPE).value('.', 'NVARCHAR(MAX)'), 1, 2, '') AS table_list
FROM sys.schemas s
-- LEFT JOIN ensures we see schemas even if they have no tables
LEFT JOIN sys.tables t ON s.schema_id = t.schema_id
-- Get row counts from partitions (index 0=heap, 1=clustered index)
LEFT JOIN sys.partitions p ON t.object_id = p.object_id AND p.index_id IN (0, 1)
-- Get storage allocation data for size calculations
LEFT JOIN sys.allocation_units a ON p.partition_id = a.container_id
-- Get column information for column counts
LEFT JOIN sys.columns c ON t.object_id = c.object_id
-- Filter out system schemas that aren't relevant for GDPR analysis
WHERE s.name NOT IN ('sys', 'INFORMATION_SCHEMA', 'guest', 'db_owner', 'db_accessadmin',
'db_securityadmin', 'db_ddladmin', 'db_backupoperator', 'db_datareader',
'db_datawriter', 'db_denydatareader', 'db_denydatawriter')
GROUP BY s.name
ORDER BY
-- Custom sorting: PII-sensitive schemas first, then by row count
CASE
WHEN s.name = 'Person' THEN 1 -- Highest priority
WHEN s.name LIKE '%HR%' OR s.name LIKE '%Human%' THEN 2
WHEN s.name LIKE '%Customer%' OR s.name LIKE '%Sales%' THEN 3
ELSE 4 -- Everything else
END,
total_rows DESC; -- Within same priority level, show largest first
-- ============================================================================
-- VERSION 2: COMPLIANCE-FOCUSED (⭐ RECOMMENDED FOR GDPR ANALYSIS)
-- ============================================================================
-- USE CASE: Complete PII risk assessment with automated prioritization
-- OUTPUT: Schema-level PII detection, risk scoring, and compliance priority
-- RUNTIME: Medium (~15-30 seconds for large databases)
-- WHEN TO USE: Quarterly compliance reviews, GDPR audits, dictionary planning
--
-- KEY FEATURES:
-- - Automatic PII column detection across 3 risk levels (high/medium/low)
-- - Calculated PII risk score (0-100 scale)
-- - Smart compliance prioritization (CRITICAL/HIGH/MEDIUM/LOW)
-- - Automatic data category classification
-- - Relationship complexity analysis
--
-- CSV EXPORT INSTRUCTIONS:
-- Method 1 (DBeaver): Run query → Right-click results → Export → CSV
-- Method 2 (SSMS): Run query → Right-click results → Save Results As → CSV
-- Method 3 (Command Line): Use the export wrapper at the bottom of this script
-- ============================================================================
-- CTE 1: SchemaMetrics - Gather basic metrics for all tables
WITH SchemaMetrics AS (
SELECT
s.name AS schema_name,
t.name AS table_name,
-- Row count from partition statistics
-- More accurate than COUNT(*) and much faster
SUM(p.rows) AS row_count,
-- Table size calculation in MB
-- total_pages includes data + indexes + LOB storage
SUM(CAST(a.total_pages AS BIGINT) * 8) / 1024.0 AS size_mb,
-- Column count per table
COUNT(DISTINCT c.column_id) AS column_count,
-- Metadata timestamps for change tracking
t.modify_date,
t.create_date
FROM sys.schemas s
INNER JOIN sys.tables t ON s.schema_id = t.schema_id
LEFT JOIN sys.partitions p ON t.object_id = p.object_id AND p.index_id IN (0, 1)
LEFT JOIN sys.allocation_units a ON p.partition_id = a.container_id
LEFT JOIN sys.columns c ON t.object_id = c.object_id
-- Exclude system schemas that have no business/personal data
WHERE s.name NOT IN ('sys', 'INFORMATION_SCHEMA', 'guest', 'db_owner', 'db_accessadmin',
'db_securityadmin', 'db_ddladmin', 'db_backupoperator', 'db_datareader',
'db_datawriter', 'db_denydatareader', 'db_denydatawriter')
GROUP BY s.name, t.name, t.modify_date, t.create_date
),
-- CTE 2: PIIIndicators - Detect PII columns using naming pattern analysis
-- This CTE categorizes columns into risk levels based on GDPR sensitivity
PIIIndicators AS (
SELECT
SCHEMA_NAME(t.schema_id) AS schema_name,
t.name AS table_name,
-- HIGH RISK COLUMNS (PII_CRITICAL)
-- These columns typically contain highly sensitive personal data
-- GDPR Articles 4, 9 - Special categories requiring explicit consent
COUNT(DISTINCT CASE
WHEN c.name LIKE '%Email%' OR c.name LIKE '%Mail%' THEN c.column_id
-- Email addresses: Direct contact, unique identifier
WHEN c.name LIKE '%Phone%' OR c.name LIKE '%Mobile%' THEN c.column_id
-- Phone numbers: Direct contact information
WHEN c.name LIKE '%SSN%' OR c.name LIKE '%National%ID%' THEN c.column_id
-- National identifiers: Government ID, highly sensitive
WHEN c.name LIKE '%Password%' OR c.name LIKE '%PWD%' THEN c.column_id
-- Credentials: Security risk if exposed
WHEN c.name LIKE '%Credit%' OR c.name LIKE '%Card%' THEN c.column_id
-- Financial data: Payment card information
END) AS high_risk_columns,
-- MEDIUM RISK COLUMNS (PII_MEDIUM)
-- Personal identifiers that combined can identify individuals
-- GDPR Article 4(1) - Personal data definition
COUNT(DISTINCT CASE
WHEN c.name LIKE '%FirstName%' OR c.name LIKE '%LastName%' THEN c.column_id
-- Names: Core personal identifiers
WHEN c.name LIKE '%MiddleName%' THEN c.column_id
-- Middle names: Additional personal identifier
WHEN c.name LIKE '%Address%' OR c.name LIKE '%Street%' THEN c.column_id
-- Physical addresses: Location data
WHEN c.name LIKE '%City%' OR c.name LIKE '%PostalCode%' OR c.name LIKE '%Zip%' THEN c.column_id
-- Location identifiers: Geographic personal data
WHEN c.name LIKE '%BirthDate%' OR c.name LIKE '%DOB%' THEN c.column_id
-- Birth dates: Age-sensitive personal information
END) AS medium_risk_columns,
-- LOW RISK COLUMNS (PII_LOW)
-- Indirect identifiers, less sensitive but still personal data
COUNT(DISTINCT CASE
WHEN c.name LIKE '%Title%' AND SCHEMA_NAME(t.schema_id) = 'Person' THEN c.column_id
-- Titles (Mr., Mrs., Dr.): Personal honorifics
WHEN c.name LIKE '%Suffix%' THEN c.column_id
-- Name suffixes (Jr., Sr., III): Family relationships
WHEN c.name LIKE '%Gender%' OR c.name LIKE '%Sex%' THEN c.column_id
-- Gender: Demographic personal data
WHEN c.name LIKE '%Marital%' THEN c.column_id
-- Marital status: Personal demographic information
END) AS low_risk_columns
FROM sys.tables t
INNER JOIN sys.columns c ON t.object_id = c.object_id
WHERE SCHEMA_NAME(t.schema_id) NOT IN ('sys', 'INFORMATION_SCHEMA')
GROUP BY SCHEMA_NAME(t.schema_id), t.name
),
-- CTE 3: RelationshipCount - Analyze foreign key relationships
-- Used to understand data lineage and impact of schema changes
RelationshipCount AS (
SELECT
OBJECT_SCHEMA_NAME(fk.parent_object_id) AS schema_name,
OBJECT_NAME(fk.parent_object_id) AS table_name,
-- Outbound FKs: How many other tables this table references
COUNT(DISTINCT fk.object_id) AS outbound_fk_count,
-- Inbound FKs: How many tables reference this table
COUNT(DISTINCT fk.referenced_object_id) AS inbound_fk_count
FROM sys.foreign_keys fk
GROUP BY OBJECT_SCHEMA_NAME(fk.parent_object_id), OBJECT_NAME(fk.parent_object_id)
)
-- MAIN QUERY: Aggregate all metrics at schema level
SELECT
sm.schema_name,
COUNT(DISTINCT sm.table_name) AS table_count,
SUM(sm.row_count) AS total_rows,
CAST(SUM(sm.size_mb) AS DECIMAL(10,2)) AS total_size_mb,
SUM(sm.column_count) AS total_columns,
-- PII Column Counts by Risk Level
SUM(ISNULL(pii.high_risk_columns, 0)) AS high_risk_pii_columns,
SUM(ISNULL(pii.medium_risk_columns, 0)) AS medium_risk_pii_columns,
SUM(ISNULL(pii.low_risk_columns, 0)) AS low_risk_pii_columns,
-- COMPLIANCE RISK SCORE CALCULATION (0-100 scale)
-- Formula: (High×10 + Medium×5 + Low×2) ÷ Total Columns × 100
-- Rationale:
-- - High risk weighted 10x (critical impact if breached)
-- - Medium risk weighted 5x (significant impact)
-- - Low risk weighted 2x (minor impact but still PII)
-- - Normalized to total columns for fair comparison across schemas
CAST(
(SUM(ISNULL(pii.high_risk_columns, 0)) * 10.0 +
SUM(ISNULL(pii.medium_risk_columns, 0)) * 5.0 +
SUM(ISNULL(pii.low_risk_columns, 0)) * 2.0) /
NULLIF(SUM(sm.column_count), 0) * 100 -- NULLIF prevents divide by zero
AS DECIMAL(5,2)) AS pii_risk_score,
-- COMPLIANCE PRIORITY ASSIGNMENT (UPDATED LOGIC v2.1)
-- Uses both absolute column counts AND risk scores for balanced assessment
--
-- CRITICAL: Schemas requiring immediate attention
-- - 3+ high-risk columns (multiple sensitive data types present)
-- - OR risk score ≥ 40 (>40% of columns are PII-weighted)
-- Examples: Person, HumanResources schemas
--
-- HIGH: Schemas requiring priority attention
-- - Any high-risk columns (at least one critical PII type)
-- - OR risk score ≥ 15 (>15% PII density)
-- Examples: Sales, Customer schemas
--
-- MEDIUM: Schemas requiring standard monitoring
-- - Any medium-risk columns (personal identifiers present)
-- - OR risk score ≥ 5 (>5% PII density)
-- Examples: Operational schemas with occasional PII
--
-- LOW: Schemas with minimal PII exposure
-- - Only low-risk columns OR risk score < 5
-- Examples: Product, Configuration schemas
CASE
WHEN (SUM(ISNULL(pii.high_risk_columns, 0)) >= 3)
OR (CAST((SUM(ISNULL(pii.high_risk_columns, 0)) * 10.0 +
SUM(ISNULL(pii.medium_risk_columns, 0)) * 5.0 +
SUM(ISNULL(pii.low_risk_columns, 0)) * 2.0) /
NULLIF(SUM(sm.column_count), 0) * 100 AS DECIMAL(5,2)) >= 40)
THEN 'CRITICAL'
WHEN (SUM(ISNULL(pii.high_risk_columns, 0)) > 0)
OR (CAST((SUM(ISNULL(pii.high_risk_columns, 0)) * 10.0 +
SUM(ISNULL(pii.medium_risk_columns, 0)) * 5.0 +
SUM(ISNULL(pii.low_risk_columns, 0)) * 2.0) /
NULLIF(SUM(sm.column_count), 0) * 100 AS DECIMAL(5,2)) >= 15)
THEN 'HIGH'
WHEN (SUM(ISNULL(pii.medium_risk_columns, 0)) > 0)
OR (CAST((SUM(ISNULL(pii.high_risk_columns, 0)) * 10.0 +
SUM(ISNULL(pii.medium_risk_columns, 0)) * 5.0 +
SUM(ISNULL(pii.low_risk_columns, 0)) * 2.0) /
NULLIF(SUM(sm.column_count), 0) * 100 AS DECIMAL(5,2)) >= 5)
THEN 'MEDIUM'
ELSE 'LOW'
END AS compliance_priority,
-- DATA CATEGORY CLASSIFICATION
-- Automatic classification based on schema naming conventions
-- Helps organize compliance efforts by data type
CASE
WHEN sm.schema_name LIKE '%Person%' OR sm.schema_name LIKE '%Customer%'
THEN 'Personal Data'
WHEN sm.schema_name LIKE '%HR%' OR sm.schema_name LIKE '%Human%'
OR sm.schema_name LIKE '%Employee%'
THEN 'Employee Data'
WHEN sm.schema_name LIKE '%Sales%' OR sm.schema_name LIKE '%Order%'
THEN 'Transaction Data'
WHEN sm.schema_name LIKE '%Financial%' OR sm.schema_name LIKE '%Payment%'
THEN 'Financial Data'
ELSE 'Operational Data'
END AS data_category,
-- Latest modification timestamp across all tables in schema
MAX(sm.modify_date) AS last_modified,
-- Ordered table list for reference
-- STRING_AGG requires SQL Server 2017+
-- For SQL Server 2016 and earlier, use the commented alternative below
STRING_AGG(sm.table_name, ', ') WITHIN GROUP (ORDER BY sm.table_name) AS table_list
-- ALTERNATIVE for SQL Server 2016 and earlier (uncomment if needed):
-- (SELECT STUFF((SELECT ', ' + sm2.table_name
-- FROM SchemaMetrics sm2
-- WHERE sm2.schema_name = sm.schema_name
-- ORDER BY sm2.table_name
-- FOR XML PATH(''), TYPE).value('.', 'NVARCHAR(MAX)'), 1, 2, '')) AS table_list
FROM SchemaMetrics sm
LEFT JOIN PIIIndicators pii
ON sm.schema_name = pii.schema_name
AND sm.table_name = pii.table_name
LEFT JOIN RelationshipCount rc
ON sm.schema_name = rc.schema_name
AND sm.table_name = rc.table_name
GROUP BY sm.schema_name
-- ORDER BY: Prioritize results for action planning
-- Primary sort: Compliance priority (CRITICAL first)
-- Secondary sort: Total rows (larger datasets first within same priority)
ORDER BY
CASE
WHEN (SUM(ISNULL(pii.high_risk_columns, 0)) >= 3)
OR (CAST((SUM(ISNULL(pii.high_risk_columns, 0)) * 10.0 +
SUM(ISNULL(pii.medium_risk_columns, 0)) * 5.0 +
SUM(ISNULL(pii.low_risk_columns, 0)) * 2.0) /
NULLIF(SUM(sm.column_count), 0) * 100 AS DECIMAL(5,2)) >= 40)
THEN 1 -- CRITICAL schemas first
WHEN (SUM(ISNULL(pii.high_risk_columns, 0)) > 0)
OR (CAST((SUM(ISNULL(pii.high_risk_columns, 0)) * 10.0 +
SUM(ISNULL(pii.medium_risk_columns, 0)) * 5.0 +
SUM(ISNULL(pii.low_risk_columns, 0)) * 2.0) /
NULLIF(SUM(sm.column_count), 0) * 100 AS DECIMAL(5,2)) >= 15)
THEN 2 -- HIGH schemas second
WHEN (SUM(ISNULL(pii.medium_risk_columns, 0)) > 0)
OR (CAST((SUM(ISNULL(pii.high_risk_columns, 0)) * 10.0 +
SUM(ISNULL(pii.medium_risk_columns, 0)) * 5.0 +
SUM(ISNULL(pii.low_risk_columns, 0)) * 2.0) /
NULLIF(SUM(sm.column_count), 0) * 100 AS DECIMAL(5,2)) >= 5)
THEN 3 -- MEDIUM schemas third
ELSE 4 -- LOW schemas last
END,
total_rows DESC;
-- ============================================================================
-- CSV EXPORT WRAPPER FOR VERSION 2
-- ============================================================================
-- INSTRUCTIONS:
-- 1. Uncomment the section below to export directly to CSV
-- 2. Update the file path to your desired location
-- 3. Ensure SQL Server has write permissions to the target directory
-- 4. Run this entire script block
-- ============================================================================
/*
-- Enable xp_cmdshell if not already enabled (requires admin rights)
-- EXEC sp_configure 'show advanced options', 1;
-- RECONFIGURE;
-- EXEC sp_configure 'xp_cmdshell', 1;
-- RECONFIGURE;
-- Export to CSV using BCP utility
DECLARE @sql NVARCHAR(MAX);
DECLARE @outputPath NVARCHAR(500) = 'C:\Exports\GDPR_Schema_Analysis.csv'; -- UPDATE THIS PATH
SET @sql = 'bcp "' + DB_NAME() + '.dbo.vw_gdpr_compliance_export" queryout "'
+ @outputPath + '" -c -t, -T -S ' + @@SERVERNAME;
EXEC xp_cmdshell @sql;
*/
-- ALTERNATIVE: Create a view for easier exports
-- CREATE VIEW vw_gdpr_compliance_export AS
-- [Paste the entire Version 2 query here]
-- Then export view results using your preferred tool
-- ============================================================================
-- PRIORITY THRESHOLDS EXPLAINED:
-- ============================================================================
--
-- THRESHOLD DESIGN RATIONALE:
-- These thresholds balance sensitivity with practicality
--
-- CRITICAL: 3+ high-risk columns OR risk score ≥ 40
-- Why: Multiple high-risk columns indicate concentrated PII exposure
-- Risk score ≥40 means >40% of schema contains weighted PII
-- Action Required: Immediate dictionary creation, data masking, audit logging
-- SLA: 1 week for initial assessment and protection
--
-- HIGH: Any high-risk columns OR risk score ≥ 15
-- Why: Even single high-risk columns (email, SSN) require protection
-- 15% threshold catches schemas with moderate PII concentration
-- Action Required: Priority dictionary creation, access controls
-- SLA: 2-3 weeks for protection implementation
--
-- MEDIUM: Any medium-risk columns OR risk score ≥ 5
-- Why: Medium-risk columns (names, addresses) still qualify as personal data
-- 5% threshold includes schemas with sparse but present PII
-- Action Required: Standard dictionary creation, monitoring
-- SLA: 4-6 weeks for implementation
--
-- LOW: Only low-risk columns OR risk score < 5
-- Why: Minimal PII exposure, primarily operational data
-- Action Required: Routine monitoring, no immediate action
-- SLA: Quarterly review cycle
--
-- ============================================================================
-- ============================================================================
-- VERSION 3: DETAILED TABLE-LEVEL ANALYSIS (Deep Dive)
-- ============================================================================
-- USE CASE: Detailed analysis of specific schemas
-- OUTPUT: Table-by-table breakdown with specific PII column identification
-- RUNTIME: Fast for single schema (~3-5 seconds)
-- WHEN TO USE: After Version 2 identifies high-priority schemas
--
-- INSTRUCTIONS:
-- 1. Replace 'Person' with your target schema name (line 270)
-- 2. Run query to see table-level detail
-- 3. Use results to plan dictionary terms and regex patterns
-- ============================================================================
WITH TablePIIAnalysis AS (
SELECT
SCHEMA_NAME(t.schema_id) AS schema_name,
t.name AS table_name,
t.create_date,
t.modify_date,
-- Table metrics
SUM(p.rows) AS row_count,
SUM(CAST(a.total_pages AS BIGINT) * 8) / 1024.0 AS size_mb,
COUNT(DISTINCT c.column_id) AS column_count,
-- PII COLUMN DETECTION WITH TYPE LABELS
-- Creates human-readable list of PII columns with their types
-- Format: "ColumnName (TYPE)", "ColumnName (TYPE)"
-- This output is perfect for:
-- - Building dictionary CSV files
-- - Creating regex patterns
-- - Documentation and audit reports
STRING_AGG(
CASE
-- Email/Mail columns: Direct contact, unique identifier
WHEN c.name LIKE '%Email%' OR c.name LIKE '%Mail%'
THEN c.name + ' (EMAIL)'
-- Phone columns: Contact information
WHEN c.name LIKE '%Phone%' OR c.name LIKE '%Mobile%' OR c.name LIKE '%Fax%'
THEN c.name + ' (PHONE)'
-- Credential columns: Critical security data
WHEN c.name LIKE '%Password%' OR c.name LIKE '%PWD%'
THEN c.name + ' (PASSWORD)'
-- National ID columns: Government identifiers
WHEN c.name LIKE '%SSN%' OR c.name LIKE '%National%ID%'
THEN c.name + ' (NATIONAL_ID)'
-- Financial columns: Payment information
WHEN c.name LIKE '%Credit%Card%'
THEN c.name + ' (CREDIT_CARD)'
-- Name columns: Personal identifiers
WHEN c.name LIKE '%FirstName%' OR c.name LIKE '%LastName%'
OR c.name LIKE '%MiddleName%'
THEN c.name + ' (NAME)'
-- Address columns: Location data
WHEN c.name LIKE '%Address%' OR c.name LIKE '%Street%'
THEN c.name + ' (ADDRESS)'
-- Birth date columns: Age-sensitive data
WHEN c.name LIKE '%BirthDate%' OR c.name LIKE '%DOB%'
THEN c.name + ' (BIRTHDATE)'
-- Location columns: Geographic identifiers
WHEN c.name LIKE '%City%' OR c.name LIKE '%Postal%' OR c.name LIKE '%Zip%'
THEN c.name + ' (LOCATION)'
END,
', ' -- Comma separator for readability
) AS potential_pii_columns,
-- RELATIONSHIP COMPLEXITY ANALYSIS
-- Understanding relationships helps with:
-- - Data lineage mapping
-- - Impact assessment for data changes
-- - Identifying data flow patterns
-- Outbound relationships: Tables this table depends on
(SELECT COUNT(*)
FROM sys.foreign_keys fk
WHERE fk.parent_object_id = t.object_id) AS outbound_relationships,
-- Inbound relationships: Tables that depend on this table
(SELECT COUNT(*)
FROM sys.foreign_keys fk
WHERE fk.referenced_object_id = t.object_id) AS inbound_relationships
FROM sys.tables t
LEFT JOIN sys.partitions p ON t.object_id = p.object_id AND p.index_id IN (0, 1)
LEFT JOIN sys.allocation_units a ON p.partition_id = a.container_id
LEFT JOIN sys.columns c ON t.object_id = c.object_id
WHERE SCHEMA_NAME(t.schema_id) = 'Person' -- ⚠️ CHANGE SCHEMA NAME HERE
GROUP BY SCHEMA_NAME(t.schema_id), t.name, t.create_date, t.modify_date, t.object_id
)
SELECT
schema_name,
table_name,
row_count,
CAST(size_mb AS DECIMAL(10,2)) AS size_mb,
column_count,
potential_pii_columns,
outbound_relationships,
inbound_relationships,
(outbound_relationships + inbound_relationships) AS total_relationships,
-- TABLE-LEVEL PRIORITIZATION
-- Assigns priority to individual tables within a schema
-- Useful for sequencing dictionary creation work
CASE
WHEN potential_pii_columns LIKE '%PASSWORD%'
OR potential_pii_columns LIKE '%CREDIT_CARD%'
OR potential_pii_columns LIKE '%NATIONAL_ID%'
THEN 'CRITICAL'
WHEN potential_pii_columns LIKE '%EMAIL%'
OR potential_pii_columns LIKE '%BIRTHDATE%'
OR potential_pii_columns LIKE '%ADDRESS%'
THEN 'HIGH'
WHEN potential_pii_columns LIKE '%NAME%'
OR potential_pii_columns LIKE '%PHONE%'
THEN 'MEDIUM'
WHEN potential_pii_columns IS NOT NULL
THEN 'LOW'
ELSE 'NONE'
END AS pii_priority,
-- Data freshness indicator
DATEDIFF(DAY, modify_date, GETDATE()) AS days_since_modified,
create_date,
modify_date
FROM TablePIIAnalysis
ORDER BY
-- Sort by priority level first
CASE
WHEN potential_pii_columns LIKE '%PASSWORD%' OR potential_pii_columns LIKE '%CREDIT_CARD%' THEN 1
WHEN potential_pii_columns LIKE '%EMAIL%' OR potential_pii_columns LIKE '%BIRTHDATE%' THEN 2
WHEN potential_pii_columns LIKE '%NAME%' OR potential_pii_columns LIKE '%PHONE%' THEN 3
WHEN potential_pii_columns IS NOT NULL THEN 4
ELSE 5
END,
-- Then by row count (larger tables first)
row_count DESC;
-- ============================================================================
-- VERSION 4: EXPORT-READY FORMAT (For Documentation & Reports)
-- ============================================================================
-- USE CASE: Clean output for compliance reports and executive summaries
-- OUTPUT: Formatted data ready for Excel export or compliance documentation
-- RUNTIME: Fast (<10 seconds)
-- WHEN TO USE: Monthly/Quarterly compliance reports, board presentations
--
-- KEY FEATURES:
-- - Human-readable number formatting (with commas)
-- - Binary indicators (Yes/No for PII presence)
-- - Clean column names for non-technical audiences
-- - Risk scores for quantitative tracking
-- ============================================================================
WITH SchemaInventory AS (
SELECT
-- Clean, business-friendly column aliases
s.name AS [Schema Name],
COUNT(DISTINCT t.name) AS [Table Count],
-- Format numbers with thousand separators for readability
-- FORMAT function converts to string with localization
FORMAT(SUM(p.rows), 'N0') AS [Total Rows],
-- Size in MB with 2 decimal places
CAST(SUM(CAST(a.total_pages AS BIGINT) * 8) / 1024.0 AS DECIMAL(10,2)) AS [Size (MB)],
COUNT(DISTINCT c.column_id) AS [Total Columns],
-- Calculate risk score using same formula as Version 2
-- Stored in CTE for reuse in priority calculation
CAST(
(COUNT(DISTINCT CASE
WHEN c.name LIKE '%Email%' OR c.name LIKE '%Phone%'
OR c.name LIKE '%SSN%' OR c.name LIKE '%Password%'
OR c.name LIKE '%Credit%' OR c.name LIKE '%National%ID%'
THEN c.column_id END) * 10.0 +
COUNT(DISTINCT CASE
WHEN c.name LIKE '%FirstName%' OR c.name LIKE '%LastName%'
OR c.name LIKE '%Address%' OR c.name LIKE '%BirthDate%'
THEN c.column_id END) * 5.0 +
COUNT(DISTINCT CASE
WHEN c.name LIKE '%Title%' OR c.name LIKE '%Gender%'
THEN c.column_id END) * 2.0) /
NULLIF(COUNT(DISTINCT c.column_id), 0) * 100
AS DECIMAL(5,2)) AS risk_score,
-- Business-friendly data classification
CASE
WHEN s.name LIKE '%Person%' OR s.name LIKE '%Customer%'
THEN 'Contains Personal Data'
WHEN s.name LIKE '%HR%' OR s.name LIKE '%Human%' OR s.name LIKE '%Employee%'
THEN 'Contains Employee Data'
WHEN s.name LIKE '%Sales%'
THEN 'Contains Transaction Data'
ELSE 'Operational Data'
END AS [Data Classification],
-- Binary PII indicator for quick scanning
CASE
WHEN COUNT(DISTINCT CASE
WHEN c.name LIKE '%Email%' OR c.name LIKE '%Phone%'
OR c.name LIKE '%SSN%' OR c.name LIKE '%Password%'
THEN c.column_id END) > 0
THEN 'Yes'
ELSE 'No'
END AS [Contains PII],
-- Priority calculation with updated thresholds
CASE
WHEN COUNT(DISTINCT CASE
WHEN c.name LIKE '%Email%' OR c.name LIKE '%Phone%'
OR c.name LIKE '%SSN%' OR c.name LIKE '%Password%'
THEN c.column_id END) >= 3
OR CAST((COUNT(DISTINCT CASE
WHEN c.name LIKE '%Email%' OR c.name LIKE '%Phone%'
OR c.name LIKE '%SSN%' OR c.name LIKE '%Password%'
THEN c.column_id END) * 10.0) /
NULLIF(COUNT(DISTINCT c.column_id), 0) * 100 AS DECIMAL(5,2)) >= 40
THEN 'CRITICAL'
WHEN COUNT(DISTINCT CASE
WHEN c.name LIKE '%Email%' OR c.name LIKE '%Phone%'
OR c.name LIKE '%SSN%' OR c.name LIKE '%Password%'
THEN c.column_id END) > 0
OR CAST((COUNT(DISTINCT CASE
WHEN c.name LIKE '%Email%' OR c.name LIKE '%Phone%'
OR c.name LIKE '%SSN%' OR c.name LIKE '%Password%'
THEN c.column_id END) * 10.0) /
NULLIF(COUNT(DISTINCT c.column_id), 0) * 100 AS DECIMAL(5,2)) >= 15
THEN 'HIGH'
WHEN COUNT(DISTINCT CASE
WHEN c.name LIKE '%FirstName%' OR c.name LIKE '%Address%'
THEN c.column_id END) > 0
THEN 'MEDIUM'
ELSE 'LOW'
END AS [Priority],
-- Formatted date for reports
FORMAT(MAX(t.modify_date), 'yyyy-MM-dd') AS [Last Modified],
-- Semicolon-separated list (easier to read than commas in CSV)
STRING_AGG(t.name, '; ') WITHIN GROUP (ORDER BY t.name) AS [Tables]
FROM sys.schemas s
LEFT JOIN sys.tables t ON s.schema_id = t.schema_id
LEFT JOIN sys.partitions p ON t.object_id = p.object_id AND p.index_id IN (0, 1)
LEFT JOIN sys.allocation_units a ON p.partition_id = a.container_id
LEFT JOIN sys.columns c ON t.object_id = c.object_id
WHERE s.name NOT IN ('sys', 'INFORMATION_SCHEMA', 'guest')
GROUP BY s.name
)
SELECT
[Schema Name],
[Table Count],
[Total Rows],
[Size (MB)],
[Total Columns],
[Data Classification],
[Contains PII],
[Priority],
CAST(risk_score AS DECIMAL(5,2)) AS [Risk Score],
[Last Modified],
[Tables]
FROM SchemaInventory
ORDER BY
-- Sort by priority for executive summary
CASE [Priority]
WHEN 'CRITICAL' THEN 1
WHEN 'HIGH' THEN 2
WHEN 'MEDIUM' THEN 3
ELSE 4
END,
-- Then by data volume
CAST(REPLACE([Total Rows], ',', '') AS BIGINT) DESC;
-- ============================================================================
-- COMPREHENSIVE USAGE GUIDE & RECOMMENDATIONS:
-- ============================================================================
--
-- WORKFLOW SEQUENCE:
--
-- 1. INITIAL ASSESSMENT (Week 1)
-- → Run VERSION 2 (Compliance-Focused)
-- → Identify CRITICAL and HIGH priority schemas
-- → Document baseline risk scores
-- → Present findings to stakeholders
--
-- 2. DEEP DIVE ANALYSIS (Week 2-3)
-- → Run VERSION 3 (Detailed) for each CRITICAL schema
-- → List specific PII columns discovered
-- → Map data relationships and dependencies
-- → Create data flow diagrams
--
-- 3. DICTIONARY PLANNING (Week 3-4)
-- → Use VERSION 3 results to build dictionary CSV files
-- → Extract actual values for dictionary terms
-- → Create regex patterns based on column names
-- → Define confidence scores and conditions
--
-- 4. COMPLIANCE REPORTING (Ongoing)
-- → Run VERSION 4 (Export-Ready) monthly/quarterly
-- → Track risk score trends over time
-- → Generate executive compliance reports
-- → Document remediation progress
--
-- 5. ROUTINE MONITORING (Ongoing)
-- → Run VERSION 1 (Basic Enhanced) weekly
-- → Alert on new tables in Person/HR schemas
-- → Monitor schema modification dates
-- → Track data volume growth
--
-- INTEGRATION RECOMMENDATIONS:
--
-- - Schedule VERSION 2 as SQL Agent Job (monthly)
-- - Create alerts for risk score increases >10 points
-- - Integrate results with data governance dashboard
-- - Link to data lineage visualization tools
-- - Feed into data catalog metadata
--
-- CUSTOMIZATION TIPS:
--
-- - Add industry-specific PII patterns (HIPAA, PCI-DSS, etc.)
-- - Adjust risk score weights based on business impact
-- - Modify priority thresholds to match org risk appetite
-- - Extend data categories for domain-specific classification
-- - Add columns for data retention policies
--
-- ============================================================================
-- PII DETECTION PATTERNS REFERENCE:
-- ============================================================================
--
-- HIGH RISK PATTERNS (Weight: 10x):
-- %Email%, %Mail% → Email addresses (unique identifiers)
-- %Phone%, %Mobile%, %Fax% → Phone numbers (direct contact)
-- %SSN%, %National%ID% → Government identifiers
-- %Password%, %PWD% → Authentication credentials
-- %Credit%, %Card% → Payment card information
--
-- MEDIUM RISK PATTERNS (Weight: 5x):
-- %FirstName%, %LastName% → Personal names
-- %MiddleName% → Additional name identifiers
-- %Address%, %Street% → Physical addresses
-- %City%, %PostalCode% → Location data
-- %BirthDate%, %DOB% → Age-sensitive information
--
-- LOW RISK PATTERNS (Weight: 2x):
-- %Title% → Honorifics (Mr., Dr., etc.)
-- %Suffix% → Name suffixes (Jr., Sr.)
-- %Gender%, %Sex% → Demographic data
-- %Marital% → Marital status
--
-- TO EXTEND PATTERNS:
-- 1. Add new CASE statements in PIIIndicators CTE
-- 2. Update risk score calculation if adding new risk levels
-- 3. Document new patterns in this reference section
-- 4. Test on sample schemas before production deployment
--
-- EXAMPLES OF EXTENSIONS:
-- - HIPAA: %Diagnosis%, %Prescription%, %Treatment%
-- - PCI-DSS: %CVV%, %CardHolder%, %Expiry%
-- - Industry: %Contract%, %Agreement%, %NDA%
--
-- ============================================================================
-- TROUBLESHOOTING:
-- ============================================================================
--
-- Issue: Row counts showing as 0
-- Solution: Run UPDATE STATISTICS or REBUILD indexes
--
-- Issue: STRING_AGG function not found
-- Solution: Upgrade to SQL Server 2017+ or use FOR XML PATH alternative
--
-- Issue: Performance slow on large databases
-- Solution: Add WHERE clause to filter specific schemas
-- Create indexed view on sys.columns for better performance
--
-- Issue: Risk scores seem too high/low
-- Solution: Adjust weights in risk score formula (currently 10/5/2)
-- Review and update PII pattern matches
--
-- Issue: Missing expected schemas in output
-- Solution: Check schema name against exclusion list (line 41)
-- Verify schema contains actual tables (not just empty schema)
--
-- ============================================================================
-- END OF SCRIPT
-- ============================================================================Column Analysis
Data classification is fundamental to compliance with GDPR, HIPAA, and other regulations that require special handling of sensitive data categories. This automated classification helps identify data elements that require enhanced protection and specific dictionary treatment.
Compliance Impact: This analysis identifies approximately 45+ potentially sensitive columns across the database, including:
HIGH Sensitivity: Email addresses, phone numbers, credit card information
MEDIUM Sensitivity: Names, addresses, demographic data
LOW Sensitivity: Product codes, business identifiers
Why This Matters: Data classification is fundamental to GDPR, HIPAA, and other regulations requiring special handling of sensitive data categories. This automated approach ensures comprehensive coverage and reduces manual review time.

Expected Results
Approximately 45+ sensitive columns identified across AdventureWorks2022 with automated risk assessment:
Person
PII_HIGH
8
EmailAddress, BirthDate, AddressLine1
150-250
URGENT/HIGH
Person
PII_MEDIUM
15
FirstName, LastName, PhoneNumber, City
80-150
HIGH/MEDIUM
Person
PII_LOW
3
Title, Suffix
30-50
LOW
HumanResources
PII_MEDIUM
5
JobTitle, HireDate
40-80
MEDIUM
Sales
PII_MEDIUM
8
CustomerID (string), SalesPersonID
60-120
MEDIUM
Sales
PII_HIGH
2
CreditCardApprovalCode
100-150
HIGH
Understanding the Output
schema_name, table_name, column_name: Identifies the exact location of sensitive data
data_type: SQL data type (helps validate PII detection accuracy)
max_length: Maximum field length (context for string data)
is_nullable: Whether column allows NULL values (data quality indicator)
sensitivity_level: Risk classification
PII_CRITICAL: Passwords, SSN, credit cards, medical records - requires immediate protection
PII_HIGH: Email, birth dates, full addresses, biometric data - direct identifiers
PII_MEDIUM: Names, phone numbers, city, salary - indirect identifiers when combined
PII_LOW: Titles, country, language - minimal risk, still personal data
data_category: Functional grouping for dictionary organization
contact_info- Email, phone, faxpersonal_name- First, last, middle nameslocation_data- Addresses, GPS coordinatessensitive_personal- Health, biometric, ethnicity (GDPR Article 9)financial_data- Credit cards, salary, tax informationauthentication- Passwords, login credentialsgovernment_id- SSN, passport, driver's licensepersonal_title- Mr., Dr., Jr.identifier- Employee ID, Customer IDdemographic- Gender, marital status, language
gdpr_article: Maps to GDPR compliance requirements
Art_9_Special_Categories- Requires explicit consent and heightened protectionArt_32_Security- Mandates technical security measuresArt_4_1_Personal_Data- General personal data definition
recommended_retention: Suggested data lifecycle
Active_Account_Only- Delete immediately when account closes7_years- Standard business/financial records retention10_years- Medical/health records retentionIndefinite- Core identity data with legitimate ongoing need5_years- Default retention for other personal data
table_row_count: Number of records in the table (impact assessment)
risk_impact_score: Calculated metric combining sensitivity × data volume
Formula: Sensitivity weight (10-100) × LOG10(row_count + 1)
Higher scores indicate both sensitive data AND high volume
Used to prioritize remediation efforts
dictionary_priority: Action urgency for dictionary creation
IMMEDIATE: PII_CRITICAL with >10K rows - start today
URGENT: PII_CRITICAL or PII_HIGH with >50K rows - start this week
HIGH: PII_HIGH or PII_MEDIUM with >100K rows - start this month
MEDIUM: PII_MEDIUM standard volume - quarterly cycle
LOW: PII_LOW - routine monitoring
Key Insights
Person.EmailAddress emerges as URGENT priority with PII_HIGH classification and ~19,000 records, yielding a risk impact score of approximately 206. This should be the first dictionary created.
Name columns (FirstName, LastName, MiddleName) across Person schema total 15+ columns classified as PII_MEDIUM with similar high volume, making them HIGH priority for consolidated dictionary treatment.
HumanResources schema contains employee salary and hire date information classified as PII_MEDIUM, requiring separate dictionary handling due to different legal retention requirements (7 years vs indefinite).
No PII_CRITICAL columns detected in AdventureWorks2022, which is expected for a sample database. Production databases typically contain passwords (hashed), credit card numbers, or SSN requiring immediate protection.
Contact information dominates with 23+ columns across email, phone, and address categories - these can be grouped into 3 related dictionaries: Contact Info, Location Data, and Personal Names.
Sales schema shows CustomerID as string-based (varchar), flagged as PII_MEDIUM since string IDs could be email addresses or usernames. Numeric customer IDs would not be flagged.
Title column only flagged in Person/HumanResources schemas due to context-aware detection - the same column name in Production schema (product titles) correctly excluded.
Quality Validation Checklist:
This detailed analysis provides the foundation for creating targeted, effective data dictionaries in Stage 2.
Deliverable
Export Results to CSV:
Right-click query results → Export → CSV format
Save as
sensitive_columns_detail_[DATE].csvThis becomes your master PII inventory document
Create Prioritized Action Plan:
Week 1: IMMEDIATE/URGENT priorities - Person.EmailAddress dictionary - Any PII_CRITICAL findings (if discovered) Week 2-3: HIGH priorities - Personal names dictionary (FirstName, LastName, MiddleName) - Address/location dictionary (AddressLine, City, PostalCode) Week 4-6: MEDIUM priorities - Phone/contact dictionary - Demographic data dictionary - Employee data dictionary Ongoing: LOW priorities - Title/suffix dictionary - Quarterly review and updatesDocument False Positives/Negatives:
Review results for incorrectly classified columns
Document true PII columns missed by pattern matching
Note any context-specific exceptions
Feed findings back into pattern refinement
Create Data Category Summary:
Summary by Category: - contact_info: 8 columns → 1 dictionary - personal_name: 15 columns → 1 dictionary - location_data: 12 columns → 1 dictionary - sensitive_personal: 2 columns → 1 dictionary - demographic: 5 columns → 1 dictionary - personal_title: 3 columns → 1 dictionary Total: 6 dictionaries required for comprehensive coveragePrepare for Stage 1.3 (Data Lineage Analysis):
Use
table_namecolumn to identify high-impact tablesTables with 5+ PII columns require detailed lineage mapping
Focus on Person.Person, Person.Address, Person.EmailAddress
Generate Executive Summary:
GDPR PII Discovery Summary - AdventureWorks2022 Total PII Columns Identified: 45+ - CRITICAL Priority: 0 (none detected) - HIGH Priority: 10 columns affecting ~19K records - MEDIUM Priority: 28 columns affecting ~8K records - LOW Priority: 7 columns Schemas Requiring Immediate Attention: 1. Person (26 PII columns) - URGENT 2. HumanResources (5 PII columns) - HIGH 3. Sales (8 PII columns) - MEDIUM Recommended Next Steps: - Create 6 data dictionaries (prioritized list attached) - Implement tagging automation in Pentaho Data Catalog - Schedule quarterly PII discovery reviews - Estimated effort: x weeks for initial dictionary creation
In DBeaver run the following script:
-- ============================================================================
-- SENSITIVE COLUMN IDENTIFICATION FOR GDPR COMPLIANCE
-- ============================================================================
-- Purpose: Automatically detect PII columns using pattern matching and context
-- Output: Comprehensive list of sensitive columns with risk classification
-- Method: Multi-pattern matching with data type validation
-- ============================================================================
-- SENSITIVE COLUMN IDENTIFICATION FOR GDPR COMPLIANCE
-- ============================================================================
-- Purpose: Automatically detect PII columns using pattern matching and context
-- ============================================================================
-- SENSITIVE COLUMN IDENTIFICATION FOR GDPR COMPLIANCE
-- ============================================================================
-- Purpose: Automatically detect PII columns using pattern matching and context
-- Output: Comprehensive list of sensitive columns with risk classification
-- Method: Multi-pattern matching with data type validation and volume analysis
--
-- WHAT THIS SCRIPT DOES:
-- 1. Scans all non-system tables for columns matching PII patterns
-- 2. Classifies columns into 4 sensitivity levels (CRITICAL/HIGH/MEDIUM/LOW)
-- 3. Groups columns into 10 functional categories for dictionary organization
-- 4. Maps columns to GDPR articles for compliance documentation
-- 5. Calculates risk impact scores based on sensitivity × data volume
-- 6. Provides actionable priority rankings for dictionary creation
--
-- DETECTION METHODOLOGY:
-- - Pattern matching on column names (50+ patterns)
-- - Data type validation (string-based IDs flagged, numeric IDs excluded)
-- - Schema context awareness (Title in Person vs Production)
-- - Row count analysis for impact assessment
-- ============================================================================
;WITH ColumnInventory AS (
SELECT
SCHEMA_NAME(t.schema_id) AS schema_name,
t.name AS table_name,
c.name AS column_name,
TYPE_NAME(c.user_type_id) AS data_type,
c.max_length,
c.is_nullable,
c.column_id,
(SELECT SUM(p.rows) FROM sys.partitions p WHERE p.object_id = t.object_id AND p.index_id IN (0, 1)) AS table_row_count,
CASE
WHEN c.name LIKE '%Password%' OR c.name LIKE '%PWD%' OR c.name LIKE '%Pass%' THEN 'PII_CRITICAL'
WHEN c.name LIKE '%SSN%' OR c.name LIKE '%Social%Security%' THEN 'PII_CRITICAL'
WHEN c.name LIKE '%National%ID%' OR c.name LIKE '%NationalID%' THEN 'PII_CRITICAL'
WHEN c.name LIKE '%Credit%Card%' OR c.name LIKE '%CreditCard%' OR c.name LIKE '%CCNumber%' THEN 'PII_CRITICAL'
WHEN c.name LIKE '%CVV%' OR c.name LIKE '%CVC%' OR c.name LIKE '%SecurityCode%' THEN 'PII_CRITICAL'
WHEN c.name LIKE '%Account%Number%' AND TYPE_NAME(c.user_type_id) IN ('varchar', 'nvarchar', 'char', 'nchar') THEN 'PII_CRITICAL'
WHEN c.name LIKE '%Passport%' OR c.name LIKE '%License%Number%' OR c.name LIKE '%DL%Number%' THEN 'PII_CRITICAL'
WHEN c.name LIKE '%Medical%Record%' OR c.name LIKE '%Health%ID%' THEN 'PII_CRITICAL'
WHEN c.name LIKE '%Email%' OR c.name LIKE '%Mail%' OR c.name LIKE '%E_Mail%' THEN 'PII_HIGH'
WHEN c.name LIKE '%Address%Line%' OR c.name LIKE '%Street%Address%' OR c.name LIKE '%Physical%Address%' THEN 'PII_HIGH'
WHEN c.name LIKE '%BirthDate%' OR c.name LIKE '%DOB%' OR c.name LIKE '%DateOfBirth%' THEN 'PII_HIGH'
WHEN c.name LIKE '%IP%Address%' OR c.name LIKE '%IPAddress%' THEN 'PII_HIGH'
WHEN c.name LIKE '%Biometric%' OR c.name LIKE '%Fingerprint%' OR c.name LIKE '%FaceID%' THEN 'PII_HIGH'
WHEN c.name LIKE '%GPS%' OR c.name LIKE '%Latitude%' OR c.name LIKE '%Longitude%' OR c.name LIKE '%GeoLocation%' THEN 'PII_HIGH'
WHEN c.name LIKE '%Tax%ID%' OR c.name LIKE '%TaxID%' OR c.name LIKE '%EIN%' THEN 'PII_HIGH'
WHEN c.name LIKE '%FirstName%' OR c.name LIKE '%First%Name%' OR c.name LIKE '%Given%Name%' OR c.name LIKE '%Forename%' THEN 'PII_MEDIUM'
WHEN c.name LIKE '%LastName%' OR c.name LIKE '%Last%Name%' OR c.name LIKE '%Surname%' OR c.name LIKE '%Family%Name%' THEN 'PII_MEDIUM'
WHEN c.name LIKE '%MiddleName%' OR c.name LIKE '%Middle%Name%' OR c.name LIKE '%Middle%Initial%' THEN 'PII_MEDIUM'
WHEN c.name LIKE '%Phone%' OR c.name LIKE '%Mobile%' OR c.name LIKE '%Telephone%' OR c.name LIKE '%Cell%' THEN 'PII_MEDIUM'
WHEN c.name LIKE '%Fax%' OR c.name LIKE '%Fax%Number%' THEN 'PII_MEDIUM'
WHEN c.name LIKE '%Address%' AND c.name NOT LIKE '%Email%' AND c.name NOT LIKE '%IP%' THEN 'PII_MEDIUM'
WHEN c.name LIKE '%City%' OR c.name LIKE '%Town%' OR c.name LIKE '%Municipality%' THEN 'PII_MEDIUM'
WHEN c.name LIKE '%PostalCode%' OR c.name LIKE '%Postal%Code%' OR c.name LIKE '%Zip%' OR c.name LIKE '%ZIP%Code%' THEN 'PII_MEDIUM'
WHEN c.name LIKE '%State%' OR c.name LIKE '%Province%' OR c.name LIKE '%Region%' THEN 'PII_MEDIUM'
WHEN c.name LIKE '%Gender%' OR c.name LIKE '%Sex%' THEN 'PII_MEDIUM'
WHEN c.name LIKE '%Marital%Status%' OR c.name LIKE '%MaritalStatus%' THEN 'PII_MEDIUM'
WHEN c.name LIKE '%Salary%' OR c.name LIKE '%Wage%' OR c.name LIKE '%Income%' OR c.name LIKE '%Compensation%' THEN 'PII_MEDIUM'
WHEN c.name LIKE '%Employee%ID%' OR c.name LIKE '%EmployeeID%' OR c.name LIKE '%Staff%ID%' THEN 'PII_MEDIUM'
WHEN (c.name LIKE '%Customer%ID%' OR c.name LIKE '%CustomerID%') AND TYPE_NAME(c.user_type_id) IN ('varchar', 'nvarchar', 'char', 'nchar') THEN 'PII_MEDIUM'
WHEN c.name LIKE '%User%Name%' OR c.name LIKE '%Username%' OR c.name LIKE '%Login%' THEN 'PII_MEDIUM'
WHEN c.name LIKE '%Age%' OR c.name LIKE '%Birth%Year%' THEN 'PII_MEDIUM'
WHEN c.name LIKE '%Title%' AND SCHEMA_NAME(t.schema_id) IN ('Person', 'HumanResources', 'Customer') THEN 'PII_LOW'
WHEN c.name LIKE '%Suffix%' OR c.name LIKE '%Name%Suffix%' THEN 'PII_LOW'
WHEN c.name LIKE '%Nickname%' OR c.name LIKE '%Preferred%Name%' THEN 'PII_LOW'
WHEN c.name LIKE '%Country%' OR c.name LIKE '%Nation%' THEN 'PII_LOW'
WHEN c.name LIKE '%Language%' OR c.name LIKE '%Locale%' THEN 'PII_LOW'
WHEN c.name LIKE '%Ethnicity%' OR c.name LIKE '%Race%' OR c.name LIKE '%Nationality%' THEN 'PII_LOW'
WHEN c.name LIKE '%Religion%' OR c.name LIKE '%Political%' THEN 'PII_LOW'
WHEN c.name LIKE '%Membership%' OR c.name LIKE '%Union%' THEN 'PII_LOW'
ELSE NULL
END AS sensitivity_level,
CASE
WHEN c.name LIKE '%Email%' OR c.name LIKE '%Mail%' OR c.name LIKE '%Phone%' OR c.name LIKE '%Mobile%' OR c.name LIKE '%Fax%' OR c.name LIKE '%Telephone%' THEN 'contact_info'
WHEN c.name LIKE '%FirstName%' OR c.name LIKE '%LastName%' OR c.name LIKE '%MiddleName%' OR c.name LIKE '%Surname%' OR c.name LIKE '%Given%Name%' OR c.name LIKE '%Forename%' OR c.name LIKE '%Family%Name%' OR c.name LIKE '%Nickname%' THEN 'personal_name'
WHEN c.name LIKE '%Address%' OR c.name LIKE '%Street%' OR c.name LIKE '%City%' OR c.name LIKE '%PostalCode%' OR c.name LIKE '%Zip%' OR c.name LIKE '%State%' OR c.name LIKE '%Province%' OR c.name LIKE '%Country%' OR c.name LIKE '%GPS%' OR c.name LIKE '%Latitude%' OR c.name LIKE '%Longitude%' THEN 'location_data'
WHEN c.name LIKE '%BirthDate%' OR c.name LIKE '%DOB%' OR c.name LIKE '%Age%' OR c.name LIKE '%Biometric%' OR c.name LIKE '%Health%' OR c.name LIKE '%Medical%' OR c.name LIKE '%Ethnicity%' OR c.name LIKE '%Race%' OR c.name LIKE '%Religion%' OR c.name LIKE '%Political%' OR c.name LIKE '%Union%' THEN 'sensitive_personal'
WHEN c.name LIKE '%Credit%' OR c.name LIKE '%Account%Number%' OR c.name LIKE '%CVV%' OR c.name LIKE '%Salary%' OR c.name LIKE '%Wage%' OR c.name LIKE '%Income%' OR c.name LIKE '%Tax%ID%' THEN 'financial_data'
WHEN c.name LIKE '%Password%' OR c.name LIKE '%PWD%' OR c.name LIKE '%User%Name%' OR c.name LIKE '%Login%' THEN 'authentication'
WHEN c.name LIKE '%SSN%' OR c.name LIKE '%National%ID%' OR c.name LIKE '%Passport%' OR c.name LIKE '%License%' OR c.name LIKE '%DL%Number%' THEN 'government_id'
WHEN c.name LIKE '%Title%' OR c.name LIKE '%Suffix%' THEN 'personal_title'
WHEN c.name LIKE '%Employee%ID%' OR c.name LIKE '%Customer%ID%' OR c.name LIKE '%Staff%ID%' THEN 'identifier'
WHEN c.name LIKE '%Gender%' OR c.name LIKE '%Marital%' OR c.name LIKE '%Language%' THEN 'demographic'
ELSE NULL
END AS data_category,
CASE
WHEN c.name LIKE '%Biometric%' OR c.name LIKE '%Health%' OR c.name LIKE '%Medical%' OR c.name LIKE '%Ethnicity%' OR c.name LIKE '%Race%' OR c.name LIKE '%Religion%' OR c.name LIKE '%Political%' OR c.name LIKE '%Union%' THEN 'Art_9_Special_Categories'
WHEN c.name LIKE '%Password%' OR c.name LIKE '%Credit%' OR c.name LIKE '%SSN%' THEN 'Art_32_Security'
ELSE 'Art_4_1_Personal_Data'
END AS gdpr_article,
CASE
WHEN c.name LIKE '%Password%' THEN 'Active_Account_Only'
WHEN c.name LIKE '%Credit%' OR c.name LIKE '%Payment%' THEN '7_years'
WHEN c.name LIKE '%Tax%' OR c.name LIKE '%Salary%' THEN '7_years'
WHEN c.name LIKE '%Medical%' OR c.name LIKE '%Health%' THEN '10_years'
WHEN c.name LIKE '%Email%' OR c.name LIKE '%Phone%' THEN '7_years'
WHEN c.name LIKE '%Address%' THEN '7_years'
WHEN c.name LIKE '%BirthDate%' OR c.name LIKE '%FirstName%' OR c.name LIKE '%LastName%' THEN 'Indefinite'
ELSE '5_years'
END AS recommended_retention
FROM sys.tables t
INNER JOIN sys.columns c ON t.object_id = c.object_id
WHERE SCHEMA_NAME(t.schema_id) NOT IN ('sys', 'INFORMATION_SCHEMA', 'guest')
)
SELECT
schema_name,
table_name,
column_name,
data_type,
max_length,
CASE WHEN is_nullable = 1 THEN 'YES' ELSE 'NO' END AS is_nullable,
sensitivity_level,
data_category,
gdpr_article,
recommended_retention,
table_row_count,
CAST(CASE sensitivity_level WHEN 'PII_CRITICAL' THEN 100 WHEN 'PII_HIGH' THEN 50 WHEN 'PII_MEDIUM' THEN 25 WHEN 'PII_LOW' THEN 10 ELSE 0 END * LOG10(ISNULL(table_row_count, 0) + 1) AS DECIMAL(10,2)) AS risk_impact_score,
CASE
WHEN sensitivity_level = 'PII_CRITICAL' AND table_row_count > 10000 THEN 'IMMEDIATE'
WHEN sensitivity_level = 'PII_CRITICAL' THEN 'URGENT'
WHEN sensitivity_level = 'PII_HIGH' AND table_row_count > 50000 THEN 'URGENT'
WHEN sensitivity_level = 'PII_HIGH' THEN 'HIGH'
WHEN sensitivity_level = 'PII_MEDIUM' AND table_row_count > 100000 THEN 'HIGH'
WHEN sensitivity_level = 'PII_MEDIUM' THEN 'MEDIUM'
ELSE 'LOW'
END AS dictionary_priority
FROM ColumnInventory
WHERE sensitivity_level IS NOT NULL
ORDER BY
CASE sensitivity_level WHEN 'PII_CRITICAL' THEN 1 WHEN 'PII_HIGH' THEN 2 WHEN 'PII_MEDIUM' THEN 3 WHEN 'PII_LOW' THEN 4 END,
risk_impact_score DESC,
schema_name,
table_name,
column_name;In DBeaver
-- Analyze relationships and dependencies for lineage mapping
WITH TableRelationships AS (
SELECT
OBJECT_SCHEMA_NAME(fk.parent_object_id) AS child_schema,
OBJECT_NAME(fk.parent_object_id) AS child_table,
COL_NAME(fkc.parent_object_id, fkc.parent_column_id) AS child_column,
OBJECT_SCHEMA_NAME(fk.referenced_object_id) AS parent_schema,
OBJECT_NAME(fk.referenced_object_id) AS parent_table,
COL_NAME(fkc.referenced_object_id, fkc.referenced_column_id) AS parent_column,
fk.name AS constraint_name
FROM sys.foreign_keys fk
JOIN sys.foreign_key_columns fkc ON fk.object_id = fkc.constraint_object_id
)
SELECT
parent_schema + '.' + parent_table AS source_table,
COUNT(DISTINCT child_schema + '.' + child_table) AS dependent_tables,
STRING_AGG(child_schema + '.' + child_table, '; ') AS dependencies
FROM TableRelationships
GROUP BY parent_schema, parent_table
HAVING COUNT(DISTINCT child_schema + '.' + child_table) > 2
ORDER BY dependent_tables DESC;x
x
x
x
x
In DBeaver run the following script:
-- Extract personal identifiers with business context and usage frequency
WITH PersonalDataElements AS (
SELECT DISTINCT
p.Title,
p.FirstName,
p.MiddleName,
p.LastName,
p.Suffix,
pa.AddressLine1,
pa.AddressLine2,
pa.City,
sp.StateProvinceCode,
sp.Name as StateProvinceName,
cr.CountryRegionCode,
cr.Name as CountryRegionName,
ea.EmailAddress,
pp.PhoneNumber,
pnt.Name as PhoneNumberType
FROM Person.Person p
LEFT JOIN Person.BusinessEntityAddress bea ON p.BusinessEntityID = bea.BusinessEntityID
LEFT JOIN Person.Address pa ON bea.AddressID = pa.AddressID
LEFT JOIN Person.StateProvince sp ON pa.StateProvinceID = sp.StateProvinceID
LEFT JOIN Person.CountryRegion cr ON sp.CountryRegionCode = cr.CountryRegionCode
LEFT JOIN Person.EmailAddress ea ON p.BusinessEntityID = ea.BusinessEntityID
LEFT JOIN Person.PersonPhone pp ON p.BusinessEntityID = pp.BusinessEntityID
LEFT JOIN Person.PhoneNumberType pnt ON pp.PhoneNumberTypeID = pnt.PhoneNumberTypeID
WHERE p.PersonType = 'IN' -- Individual customers only for GDPR scope
),
PersonalDataUsage AS (
-- Analyze how personal data is used across business processes
SELECT
'Email Usage' AS usage_context,
COUNT(DISTINCT ea.EmailAddress) AS unique_values,
COUNT(*) AS total_records,
'Customer communication, account verification' AS business_purpose
FROM Person.EmailAddress ea
UNION ALL
SELECT
'Phone Usage' AS usage_context,
COUNT(DISTINCT pp.PhoneNumber) AS unique_values,
COUNT(*) AS total_records,
'Customer support, delivery coordination' AS business_purpose
FROM Person.PersonPhone pp
UNION ALL
SELECT
'Address Usage' AS usage_context,
COUNT(DISTINCT CONCAT(pa.AddressLine1, pa.City)) AS unique_values,
COUNT(*) AS total_records,
'Shipping, billing, location services' AS business_purpose
FROM Person.Address pa
)
SELECT * FROM PersonalDataUsage;
-- Generate personal titles dictionary
SELECT DISTINCT
Title AS term,
'personal_title' AS category,
'GDPR_PII' AS compliance_tag,
'Personal title or honorific' AS description,
COUNT(*) OVER (PARTITION BY Title) AS frequency
FROM Person.Person
WHERE Title IS NOT NULL
ORDER BY frequency DESC;x
x
x
x
x
x
x
term,category,compliance_classification,gdpr_article,data_subject_rights,retention_period,description,business_context
Mr.,personal_title,PII_LOW,Art_4_1,access_rectify_delete,indefinite,Personal title or honorific,Customer identification and communication
Ms.,personal_title,PII_LOW,Art_4_1,access_rectify_delete,indefinite,Personal title or honorific,Customer identification and communication
Mrs.,personal_title,PII_LOW,Art_4_1,access_rectify_delete,indefinite,Personal title or honorific,Customer identification and communication
Dr.,personal_title,PII_LOW,Art_4_1,access_rectify_delete,indefinite,Professional title,Customer identification and communication
Sr.,personal_title,PII_LOW,Art_4_1,access_rectify_delete,indefinite,Senior designation title,Customer identification and communication
Sra.,personal_title,PII_LOW,Art_4_1,access_rectify_delete,indefinite,Spanish female title,Customer identification and communication
email_address,contact_info,PII_HIGH,Art_4_1,access_rectify_delete_portability,7_years,Electronic contact information,Customer communication and account management
phone_number,contact_info,PII_MEDIUM,Art_4_1,access_rectify_delete,7_years,Telephone contact information,Customer support and delivery coordination
home_address,location_data,PII_HIGH,Art_4_1,access_rectify_delete,7_years,Residential address information,Shipping and billing services
work_address,location_data,PII_MEDIUM,Art_4_1,access_rectify_delete,7_years,Business address information,B2B communication and delivery
first_name,personal_name,PII_MEDIUM,Art_4_1,access_rectify_delete,indefinite,Given name of individual,Customer identification and personalization
last_name,personal_name,PII_MEDIUM,Art_4_1,access_rectify_delete,indefinite,Family name of individual,Customer identification and legal documentation
middle_name,personal_name,PII_LOW,Art_4_1,access_rectify_delete,indefinite,Middle name or initial,Complete customer identification
suffix,personal_name,PII_LOW,Art_4_1,access_rectify_delete,indefinite,Name suffix (Jr. Sr. III),Customer identification and family relationships
birth_date,sensitive_personal,PII_HIGH,Art_9,explicit_consent_required,special_category,Date of birth,Age verification and personalized servicesx
x
x
x
{
"name": "Personal Data Identifiers - GDPR Compliant",
"description": "Dictionary for identifying personal data elements subject to GDPR protection and data subject rights",
"category": "Data Privacy & Protection",
"version": "1.0",
"lastUpdated": "2025-08-06",
"regulatoryFramework": {
"primary": "GDPR",
"articles": ["Art_4_1", "Art_9", "Art_17", "Art_20"],
"applicability": "EU residents and EEA",
"dataSubjectRights": ["access", "rectification", "erasure", "portability", "restriction"]
},
"caseSensitive": false,
"matchType": "exact",
"automatedActions": {
"highSensitivity": "encrypt_and_log",
"mediumSensitivity": "log_access",
"specialCategory": "explicit_consent_required"
},
"auditRequirements": {
"logAccess": true,
"logModification": true,
"retentionTracking": true,
"consentTracking": true
},
"columns": [
{
"name": "term",
"description": "Personal data identifier or value",
"required": true
},
{
"name": "category",
"description": "Type of personal data (personal_title, contact_info, location_data, personal_name, sensitive_personal)",
"required": true
},
{
"name": "compliance_classification",
"description": "GDPR sensitivity level (PII_LOW, PII_MEDIUM, PII_HIGH)",
"required": true
},
{
"name": "gdpr_article",
"description": "Relevant GDPR article reference",
"required": true
},
{
"name": "data_subject_rights",
"description": "Applicable data subject rights for this data type",
"required": true
},
{
"name": "retention_period",
"description": "Legal retention requirement",
"required": true
},
{
"name": "description",
"description": "Business definition of the data element",
"required": true
},
{
"name": "business_context",
"description": "Business purpose and usage context for DPIA requirements",
"required": true
}
],
"complianceValidation": {
"requiredFields": ["term", "category", "compliance_classification"],
"sensitivityLevels": ["PII_LOW", "PII_MEDIUM", "PII_HIGH"],
"retentionValidation": true
}
}x
x
x
x
x
x
x
x
Personal Data Identifiers Dictionary
This method demonstrates creating dictionaries entirely through the Pentaho Data Catalog interface by uploading CSV files and configuring JSON definitions through the UI.
The name and category are mandatory fields that help organize dictionaries for data governance. Categories assist in data discovery but are not related to Business Glossary categories
Navigate to: Data Operations > Dictionaries
Click: Add Dictionary
Configure Basic Information:
Name:
Personal Data Identifiers - GDPR Compliant
Description
Dictionary for identifying personal data elements subject to GDPR protection
Category
Select Sensitive or create new category GDPR_Compliance
Dictionary Status
Ensure it's enabled (toggle switch)
Select Upload Dictionary method
Upload CSV File: Upload
personal_data_identifiers.csv
term
Mr.
Ms.
Mrs.
Dr.
Sr.
Sra.
first_name
last_name
middle_name
email_address
phone_number
birth_date
home_address
work_addressSet Confidence Score:
0.7(70% confidence for similarity matching)

Set the following Column Name Regex - Metadata Hints.
Overall Column Name Confidence Score
0.4
[Ff]irst[Nn]ame|[Ff]Name|given_name
0.8
[Ll]ast[Nn]ame|[Ss]urname|family_name
0.8
[Ee]mail|[Ee]mail[Aa]ddress|contact_email
0.9
[Pp]hone|[Pp]hone[Nn]umber|[Tt]elephone
0.8

Step 4: Configure Conditions Create conditions that determine when to apply the dictionary:
Primary Condition:
Attribute: Confidence Score
Operator: Greater than or equal to
Value:
0.6
Secondary Condition (OR):
Attribute: Metadata Score
Operator: Greater than or equal to
Value:
0.7
Additional Condition (AND):
Attribute: Column Cardinality
Operator: Greater than
Value:
5(ensures column has enough data variety)
Why these conditions: This creates a flexible matching system where data can be classified either by content similarity or column name patterns, ensuring comprehensive coverage.
Step 5: Configure Actions Define what happens when conditions are met:
Add Action - Assign Tags:
Tag 1:
PIITag 2:
GDPR_Personal_DataTag 3:
Sensitive
Add Action - Assign Table Tags:
Table Tag:
Contains_Personal_Data
Add Action - Assign Business Terms:
Browse and select relevant business terms from your glossary
Examples:
Personal Identifier,Customer Data,Privacy Protected
Step 6: Create Dictionary Click Create Dictionary to finalize the setup.
x
x
Create the following Dictionary folders:
cd
x
Navigate to Dictionaries:
In the left navigation menu, click Data Operations
On the Data Identification Methods card, click Dictionaries
Import Process:
Click Import button on the Dictionaries page
Upload each ZIP file containing the dictionary CSV and JSON
Click Continue to start the upload process
Monitor progress using View Workers to track the import
Validation:
Verify each dictionary appears in the Dictionaries list
Check the Rules tab to review the JSON configuration logic
Confirm categories and tags are properly assigned
x
Last updated
Was this helpful?
