Database Inventory

Overview

In this foundational hands-on workshop, you'll learn how to conduct a comprehensive database inventory that serves as the cornerstone of any effective data governance program. We'll guide you through the systematic process of cataloging and documenting all data assets across your organization, emphasizing that you cannot govern what you don't know exists.

By the end of this workshop, you will be able to:

Conduct systematic database discovery and cataloging
Document data assets for comprehensive organizational visibility
Assess data-related risks and identify compliance gaps
Establish clear data ownership and stewardship assignments
Map governance policies to specific datasets and systems
Create baseline documentation that supports regulatory compliance requirements

Workshop Attendees:

Database Administrator (Technical Lead)
Data Governance Officer
Business Process Owners for Each Schema
Information Security Representative
Compliance/Legal Representative (if regulated industry)

This workshop will provide you with the essential foundation needed for effective data asset management, risk assessment, ownership assignment, policy application, and compliance mapping across your data landscape.

The following workshops are designed to give you an overview of the key principles when conducting an analysis of your database inventory.

Nowadays, Data discovery in large data warehouses leverages several ML algorithms, each suited for different aspects of understanding and extracting insights from massive datasets. Here are the key approaches:

K-means, DBSCAN, and Hierarchical Clustering help identify natural groupings in data without predefined labels. These are particularly useful for customer segmentation, anomaly detection, and finding similar data patterns across different tables or datasets.

Random Forests, Gradient Boosting (XGBoost, LightGBM), and Decision Trees are employed to understand relationships between variables and predict outcomes. They're especially valuable for identifying which features are most important in determining specific business outcomes.

PCA (Principal Component Analysis), t-SNE, and Autoencoders help visualize and understand high-dimensional data by reducing it to manageable dimensions while preserving important patterns. This is crucial when dealing with hundreds or thousands of variables.

Apriori and FP-Growth algorithms discover interesting relationships between variables in large databases, commonly used for market basket analysis and finding co-occurring patterns in transactional data.

Isolation Forests, One-Class SVM, and Local Outlier Factor (LOF) identify unusual patterns that don't conform to expected behavior, critical for fraud detection, quality control, and data quality assessment.

Deep Neural Networks, LSTMs for time-series data, and Graph Neural Networks for understanding complex relationships in interconnected data. These are particularly powerful for unstructured data like text fields, logs, or when dealing with complex temporal patterns.

Topic Modeling (LDA), Named Entity Recognition, and Transformer models process text fields in databases to extract meaningful information from unstructured text columns, comments, or documentation.

Kolmogorov-Smirnov tests, Benford's Law analysis, and correlation analysis help understand data distributions, detect data quality issues, and identify statistical relationships between columns.

Inventory Analysis

Let's jump straight into the deep end ..!

There's are a ton of tools that will help you scan and analyze your DW - Data Warehouse. Once connected to the AdventureWorks2022 database we're going to run an Inventory query:

Automated PII detection and risk assessment
Usage pattern analysis for optimization
Compliance mapping to meet regulatory requirements
Operational flags for maintenance & security
Data Quality indicators & sizing metrics

DBeaver

DBeaver Community is a free cross-platform database tool for developers, database administrators, analysts, and everyone working with data. It supports all popular SQL databases like MySQL, MariaDB, PostgreSQL, SQLite, Apache Family, and more.

From the drop-down list, select SQL Server

Enter the connection details:

Setting

Value

Host

localhost / pdc.pentaho.lab

Port

1443

Database / Schema

AdventureWorks2022

Username

Password

StrongPassword123

Show All Schemas

✅ ON

Trust Server Certificate

✅ ON

Click: Test Connection.
Download the database driver.

Successful connection ..!

Expand the Schemas.

Inventory Analysis

We're going to create a temporary table - the advantage of this approach is that multiple workshop participants can each create their own #DataInventory table simultaneously without conflicts, since each session gets its own copy.

Execute the following script that conducts a comprehensive analysis of AdventureWorks2022 database, holding the results in a tmp table.

-- ============================================================================
-- ENHANCED GOVERNANCE-FOCUSED INVENTORY QUERY v2.0
-- ============================================================================
-- Purpose: Comprehensive database inventory with governance, compliance, 
--          security, and operational metadata for enterprise data management
-- 
-- Features:
-- • Automated PII detection and risk assessment
-- • Usage pattern analysis for optimization
-- • Compliance mapping with regulatory requirements
-- • Operational flags for maintenance and security
-- • Data quality indicators and sizing metrics
-- 
-- Target Audience: Data governance teams, DBAs, compliance officers, security
-- Database: AdventureWorks2022 (adaptable to any SQL Server database)
-- ============================================================================

-- ============================================================================
-- SECTION 1: DATABASE CONNECTION AND ENVIRONMENT VERIFICATION
-- ============================================================================
-- Connect to target database (modify as needed)
USE AdventureWorks2022;

-- Verify connection and capture environment information
SELECT
    @@SERVERNAME AS server_name,                       -- SQL Server instance name
    DB_NAME() AS database_name,                        -- Current database name
    @@VERSION AS sql_version,                          -- SQL Server version info
    GETDATE() AS [current_time],                       -- Current timestamp (bracketed to avoid reserved keyword)
    USER_NAME() AS [current_user],                     -- Current database user (bracketed to avoid reserved keyword)
    SYSTEM_USER AS login_name;                         -- Current system login name

-- Display connection confirmation messages
PRINT 'Connected to database: ' + DB_NAME();
PRINT 'Starting governance inventory analysis at: ' + CAST(GETDATE() AS VARCHAR(50));

-- ============================================================================
-- SECTION 2: TEMPORARY TABLE CREATION FOR RESULTS STORAGE
-- ============================================================================
-- Drop existing temp table if it exists (for re-runs)
IF OBJECT_ID('tempdb..#DataGovernanceInventory') IS NOT NULL
    DROP TABLE #DataGovernanceInventory;

-- Create comprehensive temporary table to store inventory results
CREATE TABLE #DataGovernanceInventory (
    -- ========================================================================
    -- CORE IDENTIFICATION FIELDS
    -- ========================================================================
    inventory_id INT IDENTITY(1,1) PRIMARY KEY,        -- Unique identifier for each record
    schema_name NVARCHAR(128) NOT NULL,                -- Database schema name
    table_name NVARCHAR(128) NOT NULL,                 -- Table name within schema
    full_table_name NVARCHAR(257) NOT NULL,            -- Fully qualified table name
    
    -- ========================================================================
    -- STRUCTURAL ANALYSIS METRICS
    -- ========================================================================
    total_columns INT NOT NULL,                        -- Total number of columns
    required_columns INT NOT NULL,                     -- Count of NOT NULL columns
    identity_columns INT NOT NULL,                     -- Count of identity/auto-increment columns
    computed_columns INT NOT NULL,                     -- Count of computed/calculated columns
    
    -- ========================================================================
    -- STORAGE AND SIZE METRICS
    -- ========================================================================
    estimated_rows BIGINT NOT NULL,                    -- Approximate row count
    total_size_mb DECIMAL(10,2) NOT NULL,              -- Total allocated space in MB
    used_size_mb DECIMAL(10,2) NOT NULL,               -- Actually used space in MB
    data_size_mb DECIMAL(10,2) NOT NULL,               -- Pure data size in MB
    
    -- ========================================================================
    -- TEMPORAL METADATA
    -- ========================================================================
    create_date DATETIME NOT NULL,                     -- Table creation timestamp
    modify_date DATETIME NOT NULL,                     -- Last structural modification
    days_since_modified INT,                           -- Age of last modification
    
    -- ========================================================================
    -- USAGE PATTERN ANALYTICS
    -- ========================================================================
    total_read_operations BIGINT,                      -- Total SELECT operations since restart
    total_write_operations BIGINT,                     -- Total DML operations since restart
    days_since_last_access INT,                        -- Days since last read operation
    
    -- ========================================================================
    -- DATA CLASSIFICATION AND QUALITY
    -- ========================================================================
    size_category VARCHAR(20) NOT NULL,                -- Table size classification
    potential_pii_columns INT NOT NULL,                -- Count of suspected PII columns
    pii_risk_level VARCHAR(20) NOT NULL,               -- PII risk assessment
    
    -- ========================================================================
    -- BUSINESS CONTEXT AND PURPOSE
    -- ========================================================================
    business_purpose NVARCHAR(500) NOT NULL,           -- Business function mapping
    data_sensitivity VARCHAR(50) NOT NULL,             -- Data sensitivity classification
    compliance_scope NVARCHAR(500) NOT NULL,           -- Applicable regulations
    
    -- ========================================================================
    -- OPERATIONAL MANAGEMENT
    -- ========================================================================
    operational_flag VARCHAR(100) NOT NULL,            -- Operational recommendations/alerts
    
    -- ========================================================================
    -- EXTENDED GOVERNANCE FIELDS (FOR FUTURE ENHANCEMENT)
    -- ========================================================================
    data_owner NVARCHAR(100) NULL,                    -- Business data owner
    data_steward NVARCHAR(100) NULL,                  -- Technical data steward
    retention_period_years INT NULL,                   -- Data retention requirement in years
    backup_frequency VARCHAR(50) NULL,                -- Required backup frequency
    archive_eligible BIT DEFAULT 0,                   -- Whether table is archival candidate
    
    -- ========================================================================
    -- AUDIT AND TRACKING
    -- ========================================================================
    analysis_timestamp DATETIME DEFAULT GETDATE(),     -- When this analysis was performed
    analysis_user NVARCHAR(128) DEFAULT SYSTEM_USER   -- Who performed the analysis
);

PRINT 'Temporary table #DataGovernanceInventory created successfully';

-- ============================================================================
-- SECTION 3: EXECUTE COMPREHENSIVE INVENTORY QUERY AND POPULATE TABLE
-- ============================================================================
PRINT 'Starting comprehensive data inventory analysis...';

-- ============================================================================
-- CTE 1: TABLE SIZE METRICS
-- ============================================================================
-- Purpose: Pre-calculate accurate storage metrics to avoid complex GROUP BY
-- Benefits: Better performance, cleaner main query, separate data/used/total sizes
WITH TableSizeMetrics AS (
    SELECT 
        t.object_id,
        -- Total allocated space including indexes and unused space
        SUM(a.total_pages) * 8 / 1024.0 AS size_mb,
        -- Actually used space (excludes free space in pages)
        SUM(a.used_pages) * 8 / 1024.0 AS used_size_mb,
        -- Pure data size (excludes indexes and free space)
        SUM(a.data_pages) * 8 / 1024.0 AS data_size_mb
    FROM sys.tables t
    JOIN sys.partitions p ON t.object_id = p.object_id
    JOIN sys.allocation_units a ON p.partition_id = a.container_id
    WHERE p.index_id IN (0,1)  -- Heap tables (0) and clustered indexes (1) only
    GROUP BY t.object_id
),

-- ============================================================================
-- CTE 2: PII DETECTION ENGINE
-- ============================================================================
-- Purpose: Automatically identify potential PII columns based on naming patterns
-- Compliance Impact: Critical for GDPR, CCPA, and other privacy regulations
-- Note: This is heuristic-based; manual review recommended for accuracy
PIIDetection AS (
    SELECT 
        t.object_id,
        -- Count columns that match common PII naming patterns
        COUNT(CASE WHEN c.name LIKE '%ssn%' OR c.name LIKE '%social%'      -- Social Security
                    OR c.name LIKE '%email%' OR c.name LIKE '%phone%'       -- Contact info
                    OR c.name LIKE '%address%' OR c.name LIKE '%birth%'     -- Personal details
                    OR c.name LIKE '%salary%' OR c.name LIKE '%wage%'       -- Financial info
                    OR c.name LIKE '%credit%' OR c.name LIKE '%account%'    -- Account data
                    OR c.name LIKE '%passport%' OR c.name LIKE '%license%'  -- ID documents
              THEN 1 END) AS potential_pii_columns
    FROM sys.tables t
    JOIN sys.columns c ON t.object_id = c.object_id
    GROUP BY t.object_id
),

-- ============================================================================
-- CTE 3: TABLE USAGE STATISTICS
-- ============================================================================
-- Purpose: Capture table access patterns for operational optimization
-- Benefits: Identify unused tables, performance bottlenecks, archival candidates
-- Note: Statistics reset on SQL Server restart
TableStats AS (
    SELECT 
        t.object_id,
        -- Last access timestamps for different operation types
        s.last_user_seek,    -- Last index seek operation
        s.last_user_scan,    -- Last table/index scan operation  
        s.last_user_lookup,  -- Last key lookup operation
        -- Aggregate read operations (seeks + scans + lookups)
        s.user_seeks + s.user_scans + s.user_lookups AS total_reads,
        -- Write operations (INSERT, UPDATE, DELETE)
        s.user_updates
    FROM sys.tables t
    LEFT JOIN sys.dm_db_index_usage_stats s ON t.object_id = s.object_id AND s.index_id IN (0,1)
)

-- ============================================================================
-- INSERT GOVERNANCE INVENTORY RESULTS INTO TEMPORARY TABLE
-- ============================================================================
-- Note: Extended governance fields (data_owner, data_steward, etc.) will be NULL initially
-- Note: analysis_timestamp and analysis_user fields will auto-populate with DEFAULT values
INSERT INTO #DataGovernanceInventory (
    -- Core identification fields
    schema_name, table_name, full_table_name,
    -- Structural analysis metrics
    total_columns, required_columns, identity_columns, computed_columns,
    -- Storage and size metrics
    estimated_rows, total_size_mb, used_size_mb, data_size_mb,
    -- Temporal metadata
    create_date, modify_date, days_since_modified,
    -- Usage pattern analytics
    total_read_operations, total_write_operations, days_since_last_access,
    -- Data classification and quality
    size_category, potential_pii_columns, pii_risk_level,
    -- Business context and governance
    business_purpose, data_sensitivity, compliance_scope, operational_flag
)

-- ============================================================================
-- MAIN QUERY: COMPREHENSIVE INVENTORY WITH GOVERNANCE METADATA
-- ============================================================================
SELECT
    -- ========================================================================
    -- SECTION 1: CORE TABLE IDENTIFICATION
    -- ========================================================================
    s.name AS schema_name,                              -- Schema containing the table
    t.name AS table_name,                               -- Table name within schema
    SCHEMA_NAME(t.schema_id) + '.' + t.name AS full_table_name,  -- Fully qualified name for referencing
    
    -- ========================================================================
    -- SECTION 2: ENHANCED COLUMN ANALYSIS
    -- ========================================================================
    -- Provides detailed breakdown of column types for data modeling insights
    COUNT(c.column_id) AS total_columns,                -- Total number of columns in the table
    SUM(CASE WHEN c.is_nullable = 0 THEN 1 ELSE 0 END) AS required_columns,      -- Count of NOT NULL columns (data quality indicator)
    SUM(CASE WHEN c.is_identity = 1 THEN 1 ELSE 0 END) AS identity_columns,      -- Count of auto-increment/identity columns
    SUM(CASE WHEN c.is_computed = 1 THEN 1 ELSE 0 END) AS computed_columns,      -- Count of calculated/computed columns
    
    -- ========================================================================
    -- SECTION 3: STORAGE METRICS (ENHANCED ACCURACY)
    -- ========================================================================
    -- More accurate storage calculations using pre-computed CTE results
    COALESCE(p.rows, 0) AS estimated_rows,              -- Estimated row count (may not be exact, updated by statistics)
    COALESCE(tsm.size_mb, 0) AS total_size_mb,          -- Total allocated space in MB (includes indexes and free space)
    COALESCE(tsm.used_size_mb, 0) AS used_size_mb,      -- Actually used space in MB (excludes free space in pages)
    COALESCE(tsm.data_size_mb, 0) AS data_size_mb,      -- Pure data size in MB (excludes indexes and free space)
    
    -- ========================================================================
    -- SECTION 4: TEMPORAL METADATA FOR CHANGE TRACKING
    -- ========================================================================
    t.create_date,                                      -- When the table was originally created
    t.modify_date,                                      -- Last time the table structure was modified
    DATEDIFF(day, t.modify_date, GETDATE()) AS days_since_modified,  -- Age of last structural change
    
    -- ========================================================================
    -- SECTION 5: USAGE PATTERNS FOR OPERATIONAL INSIGHTS
    -- ========================================================================
    -- Critical for identifying unused tables and optimization opportunities
    -- Note: Statistics reset on SQL Server restart, so interpret with caution for new servers
    COALESCE(ts.total_reads, 0) AS total_read_operations,        -- Total SELECT operations since last restart
    COALESCE(ts.user_updates, 0) AS total_write_operations,      -- Total INSERT/UPDATE/DELETE operations
    
    -- Calculate days since last access (any type of read operation)
    CASE 
        WHEN ts.last_user_seek IS NOT NULL OR ts.last_user_scan IS NOT NULL OR ts.last_user_lookup IS NOT NULL
        THEN DATEDIFF(day, COALESCE(ts.last_user_seek, ts.last_user_scan, ts.last_user_lookup), GETDATE())
        ELSE NULL  -- NULL means no recorded access (could be new table or no activity since restart)
    END AS days_since_last_access,
    
    -- ========================================================================
    -- SECTION 6: DATA QUALITY INDICATORS
    -- ========================================================================
    -- Categorize tables by size for different governance approaches
    CASE 
        WHEN COALESCE(p.rows, 0) = 0 THEN 'EMPTY'           -- No data - potential candidate for removal
        WHEN COALESCE(p.rows, 0) < 1000 THEN 'SMALL'        -- Reference/lookup tables, low impact
        WHEN COALESCE(p.rows, 0) < 100000 THEN 'MEDIUM'     -- Standard operational tables  
        WHEN COALESCE(p.rows, 0) < 1000000 THEN 'LARGE'     -- High-volume transactional tables
        ELSE 'VERY_LARGE'                                    -- Enterprise-scale data requiring special attention
    END AS size_category,
    
    -- ========================================================================
    -- SECTION 7: PII RISK ASSESSMENT FOR PRIVACY COMPLIANCE
    -- ========================================================================
    -- Automated PII detection helps prioritize privacy protection efforts
    -- Note: This is heuristic-based detection - manual review recommended for accuracy
    COALESCE(pii.potential_pii_columns, 0) AS potential_pii_columns,    -- Count of columns with PII-like names
    
    -- Risk-based classification for privacy governance prioritization
    CASE 
        WHEN COALESCE(pii.potential_pii_columns, 0) = 0 THEN 'NO_PII_DETECTED'     -- Likely safe from privacy regulations
        WHEN COALESCE(pii.potential_pii_columns, 0) <= 2 THEN 'LOW_PII_RISK'       -- Minimal privacy exposure
        WHEN COALESCE(pii.potential_pii_columns, 0) <= 5 THEN 'MEDIUM_PII_RISK'    -- Moderate privacy exposure
        ELSE 'HIGH_PII_RISK'                                                        -- High privacy exposure - priority for review
    END AS pii_risk_level,
    
    -- ========================================================================
    -- SECTION 8: ENHANCED BUSINESS CONTEXT MAPPING
    -- ========================================================================
    -- Maps technical schemas to business functions for stakeholder communication
    -- This helps non-technical stakeholders understand data purpose and importance
    CASE
        WHEN s.name = 'Person' THEN 'Customer and Employee Identity Management'
        WHEN s.name = 'Sales' THEN 'Revenue Operations and Customer Transactions'
        WHEN s.name = 'Production' THEN 'Manufacturing and Inventory Management'
        WHEN s.name = 'Purchasing' THEN 'Vendor Relations and Procurement'
        WHEN s.name = 'HumanResources' THEN 'Employee Lifecycle Management'
        -- Enhanced detection for system and operational schemas
        WHEN s.name LIKE '%Log%' OR s.name LIKE '%Audit%' THEN 'Audit and Logging'
        WHEN s.name LIKE '%Config%' OR s.name LIKE '%Setting%' THEN 'System Configuration'
        ELSE 'System/Reference Data'  -- Default for unrecognized schemas
    END AS business_purpose,
    
    -- ========================================================================
    -- SECTION 9: DATA SENSITIVITY CLASSIFICATION (WITH PII AWARENESS)
    -- ========================================================================
    -- Enhanced sensitivity classification that considers both schema context and PII content
    -- This drives security controls, access restrictions, and compliance requirements
    CASE
        -- High sensitivity: Known sensitive schemas OR detected high PII content
        WHEN s.name IN ('Person', 'HumanResources') OR COALESCE(pii.potential_pii_columns, 0) > 3 
        THEN 'HIGH - Personal/Sensitive Data'
        -- Medium sensitivity: Business-critical data OR any PII detected
        WHEN s.name = 'Sales' OR COALESCE(pii.potential_pii_columns, 0) > 0 
        THEN 'MEDIUM - Business Sensitive'
        -- Medium operational: Core business operations data
        WHEN s.name IN ('Production', 'Purchasing') 
        THEN 'MEDIUM - Operational Data'
        -- Low sensitivity: Reference data and system tables
        ELSE 'LOW - Reference Data'
    END AS data_sensitivity,
    
    -- ========================================================================
    -- SECTION 10: EXPANDED COMPLIANCE MAPPING
    -- ========================================================================
    -- Maps data to relevant regulatory frameworks for compliance planning
    -- Helps legal and compliance teams understand regulatory exposure
    CASE
        -- High PII content triggers comprehensive privacy regulations
        WHEN s.name = 'Person' OR COALESCE(pii.potential_pii_columns, 0) > 2
        THEN 'GDPR, CCPA, PIPEDA, PII Protection'
        -- HR data has specific employment law and privacy requirements
        WHEN s.name = 'HumanResources' 
        THEN 'Employment Law, GDPR, CCPA, PII Protection'
        -- Financial data requires SOX compliance and potentially PCI-DSS
        WHEN s.name = 'Sales' 
        THEN 'SOX, Financial Reporting, PCI-DSS (if payment data)'
        -- Audit trails have specific retention and integrity requirements
        WHEN s.name LIKE '%Log%' OR s.name LIKE '%Audit%' 
        THEN 'SOX, Audit Requirements, Data Retention'
        -- Default corporate governance for other data types
        ELSE 'Standard Corporate Governance'
    END AS compliance_scope,
    
    -- ========================================================================
    -- SECTION 11: OPERATIONAL RECOMMENDATIONS ENGINE
    -- ========================================================================
    -- Automated flags to guide operational decisions and risk management
    -- These flags help DBAs and data stewards prioritize their work
    CASE
        -- Flag empty tables that may be obsolete and safe to remove
        WHEN COALESCE(p.rows, 0) = 0 AND DATEDIFF(day, t.create_date, GETDATE()) > 90
        THEN 'REVIEW - Empty table older than 90 days'
        
        -- Flag unused tables that may be candidates for archival to save storage costs
        WHEN COALESCE(ts.total_reads, 0) = 0 AND DATEDIFF(day, t.create_date, GETDATE()) > 180
        THEN 'REVIEW - No reads in 180+ days, consider archival'
        
        -- Security concern: PII detected in unexpected schemas (potential data leakage)
        WHEN COALESCE(pii.potential_pii_columns, 0) > 0 AND s.name NOT IN ('Person', 'HumanResources')
        THEN 'SECURITY - Unexpected PII in non-HR/Person schema'
        
        -- Performance concern: Large tables consuming resources but rarely accessed
        WHEN COALESCE(tsm.size_mb, 0) > 1000 AND COALESCE(ts.total_reads, 0) < 100
        THEN 'PERFORMANCE - Large table with low usage'
        
        -- No issues detected - table appears to be operating normally
        ELSE 'NORMAL'
    END AS operational_flag

-- ============================================================================
-- FROM CLAUSE: SYSTEM CATALOG JOINS FOR COMPREHENSIVE METADATA
-- ============================================================================
FROM sys.schemas s
    -- Core table information - every user table belongs to a schema
    JOIN sys.tables t ON s.schema_id = t.schema_id
    -- Column details for structural analysis - every table has at least one column
    JOIN sys.columns c ON t.object_id = c.object_id
    -- Row count estimates (LEFT JOIN handles edge cases gracefully)
    -- Note: Some tables may not have partition info if they're brand new
    LEFT JOIN sys.partitions p ON t.object_id = p.object_id AND p.index_id IN (0,1)
    -- Pre-calculated size metrics from CTE (LEFT JOIN handles tables with no allocation units)
    LEFT JOIN TableSizeMetrics tsm ON t.object_id = tsm.object_id
    -- PII detection results from CTE (LEFT JOIN ensures all tables included even if no PII detected)
    LEFT JOIN PIIDetection pii ON t.object_id = pii.object_id
    -- Usage statistics from CTE (LEFT JOIN handles tables with no recorded usage)
    LEFT JOIN TableStats ts ON t.object_id = ts.object_id

-- ============================================================================
-- FILTERS: SCOPE AND DATA QUALITY CONTROLS
-- ============================================================================
WHERE t.type = 'U'  -- User tables only - excludes views ('V'), system tables, functions ('FN', 'IF'), etc.

-- ============================================================================
-- GROUPING: REQUIRED FOR AGGREGATE FUNCTIONS
-- ============================================================================
-- Group by all non-aggregated columns to enable COUNT() and SUM() functions
-- This is required because we're counting columns and summing identity/computed column flags
GROUP BY 
    -- Core identifiers
    s.name, t.name, t.schema_id, t.object_id, 
    -- Partition data (already aggregated at table level)
    p.rows, 
    -- Temporal fields (single values per table)
    t.create_date, t.modify_date,
    -- Pre-calculated metrics from CTEs (already aggregated)
    tsm.size_mb, tsm.used_size_mb, tsm.data_size_mb,
    -- PII detection results (already aggregated)
    pii.potential_pii_columns,
    -- Usage statistics (already aggregated)
    ts.total_reads, ts.user_updates, ts.last_user_seek, ts.last_user_scan, ts.last_user_lookup

-- ============================================================================
-- ORDERING: RISK-BASED PRIORITIZATION FOR GOVERNANCE
-- ============================================================================
-- Sort results to show highest-risk and most important data first
-- This helps governance teams focus on the most critical items
ORDER BY 
    -- Primary sort: PII risk level (highest risk first)
    CASE 
        WHEN COALESCE(pii.potential_pii_columns, 0) > 3 THEN 1  -- High PII risk tables get top priority
        WHEN s.name IN ('Person', 'HumanResources') THEN 2      -- Known sensitive schemas come second
        ELSE 3                                                   -- Other data comes last
    END,
    -- Secondary sort: Alphabetical by schema within priority groups
    s.name,
    -- Tertiary sort: Largest tables first within each schema (size often correlates with importance)
    COALESCE(p.rows, 0) DESC;

-- ============================================================================
-- SECTION 4: RESULTS VERIFICATION AND SUMMARY REPORTING
-- ============================================================================
PRINT 'Governance inventory analysis completed successfully!';

-- Verify insertion success and provide record count (without using variables to avoid batch issues)
SELECT 'Total tables analyzed: ' + CAST(COUNT(*) AS VARCHAR(10)) AS record_count_message
FROM #DataGovernanceInventory;

-- ========================================================================
-- EXECUTIVE SUMMARY: HIGH-LEVEL GOVERNANCE METRICS
-- ========================================================================
-- Provides key metrics for executive reporting and governance dashboards
SELECT 
    'GOVERNANCE INVENTORY SUMMARY - ' + CAST(GETDATE() AS VARCHAR(20)) AS report_section,
    COUNT(*) AS total_tables_analyzed,                           -- Total number of user tables in database
    SUM(estimated_rows) AS total_estimated_rows,                 -- Combined row count across all tables
    CAST(SUM(total_size_mb) AS DECIMAL(10,2)) AS total_database_size_mb,  -- Total storage consumption
    -- Risk indicators for executive attention
    COUNT(CASE WHEN pii_risk_level IN ('HIGH_PII_RISK', 'MEDIUM_PII_RISK') THEN 1 END) AS tables_with_privacy_risk,
    COUNT(CASE WHEN operational_flag <> 'NORMAL' THEN 1 END) AS tables_requiring_attention,
    -- Compliance scope overview
    COUNT(DISTINCT compliance_scope) AS different_compliance_frameworks
FROM #DataGovernanceInventory;

-- ========================================================================
-- DATA SENSITIVITY DISTRIBUTION ANALYSIS
-- ========================================================================
-- Shows how data is distributed across sensitivity levels for resource allocation
SELECT 
    'DATA SENSITIVITY DISTRIBUTION' AS report_section,
    data_sensitivity,
    COUNT(*) AS table_count,                                     -- Number of tables at this sensitivity level
    CAST(SUM(total_size_mb) AS DECIMAL(10,2)) AS total_size_mb, -- Storage consumed by this sensitivity level
    CAST(AVG(estimated_rows) AS BIGINT) AS avg_rows_per_table,   -- Average table size in this category
    -- Calculate percentage of total tables
    CAST(COUNT(*) * 100.0 / (SELECT COUNT(*) FROM #DataGovernanceInventory) AS DECIMAL(5,1)) AS percent_of_tables
FROM #DataGovernanceInventory
GROUP BY data_sensitivity
ORDER BY 
    -- Sort by sensitivity priority for governance focus
    CASE data_sensitivity 
        WHEN 'HIGH - Personal/Sensitive Data' THEN 1
        WHEN 'MEDIUM - Business Sensitive' THEN 2
        WHEN 'MEDIUM - Operational Data' THEN 3
        ELSE 4 
    END;

-- ========================================================================
-- HIGH PRIORITY ITEMS REQUIRING IMMEDIATE GOVERNANCE ACTION
-- ========================================================================
-- Identifies tables that need immediate attention from governance, security, or operations teams
SELECT 
    'HIGH PRIORITY ITEMS FOR IMMEDIATE REVIEW' AS report_section,
    schema_name, 
    table_name, 
    pii_risk_level,                                             -- Privacy risk assessment
    data_sensitivity,                                           -- Business sensitivity classification
    operational_flag,                                           -- Specific operational concern
    estimated_rows,                                             -- Size context for impact assessment
    CAST(total_size_mb AS DECIMAL(10,2)) AS size_mb,          -- Storage impact
    days_since_last_access,                                     -- Usage pattern indicator
    compliance_scope                                            -- Regulatory implications
FROM #DataGovernanceInventory
WHERE 
    -- Include tables with privacy risks
    pii_risk_level IN ('HIGH_PII_RISK', 'MEDIUM_PII_RISK') 
    -- Or tables with operational issues
    OR operational_flag <> 'NORMAL'
ORDER BY 
    -- Prioritize by risk level
    CASE pii_risk_level 
        WHEN 'HIGH_PII_RISK' THEN 1 
        WHEN 'MEDIUM_PII_RISK' THEN 2 
        ELSE 3 
    END,
    -- Then by operational concern
    CASE operational_flag 
        WHEN 'SECURITY - Unexpected PII in non-HR/Person schema' THEN 1
        WHEN 'PERFORMANCE - Large table with low usage' THEN 2
        WHEN 'REVIEW - No reads in 180+ days, consider archival' THEN 3
        ELSE 4
    END,
    -- Finally by size (larger tables get more attention)
    total_size_mb DESC;

-- ========================================================================
-- COMPLETION STATUS AND NEXT STEPS
-- ========================================================================
-- Provides clear indication of completion and guidance for next actions
SELECT 
    'ANALYSIS COMPLETED' AS status,
    'Analysis completed at: ' + CAST(GETDATE() AS VARCHAR(50)) AS completion_timestamp,
    'Results stored in: #DataGovernanceInventory (session scope)' AS temp_table_location,
    'Records available until session ends or server restart' AS data_retention_note,
    'Use additional queries below for deeper analysis' AS next_steps;

-- ============================================================================
-- USAGE NOTES AND RECOMMENDATIONS
-- ============================================================================
-- 1. Run during off-peak hours for best performance on large databases
-- 2. Consider creating indexed views for frequently accessed governance reports
-- 3. Review PII detection results manually - heuristic matching has limitations
-- 4. Usage statistics reset on SQL Server restart - consider baseline measurements
-- 5. For regulatory compliance, supplement with data flow analysis and classification
-- 6. Operational flags provide starting points - always validate before taking action
-- ============================================================================

-- ============================================================================
-- SECTION 5: OPTIONAL ADVANCED ANALYSIS QUERIES
-- ============================================================================
-- These queries provide additional insights for specific governance use cases
-- Uncomment and modify as needed for your specific requirements

/*
-- ========================================================================
-- SCHEMA-LEVEL SUMMARY FOR DEPARTMENTAL REPORTING
-- ========================================================================
-- Useful for understanding data distribution across business areas
-- Helps with resource allocation and departmental data governance planning
SELECT 
    'SCHEMA-LEVEL SUMMARY' AS report_type,
    schema_name,
    COUNT(*) as table_count,                                    -- Number of tables in this schema
    SUM(estimated_rows) as total_rows,                          -- Combined rows across all tables
    CAST(SUM(total_size_mb) AS DECIMAL(10,2)) as total_size_mb, -- Storage consumption by schema
    CAST(AVG(potential_pii_columns) AS DECIMAL(3,1)) as avg_pii_columns_per_table, -- PII density
    -- Calculate percentage of database size consumed by this schema
    CAST(SUM(total_size_mb) * 100.0 / (SELECT SUM(total_size_mb) FROM #DataGovernanceInventory) AS DECIMAL(5,1)) AS percent_of_database_size
FROM #DataGovernanceInventory
GROUP BY schema_name
ORDER BY total_size_mb DESC;

-- ========================================================================
-- COMPLIANCE FRAMEWORK IMPACT ANALYSIS
-- ========================================================================
-- Shows which regulatory frameworks affect how much data
-- Critical for compliance budgeting and risk assessment
SELECT 
    'COMPLIANCE IMPACT ANALYSIS' AS report_type,
    compliance_scope,
    COUNT(*) as affected_tables,                                -- Number of tables under this compliance scope
    CAST(SUM(total_size_mb) AS DECIMAL(10,2)) as total_data_size_mb, -- Amount of data subject to these regulations
    SUM(estimated_rows) as total_affected_rows,                 -- Number of records under regulatory scope
    -- Show percentage of database under this compliance framework
    CAST(COUNT(*) * 100.0 / (SELECT COUNT(*) FROM #DataGovernanceInventory) AS DECIMAL(5,1)) AS percent_of_tables
FROM #DataGovernanceInventory
GROUP BY compliance_scope
ORDER BY affected_tables DESC;

-- ========================================================================
-- OPERATIONAL EFFICIENCY OPPORTUNITIES
-- ========================================================================
-- Identifies specific optimization opportunities with potential impact
-- Helps prioritize operational improvements and cost savings
SELECT 
    'OPERATIONAL OPTIMIZATION OPPORTUNITIES' AS report_type,
    operational_flag,
    COUNT(*) as table_count,
    CAST(SUM(total_size_mb) AS DECIMAL(10,2)) as potential_storage_impact_mb, -- Storage that could be reclaimed
    SUM(estimated_rows) as affected_rows,
    -- Show tables that could be archived or optimized
    STRING_AGG(schema_name + '.' + table_name, ', ') WITHIN GROUP (ORDER BY total_size_mb DESC) as example_tables
FROM #DataGovernanceInventory
WHERE operational_flag <> 'NORMAL'  -- Only show tables with actionable recommendations
GROUP BY operational_flag
ORDER BY potential_storage_impact_mb DESC;

-- ========================================================================
-- PII RISK HEAT MAP BY BUSINESS AREA
-- ========================================================================
-- Cross-tabulates business purpose with PII risk for targeted remediation
-- Helps security teams focus on the highest-risk business areas
SELECT 
    'PII RISK BY BUSINESS AREA' AS report_type,
    business_purpose,
    SUM(CASE WHEN pii_risk_level = 'HIGH_PII_RISK' THEN 1 ELSE 0 END) as high_risk_tables,
    SUM(CASE WHEN pii_risk_level = 'MEDIUM_PII_RISK' THEN 1 ELSE 0 END) as medium_risk_tables,
    SUM(CASE WHEN pii_risk_level = 'LOW_PII_RISK' THEN 1 ELSE 0 END) as low_risk_tables,
    SUM(CASE WHEN pii_risk_level = 'NO_PII_DETECTED' THEN 1 ELSE 0 END) as no_pii_detected,
    COUNT(*) as total_tables_in_area,
    -- Calculate risk percentage for this business area
    CAST((SUM(CASE WHEN pii_risk_level IN ('HIGH_PII_RISK', 'MEDIUM_PII_RISK') THEN 1 ELSE 0 END) * 100.0 / COUNT(*)) AS DECIMAL(5,1)) as pii_risk_percentage
FROM #DataGovernanceInventory
GROUP BY business_purpose
ORDER BY high_risk_tables DESC, medium_risk_tables DESC;

-- ========================================================================
-- DATA LIFECYCLE MANAGEMENT CANDIDATES
-- ========================================================================
-- Identifies tables for archival, retention policy application, or cleanup
-- Supports data lifecycle management and storage cost optimization
SELECT 
    'DATA LIFECYCLE MANAGEMENT CANDIDATES' AS report_type,
    schema_name + '.' + table_name as full_table_name,
    size_category,
    estimated_rows,
    CAST(total_size_mb AS DECIMAL(10,2)) as size_mb,
    days_since_last_access,
    days_since_modified,
    operational_flag,
    -- Suggest lifecycle action based on usage patterns
    CASE 
        WHEN days_since_last_access > 365 AND total_size_mb > 100 THEN 'ARCHIVE - Large unused table'
        WHEN days_since_last_access > 180 AND estimated_rows = 0 THEN 'DELETE - Empty unused table'
        WHEN days_since_modified > 730 AND days_since_last_access > 90 THEN 'ARCHIVE - Stale data'
        WHEN total_size_mb > 1000 AND days_since_last_access > 30 THEN 'REVIEW - Large infrequently used'
        ELSE 'RETAIN - Active or recent'
    END as lifecycle_recommendation
FROM #DataGovernanceInventory
WHERE 
    -- Focus on tables that could benefit from lifecycle management
    days_since_last_access > 30 OR estimated_rows = 0 OR total_size_mb > 500
ORDER BY 
    CASE 
        WHEN days_since_last_access > 365 AND total_size_mb > 100 THEN 1
        WHEN days_since_last_access > 180 AND estimated_rows = 0 THEN 2
        ELSE 3
    END,
    total_size_mb DESC;

-- ========================================================================
-- EXPORT TO PERMANENT TABLE (OPTIONAL)
-- ========================================================================
-- Uncomment and modify the table name to save results permanently
-- Useful for historical tracking and trend analysis over time
-- Note: Modify the table name to include date/time for version control
/*
SELECT 
    GETDATE() as snapshot_date,  -- Add timestamp for historical tracking
    * 
INTO dbo.DataGovernanceInventory_20250806  -- Modify date as needed
FROM #DataGovernanceInventory;

PRINT 'Results exported to permanent table: dbo.DataGovernanceInventory_20250806';
*/

-- ========================================================================
-- CUSTOM ANALYSIS TEMPLATE
-- ========================================================================
-- Template for additional custom analysis based on specific business needs
/*
SELECT 
    'CUSTOM ANALYSIS' AS report_type,
    -- Add your custom fields here
    schema_name,
    table_name,
    business_purpose,
    data_sensitivity,
    -- Add custom calculations or groupings as needed
    COUNT(*) OVER (PARTITION BY schema_name) as tables_in_schema,
    RANK() OVER (ORDER BY total_size_mb DESC) as size_rank
FROM #DataGovernanceInventory
WHERE 
    -- Add your custom filters here
    data_sensitivity LIKE '%HIGH%'  -- Example: focus on high sensitivity data
ORDER BY 
    -- Add your custom sorting here
    total_size_mb DESC;
*/

Analysis Results

Ok. there's a lot to get through .. so let's review the summary highlights:

The script generates 6 tmp tables:

Results 1 - Basic connection details.
Results 1 (2) - Total tables analyzed
Results 1 (3) - Governance Summary
Results 1 (4) - Data Sensitivity Distribution
Results 1 (5) - High Priority Items for Immediate Review
Results 1 (6) - Analysis completed details

Take a look at:

15KB

DataGovernanceInventory_202508060756.csv

Results 1 (5) - High Priority Items for Immediate Review

Let's dive into the main part of the report .. bear in mind this data is representative..!

CRITICAL:

High-Risk PII Exposure: All Person and HumanResources schema tables are flagged as HIGH_PII_RISK with HIGH - Personal/Sensitive Data sensitivity.

Key Problem Areas:

Person.Person (19,972 rows) - Contains personal identity data
Person.Password (19,972 rows) - 🚨 CRITICAL: Password data should be heavily encrypted
Person.EmailAddress (19,972 rows) - Email addresses are prime PII
HumanResources.Employee (290 rows) - Employee personal data
HumanResources.JobCandidate (13 rows) - Candidate personal information

Major Classification Error: Sales.CreditCard table shows "NO_PII_DETECTED" despite obviously containing payment card data. This is a serious compliance risk for PCI-DSS requirements.

KEY CONCERNS:

Inconsistent Logic: Rows 20-41 show NO_PII_DETECTED but still have MEDIUM sensitivity levels.

Examples of Misclassification:

Sales.SalesOrderHeader (31,465 rows) - Large table marked as no PII but medium sensitivity
Sales.Customer (19,820 rows) - Customer table with no PII detected (likely wrong)
Sales.CreditCard (19,118 rows) - 🚨 CRITICAL: Credit card table showing no PII!

Security & Compliance Risks:

Password table exists with 19K+ records - needs immediate encryption audit
Credit card data not properly classified as PII - major compliance risk
Large volume exposure: Person schema alone has ~20K records of personal data

Data Governance Gaps:

PII detection logic is missing financial data patterns (credit cards, customer IDs)
Inconsistent classification between PII risk and sensitivity levels
No operational flags showing beyond "REVIE" (appears truncated)

Compliance Exposure:

All high-risk items fall under GDPR, CCPA, PIPEDA - significant regulatory exposure
Employment Law compliance needed for HR data
PCI-DSS implications for credit card data not properly flagged

So looks like there's a couple of issues to address:

IMMEDIATE NEXT STEPS:

Phase 1: Urgent Security Actions (This Week)

Audit Password Storage: Verify Person.Password table encryption immediately
Review Credit Card Data: Investigate why Sales.CreditCard shows NO_PII_DETECTED
Access Control Review: Restrict access to all HIGH_PII_RISK tables
Data Encryption Audit: Ensure all personal data is encrypted at rest

Phase 2: Classification Fixes (Next 2 Weeks)

Enhance PII Detection: Add patterns for customer IDs, credit cards, account numbers
Manual Review: Validate all tables showing NO_PII but MEDIUM sensitivity
Schema Analysis: Deep dive into Sales schema for hidden PII
Update Classifications: Fix inconsistent risk vs sensitivity mappings

Phase 3: Governance Framework (Next Month)

Data Stewardship: Assign data owners for each schema
Access Policies: Implement role-based access for PII data
Monitoring Setup: Deploy data access monitoring for high-risk tables
Compliance Documentation: Document data flows for regulatory compliance

PII Assessment

The Inventory Analysis has given us a great overview.

However, where to start?

The Inventory Analysis identified:

High-Risk PII Exposure: All Person and HumanResources schema tables are flagged as HIGH_PII_RISK with HIGH - Personal/Sensitive Data sensitivity.

The analysis was conducted at a table level, we need to drill down further:

Column-by-column PII detection with detailed categorization
Risk-based classification with specific recommendations
Compliance framework mapping for regulatory requirements
Data Catalog output for Governance tools

PII Assessment

Execute the following script - in DBeaver - that conducts a PII Risk Assessment at a column level:

-- ============================================================================
-- COLUMN-LEVEL PII DETECTION AND GOVERNANCE ANALYSIS SCRIPT
-- ============================================================================
-- Purpose: Analyze database at the individual column level to identify PII,
--          classify data sensitivity, and provide actionable governance recommendations
-- 
-- Features:
-- • Column-by-column PII detection with detailed categorization
-- • Risk-based classification with specific recommendations
-- • Compliance framework mapping for regulatory requirements
-- • Data catalog-ready output for governance tools
-- • Granular analysis beyond table-level assessment
-- 
-- Target Audience: Data governance teams, privacy officers, compliance teams
-- Database: AdventureWorks2022 (adaptable to any SQL Server database)
-- ============================================================================

-- ============================================================================
-- SECTION 1: DATABASE CONNECTION AND SETUP
-- ============================================================================
-- Connect to target database (modify as needed)
USE AdventureWorks2022;

-- ============================================================================
-- SECTION 2: TEMPORARY TABLE FOR COLUMN-LEVEL ANALYSIS
-- ============================================================================
-- Drop existing temp table if it exists (for re-runs)
IF OBJECT_ID('tempdb..#ColumnLevelPII') IS NOT NULL
    DROP TABLE #ColumnLevelPII;

-- Create comprehensive temporary table to store column-level PII analysis
CREATE TABLE #ColumnLevelPII (
    -- ========================================================================
    -- CORE IDENTIFICATION FIELDS
    -- ========================================================================
    analysis_id INT IDENTITY(1,1) PRIMARY KEY,         -- Unique identifier for each column record
    schema_name NVARCHAR(128) NOT NULL,                -- Database schema name
    table_name NVARCHAR(128) NOT NULL,                 -- Table name within schema
    column_name NVARCHAR(128) NOT NULL,                -- Individual column name
    full_column_name NVARCHAR(393) NOT NULL,           -- Fully qualified column name (schema.table.column)
    
    -- ========================================================================
    -- TECHNICAL COLUMN METADATA
    -- ========================================================================
    data_type NVARCHAR(128) NOT NULL,                  -- Complete data type with length/precision
    max_length INT,                                     -- Maximum length for string types
    is_nullable BIT NOT NULL,                          -- Whether column allows NULL values
    is_identity BIT NOT NULL,                          -- Whether column is auto-increment/identity
    is_computed BIT NOT NULL,                          -- Whether column is computed/calculated
    
    -- ========================================================================
    -- PII CLASSIFICATION AND RISK ASSESSMENT
    -- ========================================================================
    pii_category VARCHAR(50),                          -- Specific type of PII (e.g., PERSONAL_NAME, CONTACT_INFO)
    pii_risk_level VARCHAR(20) NOT NULL,               -- Risk level assessment (HIGH/MEDIUM/LOW/NO_PII)
    data_sensitivity VARCHAR(50) NOT NULL,             -- Business sensitivity classification
    
    -- ========================================================================
    -- GOVERNANCE AND COMPLIANCE
    -- ========================================================================
    compliance_scope NVARCHAR(500) NOT NULL,           -- Applicable regulatory frameworks
    recommended_action VARCHAR(200),                   -- Specific action recommendations (ENCRYPT, MASK, etc.)
    
    -- ========================================================================
    -- AUDIT AND TRACKING
    -- ========================================================================
    analysis_timestamp DATETIME DEFAULT GETDATE()      -- When this analysis was performed
);

PRINT 'Starting column-level PII analysis...';

-- ============================================================================
-- SECTION 3: MAIN COLUMN-LEVEL PII ANALYSIS QUERY
-- ============================================================================
-- Insert comprehensive column-level analysis results
INSERT INTO #ColumnLevelPII (
    schema_name, table_name, column_name, full_column_name,
    data_type, max_length, is_nullable, is_identity, is_computed,
    pii_category, pii_risk_level, data_sensitivity, compliance_scope, recommended_action
)
SELECT
    -- ========================================================================
    -- BASIC COLUMN IDENTIFICATION
    -- ========================================================================
    s.name AS schema_name,                              -- Schema containing the table
    t.name AS table_name,                               -- Table containing the column
    c.name AS column_name,                              -- Individual column name
    s.name + '.' + t.name + '.' + c.name AS full_column_name,  -- Complete column path for referencing
    
    -- ========================================================================
    -- DETAILED DATA TYPE INFORMATION
    -- ========================================================================
    -- Build complete data type string with appropriate length/precision details
    ty.name + 
    CASE 
        -- For character types, include length
        WHEN ty.name IN ('varchar', 'nvarchar', 'char', 'nchar') 
        THEN '(' + CAST(c.max_length AS VARCHAR(10)) + ')'
        -- For numeric types, include precision and scale
        WHEN ty.name IN ('decimal', 'numeric') 
        THEN '(' + CAST(c.precision AS VARCHAR(10)) + ',' + CAST(c.scale AS VARCHAR(10)) + ')'
        -- For other types, no additional info needed
        ELSE ''
    END AS data_type,
    c.max_length,                                       -- Raw maximum length value
    c.is_nullable,                                      -- NULL constraint information
    c.is_identity,                                      -- Identity column flag
    c.is_computed,                                      -- Computed column flag
    
    -- ========================================================================
    -- COMPREHENSIVE PII CATEGORY CLASSIFICATION
    -- ========================================================================
    -- Categorize columns based on naming patterns and content type indicators
    CASE 
        -- ====================================================================
        -- GOVERNMENT AND IDENTITY DOCUMENTS (Highest Risk)
        -- ====================================================================
        WHEN c.name LIKE '%ssn%' OR c.name LIKE '%social%security%' 
        THEN 'GOVERNMENT_ID'                            -- Social Security Numbers
        WHEN c.name LIKE '%passport%' OR c.name LIKE '%license%' OR c.name LIKE '%permit%'
        THEN 'IDENTIFICATION_DOCUMENTS'                 -- Official ID documents
        
        -- ====================================================================
        -- PERSONAL NAMES (High Risk for Identity)
        -- ====================================================================
        WHEN c.name LIKE '%first%name%' OR c.name LIKE '%fname%' OR c.name LIKE '%given%name%'
        THEN 'PERSONAL_NAME'                            -- First/given names
        WHEN c.name LIKE '%last%name%' OR c.name LIKE '%surname%' OR c.name LIKE '%family%name%'
        THEN 'PERSONAL_NAME'                            -- Last/family names
        WHEN c.name LIKE '%middle%name%' OR c.name LIKE '%full%name%' OR c.name LIKE '%display%name%'
        THEN 'PERSONAL_NAME'                            -- Middle names and full names
        WHEN c.name LIKE '%maiden%' OR c.name LIKE '%suffix%' OR c.name LIKE '%prefix%' OR c.name LIKE '%title%'
        THEN 'PERSONAL_NAME'                            -- Name components and titles
        
        -- ====================================================================
        -- CONTACT INFORMATION (Direct Communication Channels)
        -- ====================================================================
        WHEN c.name LIKE '%email%' OR c.name LIKE '%mail%'
        THEN 'CONTACT_INFO'                             -- Email addresses
        WHEN c.name LIKE '%phone%' OR c.name LIKE '%mobile%' OR c.name LIKE '%telephone%' OR c.name LIKE '%fax%'
        THEN 'CONTACT_INFO'                             -- Phone numbers and communication
        WHEN c.name LIKE '%address%' OR c.name LIKE '%street%' OR c.name LIKE '%city%' OR c.name LIKE '%zip%' OR c.name LIKE '%postal%'
        THEN 'ADDRESS_INFO'                             -- Physical address components
        
        -- ====================================================================
        -- DEMOGRAPHIC INFORMATION (Personal Characteristics)
        -- ====================================================================
        WHEN c.name LIKE '%birth%' OR c.name LIKE '%dob%' OR c.name LIKE '%age%'
        THEN 'DEMOGRAPHIC'                              -- Birth dates and age information
        WHEN c.name LIKE '%gender%' OR c.name LIKE '%sex%' 
        THEN 'DEMOGRAPHIC'                              -- Gender/sex identification
        WHEN c.name LIKE '%nationality%' OR c.name LIKE '%ethnicity%' OR c.name LIKE '%race%'
        THEN 'DEMOGRAPHIC'                              -- Ethnic and national origin
        WHEN c.name LIKE '%marital%' OR c.name LIKE '%religion%'
        THEN 'DEMOGRAPHIC'                              -- Personal status and beliefs
        
        -- ====================================================================
        -- FINANCIAL INFORMATION (Monetary and Credit Data)
        -- ====================================================================
        WHEN c.name LIKE '%salary%' OR c.name LIKE '%wage%' OR c.name LIKE '%income%' OR c.name LIKE '%pay%'
        THEN 'FINANCIAL'                                -- Compensation and income data
        WHEN c.name LIKE '%credit%' OR c.name LIKE '%bank%' OR c.name LIKE '%account%'
        THEN 'FINANCIAL'                                -- Banking and credit information
        WHEN c.name LIKE '%tax%' OR c.name LIKE '%ssn%'
        THEN 'FINANCIAL'                                -- Tax-related identifiers
        
        -- ====================================================================
        -- HEALTH INFORMATION (Medical and Wellness Data)
        -- ====================================================================
        WHEN c.name LIKE '%medical%' OR c.name LIKE '%health%' OR c.name LIKE '%diagnosis%'
        THEN 'HEALTH'                                   -- Medical records and health data
        WHEN c.name LIKE '%disability%' OR c.name LIKE '%condition%' OR c.name LIKE '%treatment%'
        THEN 'HEALTH'                                   -- Health conditions and treatments
        
        -- ====================================================================
        -- SENSITIVE AUTHENTICATION AND BIOMETRIC DATA
        -- ====================================================================
        WHEN c.name LIKE '%password%' OR c.name LIKE '%secret%' OR c.name LIKE '%token%'
        THEN 'AUTHENTICATION'                           -- Authentication credentials
        WHEN c.name LIKE '%biometric%' OR c.name LIKE '%fingerprint%' OR c.name LIKE '%photo%'
        THEN 'BIOMETRIC'                               -- Biometric identifiers
        
        -- ====================================================================
        -- NO SPECIFIC PII CATEGORY DETECTED
        -- ====================================================================
        ELSE NULL                                       -- Column doesn't match known PII patterns
    END AS pii_category,
    
    -- ========================================================================
    -- COLUMN-LEVEL PII RISK ASSESSMENT
    -- ========================================================================
    -- Assign risk levels based on data type and potential privacy impact
    CASE 
        -- ====================================================================
        -- HIGH RISK: Government IDs, Financial Data, Authentication
        -- ====================================================================
        -- These columns pose the highest privacy and security risks
        WHEN c.name LIKE '%ssn%' OR c.name LIKE '%social%security%' 
            OR c.name LIKE '%passport%' OR c.name LIKE '%tax%id%'
            OR c.name LIKE '%credit%card%' OR c.name LIKE '%bank%account%'
            OR c.name LIKE '%password%' OR c.name LIKE '%secret%'
        THEN 'HIGH_PII_RISK'
        
        -- ====================================================================
        -- MEDIUM RISK: Personal Identifiers, Contact Info, Demographics
        -- ====================================================================
        -- These columns contain personally identifiable information
        WHEN c.name LIKE '%first%name%' OR c.name LIKE '%last%name%' OR c.name LIKE '%full%name%'
            OR c.name LIKE '%email%' OR c.name LIKE '%phone%' OR c.name LIKE '%address%'
            OR c.name LIKE '%birth%' OR c.name LIKE '%dob%' 
            OR c.name LIKE '%salary%' OR c.name LIKE '%wage%'
            OR c.name LIKE '%medical%' OR c.name LIKE '%health%'
        THEN 'MEDIUM_PII_RISK'
        
        -- ====================================================================
        -- LOW RISK: General Demographics, Preferences
        -- ====================================================================
        -- These columns contain less sensitive personal information
        WHEN c.name LIKE '%gender%' OR c.name LIKE '%age%' OR c.name LIKE '%title%'
            OR c.name LIKE '%suffix%' OR c.name LIKE '%prefix%'
            OR c.name LIKE '%nationality%' OR c.name LIKE '%ethnicity%'
        THEN 'LOW_PII_RISK'
        
        -- ====================================================================
        -- POTENTIAL PII: Schema-Based Detection
        -- ====================================================================
        -- For known sensitive schemas, flag string and date columns for manual review
        WHEN s.name IN ('Person', 'HumanResources') AND 
             (ty.name IN ('varchar', 'nvarchar', 'char', 'nchar') OR ty.name LIKE '%date%')
        THEN 'POTENTIAL_PII'
        
        -- ====================================================================
        -- NO PII DETECTED
        -- ====================================================================
        -- Column doesn't match any PII patterns
        ELSE 'NO_PII_DETECTED'
    END AS pii_risk_level,
    
    -- ========================================================================
    -- DATA SENSITIVITY CLASSIFICATION FOR BUSINESS CONTEXT
    -- ========================================================================
    -- Classify data sensitivity from business and regulatory perspective
    CASE
        -- Highly sensitive: Government IDs, financial data, authentication
        WHEN c.name LIKE '%ssn%' OR c.name LIKE '%passport%' OR c.name LIKE '%tax%'
            OR c.name LIKE '%credit%' OR c.name LIKE '%bank%' OR c.name LIKE '%password%'
        THEN 'HIGHLY_SENSITIVE'
        -- Sensitive: Personal identifiers and contact information
        WHEN c.name LIKE '%name%' OR c.name LIKE '%email%' OR c.name LIKE '%phone%'
            OR c.name LIKE '%address%' OR c.name LIKE '%birth%' OR c.name LIKE '%salary%'
        THEN 'SENSITIVE'
        -- Moderately sensitive: General demographics and preferences
        WHEN c.name LIKE '%gender%' OR c.name LIKE '%age%' OR c.name LIKE '%title%'
        THEN 'MODERATELY_SENSITIVE'  
        -- Not sensitive: System and reference data
        ELSE 'NOT_SENSITIVE'
    END AS data_sensitivity,
    
    -- ========================================================================
    -- REGULATORY COMPLIANCE FRAMEWORK MAPPING
    -- ========================================================================
    -- Map each column to relevant regulatory requirements for compliance planning
    CASE
        -- Government IDs and tax data: Multiple regulatory frameworks
        WHEN c.name LIKE '%ssn%' OR c.name LIKE '%tax%' OR c.name LIKE '%passport%'
        THEN 'GDPR, CCPA, PIPEDA, SOX, Government ID Protection'
        -- Personal identifiers: Privacy regulations
        WHEN c.name LIKE '%name%' OR c.name LIKE '%email%' OR c.name LIKE '%phone%' OR c.name LIKE '%address%'
        THEN 'GDPR, CCPA, PIPEDA, PII Protection'
        -- Demographic data: Privacy and anti-discrimination laws
        WHEN c.name LIKE '%birth%' OR c.name LIKE '%age%' OR c.name LIKE '%gender%'
        THEN 'GDPR, CCPA, Demographic Data Protection'
        -- Financial data: Financial regulations and employment law
        WHEN c.name LIKE '%salary%' OR c.name LIKE '%wage%' OR c.name LIKE '%credit%'
        THEN 'SOX, Financial Data Protection, Employment Law'
        -- Health data: Health information privacy laws
        WHEN c.name LIKE '%medical%' OR c.name LIKE '%health%'
        THEN 'HIPAA, Health Data Protection, GDPR'
        -- HR schema: Employment and privacy law
        WHEN s.name = 'HumanResources'
        THEN 'Employment Law, GDPR, CCPA'
        -- Default: Standard data protection requirements
        ELSE 'Standard Data Protection'
    END AS compliance_scope,
    
    -- ========================================================================
    -- ACTIONABLE RECOMMENDATIONS FOR DATA GOVERNANCE
    -- ========================================================================
    -- Provide specific, actionable recommendations for each column type
    CASE
        -- Highest security: Encryption and strict access control
        WHEN c.name LIKE '%ssn%' OR c.name LIKE '%passport%' OR c.name LIKE '%credit%' OR c.name LIKE '%password%'
        THEN 'ENCRYPT - Apply strong encryption and restrict access'
        -- Moderate security: Data masking for non-production environments
        WHEN c.name LIKE '%name%' OR c.name LIKE '%email%' OR c.name LIKE '%phone%'
        THEN 'MASK - Consider data masking for non-production'
        -- Access control: Limit access to authorized personnel
        WHEN c.name LIKE '%address%' OR c.name LIKE '%birth%' OR c.name LIKE '%salary%'
        THEN 'RESTRICT - Limit access to authorized personnel only'
        -- Monitoring: Track access and usage patterns
        WHEN c.name LIKE '%gender%' OR c.name LIKE '%age%' OR c.name LIKE '%title%'
        THEN 'MONITOR - Track access and usage patterns'
        -- Manual review: Potential PII requiring human validation
        WHEN s.name IN ('Person', 'HumanResources') AND 
             (ty.name IN ('varchar', 'nvarchar', 'char', 'nchar') OR ty.name LIKE '%date%')
        THEN 'REVIEW - Manual review recommended for PII classification'
        -- Standard controls: Apply standard data governance
        ELSE 'STANDARD - Apply standard data governance controls'
    END AS recommended_action

-- ============================================================================
-- FROM CLAUSE: SYSTEM CATALOG JOINS FOR COLUMN METADATA
-- ============================================================================
FROM sys.schemas s
    -- Core table information
    JOIN sys.tables t ON s.schema_id = t.schema_id
    -- Individual column details
    JOIN sys.columns c ON t.object_id = c.object_id
    -- Data type information with precision/scale details
    JOIN sys.types ty ON c.user_type_id = ty.user_type_id
WHERE t.type = 'U'  -- User tables only (exclude views, functions, etc.)

-- ============================================================================
-- ORDERING: RISK-BASED PRIORITIZATION
-- ============================================================================
-- Sort results to show highest-risk columns first for immediate attention
ORDER BY 
    CASE 
        -- Highest priority: Government IDs and financial data
        WHEN c.name LIKE '%ssn%' OR c.name LIKE '%passport%' OR c.name LIKE '%credit%' THEN 1
        -- High priority: Personal names and contact information
        WHEN c.name LIKE '%name%' OR c.name LIKE '%email%' OR c.name LIKE '%phone%' THEN 2
        -- Medium priority: Known sensitive schemas
        WHEN s.name IN ('Person', 'HumanResources') THEN 3
        -- Lower priority: Other columns
        ELSE 4
    END,
    s.name, t.name, c.name;  -- Alphabetical within priority groups

PRINT 'Column-level PII analysis completed!';

-- ============================================================================
-- SECTION 4: COMPREHENSIVE REPORTING AND ANALYSIS
-- ============================================================================

-- ========================================================================
-- REPORT 1: HIGH-RISK PII COLUMNS REQUIRING IMMEDIATE ATTENTION
-- ========================================================================
-- Shows columns that need immediate security and governance action
SELECT 
    'HIGH-RISK PII COLUMNS REQUIRING IMMEDIATE ATTENTION' AS report_section,
    schema_name,                                        -- Schema containing the column
    table_name,                                         -- Table containing the column
    column_name,                                        -- Specific column name
    data_type,                                          -- Data type for context
    pii_category,                                       -- Type of PII detected
    pii_risk_level,                                     -- Risk assessment level
    recommended_action,                                 -- Specific action to take
    compliance_scope                                    -- Regulatory requirements
FROM #ColumnLevelPII
WHERE pii_risk_level IN ('HIGH_PII_RISK', 'MEDIUM_PII_RISK')  -- Focus on actionable risk levels
ORDER BY 
    CASE pii_risk_level WHEN 'HIGH_PII_RISK' THEN 1 ELSE 2 END,  -- High risk first
    schema_name, table_name, column_name;               -- Alphabetical within risk level

-- ========================================================================
-- REPORT 2: PII DISTRIBUTION BY CATEGORY
-- ========================================================================
-- Provides overview of PII types found across the database
SELECT 
    'PII DISTRIBUTION BY CATEGORY' AS report_section,
    COALESCE(pii_category, 'NO_PII') AS pii_category,   -- Handle NULL categories
    COUNT(*) AS column_count,                           -- Total columns in this category
    -- Count columns by risk level within each category
    COUNT(CASE WHEN pii_risk_level = 'HIGH_PII_RISK' THEN 1 END) AS high_risk_columns,
    COUNT(CASE WHEN pii_risk_level = 'MEDIUM_PII_RISK' THEN 1 END) AS medium_risk_columns,
    COUNT(CASE WHEN pii_risk_level = 'LOW_PII_RISK' THEN 1 END) AS low_risk_columns
FROM #ColumnLevelPII
GROUP BY pii_category
ORDER BY high_risk_columns DESC, medium_risk_columns DESC;  -- Most critical categories first

-- ========================================================================
-- REPORT 3: SCHEMA-LEVEL PII RISK SUMMARY
-- ========================================================================
-- Shows PII distribution and risk levels by database schema
SELECT 
    'PII RISK BY SCHEMA' AS report_section,
    schema_name,                                        -- Database schema
    COUNT(*) AS total_columns,                          -- Total columns in schema
    COUNT(CASE WHEN pii_risk_level <> 'NO_PII_DETECTED' THEN 1 END) AS pii_columns,  -- Columns with PII
    COUNT(CASE WHEN pii_risk_level = 'HIGH_PII_RISK' THEN 1 END) AS high_risk_columns,
    COUNT(CASE WHEN pii_risk_level = 'MEDIUM_PII_RISK' THEN 1 END) AS medium_risk_columns,
    -- Calculate PII density percentage for each schema
    CAST(COUNT(CASE WHEN pii_risk_level <> 'NO_PII_DETECTED' THEN 1 END) * 100.0 / COUNT(*) AS DECIMAL(5,1)) AS pii_percentage
FROM #ColumnLevelPII
GROUP BY schema_name
ORDER BY high_risk_columns DESC, medium_risk_columns DESC;  -- Highest risk schemas first

-- ========================================================================
-- REPORT 4: COMPLETE PII INVENTORY FOR DATA CATALOG
-- ========================================================================
-- Comprehensive listing of all PII columns for data governance tools
SELECT 
    'DETAILED PII INVENTORY FOR DATA CATALOG' AS report_section,
    full_column_name,                                   -- Complete column identifier
    data_type,                                          -- Technical data type information
    CASE WHEN is_nullable = 1 THEN 'YES' ELSE 'NO' END AS nullable,  -- NULL constraint info
    pii_category,                                       -- Specific PII type
    pii_risk_level,                                     -- Risk assessment
    data_sensitivity,                                   -- Business sensitivity level
    recommended_action                                  -- Specific governance action
FROM #ColumnLevelPII
WHERE pii_risk_level <> 'NO_PII_DETECTED'              -- Only show columns with detected PII
ORDER BY 
    -- Prioritize by risk level for action planning
    CASE pii_risk_level 
        WHEN 'HIGH_PII_RISK' THEN 1 
        WHEN 'MEDIUM_PII_RISK' THEN 2 
        WHEN 'LOW_PII_RISK' THEN 3 
        ELSE 4 
    END,
    schema_name, table_name, column_name;

-- ========================================================================
-- COMPLETION STATUS AND SUMMARY STATISTICS
-- ========================================================================
-- Provides final summary and guidance for next steps
SELECT 
    'COLUMN-LEVEL ANALYSIS COMPLETED' AS status,
    'Results stored in: #ColumnLevelPII (session scope)' AS temp_table_location,
    'Total columns analyzed: ' + CAST((SELECT COUNT(*) FROM #ColumnLevelPII) AS VARCHAR(10)) AS total_columns,
    'PII columns detected: ' + CAST((SELECT COUNT(*) FROM #ColumnLevelPII WHERE pii_risk_level <> 'NO_PII_DETECTED') AS VARCHAR(10)) AS pii_columns_found;

-- ============================================================================
-- USAGE NOTES AND RECOMMENDATIONS
-- ============================================================================
-- 1. Review high-risk columns immediately for security controls
-- 2. Validate PII detection results manually - heuristic matching has limitations
-- 3. Use results to populate data catalogs and governance tools
-- 4. Implement recommended actions based on risk levels and compliance requirements
-- 5. Schedule regular re-analysis to catch schema changes and new PII
-- 6. Consider data lineage analysis for complete PII flow mapping
-- ============================================================================

Results

Again alot to go through and check ..!

The script will generate 6 tmp tables:

Results 1 - High Risk Columns requiring immediate attention
Results 1 (2) - Distribution by Category
Results 1 (3) - Risk by Schema
Results 1 (4) - Detailed PII Inventory
Results 1 (5) - Analysis completed details

Take a look at:

PII Summary - Schema

The schema-level analysis of AdventureWorks2022 reveals that 13.4% of all database columns (65 out of 486) contain personally identifiable information, with significant concentration in the Person schema (52.9% PII coverage) and HumanResources schema (40.0% PII coverage). Critical security concerns are identified in the Sales schema, which contains 5 of the 7 total high-risk columns despite having only 5.8% overall PII coverage.

Two schemas require immediate attention: Person schema due to its high PII density and Sales schema due to its concentration of high-risk financial data. The remaining schemas (Production and Purchasing) have minimal PII exposure, with less than 5% PII coverage each.

373B

Schema_level_PII_Summary_SELECT_PII_RISK_BY_SCHEMA_AS_report_se_202508080409.csv

Results 1 (3) - Risk by Schema

KEY CONCERNS:

Massive Data Exposure Volume:

Person Schema: 37 PII columns affecting ~20,000 person records (52.9% of schema is PII)
Sales Schema: 5 HIGH_PII_RISK columns in financial data affecting ~19,000+ credit records
HumanResources Schema: 16 PII columns affecting 290+ employee records (40% of schema is PII)
Total Database Impact: 65 PII columns across 486 total columns (13.4% PII coverage)

Risk Concentration Issues:

71% of all HIGH_PII_RISK columns are in Sales schema (5 of 7 total)
Password table contains unencrypted or poorly protected authentication data
Credit card data spread across multiple tables (CreditCard, PersonCreditCard, SalesOrderHeader)
32 columns marked as POTENTIAL_PII requiring immediate review

Classification Gaps Identified:

Sales.SalesTaxRate.SalesTaxRateID incorrectly marked as FINANCIAL/HIGH_PII_RISK
29 columns have no PII category assigned despite being in PII-heavy schemas
ModifiedDate columns throughout Person schema marked as POTENTIAL_PII (false positives)

SECURITY & COMPLIANCE RISKS:

Immediate Security Threats:

Password table with 20K+ records needs immediate encryption audit - AUTHENTICATION BREACH RISK
Credit card data in multiple tables - PCI-DSS VIOLATION RISK
Email addresses for 20K persons - PHISHING/SPAM TARGET RISK
Employee personal data - INSIDER THREAT/HR BREACH RISK

Data Concentration Vulnerabilities:

Person schema is 52.9% PII - SINGLE POINT OF FAILURE
Sales financial data not properly isolated - LATERAL MOVEMENT RISK
No apparent data masking in place - DEV/TEST ENVIRONMENT EXPOSURE

Access Control Gaps:

No indication of column-level security implementation
Recommended actions show "ENCRYPT" for only 6 columns (should be more)
"RESTRICT" recommended for only 9 columns (inadequate for volume)

DATA GOVERNANCE GAPS:

PII Detection Logic Failures:

Critical miss: Some customer-related tables may have undetected PII
Inconsistent classification between risk levels and sensitivity ratings
32 columns requiring manual review (50% of all PII columns) - indicates weak automated detection
False positives: ModifiedDate columns incorrectly flagged as POTENTIAL_PII

Remediation Tracking Issues:

Recommended actions appear incomplete or truncated
No versioning or audit trail for PII classification changes
No clear ownership assignment for remediation tasks

Monitoring Deficiencies:

Only 7 columns marked for "MONITOR" action (should be all HIGH/MEDIUM risk)
No real-time alerting mentioned for high-risk data access
No indication of existing DLP (Data Loss Prevention) controls

COMPLIANCE EXPOSURE:

GDPR/CCPA/PIPEDA - EXTREME RISK:

All Person schema data (37 columns, ~20K records) falls under privacy regulations
No evidence of:
- Right to deletion implementation
- Data portability capabilities
- Consent management
- Cross-border transfer controls
Potential fines: Up to 4% of global revenue (GDPR)

PCI-DSS - CRITICAL NON-COMPLIANCE:

5 HIGH_PII_RISK financial columns identified
Credit card data in Sales.CreditCard, Sales.PersonCreditCard, Sales.SalesOrderHeader
Missing requirements:
- Encryption at rest (only 6 columns marked for encryption)
- Network segmentation
- Access logging
Risk: Loss of payment processing capabilities

Employment Law - HIGH RISK:

HumanResources.Employee with 290 records
HumanResources.JobCandidate data exposed
Missing controls:
- Role-based access
- Retention policies
- Cross-border restrictions
Risk: Employment litigation, regulatory penalties

SOX Compliance (if applicable):

Financial data in Sales schema
No audit trail mentioned
Missing segregation of duties
Risk: Executive personal liability

Detailed PII Inventory

The PII detection scan of the AdventureWorks2022 database has identified significant personally identifiable information across 65 columns in multiple schemas. The analysis reveals that approximately 55% of scanned columns contain confirmed or potential PII data, with 7 columns classified as high-risk requiring immediate attention for encryption and access controls.

Key findings indicate that the Person and HumanResources schemas contain the majority of sensitive data, with authentication credentials and financial information representing the highest risk categories. The organization should prioritize implementing encryption for high-risk columns and establishing comprehensive data governance controls for medium-risk data elements.

Take a look at:

11KB

ColumnLevelPII_202508080410.csv

Results 1 (4) - Detailed PII Inventory

Here's a summary of the analysis:

CRITICAL:

Password table with ~20,000 records containing authentication credentials (HIGH_PII_RISK)
Credit card data across multiple Sales tables affecting ~19,000+ records
Person schema is 52.9% PII - massive single point of failure
71% of all high-risk columns concentrated in Sales schema

High-Risk PII Columns Requiring Immediate Action

The following columns contain highly sensitive data requiring immediate implementation of encryption and access controls:

Authentication Data:

Person.Password.PasswordHash - User password hashes
Person.Password.PasswordSalt - Password salt values

Financial Data:

Sales.CreditCard.CreditCardID - Credit card identifiers
Sales.PersonCreditCard.CreditCardID - Person-credit card associations
Sales.SalesOrderHeader.CreditCardApprovalCode - Credit approval codes
Sales.SalesOrderHeader.CreditCardID - Order credit card references
Sales.SalesTaxRate.SalesTaxRateID - Tax rate identifiers

MAJOR RISKS IDENTIFIED:

Authentication breach risk from exposed password hashes/salts
PCI-DSS violations from credit card data exposure
GDPR non-compliance affecting ~20,000 person records
Potential financial impact: $15M-$75M in breach costs and fines

IMMEDIATE ACTIONS (First 24 hours):

Freeze access to Person.Password table
Audit encryption status of all authentication data
Identify all systems accessing credit card tables
Enable comprehensive audit logging

PreviousData Discovery NextClassification & Sensitivity Analysis

Last updated 2 days ago

Was this helpful?

Overview

Inventory Analysis

DBeaver

Inventory Analysis

Analysis Results

Results 1 (5) - High Priority Items for Immediate Review

CRITICAL:

KEY CONCERNS:

Security & Compliance Risks:

Data Governance Gaps:

Compliance Exposure:

IMMEDIATE NEXT STEPS:

Phase 1: Urgent Security Actions (This Week)

Phase 2: Classification Fixes (Next 2 Weeks)

Phase 3: Governance Framework (Next Month)

PII Assessment

PII Assessment

Results

PII Summary - Schema

Results 1 (3) - Risk by Schema

KEY CONCERNS:

SECURITY & COMPLIANCE RISKS:

DATA GOVERNANCE GAPS:

COMPLIANCE EXPOSURE:

Detailed PII Inventory

Results 1 (4) - Detailed PII Inventory

High-Risk PII Columns Requiring Immediate Action

Documentation

Documentation