Classification & Sensitivity Analysis

Understanding Key Data Catalog KPIs ..

Overview

This comprehensive workshop implements Trust Score and Sensitivity Classification calculations at three distinct levels: Schema, Table, and Column. Each level provides unique insights for different stakeholder needs and governance requirements.

Analysis Levels:

Schema Level: Department/domain-wide governance metrics
Table Level: Asset-specific trust and sensitivity scores
Column Level: Granular PII detection and field-level classification

By the end of this workshop, you will be able to:

Calculate Trust Scores (0-100) and Sensitivity Levels (HIGH/MEDIUM/LOW) at all three levels
Implement hierarchical data governance with drill-down capabilities
Generate executive dashboards with schema-level KPIs
Provide detailed column-level PII analysis for compliance
Create actionable insights for each organizational level

Why does this matter?

Data classification and sensitivity analysis are fundamental pillars of the Data Discovery phase because they establish the foundation for data governance, security, and compliance throughout your entire data ecosystem. Here's why they're so critical:

Security Risk Management: Without understanding what sensitive data exists in your AdventureWorks2022 database, you can't properly protect it. Personal information, financial data, and business-critical information each require different security controls.

Regulatory Compliance: Classification helps ensure you're meeting requirements like GDPR, CCPA, HIPAA, or industry-specific regulations. You need to know where personal data, health information, or payment card data resides to implement proper controls.

Data Governance: Classification creates a common language across your organization for discussing data assets. It enables consistent policies for data access, retention, and usage.

Access Control: You can't implement least-privilege access without knowing what data is sensitive. Classification drives your authorization models and data access policies.

Classification and Sensitivity Analysis

Classification & Sensitivity Analysis automatically evaluates and categorizes data elements based on their content, risk level, and business importance. This analysis provides the intelligence needed to implement appropriate governance controls and protect sensitive information.

Key Components

Data Classification uses pattern matching to identify data types (PII, financial, health) and assigns risk levels (Critical, High, Medium, Low). It maps data to applicable regulations like GDPR, CCPA, and HIPAA.

Business Context Analysis groups data by function (Sales, Customer, Operations) and assesses business criticality and lifecycle stage.

Trust Score Calculation

The Trust Score (0-100) measures data reliability through a weighted formula:

Business Value (30%) - Inferred importance based on schema and naming patterns
Data Type Appropriateness (25%) - How well the data type matches the column's purpose
Constraints (25%) - Coverage of integrity controls (PK, FK, NOT NULL, defaults, checks)
Naming Quality (20%) - Adherence to naming conventions and descriptiveness

Scoring Levels:

90-100: Excellent (highly reliable, well-governed)
75-89: Good (reliable with minor issues)
60-74: Fair (acceptable but needs improvement)
Below 60: Poor (quality issues requiring attention)

This composite score helps prioritize governance efforts and identify data elements requiring quality improvements or enhanced controls.

Schema Analysis

Schema-level metrics provide executive oversight and departmental governance KPIs.

Calculate Sensitivity Level and Trust Scores.

Results

Take a look at:

2KB

SchemaLevelAnalysis_202508080830.csv

The Schema Sensitivity & Trust Scores can now be set.

Schema

Sensitivity Level

Trust Score

Sales

HIGH

Person

HIGH

Production

MEDIUM

HumanResources

HIGH

Purchasing

HIGH

-- ============================================================================
-- MULTI-LEVEL DATA GOVERNANCE ANALYSIS FOR ADVENTUREWORKS2022
-- Schema Level | Table Level | Column Level
-- ============================================================================

USE AdventureWorks2022;

-- ============================================================================
-- SECTION 1: SCHEMA-LEVEL ANALYSIS
-- ============================================================================
-- PURPOSE: Calculate Trust Scores and Sensitivity Classifications at the schema level
-- SCOPE: Provides executive-level KPIs for departmental governance oversight
-- OUTPUT: Schema-level metrics for Pentaho Data Catalog domain management
-- ============================================================================

-- Clean up any existing temp tables to ensure fresh analysis
-- This prevents conflicts if the script is run multiple times in the same session
IF OBJECT_ID('tempdb..#SchemaLevelAnalysis') IS NOT NULL DROP TABLE #SchemaLevelAnalysis;
IF OBJECT_ID('tempdb..#TableLevelAnalysis') IS NOT NULL DROP TABLE #TableLevelAnalysis;
IF OBJECT_ID('tempdb..#ColumnLevelAnalysis') IS NOT NULL DROP TABLE #ColumnLevelAnalysis;

PRINT 'Starting Multi-Level Data Governance Analysis...';
PRINT 'Analysis timestamp: ' + CAST(GETDATE() AS VARCHAR(50));

-- ============================================================================
-- SCHEMA-LEVEL RESULTS TABLE DEFINITION
-- ============================================================================
-- This table stores comprehensive schema-level KPIs that provide executive oversight
-- and departmental governance metrics suitable for Pentaho Data Catalog integration

CREATE TABLE #SchemaLevelAnalysis (
    -- ========================================================================
    -- PRIMARY KEY AND IDENTIFICATION
    -- ========================================================================
    schema_id INT IDENTITY(1,1) PRIMARY KEY,        -- Unique identifier for each schema record
    schema_name NVARCHAR(128) NOT NULL,             -- Database schema name (Person, Sales, etc.)
    
    -- ========================================================================
    -- FOUNDATIONAL SCHEMA METRICS
    -- ========================================================================
    -- These metrics provide the basic scale and scope of each schema
    total_tables INT NOT NULL,                      -- Count of user tables in this schema
    total_columns INT NOT NULL,                     -- Count of all columns across schema tables
    total_estimated_rows BIGINT,                    -- Sum of estimated rows across all tables
    total_size_mb DECIMAL(12,2),                    -- Total storage consumption in megabytes
    
    -- ========================================================================
    -- SCHEMA-LEVEL TRUST SCORE (0-100 Scale)
    -- ========================================================================
    -- Trust Score Components:
    -- • Data Volume & Structure Health (30%): Scale and organization quality
    -- • Usage Patterns (40%): How actively the schema is accessed
    -- • Business Context & Organization (20%): Clarity of business purpose
    -- • Data Governance Readiness (10%): Documentation and naming standards
    schema_trust_score DECIMAL(5,2) NOT NULL,       -- Calculated trust score (0-100)
    schema_trust_level VARCHAR(20) NOT NULL,        -- Categorical trust level (VERY_HIGH, HIGH, MEDIUM, LOW, VERY_LOW)
    
    -- ========================================================================
    -- SCHEMA-LEVEL SENSITIVITY CLASSIFICATION
    -- ========================================================================
    -- Sensitivity levels determined by:
    -- • PII content density and risk level
    -- • Business function criticality (HR, Sales, etc.)
    -- • Regulatory compliance requirements
    schema_sensitivity_level VARCHAR(10) NOT NULL,  -- HIGH/MEDIUM/LOW classification
    schema_sensitivity_score INT NOT NULL,          -- Numerical sensitivity score (0-100)
    
    -- ========================================================================
    -- DATA QUALITY AND RISK METRICS
    -- ========================================================================
    -- These metrics help assess data reliability and governance needs
    avg_completeness_pct DECIMAL(5,2),              -- Average completeness across schema tables
    pii_density_pct DECIMAL(5,2),                   -- Percentage of columns containing PII
    high_risk_table_count INT,                      -- Count of tables with critical PII exposure
    
    -- ========================================================================
    -- BUSINESS CONTEXT FOR GOVERNANCE
    -- ========================================================================
    -- Maps technical schemas to business functions for stakeholder communication
    business_domain VARCHAR(100),                   -- Business function description
    data_owner_department VARCHAR(50),              -- Responsible department/team
    governance_priority VARCHAR(20),                -- Priority level for governance attention
    
    -- ========================================================================
    -- COMPLIANCE AND REGULATORY REQUIREMENTS
    -- ========================================================================
    -- Helps legal and compliance teams understand regulatory exposure
    regulatory_frameworks NVARCHAR(500),            -- Applicable regulations (GDPR, CCPA, etc.)
    encryption_required_tables INT,                 -- Count of tables requiring encryption
    
    -- ========================================================================
    -- AUDIT AND TRACKING
    -- ========================================================================
    analysis_timestamp DATETIME DEFAULT GETDATE()  -- When this analysis was performed
);

-- ============================================================================
-- SCHEMA-LEVEL DATA COLLECTION AND CALCULATION
-- ============================================================================
-- This section uses Common Table Expressions (CTEs) to gather and calculate
-- the various components needed for schema-level trust scores and sensitivity
-- ============================================================================

-- CTE 1: FOUNDATIONAL SCHEMA METRICS
-- ============================================================================
-- Collects basic statistical information about each schema's structure and size
-- This forms the foundation for trust score calculations
WITH SchemaMetrics AS (
    SELECT 
        s.schema_id,                                 -- Schema system identifier
        s.name AS schema_name,                       -- Human-readable schema name
        
        -- STRUCTURAL METRICS: Count tables and columns for scope assessment
        COUNT(DISTINCT t.object_id) AS table_count, -- Number of user tables (excludes views, system tables)
        COUNT(c.column_id) AS column_count,         -- Total columns across all tables in schema
        
        -- DATA VOLUME METRICS: Assess scale and business importance
        COALESCE(SUM(p.rows), 0) AS total_rows,     -- Sum of estimated rows across all tables
        
        -- STORAGE METRICS: Calculate total space consumption
        -- This complex subquery calculates actual storage used by summing allocation pages
        COALESCE(SUM(
            (SELECT SUM(a.total_pages) * 8 / 1024.0 -- Convert 8KB pages to MB
             FROM sys.partitions p2 
             JOIN sys.allocation_units a ON p2.partition_id = a.container_id 
             WHERE p2.object_id = t.object_id         -- Match specific table
             AND p2.index_id IN (0,1))                -- Include heap (0) and clustered index (1) data
        ), 0) AS total_size_mb
        
    FROM sys.schemas s                               -- All database schemas
    LEFT JOIN sys.tables t ON s.schema_id = t.schema_id AND t.type = 'U'  -- Only user tables
    LEFT JOIN sys.columns c ON t.object_id = c.object_id                  -- All columns in user tables
    LEFT JOIN sys.partitions p ON t.object_id = p.object_id AND p.index_id IN (0,1)  -- Row count estimates
    GROUP BY s.schema_id, s.name                     -- Group by schema for aggregation
),

-- CTE 2: PII DETECTION AND RISK ASSESSMENT
-- ============================================================================
-- Analyzes column names to detect potential Personally Identifiable Information (PII)
-- Uses pattern matching to classify PII risk levels for compliance assessment
SchemaPIIAnalysis AS (
    SELECT 
        s.schema_id,
        
        -- HIGH-RISK PII DETECTION
        -- These patterns indicate the most sensitive data requiring strongest protection
        COUNT(CASE 
            WHEN c.name LIKE '%ssn%' OR c.name LIKE '%social%' OR     -- Social Security Numbers
                 c.name LIKE '%password%' OR c.name LIKE '%credit%' OR -- Authentication & Financial
                 c.name LIKE '%account%number%' OR c.name LIKE '%license%' OR  -- Account Numbers & IDs
                 c.name LIKE '%passport%'                             -- Government IDs
            THEN 1 
        END) AS high_pii_columns,
        
        -- MEDIUM-RISK PII DETECTION  
        -- Personal identifiers that require significant protection but are less critical than high-risk
        COUNT(CASE 
            WHEN c.name LIKE '%first%name%' OR c.name LIKE '%last%name%' OR  -- Personal names
                 c.name LIKE '%email%' OR c.name LIKE '%phone%' OR           -- Contact information
                 c.name LIKE '%address%' OR c.name LIKE '%birth%' OR         -- Location & demographic
                 c.name LIKE '%salary%' OR c.name LIKE '%wage%'              -- Financial compensation
            THEN 1 
        END) AS medium_pii_columns,
        
        -- LOW-RISK PII DETECTION
        -- Demographic information that may have privacy implications but lower risk
        COUNT(CASE 
            WHEN c.name LIKE '%gender%' OR c.name LIKE '%age%' OR      -- Basic demographics
                 c.name LIKE '%title%' OR c.name LIKE '%nationality%'  -- Professional/cultural info
            THEN 1 
        END) AS low_pii_columns,
        
        -- TOTAL COLUMN COUNT for calculating PII density percentages
        COUNT(c.column_id) AS total_columns
        
    FROM sys.schemas s
    LEFT JOIN sys.tables t ON s.schema_id = t.schema_id AND t.type = 'U'  -- Only user tables
    LEFT JOIN sys.columns c ON t.object_id = c.object_id                  -- All columns
    GROUP BY s.schema_id
),

-- CTE 3: USAGE PATTERN ANALYSIS
-- ============================================================================
-- Analyzes how actively each schema is being used based on SQL Server statistics
-- Usage patterns are a key indicator of data trust and business value
SchemaUsagePatterns AS (
    SELECT 
        s.schema_id,
        
        -- AVERAGE USAGE CALCULATION
        -- Combines seeks (index lookups), scans (table scans), and lookups (key lookups)
        AVG(COALESCE(us.user_seeks + us.user_scans + us.user_lookups, 0)) AS avg_usage,
        
        -- RECENT ACCESS ANALYSIS
        -- Counts tables that have been accessed within the last 30 days
        -- This indicates active vs. dormant data
        COUNT(CASE 
            WHEN DATEDIFF(day, COALESCE(us.last_user_seek, us.last_user_scan, us.last_user_lookup, '1900-01-01'), GETDATE()) <= 30 
            THEN 1 
        END) AS recently_accessed_tables,
        
        -- TOTAL TABLE COUNT for calculating percentage of active tables
        COUNT(t.object_id) AS total_tables
        
    FROM sys.schemas s
    LEFT JOIN sys.tables t ON s.schema_id = t.schema_id AND t.type = 'U'  -- Only user tables
    -- LEFT JOIN to usage stats allows for tables with no recorded usage (new or unused tables)
    LEFT JOIN sys.dm_db_index_usage_stats us ON t.object_id = us.object_id AND us.index_id IN (0,1)
    GROUP BY s.schema_id
)

-- ============================================================================
-- MAIN INSERT STATEMENT: SCHEMA-LEVEL KPI CALCULATION
-- ============================================================================
-- This section combines all the CTE data to calculate final schema-level KPIs
-- The calculations use weighted algorithms to produce meaningful business metrics
INSERT INTO #SchemaLevelAnalysis (
    schema_name, total_tables, total_columns, total_estimated_rows, total_size_mb,
    schema_trust_score, schema_trust_level, schema_sensitivity_level, schema_sensitivity_score,
    avg_completeness_pct, pii_density_pct, high_risk_table_count,
    business_domain, data_owner_department, governance_priority,
    regulatory_frameworks, encryption_required_tables
)
SELECT 
    sm.schema_name,                                  -- Schema identifier for reporting
    sm.table_count AS total_tables,                 -- Count of user tables in schema
    sm.column_count AS total_columns,               -- Total columns across all schema tables
    sm.total_rows AS total_estimated_rows,          -- Sum of estimated rows across tables
    sm.total_size_mb,                               -- Total storage consumption in MB
    
    -- ========================================================================
    -- SCHEMA-LEVEL TRUST SCORE CALCULATION (0-100 Scale)
    -- ========================================================================
    -- Trust Score Algorithm combines four weighted components:
    -- 1. Data Volume & Structure Health (30%): Measures schema scale and organization
    -- 2. Usage Patterns (40%): Indicates business value through access frequency  
    -- 3. Business Context & Organization (20%): Assesses clarity of business purpose
    -- 4. Data Governance Readiness (10%): Evaluates documentation and standards
    CAST(
        -- COMPONENT 1: DATA VOLUME & STRUCTURE HEALTH (30% of total score)
        -- Larger, well-populated schemas typically indicate higher business value
        -- and organizational investment, contributing to trustworthiness
        (CASE
            WHEN sm.table_count > 10 AND sm.total_rows > 50000 THEN 30.0    -- Large, well-populated schema
            WHEN sm.table_count > 5 AND sm.total_rows > 10000 THEN 25.0     -- Medium-sized operational schema
            WHEN sm.table_count > 2 AND sm.total_rows > 1000 THEN 20.0      -- Small but active schema
            WHEN sm.table_count > 0 THEN 15.0                               -- Minimal schema with some data
            ELSE 5.0                                                         -- Empty or nearly empty schema
        END) +
        
        -- COMPONENT 2: USAGE PATTERNS (40% of total score)
        -- High usage frequency and recent access indicate active business reliance
        -- This is the highest weighted component as usage reflects real business value
        (CASE
            -- Excellent usage: High frequency + all tables recently accessed
            WHEN sup.avg_usage > 1000 AND sup.recently_accessed_tables = sup.total_tables THEN 40.0
            -- Good usage: Regular access + most tables active
            WHEN sup.avg_usage > 100 AND sup.recently_accessed_tables >= sup.total_tables * 0.7 THEN 30.0
            -- Fair usage: Some access + half the tables active  
            WHEN sup.avg_usage > 10 AND sup.recently_accessed_tables >= sup.total_tables * 0.5 THEN 20.0
            -- Low usage: Minimal access but some activity
            WHEN sup.recently_accessed_tables > 0 THEN 15.0
            -- No usage: No recorded access (potential dormant data)
            ELSE 5.0
        END) +
        
        -- COMPONENT 3: BUSINESS CONTEXT & ORGANIZATION (20% of total score)  
        -- Well-defined business domains indicate mature data organization
        -- and clear data ownership, improving trust and governance
        (CASE sm.schema_name
            WHEN 'Person' THEN 18.0         -- Customer data: well-organized, critical business function
            WHEN 'Sales' THEN 20.0          -- Revenue operations: highest business priority
            WHEN 'Production' THEN 16.0     -- Manufacturing: clear operational purpose
            WHEN 'HumanResources' THEN 18.0 -- HR data: well-defined domain with clear ownership
            WHEN 'Purchasing' THEN 15.0     -- Procurement: defined business function
            ELSE 10.0                        -- Other schemas: unclear business purpose
        END) +
        
        -- COMPONENT 4: DATA GOVERNANCE READINESS (10% of total score)
        -- Schemas with clear business purpose and established patterns
        -- are more ready for governance frameworks and policy application
        (CASE
            -- High readiness: Known business-critical schemas
            WHEN sm.schema_name IN ('Person', 'Sales', 'HumanResources') THEN 10.0
            -- Medium readiness: Multiple tables suggest established use
            WHEN sm.table_count > 5 THEN 8.0
            -- Basic readiness: Some structure exists
            WHEN sm.table_count > 0 THEN 6.0
            -- Low readiness: Minimal or unclear structure
            ELSE 2.0
        END)
    AS DECIMAL(5,2)) AS schema_trust_score,
    
    -- ========================================================================
    -- SCHEMA TRUST LEVEL CLASSIFICATION
    -- ========================================================================
    -- Converts numerical trust scores into categorical levels for easier interpretation
    -- These categories help executives and managers quickly understand schema reliability
    CASE
        WHEN (
            -- Recalculate the same trust score for comparison against thresholds
            -- This ensures consistent classification based on the calculated score
            (CASE
                WHEN sm.table_count > 10 AND sm.total_rows > 50000 THEN 30.0
                WHEN sm.table_count > 5 AND sm.total_rows > 10000 THEN 25.0
                WHEN sm.table_count > 2 AND sm.total_rows > 1000 THEN 20.0
                WHEN sm.table_count > 0 THEN 15.0
                ELSE 5.0
            END) +
            (CASE
                WHEN sup.avg_usage > 1000 AND sup.recently_accessed_tables = sup.total_tables THEN 40.0
                WHEN sup.avg_usage > 100 AND sup.recently_accessed_tables >= sup.total_tables * 0.7 THEN 30.0
                WHEN sup.avg_usage > 10 AND sup.recently_accessed_tables >= sup.total_tables * 0.5 THEN 20.0
                WHEN sup.recently_accessed_tables > 0 THEN 15.0
                ELSE 5.0
            END) +
            (CASE sm.schema_name
                WHEN 'Person' THEN 18.0
                WHEN 'Sales' THEN 20.0
                WHEN 'Production' THEN 16.0
                WHEN 'HumanResources' THEN 18.0
                WHEN 'Purchasing' THEN 15.0
                ELSE 10.0
            END) +
            (CASE
                WHEN sm.schema_name IN ('Person', 'Sales', 'HumanResources') THEN 10.0
                WHEN sm.table_count > 5 THEN 8.0
                WHEN sm.table_count > 0 THEN 6.0
                ELSE 2.0
            END)
        ) >= 80 THEN 'VERY_HIGH'        -- 80-100: Excellent reliability, high business value
        WHEN (
            (CASE
                WHEN sm.table_count > 10 AND sm.total_rows > 50000 THEN 30.0
                WHEN sm.table_count > 5 AND sm.total_rows > 10000 THEN 25.0
                WHEN sm.table_count > 2 AND sm.total_rows > 1000 THEN 20.0
                WHEN sm.table_count > 0 THEN 15.0
                ELSE 5.0
            END) +
            (CASE
                WHEN sup.avg_usage > 1000 AND sup.recently_accessed_tables = sup.total_tables THEN 40.0
                WHEN sup.avg_usage > 100 AND sup.recently_accessed_tables >= sup.total_tables * 0.7 THEN 30.0
                WHEN sup.avg_usage > 10 AND sup.recently_accessed_tables >= sup.total_tables * 0.5 THEN 20.0
                WHEN sup.recently_accessed_tables > 0 THEN 15.0
                ELSE 5.0
            END) +
            (CASE sm.schema_name
                WHEN 'Person' THEN 18.0
                WHEN 'Sales' THEN 20.0
                WHEN 'Production' THEN 16.0
                WHEN 'HumanResources' THEN 18.0
                WHEN 'Purchasing' THEN 15.0
                ELSE 10.0
            END) +
            (CASE
                WHEN sm.schema_name IN ('Person', 'Sales', 'HumanResources') THEN 10.0
                WHEN sm.table_count > 5 THEN 8.0
                WHEN sm.table_count > 0 THEN 6.0
                ELSE 2.0
            END)
        ) >= 65 THEN 'HIGH'             -- 65-79: Good reliability, suitable for business use
        WHEN (
            (CASE
                WHEN sm.table_count > 10 AND sm.total_rows > 50000 THEN 30.0
                WHEN sm.table_count > 5 AND sm.total_rows > 10000 THEN 25.0
                WHEN sm.table_count > 2 AND sm.total_rows > 1000 THEN 20.0
                WHEN sm.table_count > 0 THEN 15.0
                ELSE 5.0
            END) +
            (CASE
                WHEN sup.avg_usage > 1000 AND sup.recently_accessed_tables = sup.total_tables THEN 40.0
                WHEN sup.avg_usage > 100 AND sup.recently_accessed_tables >= sup.total_tables * 0.7 THEN 30.0
                WHEN sup.avg_usage > 10 AND sup.recently_accessed_tables >= sup.total_tables * 0.5 THEN 20.0
                WHEN sup.recently_accessed_tables > 0 THEN 15.0
                ELSE 5.0
            END) +
            (CASE sm.schema_name
                WHEN 'Person' THEN 18.0
                WHEN 'Sales' THEN 20.0
                WHEN 'Production' THEN 16.0
                WHEN 'HumanResources' THEN 18.0
                WHEN 'Purchasing' THEN 15.0
                ELSE 10.0
            END) +
            (CASE
                WHEN sm.schema_name IN ('Person', 'Sales', 'HumanResources') THEN 10.0
                WHEN sm.table_count > 5 THEN 8.0
                WHEN sm.table_count > 0 THEN 6.0
                ELSE 2.0
            END)
        ) >= 50 THEN 'MEDIUM'           -- 50-64: Moderate reliability, may need improvement
        WHEN (
            (CASE
                WHEN sm.table_count > 10 AND sm.total_rows > 50000 THEN 30.0
                WHEN sm.table_count > 5 AND sm.total_rows > 10000 THEN 25.0
                WHEN sm.table_count > 2 AND sm.total_rows > 1000 THEN 20.0
                WHEN sm.table_count > 0 THEN 15.0
                ELSE 5.0
            END) +
            (CASE
                WHEN sup.avg_usage > 1000 AND sup.recently_accessed_tables = sup.total_tables THEN 40.0
                WHEN sup.avg_usage > 100 AND sup.recently_accessed_tables >= sup.total_tables * 0.7 THEN 30.0
                WHEN sup.avg_usage > 10 AND sup.recently_accessed_tables >= sup.total_tables * 0.5 THEN 20.0
                WHEN sup.recently_accessed_tables > 0 THEN 15.0
                ELSE 5.0
            END) +
            (CASE sm.schema_name
                WHEN 'Person' THEN 18.0
                WHEN 'Sales' THEN 20.0
                WHEN 'Production' THEN 16.0
                WHEN 'HumanResources' THEN 18.0
                WHEN 'Purchasing' THEN 15.0
                ELSE 10.0
            END) +
            (CASE
                WHEN sm.schema_name IN ('Person', 'Sales', 'HumanResources') THEN 10.0
                WHEN sm.table_count > 5 THEN 8.0
                WHEN sm.table_count > 0 THEN 6.0
                ELSE 2.0
            END)
        ) >= 35 THEN 'LOW'              -- 35-49: Low reliability, significant improvement needed
        ELSE 'VERY_LOW'                 -- 0-34: Very poor reliability, major issues exist
    END AS schema_trust_level,
    
    -- ========================================================================
    -- SCHEMA-LEVEL SENSITIVITY CLASSIFICATION (HIGH/MEDIUM/LOW)
    -- ========================================================================
    -- Determines data sensitivity based on PII content, business function, and regulatory risk
    -- This classification drives security controls, access policies, and compliance requirements
    CASE
        -- HIGH SENSITIVITY: Contains critical PII or operates in highly regulated business areas
        -- Triggers strongest security controls and compliance monitoring
        WHEN spa.high_pii_columns > 0 OR                                    -- Any government IDs, financial data, authentication
             sm.schema_name IN ('Person', 'HumanResources') OR              -- Known sensitive business domains
             (spa.medium_pii_columns > 0 AND spa.medium_pii_columns * 100.0 / NULLIF(spa.total_columns, 0) > 20) -- High PII density (>20%)
        THEN 'HIGH'
        
        -- MEDIUM SENSITIVITY: Contains some PII or handles business-critical operations
        -- Requires standard security controls and regular governance oversight  
        WHEN spa.medium_pii_columns > 0 OR                                  -- Any personal identifiers detected
             sm.schema_name IN ('Sales', 'Production') AND sm.total_rows > 10000 OR  -- Large business-critical datasets
             sm.total_rows > 50000                                          -- Large datasets regardless of domain
        THEN 'MEDIUM'
        
        -- LOW SENSITIVITY: Reference data, system schemas, or minimal business impact
        -- Standard governance controls sufficient
        ELSE 'LOW'
    END AS schema_sensitivity_level,
    
    -- Schema Sensitivity Score (0-100)
    CASE
        -- HIGH sensitivity scoring (70-100)
        WHEN spa.high_pii_columns > 0 THEN 85 + LEAST(spa.high_pii_columns * 3, 15)
        WHEN sm.schema_name = 'Person' THEN 80 + (spa.medium_pii_columns * 2)
        WHEN sm.schema_name = 'HumanResources' THEN 78 + (spa.medium_pii_columns * 2)
        WHEN (spa.medium_pii_columns * 100.0 / NULLIF(spa.total_columns, 0)) > 20 THEN 72 + (spa.medium_pii_columns)
        
        -- MEDIUM sensitivity scoring (40-69)
        WHEN spa.medium_pii_columns > 0 THEN 45 + (spa.medium_pii_columns * 3) + (spa.low_pii_columns)
        WHEN sm.schema_name = 'Sales' AND sm.total_rows > 10000 THEN 55
        WHEN sm.schema_name = 'Sales' THEN 48
        WHEN sm.schema_name IN ('Production', 'Purchasing') AND sm.total_rows > 10000 THEN 50
        WHEN sm.schema_name IN ('Production', 'Purchasing') THEN 42
        WHEN sm.total_rows > 50000 THEN 45
        
        -- LOW sensitivity scoring (5-39)
        WHEN spa.low_pii_columns > 0 THEN 25 + (spa.low_pii_columns * 2)
        WHEN sm.total_rows > 1000 THEN 20
        WHEN sm.table_count > 0 THEN 15
        ELSE 5
    END AS schema_sensitivity_score,
    
    -- Quality Metrics
    CAST(
        CASE 
            WHEN sm.column_count > 0 
            THEN (
                SELECT AVG(CAST(
                    COUNT(CASE WHEN c.is_nullable = 0 THEN 1 END) * 100.0 / COUNT(*)
                AS DECIMAL(5,2)))
                FROM sys.tables t2
                JOIN sys.columns c ON t2.object_id = c.object_id
                WHERE t2.schema_id = sm.schema_id AND t2.type = 'U'
                GROUP BY t2.object_id
            )
            ELSE 0 
        END
    AS DECIMAL(5,2)) AS avg_completeness_pct,
    
    -- PII Density Percentage
    CAST(
        CASE 
            WHEN spa.total_columns > 0 
            THEN (spa.high_pii_columns + spa.medium_pii_columns + spa.low_pii_columns) * 100.0 / spa.total_columns
            ELSE 0 
        END
    AS DECIMAL(5,2)) AS pii_density_pct,
    
    -- High Risk Table Count (tables with high PII + large data volume)
    (SELECT COUNT(*)
     FROM sys.tables t3
     WHERE t3.schema_id = sm.schema_id AND t3.type = 'U'
     AND EXISTS (
         SELECT 1 FROM sys.columns c3 
         WHERE c3.object_id = t3.object_id 
         AND (c3.name LIKE '%ssn%' OR c3.name LIKE '%password%' OR c3.name LIKE '%credit%')
     )
    ) AS high_risk_table_count,
    
    -- Business Context
    CASE sm.schema_name
        WHEN 'Person' THEN 'Customer & Identity Management'
        WHEN 'Sales' THEN 'Revenue Operations & Customer Relations'
        WHEN 'Production' THEN 'Manufacturing & Inventory Operations'
        WHEN 'HumanResources' THEN 'Employee Lifecycle Management'
        WHEN 'Purchasing' THEN 'Vendor Relations & Procurement'
        ELSE 'System & Reference Data'
    END AS business_domain,
    
    CASE sm.schema_name
        WHEN 'Person' THEN 'Customer Service'
        WHEN 'Sales' THEN 'Sales Operations'
        WHEN 'HumanResources' THEN 'Human Resources'
        WHEN 'Production' THEN 'Operations'
        WHEN 'Purchasing' THEN 'Procurement'
        ELSE 'IT Administration'
    END AS data_owner_department,
    
    -- Governance Priority
    CASE
        WHEN spa.high_pii_columns > 0 OR sm.schema_name IN ('Person', 'HumanResources') THEN 'CRITICAL'
        WHEN spa.medium_pii_columns > 0 OR sm.schema_name = 'Sales' THEN 'HIGH'
        WHEN sm.total_rows > 10000 THEN 'MEDIUM'
        ELSE 'LOW'
    END AS governance_priority,
    
    -- Regulatory Frameworks
    CASE
        WHEN spa.high_pii_columns > 0 OR sm.schema_name = 'Person'
        THEN 'GDPR, CCPA, PIPEDA, PII Protection Laws, Data Breach Notification'
        WHEN sm.schema_name = 'HumanResources'
        THEN 'GDPR, CCPA, Employment Law, Wage & Hour Regulations'
        WHEN sm.schema_name = 'Sales' AND spa.medium_pii_columns > 0
        THEN 'GDPR, CCPA, SOX, Financial Privacy Rules'
        WHEN sm.schema_name IN ('Production', 'Purchasing')
        THEN 'SOX, Industry Safety Regulations, Supply Chain Compliance'
        ELSE 'Standard Corporate Data Governance'
    END AS regulatory_frameworks,
    
    -- Encryption Required Tables Count
    (SELECT COUNT(*)
     FROM sys.tables t4
     WHERE t4.schema_id = sm.schema_id AND t4.type = 'U'
     AND (
         sm.schema_name IN ('Person', 'HumanResources') OR
         EXISTS (
             SELECT 1 FROM sys.columns c4 
             WHERE c4.object_id = t4.object_id 
             AND (c4.name LIKE '%ssn%' OR c4.name LIKE '%password%' OR c4.name LIKE '%credit%')
         )
     )
    ) AS encryption_required_tables

FROM SchemaMetrics sm
LEFT JOIN SchemaPIIAnalysis spa ON sm.schema_id = spa.schema_id
LEFT JOIN SchemaUsagePatterns sup ON sm.schema_id = sup.schema_id
WHERE sm.table_count > 0  -- Only include schemas with user tables
ORDER BY schema_trust_score DESC, schema_sensitivity_score DESC;

PRINT 'Schema-level analysis completed successfully!';

-- Display Schema-Level Results
SELECT
    'SCHEMA-LEVEL KPI DASHBOARD' AS report_section,
    schema_name,
    
    -- Key Metrics
    total_tables,
    total_columns,
    FORMAT(total_estimated_rows, 'N0') AS total_estimated_rows,
    FORMAT(total_size_mb, 'N2') + ' MB' AS total_size,
    
    -- Trust Score KPIs
    schema_trust_score,
    schema_trust_level,
    
    -- Sensitivity KPIs  
    schema_sensitivity_level,
    schema_sensitivity_score,
    
    -- Quality KPIs
    FORMAT(avg_completeness_pct, 'N1') + '%' AS avg_completeness,
    FORMAT(pii_density_pct, 'N1') + '%' AS pii_density,
    
    -- Business Context
    business_domain,
    data_owner_department,
    governance_priority,
    
    -- Risk Indicators
    high_risk_table_count,
    encryption_required_tables,
    
    -- Compliance
    regulatory_frameworks

FROM #SchemaLevelAnalysis
ORDER BY schema_trust_score DESC, schema_sensitivity_score DESC;

Table Analysis

Table-level metrics provide executive oversight and departmental governance KPIs.

Calculate Sensitivity Level and Trust Scores.

Results

Take a look at:

13KB

TableLevelAnalysis_202508080954.csv

The Table Sensitivity & Trust Scores can now be set.

Schema

Table

Sensitivity

Trust Score

Sales

CountryRegionCurrency

LOW

CreditCard

HIGH

Currency

LOW

CurrencyRate

MEDIUM

Customer

HIGH

PersonCreditCard

HIGH

SalesOrderDetail

MEDIUM

SalesOrderHeader

HIGH

100

SalesOrderHeaderSalesReason

MEDIUM

SalesPerson

LOW

SalesPersonQuotaHistory

LOW

SalesReason

LOW

SalesTaxRate

HIGH

SalesTerritory

LOW

SalesTerritoryHistory

LOW

ShoppingCartItem

LOW

SpecialOffer

LOW

SpecialOfferProduct

LOW

Store

LOW

-- ============================================================================
-- TABLE-LEVEL ANALYSIS -
-- ============================================================================
-- PURPOSE: Calculate Trust Scores and Sensitivity Classifications at table level
-- SCOPE: Provides asset-specific KPIs for operational data management and governance
-- OUTPUT: Table-level metrics for Pentaho Data Catalog asset management
-- BUSINESS VALUE: Enables data stewards to prioritize governance efforts by table risk
-- DATABASE: AdventureWorks2022 (adaptable to any SQL Server database)
-- ============================================================================

USE AdventureWorks2022;

-- Clean up any existing temp table for fresh execution
IF OBJECT_ID('tempdb..#TableLevelAnalysis') IS NOT NULL DROP TABLE #TableLevelAnalysis;

PRINT '============================================================================';
PRINT 'STARTING TABLE-LEVEL DATA GOVERNANCE ANALYSIS';
PRINT '============================================================================';
PRINT 'Analysis timestamp: ' + CAST(GETDATE() AS VARCHAR(50));
PRINT 'Target database: ' + DB_NAME();
PRINT 'Integration target: Pentaho Data Catalog';
PRINT '';

-- ============================================================================
-- TABLE-LEVEL RESULTS TABLE DEFINITION
-- ============================================================================
-- This table stores comprehensive table-level KPIs that provide operational oversight
-- and asset-specific governance metrics suitable for Pentaho Data Catalog integration

CREATE TABLE #TableLevelAnalysis (
    -- ========================================================================
    -- PRIMARY KEY AND IDENTIFICATION
    -- ========================================================================
    table_id INT IDENTITY(1,1) PRIMARY KEY,        -- Unique identifier for each table record
    schema_name NVARCHAR(128) NOT NULL,            -- Database schema name (Person, Sales, etc.)
    table_name NVARCHAR(128) NOT NULL,             -- Individual table name within schema
    full_table_name NVARCHAR(257) NOT NULL,        -- Fully qualified table name (schema.table)
    
    -- ========================================================================
    -- FOUNDATIONAL TABLE METRICS
    -- ========================================================================
    -- These metrics provide the basic scale and structure information for each table
    column_count INT NOT NULL,                     -- Number of columns in this table
    estimated_rows BIGINT,                         -- Approximate number of rows (from statistics)
    table_size_mb DECIMAL(10,2),                   -- Storage space consumed by this table in MB
    
    -- ========================================================================
    -- TABLE-LEVEL TRUST SCORE (0-100 Scale)
    -- ========================================================================
    -- Trust Score Components for Tables:
    -- • Data Quality (40%): Completeness, consistency, freshness of table data
    -- • Usage Patterns (30%): How frequently and recently the table is accessed
    -- • Technical Health (20%): Storage efficiency and structural appropriateness
    -- • Business Context (10%): Schema-based business importance assessment
    table_trust_score DECIMAL(5,2) NOT NULL,       -- Calculated trust score (0-100)
    table_trust_level VARCHAR(20) NOT NULL,        -- Categorical trust level (VERY_HIGH, HIGH, MEDIUM, LOW, VERY_LOW)
    
    -- ========================================================================
    -- TABLE-LEVEL SENSITIVITY CLASSIFICATION
    -- ========================================================================
    -- Sensitivity levels determined by:
    -- • PII column content analysis (high/medium/low risk columns)
    -- • Business schema context (HR, Sales, Customer data)
    -- • Data volume considerations (large datasets = higher impact)
    table_sensitivity_level VARCHAR(10) NOT NULL,  -- HIGH/MEDIUM/LOW classification
    table_sensitivity_score INT NOT NULL,          -- Numerical sensitivity score (0-100)
    
    -- ========================================================================
    -- DATA QUALITY COMPONENT METRICS
    -- ========================================================================
    -- Detailed breakdown of the quality factors that contribute to trust score
    completeness_percentage DECIMAL(5,2),          -- % of columns that are NOT NULL (data completeness)
    consistency_score DECIMAL(5,2),                -- % of columns with constraints/relationships (data integrity)
    freshness_score DECIMAL(5,2),                  -- Recency score based on last modification date
    
    -- ========================================================================
    -- USAGE PATTERN ANALYTICS
    -- ========================================================================
    -- These metrics indicate business value through actual system usage
    total_reads BIGINT,                            -- Total SELECT operations since SQL restart
    total_writes BIGINT,                           -- Total INSERT/UPDATE/DELETE operations
    days_since_last_access INT,                    -- Days since any read operation occurred
    usage_frequency VARCHAR(20),                   -- Categorical usage level (VERY_HIGH to NO_USAGE)
    
    -- ========================================================================
    -- PII (PERSONALLY IDENTIFIABLE INFORMATION) ANALYSIS
    -- ========================================================================
    -- Granular breakdown of privacy-sensitive column counts for compliance
    high_risk_pii_columns INT,                     -- Gov IDs, financial, authentication columns
    medium_risk_pii_columns INT,                   -- Names, contact info, demographic columns
    low_risk_pii_columns INT,                      -- General demographic, preference columns
    pii_risk_assessment VARCHAR(20),               -- Overall PII risk (CRITICAL to MINIMAL)
    
    -- ========================================================================
    -- BUSINESS CONTEXT AND CLASSIFICATION
    -- ========================================================================
    -- Maps technical tables to business functions for stakeholder communication
    business_criticality VARCHAR(30),              -- Business importance (MISSION_CRITICAL to MINIMAL)
    data_classification VARCHAR(60),               -- Data sensitivity classification for access control
    
    -- ========================================================================
    -- GOVERNANCE REQUIREMENTS AND CONTROLS
    -- ========================================================================
    -- Specific governance actions required based on sensitivity and risk analysis
    encryption_required BIT DEFAULT 0,             -- Whether table data should be encrypted
    audit_logging_required BIT DEFAULT 0,          -- Whether access should be logged
    access_control_level VARCHAR(20),              -- Required access restriction level
    governance_recommendation NVARCHAR(500),       -- Specific actionable recommendations
    
    -- ========================================================================
    -- AUDIT AND TRACKING
    -- ========================================================================
    analysis_timestamp DATETIME DEFAULT GETDATE() -- When this analysis was performed
);

PRINT 'Table structure created successfully for comprehensive KPI storage.';

-- ============================================================================
-- TABLE-LEVEL DATA COLLECTION AND CALCULATION
-- ============================================================================
-- This section uses Common Table Expressions (CTEs) to gather and calculate
-- the various components needed for table-level trust scores and sensitivity
-- Each CTE focuses on a specific aspect of table analysis for maintainability
-- ============================================================================

PRINT 'Starting data collection and calculation phase...';

-- CTE 1: FOUNDATIONAL TABLE METRICS
-- ============================================================================
-- Collects basic statistical and structural information about each table
-- This forms the foundation for trust score calculations and business context
WITH TableMetrics AS (
    SELECT 
        t.object_id,                                 -- SQL Server internal table identifier
        s.name AS schema_name,                       -- Human-readable schema name
        t.name AS table_name,                        -- Human-readable table name
        
        -- STRUCTURAL METRICS: Understanding table composition
        COUNT(c.column_id) AS column_count,          -- Total columns (indicates complexity)
        
        -- DATA VOLUME METRICS: Scale assessment for business importance
        COALESCE(p.rows, 0) AS estimated_rows,       -- Estimated row count from statistics
        
        -- STORAGE METRICS: Calculate actual space consumption
        -- This complex subquery calculates storage by summing 8KB allocation pages
        COALESCE(
            (SELECT SUM(a.total_pages) * 8 / 1024.0  -- Convert 8KB pages to megabytes
             FROM sys.partitions p2                   -- Table partitions (usually just one)
             JOIN sys.allocation_units a ON p2.partition_id = a.container_id 
             WHERE p2.object_id = t.object_id         -- Match this specific table
             AND p2.index_id IN (0,1))                -- Include heap (0) and clustered index (1) data only
        , 0) AS table_size_mb,
        
        -- TEMPORAL METRICS: Track table lifecycle for freshness scoring
        t.create_date,                               -- When table was originally created
        t.modify_date                                -- Last time table structure was modified
        
    FROM sys.schemas s                               -- All database schemas
    JOIN sys.tables t ON s.schema_id = t.schema_id  -- User tables within schemas
    JOIN sys.columns c ON t.object_id = c.object_id -- All columns in each table (for counting)
    LEFT JOIN sys.partitions p ON t.object_id = p.object_id AND p.index_id IN (0,1)  -- Row estimates
    WHERE t.type = 'U'                               -- Only user tables (excludes views, system tables)
    GROUP BY t.object_id, s.name, t.name, p.rows, t.create_date, t.modify_date
),

-- CTE 2: PII (PERSONALLY IDENTIFIABLE INFORMATION) DETECTION
-- ============================================================================
-- Analyzes column names using pattern matching to detect potential PII
-- This is critical for GDPR, CCPA, and other privacy regulation compliance
TablePIIAnalysis AS (
    SELECT 
        t.object_id,
        
        -- HIGH-RISK PII DETECTION (Most Sensitive Data)
        -- These patterns indicate data requiring the strongest protection and access controls
        COUNT(CASE 
            WHEN c.name LIKE '%ssn%' OR c.name LIKE '%social%security%' OR  -- Social Security Numbers
                 c.name LIKE '%password%' OR c.name LIKE '%secret%' OR       -- Authentication credentials  
                 c.name LIKE '%credit%card%' OR c.name LIKE '%account%number%' OR -- Financial account data
                 c.name LIKE '%license%' OR c.name LIKE '%passport%' OR      -- Government-issued IDs
                 c.name LIKE '%tax%id%'                                      -- Tax identification numbers
            THEN 1 
        END) AS high_pii_columns,
        
        -- MEDIUM-RISK PII DETECTION (Personal Identifiers)
        -- Personal information that requires significant protection but less critical than high-risk
        COUNT(CASE 
            WHEN c.name LIKE '%first%name%' OR c.name LIKE '%last%name%' OR -- Personal names
                 c.name LIKE '%full%name%' OR c.name LIKE '%display%name%' OR -- Combined name fields
                 c.name LIKE '%email%' OR c.name LIKE '%phone%' OR           -- Direct contact information
                 c.name LIKE '%address%' OR c.name LIKE '%street%' OR        -- Physical location data
                 c.name LIKE '%city%' OR c.name LIKE '%zip%' OR c.name LIKE '%postal%' OR -- Location components
                 c.name LIKE '%birth%' OR c.name LIKE '%dob%' OR            -- Birth date information
                 c.name LIKE '%salary%' OR c.name LIKE '%wage%' OR c.name LIKE '%income%' -- Compensation data
            THEN 1 
        END) AS medium_pii_columns,
        
        -- LOW-RISK PII DETECTION (General Demographics)
        -- Demographic information with privacy implications but lower individual risk
        COUNT(CASE 
            WHEN c.name LIKE '%gender%' OR c.name LIKE '%sex%' OR           -- Gender identification
                 c.name LIKE '%age%' OR c.name LIKE '%title%' OR            -- Age and professional titles
                 c.name LIKE '%nationality%' OR c.name LIKE '%ethnicity%' OR -- National/ethnic origin
                 c.name LIKE '%marital%' OR c.name LIKE '%suffix%' OR c.name LIKE '%prefix%' -- Personal status
            THEN 1 
        END) AS low_pii_columns,
        
        -- TOTAL COLUMN COUNT for calculating PII density percentages
        COUNT(c.column_id) AS total_columns
        
    FROM sys.tables t
    JOIN sys.columns c ON t.object_id = c.object_id
    WHERE t.type = 'U'                               -- Only user tables
    GROUP BY t.object_id
),

-- CTE 3: TABLE USAGE PATTERN ANALYSIS
-- ============================================================================
-- Analyzes actual table access patterns from SQL Server usage statistics
-- Usage frequency is a strong indicator of business value and data trust
TableUsageStats AS (
    SELECT 
        t.object_id,
        
        -- READ OPERATION METRICS: Measure query activity
        COALESCE(us.user_seeks + us.user_scans + us.user_lookups, 0) AS total_reads,
        -- user_seeks: Index seek operations (efficient targeted queries)
        -- user_scans: Table/index scan operations (broader data access)  
        -- user_lookups: Key lookup operations (retrieving additional columns)
        
        -- WRITE OPERATION METRICS: Measure data modification activity  
        COALESCE(us.user_updates, 0) AS total_writes,
        -- user_updates: INSERT, UPDATE, DELETE operations
        
        -- ACCESS TIMESTAMPS: Track most recent activity for freshness assessment
        us.last_user_seek,                          -- Last index seek timestamp
        us.last_user_scan,                          -- Last table scan timestamp  
        us.last_user_lookup,                        -- Last key lookup timestamp
        
        -- DAYS SINCE ACCESS CALCULATION: Convert timestamps to business-friendly metrics
        CASE
            WHEN us.last_user_seek IS NOT NULL OR us.last_user_scan IS NOT NULL OR us.last_user_lookup IS NOT NULL
            THEN DATEDIFF(day, COALESCE(us.last_user_seek, us.last_user_scan, us.last_user_lookup), GETDATE())
            ELSE NULL                                -- NULL indicates no recorded access since SQL restart
        END AS days_since_access
        
    FROM sys.tables t
    -- LEFT JOIN allows for tables with no usage statistics (new or unused tables)
    LEFT JOIN sys.dm_db_index_usage_stats us ON t.object_id = us.object_id AND us.index_id IN (0,1)
    WHERE t.type = 'U'                               -- Only user tables
),

-- CTE 4: TABLE DATA QUALITY ASSESSMENT
-- ============================================================================
-- Evaluates data quality indicators that contribute to trustworthiness
-- Quality assessment combines structural integrity, constraints, and maintenance
TableQualityMetrics AS (
    SELECT 
        t.object_id,
        
        -- COMPLETENESS METRIC: Measures data density and required field coverage
        -- Higher percentage of NOT NULL columns indicates better data completeness
        CAST(COUNT(CASE WHEN c.is_nullable = 0 THEN 1 END) * 100.0 / COUNT(*) AS DECIMAL(5,2)) AS completeness_pct,
        
        -- CONSISTENCY METRIC: Measures data integrity controls and relationships
        -- Combines identity columns, foreign keys, and check constraints as quality indicators
        CAST(
            (COUNT(CASE WHEN c.is_identity = 1 THEN 1 END) +        -- Auto-increment columns (system-managed)
             COUNT(CASE WHEN fkc.parent_column_id IS NOT NULL THEN 1 END) + -- Foreign key relationships
             COUNT(CASE WHEN cc.object_id IS NOT NULL THEN 1 END)   -- Check constraint validations
            ) * 100.0 / COUNT(*) 
        AS DECIMAL(5,2)) AS consistency_score,
        
        -- FRESHNESS METRIC: Measures how recently the table structure was maintained
        -- Recent modifications indicate active maintenance and data currency
        CASE 
            WHEN DATEDIFF(day, t.modify_date, GETDATE()) <= 7 THEN 100.0    -- Modified within last week
            WHEN DATEDIFF(day, t.modify_date, GETDATE()) <= 30 THEN 80.0    -- Modified within last month
            WHEN DATEDIFF(day, t.modify_date, GETDATE()) <= 90 THEN 60.0    -- Modified within last quarter
            WHEN DATEDIFF(day, t.modify_date, GETDATE()) <= 365 THEN 40.0   -- Modified within last year
            WHEN DATEDIFF(day, t.modify_date, GETDATE()) <= 730 THEN 20.0   -- Modified within last 2 years
            ELSE 10.0                                                        -- Very stale (2+ years old)
        END AS freshness_score
        
    FROM sys.tables t
    JOIN sys.columns c ON t.object_id = c.object_id
    -- LEFT JOINs handle tables without foreign keys or check constraints gracefully
    LEFT JOIN sys.foreign_key_columns fkc ON c.object_id = fkc.parent_object_id AND c.column_id = fkc.parent_column_id
    LEFT JOIN sys.check_constraints cc ON t.object_id = cc.parent_object_id
    GROUP BY t.object_id, t.modify_date
)

-- ============================================================================
-- MAIN INSERT STATEMENT: TABLE-LEVEL KPI CALCULATION
-- ============================================================================
-- This section combines all CTE data to calculate comprehensive table-level KPIs
-- The calculations use weighted algorithms to produce actionable business metrics

PRINT 'Calculating comprehensive KPIs and populating results table...';

INSERT INTO #TableLevelAnalysis (
    schema_name, table_name, full_table_name, column_count, estimated_rows, table_size_mb,
    table_trust_score, table_trust_level, table_sensitivity_level, table_sensitivity_score,
    completeness_percentage, consistency_score, freshness_score,
    total_reads, total_writes, days_since_last_access, usage_frequency,
    high_risk_pii_columns, medium_risk_pii_columns, low_risk_pii_columns, pii_risk_assessment,
    business_criticality, data_classification, encryption_required, audit_logging_required,
    access_control_level, governance_recommendation
)
SELECT 
    -- ========================================================================
    -- BASIC TABLE IDENTIFICATION
    -- ========================================================================
    tm.schema_name,                                  -- Schema identifier for business context
    tm.table_name,                                   -- Table name for asset identification
    tm.schema_name + '.' + tm.table_name AS full_table_name, -- Fully qualified name for referencing
    tm.column_count,                                 -- Structural complexity indicator
    tm.estimated_rows,                               -- Data volume indicator
    tm.table_size_mb,                                -- Storage consumption indicator
    
    -- ========================================================================
    -- TABLE-LEVEL TRUST SCORE CALCULATION (0-100 Scale)
    -- ========================================================================
    -- Trust Score Algorithm combines four weighted components optimized for table-level analysis:
    -- 1. Data Quality Component (40%): Completeness + Consistency + Freshness
    -- 2. Usage Patterns Component (30%): Access frequency and recency indicate business value
    -- 3. Technical Health Component (20%): Storage efficiency and structural appropriateness  
    -- 4. Business Context Component (10%): Schema-based importance weighting
    CAST(
        -- COMPONENT 1: DATA QUALITY (40% of total score)
        -- Combines three quality dimensions with internal weighting
        (
            (tqm.completeness_pct * 0.4) +          -- Completeness: 40% of quality (16% of total)
            (tqm.consistency_score * 0.35) +        -- Consistency: 35% of quality (14% of total)  
            (tqm.freshness_score * 0.25)            -- Freshness: 25% of quality (10% of total)
        ) * 0.4 +
        
        -- COMPONENT 2: USAGE PATTERNS (30% of total score)
        -- Recent and frequent access indicates active business reliance and data value
        (CASE
            -- Excellent usage: High volume + very recent access (within 1 week)
            WHEN tus.total_reads > 10000 AND ISNULL(tus.days_since_access, 999) <= 7 THEN 30.0
            -- Good usage: Regular access + recent activity (within 1 month)
            WHEN tus.total_reads > 1000 AND ISNULL(tus.days_since_access, 999) <= 30 THEN 25.0
            -- Fair usage: Moderate access + quarterly activity (within 3 months)
            WHEN tus.total_reads > 100 AND ISNULL(tus.days_since_access, 999) <= 90 THEN 20.0
            -- Low usage: Some access + semi-annual activity (within 6 months)
            WHEN tus.total_reads > 10 AND ISNULL(tus.days_since_access, 999) <= 180 THEN 15.0
            -- Minimal usage: Very low access but some activity recorded
            WHEN tus.total_reads > 0 THEN 10.0
            -- No usage: No recorded access since SQL Server restart
            ELSE 5.0
        END) +
        
        -- COMPONENT 3: TECHNICAL HEALTH (20% of total score)  
        -- Assesses storage efficiency and size appropriateness for the data volume
        (CASE
            -- Excellent health: Appropriate size + efficient storage
            WHEN tm.estimated_rows BETWEEN 100 AND 10000000 AND tm.table_size_mb < 5000 THEN 20.0
            -- Good health: Reasonable size + acceptable storage
            WHEN tm.estimated_rows BETWEEN 10 AND 100000000 AND tm.table_size_mb < 10000 THEN 16.0
            -- Fair health: Has data but may have storage issues
            WHEN tm.estimated_rows > 0 THEN 12.0
            -- Poor health: Empty table or storage problems
            ELSE 8.0
        END) +
        
        -- COMPONENT 4: BUSINESS CONTEXT (10% of total score)
        -- Schema-based importance weighting reflects known business priorities
        (CASE tm.schema_name
            WHEN 'Sales' THEN 10.0          -- Revenue operations: highest business priority
            WHEN 'Person' THEN 9.0          -- Customer data: critical for customer service
            WHEN 'HumanResources' THEN 9.0  -- HR data: important for employee management
            WHEN 'Production' THEN 8.0      -- Manufacturing: operational importance
            WHEN 'Purchasing' THEN 7.0      -- Procurement: supporting business function
            ELSE 6.0                         -- Other schemas: standard business importance
        END)
    AS DECIMAL(5,2)) AS table_trust_score,
    
    -- ========================================================================
    -- TABLE TRUST LEVEL CLASSIFICATION
    -- ========================================================================
    -- Converts numerical trust scores into categorical levels for executive reporting
    -- These categories help business users quickly understand table reliability without numbers
    CASE
        WHEN (
            -- Recalculate trust score for threshold comparison (ensures consistency)
            (
                (tqm.completeness_pct * 0.4) +
                (tqm.consistency_score * 0.35) +
                (tqm.freshness_score * 0.25)
            ) * 0.4 +
            (CASE
                WHEN tus.total_reads > 10000 AND ISNULL(tus.days_since_access, 999) <= 7 THEN 30.0
                WHEN tus.total_reads > 1000 AND ISNULL(tus.days_since_access, 999) <= 30 THEN 25.0
                WHEN tus.total_reads > 100 AND ISNULL(tus.days_since_access, 999) <= 90 THEN 20.0
                WHEN tus.total_reads > 10 AND ISNULL(tus.days_since_access, 999) <= 180 THEN 15.0
                WHEN tus.total_reads > 0 THEN 10.0
                ELSE 5.0
            END) +
            (CASE
                WHEN tm.estimated_rows BETWEEN 100 AND 10000000 AND tm.table_size_mb < 5000 THEN 20.0
                WHEN tm.estimated_rows BETWEEN 10 AND 100000000 AND tm.table_size_mb < 10000 THEN 16.0
                WHEN tm.estimated_rows > 0 THEN 12.0
                ELSE 8.0
            END) +
            (CASE tm.schema_name
                WHEN 'Sales' THEN 10.0
                WHEN 'Person' THEN 9.0
                WHEN 'HumanResources' THEN 9.0
                WHEN 'Production' THEN 8.0
                WHEN 'Purchasing' THEN 7.0
                ELSE 6.0
            END)
        ) >= 80 THEN 'VERY_HIGH'        -- 80-100: Exceptional reliability, mission-critical quality
        WHEN (
            (
                (tqm.completeness_pct * 0.4) +
                (tqm.consistency_score * 0.35) +
                (tqm.freshness_score * 0.25)
            ) * 0.4 +
            (CASE
                WHEN tus.total_reads > 10000 AND ISNULL(tus.days_since_access, 999) <= 7 THEN 30.0
                WHEN tus.total_reads > 1000 AND ISNULL(tus.days_since_access, 999) <= 30 THEN 25.0
                WHEN tus.total_reads > 100 AND ISNULL(tus.days_since_access, 999) <= 90 THEN 20.0
                WHEN tus.total_reads > 10 AND ISNULL(tus.days_since_access, 999) <= 180 THEN 15.0
                WHEN tus.total_reads > 0 THEN 10.0
                ELSE 5.0
            END) +
            (CASE
                WHEN tm.estimated_rows BETWEEN 100 AND 10000000 AND tm.table_size_mb < 5000 THEN 20.0
                WHEN tm.estimated_rows BETWEEN 10 AND 100000000 AND tm.table_size_mb < 10000 THEN 16.0
                WHEN tm.estimated_rows > 0 THEN 12.0
                ELSE 8.0
            END) +
            (CASE tm.schema_name
                WHEN 'Sales' THEN 10.0
                WHEN 'Person' THEN 9.0
                WHEN 'HumanResources' THEN 9.0
                WHEN 'Production' THEN 8.0
                WHEN 'Purchasing' THEN 7.0
                ELSE 6.0
            END)
        ) >= 65 THEN 'HIGH'             -- 65-79: Good reliability, suitable for business decisions
        WHEN (
            (
                (tqm.completeness_pct * 0.4) +
                (tqm.consistency_score * 0.35) +
                (tqm.freshness_score * 0.25)
            ) * 0.4 +
            (CASE
                WHEN tus.total_reads > 10000 AND ISNULL(tus.days_since_access, 999) <= 7 THEN 30.0
                WHEN tus.total_reads > 1000 AND ISNULL(tus.days_since_access, 999) <= 30 THEN 25.0
                WHEN tus.total_reads > 100 AND ISNULL(tus.days_since_access, 999) <= 90 THEN 20.0
                WHEN tus.total_reads > 10 AND ISNULL(tus.days_since_access, 999) <= 180 THEN 15.0
                WHEN tus.total_reads > 0 THEN 10.0
                ELSE 5.0
            END) +
            (CASE
                WHEN tm.estimated_rows BETWEEN 100 AND 10000000 AND tm.table_size_mb < 5000 THEN 20.0
                WHEN tm.estimated_rows BETWEEN 10 AND 100000000 AND tm.table_size_mb < 10000 THEN 16.0
                WHEN tm.estimated_rows > 0 THEN 12.0
                ELSE 8.0
            END) +
            (CASE tm.schema_name
                WHEN 'Sales' THEN 10.0
                WHEN 'Person' THEN 9.0
                WHEN 'HumanResources' THEN 9.0
                WHEN 'Production' THEN 8.0
                WHEN 'Purchasing' THEN 7.0
                ELSE 6.0
            END)
        ) >= 50 THEN 'MEDIUM'           -- 50-64: Moderate reliability, acceptable for most uses
        WHEN (
            (
                (tqm.completeness_pct * 0.4) +
                (tqm.consistency_score * 0.35) +
                (tqm.freshness_score * 0.25)
            ) * 0.4 +
            (CASE
                WHEN tus.total_reads > 10000 AND ISNULL(tus.days_since_access, 999) <= 7 THEN 30.0
                WHEN tus.total_reads > 1000 AND ISNULL(tus.days_since_access, 999) <= 30 THEN 25.0
                WHEN tus.total_reads > 100 AND ISNULL(tus.days_since_access, 999) <= 90 THEN 20.0
                WHEN tus.total_reads > 10 AND ISNULL(tus.days_since_access, 999) <= 180 THEN 15.0
                WHEN tus.total_reads > 0 THEN 10.0
                ELSE 5.0
            END) +
            (CASE
                WHEN tm.estimated_rows BETWEEN 100 AND 10000000 AND tm.table_size_mb < 5000 THEN 20.0
                WHEN tm.estimated_rows BETWEEN 10 AND 100000000 AND tm.table_size_mb < 10000 THEN 16.0
                WHEN tm.estimated_rows > 0 THEN 12.0
                ELSE 8.0
            END) +
            (CASE tm.schema_name
                WHEN 'Sales' THEN 10.0
                WHEN 'Person' THEN 9.0
                WHEN 'HumanResources' THEN 9.0
                WHEN 'Production' THEN 8.0
                WHEN 'Purchasing' THEN 7.0
                ELSE 6.0
            END)
        ) >= 35 THEN 'LOW'              -- 35-49: Low reliability, improvement needed before critical use
        ELSE 'VERY_LOW'                 -- 0-34: Very poor reliability, significant issues present
    END AS table_trust_level,
    
    -- ========================================================================
    -- TABLE-LEVEL SENSITIVITY CLASSIFICATION (HIGH/MEDIUM/LOW)
    -- ========================================================================
    -- Determines data sensitivity based on PII content analysis, business function, and impact scale
    -- This classification drives security controls, access policies, and compliance requirements
    CASE
        -- HIGH SENSITIVITY: Contains critical PII or sensitive schemas with significant exposure
        -- Triggers strongest security controls and regulatory compliance monitoring
        WHEN tpia.high_pii_columns > 0 OR                                    -- Any government IDs, financial, auth data
             (tm.schema_name IN ('Person', 'HumanResources') AND tpia.medium_pii_columns > 1) OR -- HR/Customer with multiple PII columns
             (tpia.medium_pii_columns > 3 AND tm.estimated_rows > 1000)      -- High PII density with significant volume
        THEN 'HIGH'
        
        -- MEDIUM SENSITIVITY: Contains some PII or handles business-critical operations with scale
        -- Requires standard security controls and regular governance oversight
        WHEN tpia.medium_pii_columns > 0 OR                                  -- Any personal identifiers present
             (tm.schema_name = 'Sales' AND tm.estimated_rows > 5000) OR      -- Large sales datasets (business critical)
             (tm.schema_name IN ('Production', 'Purchasing') AND tm.estimated_rows > 10000) OR -- Large operational datasets
             tpia.low_pii_columns > 2                                        -- Multiple demographic columns
        THEN 'MEDIUM'
        
        -- LOW SENSITIVITY: Minimal PII content, reference data, or small operational tables
        -- Standard governance controls sufficient
        ELSE 'LOW'
    END AS table_sensitivity_level,
    
    -- ========================================================================
    -- TABLE SENSITIVITY SCORE (0-100)
    -- ========================================================================
    -- Numerical sensitivity score combining PII risk levels with business context and data volume
    -- This score enables algorithmic processing and comparative analysis across tables
    CASE
        -- CRITICAL sensitivity scoring (85-100): Government IDs, financial data, authentication
        -- These tables pose maximum privacy risk and require strongest controls
        WHEN tpia.high_pii_columns > 0 THEN 85 + LEAST(tpia.high_pii_columns * 5, 15)  -- Base 85 + up to 15 bonus
        
        -- HIGH sensitivity scoring (70-84): Personal data in sensitive business contexts  
        -- Tables with significant personal data exposure in HR or customer domains
        WHEN tm.schema_name = 'Person' AND tpia.medium_pii_columns > 1 THEN 75 + (tpia.medium_pii_columns * 3)
        WHEN tm.schema_name = 'HumanResources' AND tpia.medium_pii_columns > 1 THEN 73 + (tpia.medium_pii_columns * 3)
        WHEN tpia.medium_pii_columns > 3 AND tm.estimated_rows > 1000 THEN 70 + (tpia.medium_pii_columns * 2)
        
        -- MEDIUM sensitivity scoring (40-69): General personal data or business-critical information
        -- Standard business data with some privacy implications or significant business value
        WHEN tpia.medium_pii_columns > 0 THEN 45 + (tpia.medium_pii_columns * 4) + (tpia.low_pii_columns)
        WHEN tm.schema_name = 'Sales' AND tm.estimated_rows > 5000 THEN 55
        WHEN tm.schema_name = 'Sales' THEN 50
        WHEN tm.schema_name IN ('Production', 'Purchasing') AND tm.estimated_rows > 10000 THEN 52
        WHEN tm.schema_name IN ('Production', 'Purchasing') THEN 45
        WHEN tpia.low_pii_columns > 2 THEN 40 + (tpia.low_pii_columns * 2)
        
        -- LOW sensitivity scoring (5-39): Reference data, operational tables, minimal risk
        -- Standard governance controls adequate for these tables
        WHEN tpia.low_pii_columns > 0 THEN 25 + (tpia.low_pii_columns * 3)
        WHEN tm.estimated_rows > 10000 THEN 30  -- Large reference tables
        WHEN tm.estimated_rows > 1000 THEN 20   -- Medium reference tables
        WHEN tm.estimated_rows > 0 THEN 15      -- Small operational tables
        ELSE 5                                  -- Empty or minimal tables
    END AS table_sensitivity_score,
    
    -- ========================================================================
    -- DETAILED QUALITY METRICS (From CTE Calculations)
    -- ========================================================================
    -- Provide transparency into the components that make up the trust score
    tqm.completeness_pct AS completeness_percentage,   -- % of NOT NULL columns
    tqm.consistency_score,                             -- % of columns with integrity controls
    tqm.freshness_score,                               -- Recency-based score (0-100)
    
    -- ========================================================================
    -- USAGE PATTERN METRICS (From CTE Calculations)
    -- ========================================================================
    -- Actual system usage statistics that indicate business value and active reliance
    tus.total_reads,                                   -- Total SELECT operations since restart
    tus.total_writes,                                  -- Total DML operations since restart
    tus.days_since_access AS days_since_last_access,   -- Business-friendly access recency
    
    -- USAGE FREQUENCY CLASSIFICATION
    -- Converts numerical usage into business-friendly categories for reporting
    CASE
        WHEN tus.total_reads > 10000 AND ISNULL(tus.days_since_access, 999) <= 7 THEN 'VERY_HIGH'    -- Mission-critical access
        WHEN tus.total_reads > 1000 AND ISNULL(tus.days_since_access, 999) <= 30 THEN 'HIGH'         -- Regular business use
        WHEN tus.total_reads > 100 AND ISNULL(tus.days_since_access, 999) <= 90 THEN 'MEDIUM'        -- Periodic access
        WHEN tus.total_reads > 10 AND ISNULL(tus.days_since_access, 999) <= 180 THEN 'LOW'           -- Occasional use
        WHEN tus.total_reads > 0 THEN 'VERY_LOW'                                                      -- Minimal access
        ELSE 'NO_USAGE'                                                                               -- Dormant table
    END AS usage_frequency,
    
    -- ========================================================================
    -- PII ANALYSIS BREAKDOWN (From CTE Calculations)
    -- ========================================================================
    -- Detailed breakdown of privacy-sensitive column counts for compliance reporting
    tpia.high_pii_columns AS high_risk_pii_columns,    -- Government IDs, financial, auth (highest risk)
    tpia.medium_pii_columns AS medium_risk_pii_columns, -- Names, contact, demographic (significant risk)
    tpia.low_pii_columns AS low_risk_pii_columns,       -- General demographic (lower risk)
    
    -- PII RISK ASSESSMENT CLASSIFICATION
    -- Overall privacy risk assessment combining all PII column types
    CASE
        WHEN tpia.high_pii_columns > 0 THEN 'CRITICAL'      -- Any high-risk PII = critical concern
        WHEN tpia.medium_pii_columns > 2 THEN 'HIGH'        -- Multiple personal identifiers
        WHEN tpia.medium_pii_columns > 0 THEN 'MEDIUM'      -- Some personal identifiers
        WHEN tpia.low_pii_columns > 0 THEN 'LOW'            -- Only demographic data
        ELSE 'MINIMAL'                                      -- No detected PII
    END AS pii_risk_assessment,
    
    -- ========================================================================
    -- BUSINESS CONTEXT AND IMPORTANCE ASSESSMENT
    -- ========================================================================
    -- Maps technical characteristics to business language for stakeholder communication
    
    -- BUSINESS CRITICALITY based on schema context and data volume
    CASE
        WHEN tm.schema_name = 'Sales' AND tm.estimated_rows > 10000 THEN 'MISSION_CRITICAL'        -- Large revenue data
        WHEN tm.schema_name IN ('Person', 'Sales') AND tm.estimated_rows > 1000 THEN 'BUSINESS_CRITICAL' -- Core business data
        WHEN tm.schema_name IN ('HumanResources', 'Production') AND tm.estimated_rows > 1000 THEN 'OPERATIONAL' -- Key operations
        WHEN tm.estimated_rows > 1000 THEN 'SUPPORTING'                                            -- Supporting functions
        WHEN tm.estimated_rows > 0 THEN 'REFERENCE'                                               -- Reference/lookup data
        ELSE 'MINIMAL'                                                                             -- Empty or test tables
    END AS business_criticality,
    
    -- DATA CLASSIFICATION for access control and handling procedures
    CASE
        WHEN tpia.high_pii_columns > 0 OR tm.schema_name = 'Person' 
        THEN 'HIGHLY_CONFIDENTIAL - Personal/Sensitive Data'     -- Strongest protection required
        WHEN tpia.medium_pii_columns > 0 OR tm.schema_name IN ('HumanResources', 'Sales')
        THEN 'CONFIDENTIAL - Business Sensitive'                 -- Strong protection required
        WHEN tm.schema_name IN ('Production', 'Purchasing')
        THEN 'INTERNAL - Operational Data'                       -- Internal use only
        ELSE 'PUBLIC - Reference Data'                           -- Standard access controls
    END AS data_classification,
    
    -- ========================================================================
    -- GOVERNANCE REQUIREMENTS AND CONTROLS
    -- ========================================================================
    -- Specific technical controls required based on sensitivity and compliance needs
    
    -- ENCRYPTION REQUIREMENT FLAG
    -- Determines if table data should be encrypted at rest or in transit
    CASE 
        WHEN tpia.high_pii_columns > 0 OR tm.schema_name IN ('Person', 'HumanResources') THEN 1 
        ELSE 0 
    END AS encryption_required,
    
    -- AUDIT LOGGING REQUIREMENT FLAG  
    -- Determines if access to this table should be logged for compliance
    CASE 
        WHEN tpia.high_pii_columns > 0 OR                                    -- Any critical PII
             tpia.medium_pii_columns > 1 OR                                  -- Multiple personal identifiers
             tm.schema_name IN ('Person', 'HumanResources')                 -- Sensitive business domains
        THEN 1 
        ELSE 0 
    END AS audit_logging_required,
    
    -- ACCESS CONTROL LEVEL
    -- Determines the appropriate level of access restrictions
    CASE
        WHEN tpia.high_pii_columns > 0 THEN 'RESTRICTED'                    -- Highest security, minimal access
        WHEN tpia.medium_pii_columns > 0 OR tm.schema_name IN ('Person', 'HumanResources', 'Sales') 
        THEN 'CONTROLLED'                                                   -- Role-based access control
        WHEN tm.estimated_rows > 10000 OR tm.schema_name IN ('Production', 'Purchasing') 
        THEN 'MANAGED'                                                      -- Standard business controls
        ELSE 'STANDARD'                                                     -- Default access policies
    END AS access_control_level,
    
    -- ========================================================================
    -- GOVERNANCE RECOMMENDATIONS
    -- ========================================================================
    -- Simplified, actionable recommendation based on overall risk assessment
    CASE
        WHEN tpia.high_pii_columns > 0 THEN 'URGENT: Implement field-level encryption and restrict access immediately.'
        WHEN tm.schema_name IN ('Person', 'HumanResources') AND tpia.medium_pii_columns > 1 
        THEN 'PRIORITY: Apply comprehensive data protection controls and access monitoring.'
        WHEN tpia.medium_pii_columns > 0 THEN 'RECOMMENDED: Consider data masking and appropriate access controls.'
        WHEN tm.estimated_rows = 0 THEN 'REVIEW: Empty table - consider removal if no longer needed.'
        WHEN ISNULL(tus.days_since_access, 999) > 90 THEN 'REVIEW: Low usage - consider archival or lifecycle management.'
        ELSE 'STANDARD: Apply standard governance practices and periodic monitoring.'
    END AS governance_recommendation

-- ========================================================================
-- DATA SOURCE INTEGRATION
-- ========================================================================
-- Combine all CTE results to create comprehensive table-level analysis
FROM TableMetrics tm
JOIN TablePIIAnalysis tpia ON tm.object_id = tpia.object_id
JOIN TableUsageStats tus ON tm.object_id = tus.object_id
JOIN TableQualityMetrics tqm ON tm.object_id = tqm.object_id;

PRINT 'Table-level KPI calculations completed successfully!';
PRINT 'Analyzed ' + CAST((SELECT COUNT(*) FROM #TableLevelAnalysis) AS VARCHAR(10)) + ' tables across all schemas.';
PRINT '';

-- ============================================================================
-- TABLE-LEVEL RESULTS DISPLAY AND REPORTING
-- ============================================================================
-- Comprehensive dashboard showing table-level KPIs optimized for governance teams
-- This section formats the calculated metrics for executive and operational reporting

PRINT 'Generating comprehensive TABLE-LEVEL KPI DASHBOARD...';
PRINT '';

SELECT
    'TABLE-LEVEL KPI DASHBOARD' AS report_section,
    schema_name,
    table_name,
    
    -- ========================================================================
    -- FOUNDATIONAL TABLE CHARACTERISTICS
    -- ========================================================================
    -- Basic table information that provides context for all other metrics
    column_count,                                           -- Structural complexity indicator
    FORMAT(estimated_rows, 'N0') AS estimated_rows,        -- Data volume (formatted with commas)
    FORMAT(table_size_mb, 'N2') + ' MB' AS table_size,    -- Storage consumption (formatted to 2 decimals)
    
    -- ========================================================================
    -- TRUST SCORE KPIs (Primary Reliability Indicators)
    -- ========================================================================
    -- These metrics help data consumers understand data reliability at a glance
    table_trust_score,                                     -- Numerical score (0-100) for algorithms
    table_trust_level,                                     -- Categorical level for business users
    
    -- ========================================================================
    -- SENSITIVITY KPIs (Primary Risk Indicators)
    -- ========================================================================
    -- These metrics drive security policies and access control decisions
    table_sensitivity_level,                               -- HIGH/MEDIUM/LOW classification
    table_sensitivity_score,                               -- Numerical score (0-100) for algorithms
    pii_risk_assessment,                                   -- CRITICAL/HIGH/MEDIUM/LOW/MINIMAL privacy risk
    
    -- ========================================================================
    -- DATA QUALITY KPIs (Trust Score Components)
    -- ========================================================================
    -- Detailed breakdown showing why the trust score is high or low
    FORMAT(completeness_percentage, 'N1') + '%' AS completeness,  -- % NOT NULL columns (data density)
    FORMAT(consistency_score, 'N1') + '%' AS consistency,         -- % columns with integrity controls
    FORMAT(freshness_score, 'N1') + '%' AS freshness,            -- Recency score (recent updates = higher score)
    
    -- ========================================================================
    -- USAGE PATTERN KPIs (Business Value Indicators)
    -- ========================================================================
    -- These metrics show actual business reliance and system integration
    FORMAT(total_reads, 'N0') AS total_reads,              -- Total queries since restart (formatted)
    FORMAT(total_writes, 'N0') AS total_writes,            -- Total modifications since restart (formatted)
    ISNULL(CAST(days_since_last_access AS VARCHAR), 'Never') + ' days' AS last_access, -- Human-readable access recency
    usage_frequency,                                        -- Categorical usage level for quick assessment
    
    -- ========================================================================
    -- PII ANALYSIS BREAKDOWN (Compliance Detail)
    -- ========================================================================
    -- Granular privacy risk breakdown for compliance teams and legal review
    high_risk_pii_columns,                                 -- Count of government ID, financial, auth columns
    medium_risk_pii_columns,                               -- Count of name, contact, demographic columns
    low_risk_pii_columns,                                  -- Count of general demographic columns
    
    -- ========================================================================
    -- BUSINESS CONTEXT AND CLASSIFICATION
    -- ========================================================================
    -- Maps technical data to business language for stakeholder communication
    business_criticality,                                  -- Business importance level
    -- Format boolean flags as human-readable YES/NO for executive reports
    CASE WHEN encryption_required = 1 THEN 'YES' ELSE 'NO' END AS encryption_required,
    CASE WHEN audit_logging_required = 1 THEN 'YES' ELSE 'NO' END AS audit_logging_required,
    access_control_level,                                   -- Required access restriction level
    
    -- ========================================================================
    -- ACTIONABLE GOVERNANCE RECOMMENDATIONS
    -- ========================================================================
    governance_recommendation                               -- Specific actions for data stewards

FROM #TableLevelAnalysis

-- ========================================================================
-- RESULT ORDERING FOR GOVERNANCE PRIORITIZATION
-- ========================================================================
-- Sort results to show highest-risk and most important tables first
-- This ordering helps governance teams focus on the most critical items
ORDER BY 
    -- Primary sort: Sensitivity level (highest risk gets immediate attention)
    CASE table_sensitivity_level 
        WHEN 'HIGH' THEN 1      -- High sensitivity tables reviewed first
        WHEN 'MEDIUM' THEN 2    -- Medium sensitivity tables reviewed second
        ELSE 3                  -- Low sensitivity tables reviewed last
    END,
    -- Secondary sort: Trust score (lowest trust scores need most attention within same sensitivity)
    table_trust_score ASC,     -- Lower scores = less reliable = higher priority for improvement
    -- Tertiary sort: Table size (larger tables have more business impact within same risk/trust)
    table_size_mb DESC;        -- Larger tables affect more data and more users

-- ============================================================================
-- SUMMARY STATISTICS AND ANALYSIS INSIGHTS
-- ============================================================================

PRINT '';
PRINT '============================================================================';
PRINT 'TABLE-LEVEL ANALYSIS SUMMARY STATISTICS';
PRINT '============================================================================';

SELECT
    'ANALYSIS SUMMARY' AS summary_section,
    COUNT(*) AS total_tables_analyzed,
    
    -- Trust Score Distribution
    COUNT(CASE WHEN table_trust_level = 'VERY_HIGH' THEN 1 END) AS very_high_trust_tables,
    COUNT(CASE WHEN table_trust_level = 'HIGH' THEN 1 END) AS high_trust_tables,
    COUNT(CASE WHEN table_trust_level = 'MEDIUM' THEN 1 END) AS medium_trust_tables,
    COUNT(CASE WHEN table_trust_level IN ('LOW', 'VERY_LOW') THEN 1 END) AS low_trust_tables,
    
    -- Sensitivity Distribution
    COUNT(CASE WHEN table_sensitivity_level = 'HIGH' THEN 1 END) AS high_sensitivity_tables,
    COUNT(CASE WHEN table_sensitivity_level = 'MEDIUM' THEN 1 END) AS medium_sensitivity_tables,
    COUNT(CASE WHEN table_sensitivity_level = 'LOW' THEN 1 END) AS low_sensitivity_tables,
    
    -- Risk Indicators
    COUNT(CASE WHEN pii_risk_assessment = 'CRITICAL' THEN 1 END) AS critical_pii_risk_tables,
    COUNT(CASE WHEN encryption_required = 1 THEN 1 END) AS tables_requiring_encryption,
    COUNT(CASE WHEN audit_logging_required = 1 THEN 1 END) AS tables_requiring_audit_logging,
    
    -- Average Scores
    FORMAT(AVG(table_trust_score), 'N1') AS avg_trust_score,
    FORMAT(AVG(table_sensitivity_score), 'N1') AS avg_sensitivity_score

FROM #TableLevelAnalysis;

-- ============================================================================
-- HIGH PRIORITY ITEMS REQUIRING IMMEDIATE ATTENTION
-- ============================================================================

SELECT
    'HIGH PRIORITY ITEMS FOR IMMEDIATE REVIEW' AS priority_section,
    schema_name + '.' + table_name AS table_name,
    table_sensitivity_level,
    table_trust_level,
    pii_risk_assessment,
    business_criticality,
    CASE WHEN encryption_required = 1 THEN 'YES' ELSE 'NO' END AS encryption_needed,
    governance_recommendation
FROM #TableLevelAnalysis
WHERE table_sensitivity_level = 'HIGH' 
   OR pii_risk_assessment = 'CRITICAL'
   OR (table_sensitivity_level = 'MEDIUM' AND table_trust_level IN ('LOW', 'VERY_LOW'))
ORDER BY 
    CASE table_sensitivity_level WHEN 'HIGH' THEN 1 ELSE 2 END,
    CASE pii_risk_assessment WHEN 'CRITICAL' THEN 1 WHEN 'HIGH' THEN 2 ELSE 3 END,
    table_trust_score ASC;

-- ============================================================================
-- ANALYSIS COMPLETION AND INTEGRATION GUIDANCE
-- ============================================================================

PRINT '';
PRINT '============================================================================';
PRINT 'TABLE-LEVEL ANALYSIS COMPLETED SUCCESSFULLY!';
PRINT '============================================================================';
PRINT 'Analysis completion timestamp: ' + CAST(GETDATE() AS VARCHAR(50));
PRINT '';
PRINT 'RESULTS SUMMARY:';
PRINT '• Trust scores calculated using 4-component weighted algorithm';
PRINT '• Sensitivity levels: HIGH/MEDIUM/LOW with detailed PII risk assessment';
PRINT '• Quality metrics: Completeness, consistency, freshness percentages';
PRINT '• Usage patterns: Access frequency and business value indicators';
PRINT '• Governance requirements: Encryption, audit logging, access control levels';
PRINT '';
PRINT 'NEXT STEPS FOR DATA GOVERNANCE:';
PRINT '1. Review HIGH PRIORITY ITEMS table above for immediate actions';
PRINT '2. Implement encryption for tables marked as encryption_required = YES';
PRINT '3. Establish audit logging for tables marked as audit_logging_required = YES';
PRINT '4. Address low trust scores through data quality improvement initiatives';
PRINT '5. Apply appropriate access controls based on access_control_level classifications';
PRINT '';
PRINT 'PENTAHO DATA CATALOG INTEGRATION:';
PRINT '• Use table_trust_score directly as Trust Score (0-100 scale)';
PRINT '• Map table_sensitivity_level: HIGH=3, MEDIUM=2, LOW=1 for Sensitivity metric';
PRINT '• Apply pii_risk_assessment as PII Risk tagging';
PRINT '• Use business_criticality for Business Impact classification';
PRINT '• Implement governance_recommendation as automated policy suggestions';
PRINT '';
PRINT 'Data available in #TableLevelAnalysis temp table for further analysis.';
PRINT '============================================================================';

Column Analysis

Now we're getting into the weeds ..!

Column-level metrics provide executive oversight and departmental governance KPIs.

Calculate Sensitivity Level and Trust Scores.

Results

Take a look at:

-- ============================================================================
-- COMPLETE DATABASE COLUMN ANALYSIS - ALL COLUMNS
-- ============================================================================
-- PURPOSE: This script performs comprehensive analysis of EVERY column in EVERY table
--          across ALL schemas in the database. Unlike the PII-focused script,
--          this analyzes all columns regardless of whether they contain PII.
--
-- METHODOLOGY:
--   - Uses multiple CTEs to break down complex analysis into logical components
--   - Applies universal scoring algorithms to all columns (no filtering)
--   - Combines technical, business, and governance perspectives
--   - Provides actionable recommendations for every column
--
-- SCOPE: 
--   - All user tables in all schemas (excludes system tables and views)
--   - Every column regardless of data type or naming convention
--   - Complete technical metadata analysis
--   - Universal quality and trust scoring (0-100 scale)
--   - Comprehensive governance recommendations
--
-- SCORING METHODOLOGY:
--   - Overall Score: Weighted combination of naming (20%) + data type (25%) + constraints (25%) + business value (30%)
--   - Technical Score: Data type appropriateness + implementation quality
--   - Structural Score: Constraints, keys, and relationships
--   - Naming Score: Convention adherence and descriptiveness
--   - Business Score: Inferred importance and domain relevance
--
-- OUTPUTS: 
--   1. #CompleteColumnAnalysis - Every column with full analysis (main results table)
--   2. Executive Summary - High-level statistics and KPIs
--   3. Schema-level Breakdown - Analysis grouped by schema
--   4. Top Priority Report - Urgent items requiring immediate attention
--   5. Data type Distribution - Technical metadata summary
-- ============================================================================

-- ============================================================================
-- CREATE COMPREHENSIVE RESULTS TABLE
-- ============================================================================
-- This table will store the complete analysis results for every column in the database
-- Each column gets a comprehensive evaluation across multiple dimensions
CREATE TABLE #CompleteColumnAnalysis (
    -- Primary Key and Identification
    analysis_id INT IDENTITY(1,1) PRIMARY KEY,          -- Unique identifier for each analysis record
    
    -- ========================================================================
    -- COLUMN IDENTIFICATION SECTION
    -- ========================================================================
    -- Basic information to uniquely identify each column in the database
    database_name NVARCHAR(128) NOT NULL,               -- Database name (for multi-DB scenarios)
    schema_name NVARCHAR(128) NOT NULL,                 -- Schema containing the table
    table_name NVARCHAR(128) NOT NULL,                  -- Table containing the column
    column_name NVARCHAR(128) NOT NULL,                 -- Column name within the table
    ordinal_position INT NOT NULL,                      -- Column position in table (1, 2, 3, etc.)
    full_column_name NVARCHAR(400) NOT NULL,            -- Fully qualified name (DB.Schema.Table.Column)
    
    -- ========================================================================
    -- COMPLETE TECHNICAL METADATA SECTION
    -- ========================================================================
    -- Comprehensive technical details about each column's data type and properties
    data_type_category VARCHAR(50) NOT NULL,            -- Categorized data type (TEXT, INTEGER, NUMERIC, etc.)
    base_data_type NVARCHAR(128) NOT NULL,              -- Raw SQL Server data type (varchar, int, datetime, etc.)
    formatted_data_type NVARCHAR(200) NOT NULL,         -- Type with length/precision (varchar(50), decimal(10,2))
    max_length INT,                                     -- Maximum length for string/binary types
    precision_value TINYINT,                            -- Numeric precision (total digits)
    scale_value TINYINT,                                -- Numeric scale (decimal places)
    character_set_name NVARCHAR(128),                   -- Character set for text types (ASCII/UNICODE)
    collation_name NVARCHAR(128),                       -- Collation for text types (sorting rules)
    
    -- ========================================================================
    -- COLUMN PROPERTIES SECTION
    -- ========================================================================
    -- Binary flags indicating various column characteristics and constraints
    -- Every column gets analyzed for these properties regardless of type or purpose
    is_nullable BIT NOT NULL,                           -- 1 = allows NULL values, 0 = NOT NULL constraint
    is_identity BIT NOT NULL,                           -- 1 = identity/auto-increment column
    is_computed BIT NOT NULL,                           -- 1 = computed column (formula-based)
    is_primary_key BIT NOT NULL,                        -- 1 = part of primary key constraint
    is_foreign_key BIT NOT NULL,                        -- 1 = part of foreign key constraint
    is_unique_key BIT NOT NULL,                         -- 1 = part of unique constraint (non-PK)
    is_indexed BIT NOT NULL,                            -- 1 = has index (any type) for performance
    has_default_constraint BIT NOT NULL,                -- 1 = has default value constraint
    has_check_constraint BIT NOT NULL,                  -- 1 = has check constraint (business rules)
    
    -- ========================================================================
    -- UNIVERSAL QUALITY SCORING SECTION (0-100 scale for ALL columns)
    -- ========================================================================
    -- Multi-dimensional quality assessment applied to every column
    overall_column_score DECIMAL(5,2) NOT NULL,         -- Composite quality score (weighted average of all factors)
    technical_quality_score DECIMAL(5,2) NOT NULL,      -- Technical implementation quality (data types, structure)
    structural_integrity_score DECIMAL(5,2) NOT NULL,   -- Constraints and relationships quality
    naming_quality_score DECIMAL(5,2) NOT NULL,         -- Naming convention adherence and clarity
    business_value_score DECIMAL(5,2) NOT NULL,         -- Inferred business importance and relevance
    
    -- ========================================================================
    -- QUALITY LEVEL CLASSIFICATIONS
    -- ========================================================================
    -- Categorical interpretations of the numeric scores for easier understanding
    overall_quality_level VARCHAR(20) NOT NULL,         -- EXCELLENT(90+)/GOOD(75-89)/FAIR(60-74)/POOR(<60)
    technical_quality_level VARCHAR(20) NOT NULL,       -- Technical implementation assessment
    structural_quality_level VARCHAR(20) NOT NULL,      -- Constraint and relationship assessment
    naming_quality_level VARCHAR(20) NOT NULL,          -- Naming convention assessment
    
    -- ========================================================================
    -- UNIVERSAL SENSITIVITY ANALYSIS SECTION
    -- ========================================================================
    -- Security and privacy assessment applied to ALL columns (not just obvious PII)
    data_sensitivity_category VARCHAR(30) NOT NULL,     -- HIGHLY_SENSITIVE/SENSITIVE/MODERATELY_SENSITIVE/LOW_SENSITIVITY/PUBLIC
    potential_pii_indicator VARCHAR(50),                -- Detected PII category (if any pattern matches)
    pii_risk_assessment VARCHAR(30) NOT NULL,           -- CRITICAL_RISK/HIGH_RISK/MEDIUM_RISK/LOW_RISK/MINIMAL_RISK
    business_criticality VARCHAR(20) NOT NULL,          -- CRITICAL/HIGH/MEDIUM/STANDARD (business importance)
    
    -- ========================================================================
    -- COMPLETE GOVERNANCE ANALYSIS
    -- ========================================================================
    -- Data governance recommendations and requirements for every column
    data_classification VARCHAR(50) NOT NULL,           -- RESTRICTED/CONFIDENTIAL/SENSITIVE/BUSINESS_CRITICAL/INTERNAL
    access_control_recommendation VARCHAR(30) NOT NULL, -- HIGHLY_RESTRICTED/RESTRICTED/CONTROLLED/MANAGED/STANDARD
    encryption_recommendation VARCHAR(30) NOT NULL,     -- MANDATORY/REQUIRED/RECOMMENDED/NOT_REQUIRED
    masking_recommendation VARCHAR(30) NOT NULL,        -- REQUIRED/RECOMMENDED/NOT_REQUIRED (for non-prod environments)
    monitoring_priority VARCHAR(20) NOT NULL,           -- CONTINUOUS/HIGH/MEDIUM/STANDARD (access monitoring level)
    
    -- ========================================================================
    -- BUSINESS CONTEXT ANALYSIS
    -- ========================================================================
    -- Inferred business purpose and domain classification for each column
    inferred_purpose VARCHAR(200) NOT NULL,             -- Likely business purpose based on naming and context
    domain_category VARCHAR(50) NOT NULL,               -- Business domain (SALES_REVENUE, CUSTOMER_DATA, etc.)
    data_lifecycle_stage VARCHAR(30) NOT NULL,          -- ACTIVE_TRANSACTIONAL/ACTIVE_REFERENCE/HISTORICAL_ARCHIVE/TEMPORARY
    
    -- ========================================================================
    -- COMPLIANCE AND REGULATORY SECTION
    -- ========================================================================
    -- Legal and regulatory considerations for each column
    regulatory_scope NVARCHAR(500) NOT NULL,            -- Applicable regulations (GDPR, CCPA, HIPAA, etc.)
    retention_category VARCHAR(50) NOT NULL,            -- Data retention classification and requirements
    compliance_risk_level VARCHAR(20) NOT NULL,         -- Regulatory compliance risk assessment
    
    -- ========================================================================
    -- ACTIONABLE RECOMMENDATIONS SECTION
    -- ========================================================================
    -- Specific, actionable guidance for data stewards and administrators
    immediate_actions VARCHAR(300),                      -- Urgent actions needed (if any) - NULL if none required
    recommended_improvements VARCHAR(300),               -- Suggested improvements for better governance/quality
    governance_priority VARCHAR(20) NOT NULL,           -- URGENT/HIGH/MEDIUM/LOW (priority for governance attention)
    steward_assignment VARCHAR(50) NOT NULL,            -- Suggested data steward type/role for ownership
    
    -- ========================================================================
    -- ANALYSIS METADATA
    -- ========================================================================
    -- Information about when and how the analysis was performed
    analysis_timestamp DATETIME DEFAULT GETDATE(),      -- When this analysis was run
    analysis_version VARCHAR(10) DEFAULT '1.0'          -- Version of the analysis methodology
);

-- ============================================================================
-- COMPREHENSIVE COLUMN ANALYSIS - MAIN QUERY WITH MULTIPLE CTEs
-- ============================================================================
-- Using Common Table Expressions (CTEs) to break down the complex analysis
-- into manageable, logical components. Each CTE focuses on a specific aspect
-- of the analysis, making the code more readable and maintainable.

WITH 
-- ============================================================================
-- CTE 1: COMPLETE COLUMN INVENTORY
-- ============================================================================
-- PURPOSE: Catalog EVERY column in EVERY user table across ALL schemas
-- SCOPE: Universal - no filtering, every column included regardless of purpose
-- METHOD: Joins system catalog views to extract complete technical metadata
-- OUTPUT: Base dataset with all columns and their technical characteristics
AllColumnsInventory AS (
    SELECT 
        -- Core identifiers for joining with other CTEs
        c.object_id,                                     -- Table object ID (internal SQL Server identifier)
        c.column_id,                                     -- Column ID within the table (internal identifier)
        
        -- Human-readable identifiers
        DB_NAME() as database_name,                      -- Current database name
        s.name AS schema_name,                           -- Schema name (dbo, Sales, Person, etc.)
        t.name AS table_name,                            -- Table name
        c.name AS column_name,                           -- Column name
        c.column_id AS ordinal_position,                 -- Position of column in table definition
        
        -- Fully qualified name for unique identification across database
        DB_NAME() + '.' + s.name + '.' + t.name + '.' + c.name AS full_column_name,
        
        -- ====================================================================
        -- COMPLETE TECHNICAL DETAILS FOR EVERY COLUMN
        -- ====================================================================
        ty.name AS base_data_type,                       -- Base SQL Server data type name
        c.max_length,                                    -- Maximum length (relevant for strings/binary)
        c.precision AS precision_value,                  -- Numeric precision (total significant digits)
        c.scale AS scale_value,                          -- Numeric scale (digits after decimal point)
        c.is_nullable,                                   -- NULL constraint indicator
        c.is_identity,                                   -- Identity column indicator
        c.is_computed,                                   -- Computed column indicator
        c.collation_name,                                -- Text collation (sorting/comparison rules)
        
        -- ====================================================================
        -- ENHANCED DATA TYPE CATEGORIZATION FOR ALL TYPES
        -- ====================================================================
        -- Groups SQL Server's many data types into logical categories
        -- This helps with standardized analysis across different specific types
        CASE 
            -- Text/String data types - for names, descriptions, codes
            WHEN ty.name IN ('char', 'varchar', 'text', 'nchar', 'nvarchar', 'ntext') THEN 'TEXT'
            
            -- Integer data types - for counts, IDs, flags
            WHEN ty.name IN ('int', 'bigint', 'smallint', 'tinyint') THEN 'INTEGER'
            
            -- Decimal/Floating point data types - for money, measurements, calculations
            WHEN ty.name IN ('decimal', 'numeric', 'money', 'smallmoney', 'float', 'real') THEN 'NUMERIC'
            
            -- Date/Time data types - for timestamps, dates, durations
            WHEN ty.name IN ('date', 'datetime', 'datetime2', 'smalldatetime', 'time', 'datetimeoffset') THEN 'TEMPORAL'
            
            -- Boolean data types - for true/false, yes/no flags
            WHEN ty.name = 'bit' THEN 'BOOLEAN'
            
            -- Unique identifier data types - for GUIDs, unique keys
            WHEN ty.name = 'uniqueidentifier' THEN 'IDENTIFIER'
            
            -- Binary data types - for images, documents, encrypted data
            WHEN ty.name IN ('binary', 'varbinary', 'image') THEN 'BINARY'
            
            -- Structured data types - for XML documents
            WHEN ty.name = 'xml' THEN 'STRUCTURED'
            
            -- Spatial data types - for geographic/geometric data
            WHEN ty.name IN ('geography', 'geometry') THEN 'SPATIAL'
            
            -- Hierarchical data types - for organizational structures
            WHEN ty.name = 'hierarchyid' THEN 'HIERARCHICAL'
            
            -- System data types - for internal SQL Server use
            WHEN ty.name IN ('sql_variant', 'timestamp', 'rowversion') THEN 'SYSTEM'
            
            -- Catch-all for any other data types
            ELSE 'OTHER'
        END AS data_type_category,
        
        -- ====================================================================
        -- FORMATTED DATA TYPE WITH FULL SPECIFICATIONS
        -- ====================================================================
        -- Creates human-readable data type strings with length/precision info
        -- Examples: varchar(50), decimal(10,2), datetime2(7)
        ty.name + 
        CASE
            -- String types: show length or MAX
            WHEN ty.name IN ('varchar', 'nvarchar', 'char', 'nchar') THEN
                '(' + CASE WHEN c.max_length = -1 THEN 'MAX' ELSE CAST(c.max_length AS VARCHAR(10)) END + ')'
            
            -- Numeric types: show precision and scale
            WHEN ty.name IN ('decimal', 'numeric') THEN
                '(' + CAST(c.precision AS VARCHAR(10)) + ',' + CAST(c.scale AS VARCHAR(10)) + ')'
            
            -- Float: show precision if not default
            WHEN ty.name IN ('float') THEN
                CASE WHEN c.precision != 53 THEN '(' + CAST(c.precision AS VARCHAR(10)) + ')' ELSE '' END
            
            -- DateTime types: show scale (fractional seconds precision)
            WHEN ty.name IN ('datetime2', 'time', 'datetimeoffset') THEN
                '(' + CAST(c.scale AS VARCHAR(10)) + ')'
            
            -- Other types: no additional formatting needed
            ELSE ''
        END AS formatted_data_type,
        
        -- ====================================================================
        -- CHARACTER SET INFORMATION FOR TEXT COLUMNS
        -- ====================================================================
        -- Identifies whether text columns use ASCII or Unicode encoding
        CASE 
            WHEN ty.name IN ('char', 'varchar', 'text') THEN 'ASCII'      -- Single-byte character sets
            WHEN ty.name IN ('nchar', 'nvarchar', 'ntext') THEN 'UNICODE' -- Multi-byte Unicode support
            ELSE NULL                                                      -- Not applicable to non-text types
        END AS character_set_name
        
    FROM sys.schemas s
    JOIN sys.tables t ON s.schema_id = t.schema_id       -- Join schemas to tables
    JOIN sys.columns c ON t.object_id = c.object_id      -- Join tables to columns
    JOIN sys.types ty ON c.user_type_id = ty.user_type_id -- Join to get data type information
    WHERE t.type = 'U'                                   -- Only user tables (excludes system tables, views, etc.)
),

-- ============================================================================
-- CTE 2: COMPREHENSIVE CONSTRAINT ANALYSIS
-- ============================================================================
-- PURPOSE: Analyze ALL types of constraints for EVERY column
-- SCOPE: Primary keys, foreign keys, unique constraints, indexes, defaults, checks
-- METHOD: Multiple LEFT JOINs to constraint system views to detect all constraint types
-- OUTPUT: Binary flags and scoring for constraint coverage per column
CompleteConstraintAnalysis AS (
    SELECT 
        c.object_id,                                     -- Table identifier for joining
        c.column_id,                                     -- Column identifier for joining
        
        -- ====================================================================
        -- CONSTRAINT DETECTION FLAGS
        -- ====================================================================
        -- Each flag indicates whether the column participates in that constraint type
        
        -- Primary Key Detection
        -- Checks if column is part of the table's primary key constraint
        CASE WHEN pk.column_id IS NOT NULL THEN 1 ELSE 0 END AS is_primary_key,
        
        -- Foreign Key Detection  
        -- Checks if column references another table (referential integrity)
        CASE WHEN fk.parent_column_id IS NOT NULL THEN 1 ELSE 0 END AS is_foreign_key,
        
        -- Unique Constraint Detection (excluding primary key)
        -- Checks for unique constraints that aren't the primary key
        CASE WHEN uk.column_id IS NOT NULL THEN 1 ELSE 0 END AS is_unique_key,
        
        -- Index Detection (any type of index)
        -- Checks if column has any index for performance optimization
        CASE WHEN idx.column_id IS NOT NULL THEN 1 ELSE 0 END AS is_indexed,
        
        -- Default Constraint Detection
        -- Checks if column has a default value specified
        CASE WHEN dc.parent_column_id IS NOT NULL THEN 1 ELSE 0 END AS has_default_constraint,
        
        -- Check Constraint Detection (table-level, conservative approach)
        -- Checks if table has check constraints (business rule validation)
        CASE WHEN cc.parent_object_id IS NOT NULL THEN 1 ELSE 0 END AS has_check_constraint,
        
        -- ====================================================================
        -- CONSTRAINT SCORE CALCULATION (0-100+ points possible)
        -- ====================================================================
        -- Weighted scoring system that rewards different types of constraints
        -- Higher scores indicate better data integrity and reliability
        (
            -- Primary Key: 25 points (highest value - uniqueness + not null guaranteed)
            CASE WHEN pk.column_id IS NOT NULL THEN 25 ELSE 0 END +
            
            -- Foreign Key: 20 points (referential integrity, relationships)
            CASE WHEN fk.parent_column_id IS NOT NULL THEN 20 ELSE 0 END +
            
            -- Unique Constraint: 15 points (data uniqueness without being PK)
            CASE WHEN uk.column_id IS NOT NULL THEN 15 ELSE 0 END +
            
            -- Default Constraint: 10 points (prevents null insertion issues)
            CASE WHEN dc.parent_column_id IS NOT NULL THEN 10 ELSE 0 END +
            
            -- Check Constraint: 10 points (business rule enforcement)
            CASE WHEN cc.parent_object_id IS NOT NULL THEN 10 ELSE 0 END +
            
            -- Index: 10 points (performance optimization, indicates importance)
            CASE WHEN idx.column_id IS NOT NULL THEN 10 ELSE 0 END +
            
            -- Identity: 15 points (system-managed, guaranteed uniqueness)
            CASE WHEN c.is_identity = 1 THEN 15 ELSE 0 END +
            
            -- NOT NULL: 5 points (basic data quality requirement)
            CASE WHEN c.is_nullable = 0 THEN 5 ELSE 0 END
        ) AS total_constraint_score
        
    FROM sys.columns c
    
    -- ====================================================================
    -- PRIMARY KEY DETECTION SUBQUERY
    -- ====================================================================
    -- Identifies columns that are part of primary key constraints
    LEFT JOIN (
        SELECT ic.object_id, ic.column_id
        FROM sys.indexes i
        JOIN sys.index_columns ic ON i.object_id = ic.object_id AND i.index_id = ic.index_id
        WHERE i.is_primary_key = 1                       -- Only primary key indexes
    ) pk ON c.object_id = pk.object_id AND c.column_id = pk.column_id
    
    -- ====================================================================
    -- FOREIGN KEY DETECTION
    -- ====================================================================
    -- Identifies columns that reference other tables (parent side of FK relationship)
    LEFT JOIN sys.foreign_key_columns fk ON c.object_id = fk.parent_object_id 
                                         AND c.column_id = fk.parent_column_id
    
    -- ====================================================================
    -- UNIQUE CONSTRAINT DETECTION SUBQUERY
    -- ====================================================================
    -- Identifies columns with unique constraints (excluding primary keys)
    LEFT JOIN (
        SELECT ic.object_id, ic.column_id
        FROM sys.indexes i
        JOIN sys.index_columns ic ON i.object_id = ic.object_id AND i.index_id = ic.index_id
        WHERE i.is_unique = 1 AND i.is_primary_key = 0   -- Unique but not primary key
    ) uk ON c.object_id = uk.object_id AND c.column_id = uk.column_id
    
    -- ====================================================================
    -- INDEX DETECTION SUBQUERY
    -- ====================================================================
    -- Identifies columns that have any type of index (for performance)
    LEFT JOIN (
        SELECT DISTINCT ic.object_id, ic.column_id
        FROM sys.indexes i
        JOIN sys.index_columns ic ON i.object_id = ic.object_id AND i.index_id = ic.index_id
        -- Includes all index types: clustered, nonclustered, unique, etc.
    ) idx ON c.object_id = idx.object_id AND c.column_id = idx.column_id
    
    -- ====================================================================
    -- DEFAULT CONSTRAINT DETECTION
    -- ====================================================================
    -- Identifies columns that have default value constraints
    LEFT JOIN sys.default_constraints dc ON c.object_id = dc.parent_object_id 
                                         AND c.column_id = dc.parent_column_id
    
    -- ====================================================================
    -- CHECK CONSTRAINT DETECTION (TABLE LEVEL)
    -- ====================================================================
    -- Conservative approach: if table has check constraints, gives credit to all columns
    -- More sophisticated column-specific detection would require parsing constraint definitions
    LEFT JOIN (
        SELECT DISTINCT parent_object_id
        FROM sys.check_constraints
    ) cc ON c.object_id = cc.parent_object_id
),

-- ============================================================================
-- CTE 3: UNIVERSAL NAMING AND PATTERN ANALYSIS
-- ============================================================================
-- PURPOSE: Analyze naming patterns and conventions for ALL columns
-- SCOPE: Every column gets naming quality assessment and pattern detection
-- METHOD: Pattern matching, length analysis, convention checking, PII detection
-- OUTPUT: Naming quality scores, PII risk assessment, data type appropriateness
UniversalNamingAnalysis AS (
    SELECT 
        aci.object_id,
        aci.column_id,
        aci.column_name,
        aci.schema_name,
        aci.table_name,
        aci.data_type_category,
        
        -- ====================================================================
        -- COMPREHENSIVE NAMING QUALITY ANALYSIS
        -- ====================================================================
        -- Evaluates naming conventions, length, clarity, and best practices
        
        -- Column Name Length Assessment (useful for quality evaluation)
        LEN(aci.column_name) AS name_length,
        
        -- ====================================================================
        -- NAMING CONVENTION SCORE (0-100 SCALE)
        -- ====================================================================
        -- Multi-factor assessment of naming quality based on industry best practices
        CASE
            -- EXCELLENT NAMING (95 points): Meets all best practice criteria
            -- - Appropriate length (not too short, not too long)
            -- - Avoids temporary/test naming patterns
            -- - Descriptive and meaningful
            -- - Follows standard conventions
            WHEN LEN(aci.column_name) BETWEEN 3 AND 50 AND 
                 aci.column_name NOT LIKE '%temp%' AND        -- Avoids temporary naming
                 aci.column_name NOT LIKE '%old%' AND         -- Avoids legacy naming
                 aci.column_name NOT LIKE '%bak%' AND         -- Avoids backup naming
                 aci.column_name NOT LIKE '%test%' AND        -- Avoids test naming
                 aci.column_name NOT LIKE '%1' AND            -- Avoids numbered suffixes
                 aci.column_name NOT LIKE '%2' AND            -- Avoids numbered suffixes
                 aci.column_name NOT LIKE 'col%' AND          -- Avoids generic "column" names
                 aci.column_name NOT LIKE 'field%' AND        -- Avoids generic "field" names
                 aci.column_name <> 'id' AND                  -- Avoids non-descriptive "id"
                 aci.column_name NOT LIKE '%_%_%_%'           -- Not overly complex with underscores
            THEN 95
            
            -- GOOD NAMING (80 points): Meets most standards, minor issues
            -- - Reasonable length and avoids obvious bad patterns
            WHEN LEN(aci.column_name) BETWEEN 2 AND 75 AND 
                 aci.column_name NOT LIKE '%temp%' AND 
                 aci.column_name NOT LIKE '%old%' AND
                 aci.column_name NOT LIKE '%test%'
            THEN 80
            
            -- FAIR NAMING (65 points): Acceptable but could improve
            -- - Basic length requirements met, some issues present
            WHEN LEN(aci.column_name) BETWEEN 2 AND 100 AND
                 aci.column_name NOT LIKE '%temp%'
            THEN 65
            
            -- POOR NAMING (40 points): Below standards, needs improvement
            -- - Too short/long, or contains problematic patterns
            WHEN LEN(aci.column_name) < 2 OR LEN(aci.column_name) > 100 OR
                 aci.column_name LIKE '%temp%' OR aci.column_name LIKE '%old%'
            THEN 40
            
            -- VERY POOR NAMING (20 points): Major issues requiring attention
            ELSE 20
        END AS naming_quality_score,
        
        -- ====================================================================
        -- UNIVERSAL PII PATTERN DETECTION
        -- ====================================================================
        -- Apply PII detection to ALL columns, not just suspected ones
        -- Uses broader pattern matching to catch potential PII across all naming styles
        
        -- ====================================================================
        -- POTENTIAL PII CATEGORY (COMPREHENSIVE DETECTION)
        -- ====================================================================
        -- Analyzes column names for patterns indicating different types of personal data
        CASE
            -- GOVERNMENT AND LEGAL IDENTIFIERS
            -- Highest risk category - legal penalties for misuse
            WHEN aci.column_name LIKE '%ssn%' OR aci.column_name LIKE '%social%security%' OR
                 aci.column_name LIKE '%tax%id%' OR aci.column_name LIKE '%ein%' OR
                 aci.column_name LIKE '%passport%' OR aci.column_name LIKE '%license%'
            THEN 'GOVERNMENT_LEGAL_ID'
            
            -- FINANCIAL INFORMATION
            -- Protected by financial regulations (PCI-DSS, banking laws)
            WHEN aci.column_name LIKE '%credit%card%' OR aci.column_name LIKE '%account%number%' OR
                 aci.column_name LIKE '%salary%' OR aci.column_name LIKE '%wage%' OR
                 aci.column_name LIKE '%income%' OR aci.column_name LIKE '%pay%'
            THEN 'FINANCIAL_DATA'
            
            -- PERSONAL NAMES
            -- Can enable identity theft when combined with other data
            WHEN aci.column_name LIKE '%name%' OR aci.column_name LIKE '%first%' OR
                 aci.column_name LIKE '%last%' OR aci.column_name LIKE '%middle%' OR
                 aci.column_name LIKE '%display%'
            THEN 'PERSONAL_NAME'
            
            -- CONTACT INFORMATION
            -- Used for communication but also phishing/social engineering
            WHEN aci.column_name LIKE '%email%' OR aci.column_name LIKE '%mail%' OR
                 aci.column_name LIKE '%phone%' OR aci.column_name LIKE '%mobile%' OR
                 aci.column_name LIKE '%telephone%'
            THEN 'CONTACT_INFO'
            
            -- ADDRESS INFORMATION
            -- Location data, delivery information, can reveal patterns
            WHEN aci.column_name LIKE '%address%' OR aci.column_name LIKE '%street%' OR
                 aci.column_name LIKE '%city%' OR aci.column_name LIKE '%state%' OR
                 aci.column_name LIKE '%zip%' OR aci.column_name LIKE '%postal%' OR
                 aci.column_name LIKE '%country%'
            THEN 'ADDRESS_INFO'
            
            -- DEMOGRAPHIC DATA
            -- Protected classes, potential discrimination risks
            WHEN aci.column_name LIKE '%birth%' OR aci.column_name LIKE '%dob%' OR
                 aci.column_name LIKE '%age%' OR aci.column_name LIKE '%gender%' OR
                 aci.column_name LIKE '%sex%' OR aci.column_name LIKE '%race%' OR
                 aci.column_name LIKE '%ethnicity%'
            THEN 'DEMOGRAPHIC_DATA'
            
            -- AUTHENTICATION/SECURITY
            -- System access credentials and security tokens
            WHEN aci.column_name LIKE '%password%' OR aci.column_name LIKE '%secret%' OR
                 aci.column_name LIKE '%token%' OR aci.column_name LIKE '%hash%' OR
                 aci.column_name LIKE '%key%'
            THEN 'AUTHENTICATION_SECURITY'
            
            -- HEALTH INFORMATION
            -- Protected by health privacy laws (HIPAA, etc.)
            WHEN aci.column_name LIKE '%medical%' OR aci.column_name LIKE '%health%' OR
                 aci.column_name LIKE '%diagnosis%' OR aci.column_name LIKE '%treatment%'
            THEN 'HEALTH_DATA'
            
            -- SYSTEM/TECHNICAL FIELDS
            -- Usually safe but important for system integrity
            WHEN aci.column_name LIKE '%id' OR aci.column_name LIKE '%key' OR
                 aci.column_name LIKE '%guid%' OR aci.column_name LIKE '%uuid%'
            THEN 'SYSTEM_IDENTIFIER'
            
            -- TEMPORAL FIELDS
            -- Time-based data for auditing and analysis
            WHEN aci.column_name LIKE '%date%' OR aci.column_name LIKE '%time%' OR
                 aci.column_name LIKE '%created%' OR aci.column_name LIKE '%modified%' OR
                 aci.column_name LIKE '%updated%'
            THEN 'TEMPORAL_DATA'
            
            -- FINANCIAL/BUSINESS METRICS
            -- Business-sensitive numerical data
            WHEN aci.column_name LIKE '%amount%' OR aci.column_name LIKE '%price%' OR
                 aci.column_name LIKE '%cost%' OR aci.column_name LIKE '%value%' OR
                 aci.column_name LIKE '%total%'
            THEN 'BUSINESS_METRIC'
            
            -- STATUS/CONTROL FIELDS
            -- Workflow and process control indicators
            WHEN aci.column_name LIKE '%status%' OR aci.column_name LIKE '%flag%' OR
                 aci.column_name LIKE '%active%' OR aci.column_name LIKE '%enabled%'
            THEN 'STATUS_CONTROL'
            
            -- DEFAULT CATEGORY
            -- Columns that don't match specific patterns
            ELSE 'GENERAL_DATA'
        END AS potential_pii_category,
        
        -- ====================================================================
        -- PII RISK ASSESSMENT FOR ALL COLUMNS
        -- ====================================================================
        -- Risk level assessment based on potential impact of data exposure
        CASE
            -- CRITICAL RISK: Immediate identity theft or system compromise possible
            -- Legal violations, financial fraud, system breaches
            WHEN aci.column_name LIKE '%ssn%' OR aci.column_name LIKE '%social%security%' OR
                 aci.column_name LIKE '%passport%' OR aci.column_name LIKE '%credit%card%' OR
                 aci.column_name LIKE '%password%'
            THEN 'CRITICAL_RISK'
            
            -- HIGH RISK: Significant personal or financial harm possible
            -- Identity reconstruction, financial loss, privacy violations
            WHEN aci.column_name LIKE '%name%' OR aci.column_name LIKE '%email%' OR
                 aci.column_name LIKE '%phone%' OR aci.column_name LIKE '%address%' OR
                 aci.column_name LIKE '%salary%' OR aci.column_name LIKE '%birth%'
            THEN 'HIGH_RISK'
            
            -- MEDIUM RISK: Moderate privacy concerns, partial identity exposure
            -- Can be problematic in combination with other data
            WHEN aci.column_name LIKE '%age%' OR aci.column_name LIKE '%gender%' OR
                 aci.column_name LIKE '%city%' OR aci.column_name LIKE '%state%'
            THEN 'MEDIUM_RISK'
            
            -- LOW RISK: Limited privacy impact, usually safe when isolated
            -- System fields, non-personal identifiers
            WHEN aci.column_name LIKE '%id%' OR aci.column_name LIKE '%date%' OR
                 aci.column_name LIKE '%flag%' OR aci.column_name LIKE '%status%'
            THEN 'LOW_RISK'
            
            -- MINIMAL RISK: Generally safe, public or non-sensitive data
            ELSE 'MINIMAL_RISK'
        END AS pii_risk_assessment,
        
        -- ====================================================================
        -- DATA TYPE APPROPRIATENESS FOR ALL COLUMNS
        -- ====================================================================
        -- Evaluates whether data type choice is appropriate for column name/purpose
        -- Higher scores indicate better alignment between name and data type
        CASE
            -- PERFECT MATCHES (95-100 points): Ideal data type for the purpose
            WHEN aci.column_name LIKE '%date%' AND aci.data_type_category = 'TEMPORAL' THEN 100
            WHEN aci.column_name LIKE '%time%' AND aci.data_type_category = 'TEMPORAL' THEN 100
            WHEN aci.column_name LIKE '%amount%' AND aci.data_type_category = 'NUMERIC' THEN 100
            WHEN aci.column_name LIKE '%price%' AND aci.data_type_category = 'NUMERIC' THEN 100
            WHEN aci.column_name LIKE '%count%' AND aci.data_type_category = 'INTEGER' THEN 100
            WHEN aci.column_name LIKE '%flag%' AND aci.data_type_category = 'BOOLEAN' THEN 100
            WHEN aci.column_name LIKE '%id' AND aci.data_type_category IN ('INTEGER', 'IDENTIFIER') THEN 95
            WHEN aci.column_name LIKE '%name%' AND aci.data_type_category = 'TEXT' THEN 90
            WHEN aci.column_name LIKE '%description%' AND aci.data_type_category = 'TEXT' THEN 90
            
            -- GOOD MATCHES (75-89 points): Appropriate but not perfect
            WHEN aci.data_type_category = 'TEXT' AND aci.column_name LIKE '%code%' THEN 85
            WHEN aci.data_type_category = 'TEXT' AND aci.column_name LIKE '%address%' THEN 85
            WHEN aci.data_type_category = 'NUMERIC' AND aci.column_name LIKE '%rate%' THEN 80
            WHEN aci.data_type_category = 'INTEGER' AND aci.column_name LIKE '%number%' THEN 80
            
            -- REASONABLE MATCHES (60-74 points): Generally acceptable
            WHEN aci.data_type_category = 'TEXT' THEN 70
            WHEN aci.data_type_category = 'INTEGER' THEN 65
            WHEN aci.data_type_category = 'NUMERIC' THEN 65
            WHEN aci.data_type_category = 'TEMPORAL' THEN 70
            WHEN aci.data_type_category = 'BOOLEAN' THEN 65
            
            -- BELOW AVERAGE MATCHES (40-59 points): Questionable choices
            WHEN aci.data_type_category = 'BINARY' THEN 50
            WHEN aci.data_type_category = 'SYSTEM' THEN 45
            
            -- POOR MATCHES (0-39 points): Likely inappropriate data type choices
            ELSE 40
        END AS data_type_appropriateness_score
        
    FROM AllColumnsInventory aci
),

-- ============================================================================
-- CTE 4: BUSINESS CONTEXT AND VALUE ASSESSMENT
-- ============================================================================
-- PURPOSE: Assess business value and context for ALL columns
-- METHOD: Schema-based and naming-based business value inference
-- SCOPE: Every column gets business importance scoring and domain classification
-- OUTPUT: Business value scores, criticality assessment, domain categorization
BusinessContextAnalysis AS (
    SELECT 
        una.object_id,
        una.column_id,
        una.schema_name,
        una.table_name,
        una.column_name,
        
        -- ====================================================================
        -- BUSINESS VALUE SCORING (0-100 SCALE)
        -- ====================================================================
        -- Determines business importance based on schema, naming, and patterns
        -- Higher scores indicate greater business importance and value
        
        -- Three-component scoring system with weighted contributions:
        
        -- COMPONENT 1: SCHEMA-BASED BUSINESS VALUE (40% OF TOTAL SCORE)
        -- Different schemas typically have different business importance levels
        CASE una.schema_name
            WHEN 'Sales' THEN 40           -- Revenue generating activities - highest priority
            WHEN 'Person' THEN 35          -- Customer data - critical for business
            WHEN 'HumanResources' THEN 35  -- Employee data - legal and operational importance
            WHEN 'Production' THEN 30      -- Operations data - important for manufacturing
            WHEN 'Purchasing' THEN 25      -- Procurement data - cost management
            WHEN 'dbo' THEN 20              -- Default schema - variable importance
            ELSE 15                         -- Other schemas - standard importance
        END +
        
        -- COMPONENT 2: COLUMN NAME BUSINESS VALUE INDICATORS (35% OF TOTAL SCORE)
        -- Column names that suggest high business value or critical operations
        CASE 
            -- Revenue/Financial keywords - highest business impact
            WHEN una.column_name LIKE '%revenue%' OR una.column_name LIKE '%sales%' OR 
                 una.column_name LIKE '%profit%' OR una.column_name LIKE '%income%' THEN 35
            
            -- Customer/Client keywords - critical for customer relationships
            WHEN una.column_name LIKE '%customer%' OR una.column_name LIKE '%client%' OR
                 una.column_name LIKE '%name%' THEN 30
            
            -- Product/Order keywords - core business operations
            WHEN una.column_name LIKE '%product%' OR una.column_name LIKE '%order%' OR
                 una.column_name LIKE '%price%' OR una.column_name LIKE '%amount%' THEN 25
            
            -- Temporal/Status keywords - tracking and workflow
            WHEN una.column_name LIKE '%date%' OR una.column_name LIKE '%time%' OR
                 una.column_name LIKE '%status%' THEN 20
            
            -- Identifier keywords - system integrity
            WHEN una.column_name LIKE '%id%' OR una.column_name LIKE '%key%' THEN 15
            
            -- Descriptive keywords - supporting information
            WHEN una.column_name LIKE '%description%' OR una.column_name LIKE '%note%' THEN 10
            
            -- Other columns - basic value
            ELSE 5
        END +
        
        -- COMPONENT 3: DATA TYPE BUSINESS VALUE (25% OF TOTAL SCORE)
        -- Different data types typically have different business analysis value
        CASE una.data_type_category
            WHEN 'NUMERIC' THEN 25          -- Financial calculations, measurements, KPIs
            WHEN 'TEXT' THEN 20             -- Descriptive information, names, codes
            WHEN 'TEMPORAL' THEN 20         -- Time-based analysis, trends, auditing
            WHEN 'IDENTIFIER' THEN 15       -- Relationships, unique identification
            WHEN 'INTEGER' THEN 15          -- Counts, references, flags
            WHEN 'BOOLEAN' THEN 10          -- Status indicators, flags
            ELSE 5                          -- Other types - limited business analysis value
        END AS business_value_score,
        
        -- ====================================================================
        -- BUSINESS CRITICALITY ASSESSMENT
        -- ====================================================================
        -- Categorical assessment of how critical this column is to business operations
        CASE
            -- CRITICAL: Revenue-generating or financial data in key schemas
            WHEN (una.schema_name IN ('Sales', 'Person') AND una.column_name LIKE '%amount%') OR
                 (una.column_name LIKE '%revenue%' OR una.column_name LIKE '%profit%') THEN 'CRITICAL'
            
            -- HIGH: Data in business-critical schemas
            WHEN una.schema_name IN ('Sales', 'Person', 'HumanResources') THEN 'HIGH'
            
            -- MEDIUM: Operational data supporting business processes
            WHEN una.schema_name IN ('Production', 'Purchasing') THEN 'MEDIUM'
            
            -- STANDARD: General business data with normal importance
            ELSE 'STANDARD'
        END AS business_criticality,
        
        -- ====================================================================
        -- DOMAIN CLASSIFICATION
        -- ====================================================================
        -- Assigns each column to a business domain for governance and stewardship
        CASE 
            WHEN una.schema_name = 'Sales' THEN 'SALES_REVENUE'
            WHEN una.schema_name = 'Person' THEN 'CUSTOMER_DATA'
            WHEN una.schema_name = 'HumanResources' THEN 'EMPLOYEE_DATA'
            WHEN una.schema_name = 'Production' THEN 'OPERATIONS'
            WHEN una.schema_name = 'Purchasing' THEN 'PROCUREMENT'
            WHEN una.column_name LIKE '%financial%' OR una.column_name LIKE '%money%' THEN 'FINANCIAL'
            WHEN una.column_name LIKE '%audit%' OR una.column_name LIKE '%log%' THEN 'AUDIT_COMPLIANCE'
            WHEN una.column_name LIKE '%config%' OR una.column_name LIKE '%setting%' THEN 'CONFIGURATION'
            ELSE 'GENERAL_BUSINESS'
        END AS domain_category,
        
        -- ====================================================================
        -- DATA LIFECYCLE ASSESSMENT
        -- ====================================================================
        -- Determines the lifecycle stage and usage pattern of the data
        CASE 
            -- Active transactional data - frequently updated, current operations
            WHEN una.column_name LIKE '%created%' OR una.column_name LIKE '%modified%' OR
                 una.column_name LIKE '%updated%' OR una.column_name LIKE '%date%' THEN 'ACTIVE_TRANSACTIONAL'
            
            -- Historical/archived data - preserved for compliance or analysis
            WHEN una.column_name LIKE '%archived%' OR una.column_name LIKE '%deleted%' OR
                 una.column_name LIKE '%historical%' THEN 'HISTORICAL_ARCHIVE'
            
            -- Temporary/staging data - processing intermediate results
            WHEN una.column_name LIKE '%temp%' OR una.column_name LIKE '%staging%' THEN 'TEMPORARY_PROCESSING'
            
            -- Active reference data - stable, frequently read
            ELSE 'ACTIVE_REFERENCE'
        END AS data_lifecycle_stage
        
    FROM UniversalNamingAnalysis una
)

-- ============================================================================
-- MAIN INSERT STATEMENT - COMPLETE ANALYSIS OF ALL COLUMNS
-- ============================================================================
-- Combines all the CTE results and calculates final scores and classifications
-- This INSERT processes every column and generates comprehensive governance analysis

INSERT INTO #CompleteColumnAnalysis (
    -- Column identification fields
    database_name, schema_name, table_name, column_name, ordinal_position, full_column_name,
    
    -- Technical metadata fields
    data_type_category, base_data_type, formatted_data_type, max_length, precision_value, scale_value,
    character_set_name, collation_name,
    
    -- Column properties (constraint and structure flags)
    is_nullable, is_identity, is_computed, is_primary_key, is_foreign_key, is_unique_key, 
    is_indexed, has_default_constraint, has_check_constraint,
    
    -- Quality scoring fields
    overall_column_score, technical_quality_score, structural_integrity_score, 
    naming_quality_score, business_value_score,
    
    -- Quality level classifications
    overall_quality_level, technical_quality_level, structural_quality_level, naming_quality_level,
    
    -- Sensitivity and risk fields
    data_sensitivity_category, potential_pii_indicator, pii_risk_assessment, business_criticality,
    
    -- Governance recommendation fields
    data_classification, access_control_recommendation, encryption_recommendation, 
    masking_recommendation, monitoring_priority,
    
    -- Business context fields
    inferred_purpose, domain_category, data_lifecycle_stage,
    
    -- Compliance and regulatory fields
    regulatory_scope, retention_category, compliance_risk_level,
    
    -- Actionable recommendation fields
    immediate_actions, recommended_improvements, governance_priority, steward_assignment
)
SELECT 
    -- ====================================================================
    -- BASIC COLUMN IDENTIFICATION
    -- ====================================================================
    aci.database_name,
    aci.schema_name,
    aci.table_name,
    aci.column_name,
    aci.ordinal_position,
    aci.full_column_name,
    
    -- ====================================================================
    -- TECHNICAL METADATA SECTION
    -- ====================================================================
    aci.data_type_category,
    aci.base_data_type,
    aci.formatted_data_type,
    aci.max_length,
    aci.precision_value,
    aci.scale_value,
    aci.character_set_name,
    aci.collation_name,
    
    -- ====================================================================
    -- COLUMN PROPERTIES (CONSTRAINTS AND STRUCTURE)
    -- ====================================================================
    aci.is_nullable,
    aci.is_identity,
    aci.is_computed,
    cca.is_primary_key,
    cca.is_foreign_key,
    cca.is_unique_key,
    cca.is_indexed,
    cca.has_default_constraint,
    cca.has_check_constraint,
    
    -- ====================================================================
    -- COMPREHENSIVE SCORING FOR ALL COLUMNS
    -- ====================================================================
    
    -- OVERALL COLUMN SCORE (0-100) - WEIGHTED COMPOSITE OF ALL FACTORS
    -- This is the primary quality indicator combining multiple dimensions
    -- Weighting rationale:
    --   - Business Value (30%): Most important - drives governance priorities
    --   - Data Type Appropriateness (25%): Technical quality foundation
    --   - Constraints (25%): Data integrity and reliability
    --   - Naming Quality (20%): Maintainability and understanding
    CAST(
        (una.naming_quality_score * 0.20 +                    -- 20% naming quality
         una.data_type_appropriateness_score * 0.25 +         -- 25% data type appropriateness  
         LEAST(cca.total_constraint_score, 100) * 0.25 +      -- 25% constraints (capped at 100)
         bca.business_value_score * 0.30                      -- 30% business value
        ) AS DECIMAL(5,2)
    ) AS overall_column_score,
    
    -- TECHNICAL QUALITY SCORE (data type appropriateness + technical implementation)
    -- Focuses on technical implementation quality and appropriateness
    CAST((una.data_type_appropriateness_score * 0.6 +         -- 60% data type fit
          CASE WHEN aci.is_computed = 1 THEN 90                -- Computed columns: sophisticated
               WHEN aci.is_identity = 1 THEN 85                -- Identity columns: system-managed
               ELSE 70 END * 0.4                              -- Regular columns: standard baseline
         ) AS DECIMAL(5,2)) AS technical_quality_score,
    
    -- STRUCTURAL INTEGRITY SCORE (constraints and relationships)
    -- Measures data integrity controls and relationship management
    CAST(LEAST(cca.total_constraint_score, 100) AS DECIMAL(5,2)) AS structural_integrity_score,
    
    -- NAMING QUALITY SCORE (from naming analysis)
    CAST(una.naming_quality_score AS DECIMAL(5,2)) AS naming_quality_score,
    
    -- BUSINESS VALUE SCORE (from business context analysis)
    CAST(bca.business_value_score AS DECIMAL(5,2)) AS business_value_score,
    
    -- ====================================================================
    -- QUALITY LEVEL CLASSIFICATIONS
    -- ====================================================================
    -- Convert numeric scores to categorical levels for easier interpretation
    
    -- OVERALL QUALITY LEVEL
    -- Based on the weighted composite score calculated above
    CASE 
        WHEN (una.naming_quality_score * 0.20 + una.data_type_appropriateness_score * 0.25 + 
              LEAST(cca.total_constraint_score, 100) * 0.25 + bca.business_value_score * 0.30) >= 90 THEN 'EXCELLENT'
        WHEN (una.naming_quality_score * 0.20 + una.data_type_appropriateness_score * 0.25 + 
              LEAST(cca.total_constraint_score, 100) * 0.25 + bca.business_value_score * 0.30) >= 75 THEN 'GOOD'
        WHEN (una.naming_quality_score * 0.20 + una.data_type_appropriateness_score * 0.25 + 
              LEAST(cca.total_constraint_score, 100) * 0.25 + bca.business_value_score * 0.30) >= 60 THEN 'FAIR'
        ELSE 'POOR'
    END AS overall_quality_level,
    
    -- TECHNICAL QUALITY LEVEL
    -- Based on data type appropriateness score
    CASE 
        WHEN una.data_type_appropriateness_score >= 90 THEN 'EXCELLENT'
        WHEN una.data_type_appropriateness_score >= 75 THEN 'GOOD'
        WHEN una.data_type_appropriateness_score >= 60 THEN 'FAIR'
        ELSE 'POOR'
    END AS technical_quality_level,
    
    -- STRUCTURAL QUALITY LEVEL
    -- Based on constraint coverage score
    CASE 
        WHEN cca.total_constraint_score >= 75 THEN 'EXCELLENT'
        WHEN cca.total_constraint_score >= 50 THEN 'GOOD'
        WHEN cca.total_constraint_score >= 25 THEN 'FAIR'
        ELSE 'POOR'
    END AS structural_quality_level,
    
    -- NAMING QUALITY LEVEL
    -- Based on naming convention score
    CASE 
        WHEN una.naming_quality_score >= 90 THEN 'EXCELLENT'
        WHEN una.naming_quality_score >= 75 THEN 'GOOD'
        WHEN una.naming_quality_score >= 60 THEN 'FAIR'
        ELSE 'POOR'
    END AS naming_quality_level,
    
    -- ====================================================================
    -- SENSITIVITY AND RISK ANALYSIS
    -- ====================================================================
    
    -- DATA SENSITIVITY CATEGORY
    -- Maps PII risk assessment to sensitivity categories for governance
    CASE 
        WHEN una.pii_risk_assessment = 'CRITICAL_RISK' THEN 'HIGHLY_SENSITIVE'
        WHEN una.pii_risk_assessment = 'HIGH_RISK' THEN 'SENSITIVE'
        WHEN una.pii_risk_assessment = 'MEDIUM_RISK' THEN 'MODERATELY_SENSITIVE'
        WHEN una.pii_risk_assessment = 'LOW_RISK' THEN 'LOW_SENSITIVITY'
        ELSE 'PUBLIC'
    END AS data_sensitivity_category,
    
    -- POTENTIAL PII INDICATOR (from naming analysis)
    una.potential_pii_category AS potential_pii_indicator,
    
    -- PII RISK ASSESSMENT (from naming analysis)
    una.pii_risk_assessment,
    
    -- BUSINESS CRITICALITY (from business context analysis)
    bca.business_criticality,
    
    -- ====================================================================
    -- GOVERNANCE RECOMMENDATIONS
    -- ====================================================================
    -- Specific governance controls and requirements based on risk and sensitivity
    
    -- DATA CLASSIFICATION
    -- Formal data classification level for access control and handling procedures
    CASE 
        WHEN una.pii_risk_assessment = 'CRITICAL_RISK' THEN 'RESTRICTED'
        WHEN una.pii_risk_assessment = 'HIGH_RISK' THEN 'CONFIDENTIAL'
        WHEN una.pii_risk_assessment = 'MEDIUM_RISK' THEN 'SENSITIVE'
        WHEN bca.business_criticality = 'CRITICAL' THEN 'BUSINESS_CRITICAL'
        WHEN bca.business_criticality = 'HIGH' THEN 'BUSINESS_SENSITIVE'
        ELSE 'INTERNAL'
    END AS data_classification,
    
    -- ACCESS CONTROL RECOMMENDATION
    -- Recommended access control level based on sensitivity and risk
    CASE 
        WHEN una.pii_risk_assessment = 'CRITICAL_RISK' THEN 'HIGHLY_RESTRICTED'
        WHEN una.pii_risk_assessment = 'HIGH_RISK' THEN 'RESTRICTED'
        WHEN una.pii_risk_assessment = 'MEDIUM_RISK' THEN 'CONTROLLED'
        WHEN bca.business_criticality IN ('CRITICAL', 'HIGH') THEN 'MANAGED'
        ELSE 'STANDARD'
    END AS access_control_recommendation,
    
    -- ENCRYPTION RECOMMENDATION
    -- Encryption requirements based on sensitivity and regulatory needs
    CASE 
        WHEN una.pii_risk_assessment = 'CRITICAL_RISK' THEN 'MANDATORY'
        WHEN una.pii_risk_assessment = 'HIGH_RISK' THEN 'REQUIRED'
        WHEN una.pii_risk_assessment = 'MEDIUM_RISK' THEN 'RECOMMENDED'
        WHEN bca.business_criticality = 'CRITICAL' THEN 'RECOMMENDED'
        ELSE 'NOT_REQUIRED'
    END AS encryption_recommendation,
    
    -- MASKING RECOMMENDATION
    -- Data masking requirements for non-production environments
    CASE 
        WHEN una.pii_risk_assessment IN ('CRITICAL_RISK', 'HIGH_RISK', 'MEDIUM_RISK') THEN 'REQUIRED'
        WHEN bca.business_criticality = 'CRITICAL' THEN 'RECOMMENDED'
        ELSE 'NOT_REQUIRED'
    END AS masking_recommendation,
    
    -- MONITORING PRIORITY
    -- Access monitoring and audit logging requirements
    CASE 
        WHEN una.pii_risk_assessment = 'CRITICAL_RISK' THEN 'CONTINUOUS'
        WHEN una.pii_risk_assessment = 'HIGH_RISK' THEN 'HIGH'
        WHEN una.pii_risk_assessment = 'MEDIUM_RISK' OR bca.business_criticality = 'CRITICAL' THEN 'MEDIUM'
        ELSE 'STANDARD'
    END AS monitoring_priority,
    
    -- ====================================================================
    -- BUSINESS CONTEXT
    -- ====================================================================
    
    -- INFERRED PURPOSE
    -- Likely business purpose based on naming patterns and context
    CASE 
        WHEN una.potential_pii_category = 'SYSTEM_IDENTIFIER' THEN 'System identification and relationships'
        WHEN una.potential_pii_category = 'PERSONAL_NAME' THEN 'Individual identification and communication'
        WHEN una.potential_pii_category = 'CONTACT_INFO' THEN 'Communication and customer service'
        WHEN una.potential_pii_category = 'ADDRESS_INFO' THEN 'Location services and delivery'
        WHEN una.potential_pii_category = 'FINANCIAL_DATA' THEN 'Financial transactions and analysis'
        WHEN una.potential_pii_category = 'TEMPORAL_DATA' THEN 'Time tracking and audit trails'
        WHEN una.potential_pii_category = 'BUSINESS_METRIC' THEN 'Business analytics and reporting'
        WHEN una.potential_pii_category = 'STATUS_CONTROL' THEN 'Process control and workflow management'
        ELSE 'General business operations and data storage'
    END AS inferred_purpose,
    
    -- DOMAIN CATEGORY (from business context analysis)
    bca.domain_category,
    
    -- DATA LIFECYCLE STAGE (from business context analysis)
    bca.data_lifecycle_stage,
    
    -- ====================================================================
    -- COMPLIANCE AND REGULATORY
    -- ====================================================================
    
    -- REGULATORY SCOPE
    -- Applicable legal and regulatory frameworks based on data type and schema
    CASE 
        WHEN una.potential_pii_category = 'GOVERNMENT_LEGAL_ID' THEN 'GDPR Art.9, CCPA Sensitive PI, Government ID Laws'
        WHEN una.potential_pii_category = 'FINANCIAL_DATA' THEN 'PCI-DSS, SOX, GDPR, CCPA, Financial Regulations'
        WHEN una.potential_pii_category IN ('PERSONAL_NAME', 'CONTACT_INFO', 'ADDRESS_INFO') THEN 'GDPR, CCPA, Privacy Laws'
        WHEN una.potential_pii_category = 'HEALTH_DATA' THEN 'HIPAA, GDPR Art.9, Health Privacy Laws'
        WHEN aci.schema_name = 'HumanResources' THEN 'Employment Laws, GDPR, CCPA'
        WHEN aci.schema_name = 'Sales' THEN 'Consumer Protection, GDPR, CCPA'
        ELSE 'Standard Data Protection Laws'
    END AS regulatory_scope,
    
    -- RETENTION CATEGORY
    -- Data retention classification based on type and business context
    CASE 
        WHEN una.potential_pii_category = 'GOVERNMENT_LEGAL_ID' THEN 'LEGAL_RETENTION'
        WHEN una.potential_pii_category = 'FINANCIAL_DATA' THEN 'FINANCIAL_RETENTION'
        WHEN una.potential_pii_category IN ('PERSONAL_NAME', 'CONTACT_INFO') THEN 'CUSTOMER_RETENTION'
        WHEN aci.schema_name = 'HumanResources' THEN 'EMPLOYEE_RETENTION'
        WHEN aci.schema_name = 'Sales' THEN 'BUSINESS_RETENTION'
        ELSE 'STANDARD_RETENTION'
    END AS retention_category,
    
    -- COMPLIANCE RISK LEVEL
    -- Overall regulatory compliance risk assessment
    CASE 
        WHEN una.pii_risk_assessment = 'CRITICAL_RISK' THEN 'VERY_HIGH'
        WHEN una.pii_risk_assessment = 'HIGH_RISK' THEN 'HIGH'
        WHEN una.pii_risk_assessment = 'MEDIUM_RISK' THEN 'MEDIUM'
        ELSE 'LOW'
    END AS compliance_risk_level,
    
    -- ====================================================================
    -- ACTIONABLE RECOMMENDATIONS
    -- ====================================================================
    
    -- IMMEDIATE ACTIONS
    -- Urgent actions required based on risk level and quality issues
    CASE 
        WHEN una.pii_risk_assessment = 'CRITICAL_RISK' 
        THEN 'URGENT: Implement encryption, restrict access, enable audit logging'
        WHEN una.pii_risk_assessment = 'HIGH_RISK' 
        THEN 'HIGH PRIORITY: Apply data masking, implement access controls'
        WHEN una.naming_quality_score < 40 
        THEN 'IMPROVE: Review and improve column naming standards'
        WHEN cca.total_constraint_score < 20 
        THEN 'ENHANCE: Add appropriate constraints for data integrity'
        ELSE NULL  -- No immediate actions required
    END AS immediate_actions,
    
    -- RECOMMENDED IMPROVEMENTS
    -- Suggested improvements for better governance, quality, or performance
    CASE 
        WHEN una.data_type_appropriateness_score < 60 
        THEN 'Consider reviewing data type choice for better alignment with data content'
        WHEN cca.total_constraint_score < 40 
        THEN 'Add constraints (NOT NULL, CHECK, FK) to improve data quality'
        WHEN una.naming_quality_score < 70 
        THEN 'Improve naming convention adherence for better maintainability'
        WHEN cca.is_indexed = 0 AND (cca.is_foreign_key = 1 OR una.column_name LIKE '%id%')
        THEN 'Consider adding index for performance optimization'
        ELSE 'No immediate improvements required'
    END AS recommended_improvements,
    
    -- GOVERNANCE PRIORITY
    -- Priority level for data governance attention and resource allocation
    CASE 
        WHEN una.pii_risk_assessment IN ('CRITICAL_RISK', 'HIGH_RISK') THEN 'URGENT'
        WHEN una.pii_risk_assessment = 'MEDIUM_RISK' OR bca.business_criticality = 'CRITICAL' THEN 'HIGH'
        WHEN bca.business_criticality = 'HIGH' OR 
             (una.naming_quality_score < 50 AND cca.total_constraint_score < 30) THEN 'MEDIUM'
        ELSE 'LOW'
    END AS governance_priority,
    
    -- STEWARD ASSIGNMENT
    -- Suggested data steward type/role based on domain and sensitivity
    CASE 
        WHEN aci.schema_name = 'Sales' THEN 'SALES_DATA_STEWARD'
        WHEN aci.schema_name = 'Person' THEN 'CUSTOMER_DATA_STEWARD'
        WHEN aci.schema_name = 'HumanResources' THEN 'HR_DATA_STEWARD'
        WHEN aci.schema_name = 'Production' THEN 'OPERATIONS_DATA_STEWARD'
        WHEN una.pii_risk_assessment IN ('CRITICAL_RISK', 'HIGH_RISK') THEN 'PRIVACY_OFFICER'
        ELSE 'GENERAL_DATA_STEWARD'
    END AS steward_assignment

-- ====================================================================
-- JOIN ALL CTEs TO GET COMPLETE ANALYSIS
-- ====================================================================
FROM AllColumnsInventory aci
JOIN CompleteConstraintAnalysis cca ON aci.object_id = cca.object_id AND aci.column_id = cca.column_id
JOIN UniversalNamingAnalysis una ON aci.object_id = una.object_id AND aci.column_id = una.column_id
JOIN BusinessContextAnalysis bca ON aci.object_id = bca.object_id AND aci.column_id = bca.column_id

-- ====================================================================
-- ORDERING RESULTS BY GOVERNANCE PRIORITY
-- ====================================================================
-- Order by governance priority (most critical first), then by schema/table/column
-- This ensures the most important items appear first in the results
ORDER BY 
    -- Primary sort: Governance priority (most urgent first)
    CASE 
        WHEN una.pii_risk_assessment IN ('CRITICAL_RISK', 'HIGH_RISK') THEN 1
        WHEN una.pii_risk_assessment = 'MEDIUM_RISK' OR bca.business_criticality = 'CRITICAL' THEN 2
        WHEN bca.business_criticality = 'HIGH' THEN 3
        ELSE 4
    END,
    -- Secondary sort: Schema name (alphabetical)
    aci.schema_name,
    -- Tertiary sort: Table name (alphabetical)
    aci.table_name,
    -- Final sort: Column position in table (logical order)
    aci.ordinal_position;

-- ============================================================================
-- COMPLETION MESSAGES AND SUMMARY STATISTICS
-- ============================================================================

PRINT '============================================================================';
PRINT 'COMPLETE DATABASE COLUMN ANALYSIS - FINISHED';
PRINT '============================================================================';

-- Calculate and display summary statistics
-- This provides immediate feedback on analysis results without using variables
SELECT 
    'ANALYSIS SUMMARY' AS summary_type,
    COUNT(*) AS total_columns_analyzed,
    SUM(CASE WHEN pii_risk_assessment IN ('CRITICAL_RISK', 'HIGH_RISK') THEN 1 ELSE 0 END) AS high_risk_columns,
    SUM(CASE WHEN overall_quality_level = 'POOR' THEN 1 ELSE 0 END) AS poor_quality_columns,
    SUM(CASE WHEN governance_priority IN ('URGENT', 'HIGH') THEN 1 ELSE 0 END) AS priority_attention_needed
FROM #CompleteColumnAnalysis;

PRINT '============================================================================';

-- ============================================================================
-- EXECUTIVE SUMMARY DASHBOARD
-- ============================================================================
-- Comprehensive high-level view of all columns analyzed
-- Provides key metrics and distributions for executive reporting
SELECT 
    '=== EXECUTIVE SUMMARY - ALL COLUMNS ANALYSIS ===' AS summary_section,
    
    -- ====================================================================
    -- COVERAGE METRICS
    -- ====================================================================
    COUNT(*) AS total_columns_analyzed,                  -- Total number of columns processed
    COUNT(DISTINCT schema_name) AS schemas_analyzed,     -- Number of different schemas covered
    COUNT(DISTINCT schema_name + '.' + table_name) AS tables_analyzed, -- Number of tables covered
    
    -- ====================================================================
    -- QUALITY DISTRIBUTION ANALYSIS
    -- ====================================================================
    -- Shows how columns are distributed across quality levels
    SUM(CASE WHEN overall_quality_level = 'EXCELLENT' THEN 1 ELSE 0 END) AS excellent_quality_columns,
    SUM(CASE WHEN overall_quality_level = 'GOOD' THEN 1 ELSE 0 END) AS good_quality_columns,
    SUM(CASE WHEN overall_quality_level = 'FAIR' THEN 1 ELSE 0 END) AS fair_quality_columns,
    SUM(CASE WHEN overall_quality_level = 'POOR' THEN 1 ELSE 0 END) AS poor_quality_columns,
    
    -- Calculate quality percentages for better understanding
    CAST(SUM(CASE WHEN overall_quality_level = 'EXCELLENT' THEN 1 ELSE 0 END) * 100.0 / COUNT(*) AS DECIMAL(5,2)) AS excellent_quality_percentage,
    CAST(SUM(CASE WHEN overall_quality_level = 'POOR' THEN 1 ELSE 0 END) * 100.0 / COUNT(*) AS DECIMAL(5,2)) AS poor_quality_percentage,
    
    -- ====================================================================
    -- RISK DISTRIBUTION ANALYSIS
    -- ====================================================================
    -- Shows distribution of columns across risk levels for security planning
    SUM(CASE WHEN pii_risk_assessment = 'CRITICAL_RISK' THEN 1 ELSE 0 END) AS critical_risk_columns,
    SUM(CASE WHEN pii_risk_assessment = 'HIGH_RISK' THEN 1 ELSE 0 END) AS high_risk_columns,
    SUM(CASE WHEN pii_risk_assessment = 'MEDIUM_RISK' THEN 1 ELSE 0 END) AS medium_risk_columns,
    SUM(CASE WHEN pii_risk_assessment = 'LOW_RISK' THEN 1 ELSE 0 END) AS low_risk_columns,
    
    -- Calculate risk percentages
    CAST(SUM(CASE WHEN pii_risk_assessment IN ('CRITICAL_RISK', 'HIGH_RISK') THEN 1 ELSE 0 END) * 100.0 / COUNT(*) AS DECIMAL(5,2)) AS high_risk_percentage,
    
    -- ====================================================================
    -- GOVERNANCE PRIORITY DISTRIBUTION
    -- ====================================================================
    -- Shows workload distribution for governance teams
    SUM(CASE WHEN governance_priority = 'URGENT' THEN 1 ELSE 0 END) AS urgent_priority_columns,
    SUM(CASE WHEN governance_priority = 'HIGH' THEN 1 ELSE 0 END) AS high_priority_columns,
    SUM(CASE WHEN governance_priority = 'MEDIUM' THEN 1 ELSE 0 END) AS medium_priority_columns,
    SUM(CASE WHEN governance_priority = 'LOW' THEN 1 ELSE 0 END) AS low_priority_columns,
    
    -- ====================================================================
    -- AVERAGE QUALITY SCORES
    -- ====================================================================
    -- Provides baseline metrics for quality assessment
    CAST(AVG(overall_column_score) AS DECIMAL(5,2)) AS avg_overall_score,
    CAST(AVG(technical_quality_score) AS DECIMAL(5,2)) AS avg_technical_score,
    CAST(AVG(structural_integrity_score) AS DECIMAL(5,2)) AS avg_structural_score,
    CAST(AVG(naming_quality_score) AS DECIMAL(5,2)) AS avg_naming_score,
    CAST(AVG(business_value_score) AS DECIMAL(5,2)) AS avg_business_value_score,
    
    -- ====================================================================
    -- CONSTRAINT COVERAGE ANALYSIS
    -- ====================================================================
    -- Shows how well columns are protected by constraints
    SUM(CASE WHEN is_primary_key = 1 THEN 1 ELSE 0 END) AS primary_key_columns,
    SUM(CASE WHEN is_foreign_key = 1 THEN 1 ELSE 0 END) AS foreign_key_columns,
    SUM(CASE WHEN is_nullable = 0 THEN 1 ELSE 0 END) AS not_null_columns,
    SUM(CASE WHEN is_indexed = 1 THEN 1 ELSE 0 END) AS indexed_columns,
    
    -- Calculate constraint coverage percentages
    CAST(SUM(CASE WHEN is_nullable = 0 THEN 1 ELSE 0 END) * 100.0 / COUNT(*) AS DECIMAL(5,2)) AS not_null_coverage_percentage
    
FROM #CompleteColumnAnalysis;

-- ============================================================================
-- SCHEMA-LEVEL BREAKDOWN ANALYSIS
-- ============================================================================
-- Detailed analysis grouped by schema for targeted governance efforts
SELECT 
    '=== SCHEMA-LEVEL ANALYSIS BREAKDOWN ===' AS breakdown_section,
    schema_name,
    
    -- ====================================================================
    -- BASIC COVERAGE METRICS
    -- ====================================================================
    COUNT(*) AS total_columns,                           -- Total columns in this schema
    COUNT(DISTINCT table_name) AS total_tables,          -- Total tables in this schema
    
    -- ====================================================================
    -- QUALITY METRICS BY SCHEMA
    -- ====================================================================
    CAST(AVG(overall_column_score) AS DECIMAL(5,2)) AS avg_quality_score,
    CAST(AVG(technical_quality_score) AS DECIMAL(5,2)) AS avg_technical_score,
    CAST(AVG(structural_integrity_score) AS DECIMAL(5,2)) AS avg_structural_score,
    CAST(AVG(naming_quality_score) AS DECIMAL(5,2)) AS avg_naming_score,
    
    -- Quality level distribution by schema
    SUM(CASE WHEN overall_quality_level = 'EXCELLENT' THEN 1 ELSE 0 END) AS excellent_quality_count,
    SUM(CASE WHEN overall_quality_level = 'GOOD' THEN 1 ELSE 0 END) AS good_quality_count,
    SUM(CASE WHEN overall_quality_level = 'FAIR' THEN 1 ELSE 0 END) AS fair_quality_count,
    SUM(CASE WHEN overall_quality_level = 'POOR' THEN 1 ELSE 0 END) AS poor_quality_count,
    
    -- ====================================================================
    -- RISK METRICS BY SCHEMA
    -- ====================================================================
    SUM(CASE WHEN pii_risk_assessment IN ('CRITICAL_RISK', 'HIGH_RISK') THEN 1 ELSE 0 END) AS high_risk_count,
    SUM(CASE WHEN pii_risk_assessment = 'CRITICAL_RISK' THEN 1 ELSE 0 END) AS critical_risk_count,
    SUM(CASE WHEN governance_priority IN ('URGENT', 'HIGH') THEN 1 ELSE 0 END) AS priority_attention_needed,
    
    -- Risk percentage within schema
    CAST(SUM(CASE WHEN pii_risk_assessment IN ('CRITICAL_RISK', 'HIGH_RISK') THEN 1 ELSE 0 END) * 100.0 / COUNT(*) AS DECIMAL(5,2)) AS high_risk_percentage,
    
    -- ====================================================================
    -- DATA TYPE DISTRIBUTION BY SCHEMA
    -- ====================================================================
    -- Shows what types of data each schema contains
    SUM(CASE WHEN data_type_category = 'TEXT' THEN 1 ELSE 0 END) AS text_columns,
    SUM(CASE WHEN data_type_category = 'INTEGER' THEN 1 ELSE 0 END) AS integer_columns,
    SUM(CASE WHEN data_type_category = 'NUMERIC' THEN 1 ELSE 0 END) AS numeric_columns,
    SUM(CASE WHEN data_type_category = 'TEMPORAL' THEN 1 ELSE 0 END) AS temporal_columns,
    SUM(CASE WHEN data_type_category = 'BOOLEAN' THEN 1 ELSE 0 END) AS boolean_columns,
    SUM(CASE WHEN data_type_category = 'IDENTIFIER' THEN 1 ELSE 0 END) AS identifier_columns,
    
    -- ====================================================================
    -- CONSTRAINT ANALYSIS BY SCHEMA
    -- ====================================================================
    -- Shows data integrity controls per schema
    SUM(CASE WHEN is_primary_key = 1 THEN 1 ELSE 0 END) AS primary_key_columns,
    SUM(CASE WHEN is_foreign_key = 1 THEN 1 ELSE 0 END) AS foreign_key_columns,
    SUM(CASE WHEN is_nullable = 0 THEN 1 ELSE 0 END) AS not_null_columns,
    SUM(CASE WHEN is_indexed = 1 THEN 1 ELSE 0 END) AS indexed_columns,
    SUM(CASE WHEN has_default_constraint = 1 THEN 1 ELSE 0 END) AS default_constraint_columns,
    
    -- Constraint coverage percentage for this schema
    CAST(SUM(CASE WHEN is_nullable = 0 THEN 1 ELSE 0 END) * 100.0 / COUNT(*) AS DECIMAL(5,2)) AS not_null_coverage_pct,
    
    -- ====================================================================
    -- BUSINESS CRITICALITY BY SCHEMA
    -- ====================================================================
    SUM(CASE WHEN business_criticality = 'CRITICAL' THEN 1 ELSE 0 END) AS critical_business_columns,
    SUM(CASE WHEN business_criticality = 'HIGH' THEN 1 ELSE 0 END) AS high_business_columns,
    CAST(AVG(business_value_score) AS DECIMAL(5,2)) AS avg_business_value_score
    
FROM #CompleteColumnAnalysis
GROUP BY schema_name
-- Order by schemas needing most attention first
ORDER BY 
    high_risk_count DESC,                                -- Schemas with most high-risk columns first
    avg_quality_score ASC,                               -- Then by lowest quality scores
    schema_name;                                         -- Finally alphabetical

-- ============================================================================
-- DATA TYPE ANALYSIS DASHBOARD
-- ============================================================================
-- Analysis of data type usage patterns and appropriateness
SELECT 
    '=== DATA TYPE DISTRIBUTION AND QUALITY ANALYSIS ===' AS data_type_section,
    data_type_category,
    
    -- ====================================================================
    -- DATA TYPE USAGE STATISTICS
    -- ====================================================================
    COUNT(*) AS total_columns,                           -- How many columns use this data type category
    COUNT(DISTINCT schema_name) AS schemas_using,        -- How many schemas use this type
    COUNT(DISTINCT schema_name + '.' + table_name) AS tables_using, -- How many tables use this type
    
    -- ====================================================================
    -- QUALITY METRICS BY DATA TYPE
    -- ====================================================================
    CAST(AVG(overall_column_score) AS DECIMAL(5,2)) AS avg_overall_score,
    CAST(AVG(technical_quality_score) AS DECIMAL(5,2)) AS avg_technical_score,
    CAST(AVG(structural_integrity_score) AS DECIMAL(5,2)) AS avg_structural_score,
    CAST(AVG(naming_quality_score) AS DECIMAL(5,2)) AS avg_naming_score,
    
    -- ====================================================================
    -- DATA TYPE APPROPRIATENESS
    -- ====================================================================
    -- Shows how well data types are chosen for their purpose
    CAST(AVG(CASE WHEN technical_quality_level = 'EXCELLENT' THEN 100
                  WHEN technical_quality_level = 'GOOD' THEN 85
                  WHEN technical_quality_level = 'FAIR' THEN 70
                  ELSE 40 END) AS DECIMAL(5,2)) AS avg_appropriateness_score,
    
    -- ====================================================================
    -- RISK ANALYSIS BY DATA TYPE
    -- ====================================================================
    SUM(CASE WHEN pii_risk_assessment IN ('CRITICAL_RISK', 'HIGH_RISK') THEN 1 ELSE 0 END) AS high_risk_columns,
    CAST(SUM(CASE WHEN pii_risk_assessment IN ('CRITICAL_RISK', 'HIGH_RISK') THEN 1 ELSE 0 END) * 100.0 / COUNT(*) AS DECIMAL(5,2)) AS high_risk_percentage,
    
    -- ====================================================================
    -- CONSTRAINT COVERAGE BY DATA TYPE
    -- ====================================================================
    SUM(CASE WHEN is_nullable = 0 THEN 1 ELSE 0 END) AS not_null_columns,
    SUM(CASE WHEN is_primary_key = 1 THEN 1 ELSE 0 END) AS primary_key_columns,
    SUM(CASE WHEN is_foreign_key = 1 THEN 1 ELSE 0 END) AS foreign_key_columns,
    SUM(CASE WHEN is_indexed = 1 THEN 1 ELSE 0 END) AS indexed_columns,
    
    -- Constraint coverage percentage for this data type
    CAST(SUM(CASE WHEN is_nullable = 0 THEN 1 ELSE 0 END) * 100.0 / COUNT(*) AS DECIMAL(5,2)) AS not_null_coverage_pct
    
FROM #CompleteColumnAnalysis
GROUP BY data_type_category
-- Order by most common data types first, then by risk level
ORDER BY 
    COUNT(*) DESC,                                       -- Most common data types first
    high_risk_percentage DESC;                           -- Then by risk level

-- ============================================================================
-- COMPLETE ALL COLUMNS REPORT - EVERY COLUMN IN DATABASE
-- ============================================================================
-- PURPOSE: This report displays EVERY column analyzed in the database
-- SCOPE: Universal coverage - no filtering applied, all columns included
-- USE CASE: Complete governance review, comprehensive audit, systematic analysis
-- OUTPUT: Detailed analysis of every single column with full metadata and recommendations
--
-- REPORT FEATURES:
-- - Every column in every table across all schemas
-- - Complete technical metadata and properties
-- - All quality scores and classifications
-- - Full risk and sensitivity analysis
-- - Comprehensive governance recommendations
-- - Business context and compliance information
-- - Sorted systematically for review (schema → table → column position)
--
-- INTENDED USERS: 
-- - Data governance teams conducting comprehensive reviews
-- - Database administrators performing complete audits
-- - Compliance officers reviewing all data elements
-- - Data architects analyzing entire database structure
-- ============================================================================
SELECT 
    '=== COMPLETE ALL COLUMNS ANALYSIS - EVERY COLUMN ===' AS complete_section,
    
    -- ====================================================================
    -- COLUMN IDENTIFICATION SECTION
    -- ====================================================================
    -- Provides unique identification and positioning information for each column
    -- Enables easy reference and navigation through the complete dataset
    
    ROW_NUMBER() OVER (ORDER BY schema_name, table_name, ordinal_position) AS row_number, -- Sequential numbering for reference
    schema_name,                                         -- Database schema containing the table
    table_name,                                          -- Table containing the column
    column_name,                                         -- Name of the column
    ordinal_position,                                    -- Position of column within table (1, 2, 3, etc.)
    full_column_name,                                    -- Fully qualified name (database.schema.table.column)
    
    -- ====================================================================
    -- TECHNICAL DETAILS SECTION
    -- ====================================================================
    -- Complete technical metadata about the column's data type and characteristics
    -- Essential for understanding storage, performance, and compatibility implications
    
    data_type_category,                                  -- High-level data type category (TEXT, INTEGER, NUMERIC, etc.)
    base_data_type,                                      -- SQL Server base data type (varchar, int, datetime, etc.)
    formatted_data_type,                                 -- Full type specification with length/precision (varchar(50), decimal(10,2))
    character_set_name,                                  -- Character encoding for text types (ASCII, UNICODE, or NULL)
    
    -- ====================================================================
    -- COLUMN PROPERTIES SECTION
    -- ====================================================================
    -- Binary properties converted to human-readable YES/NO format
    -- Shows structural characteristics and constraints applied to each column
    
    -- NULL CONSTRAINT ANALYSIS
    -- Indicates whether column allows NULL values (important for data quality)
    CASE WHEN is_nullable = 1 THEN 'YES' ELSE 'NO' END AS allows_nulls,
    
    -- SPECIAL COLUMN TYPES
    -- Identity columns are auto-incrementing, computed columns are formula-based
    CASE WHEN is_identity = 1 THEN 'YES' ELSE 'NO' END AS is_identity,
    CASE WHEN is_computed = 1 THEN 'YES' ELSE 'NO' END AS is_computed,
    
    -- CONSTRAINT PARTICIPATION
    -- Shows which types of constraints this column participates in
    CASE WHEN is_primary_key = 1 THEN 'YES' ELSE 'NO' END AS is_primary_key,     -- Uniquely identifies table rows
    CASE WHEN is_foreign_key = 1 THEN 'YES' ELSE 'NO' END AS is_foreign_key,     -- References another table
    CASE WHEN is_unique_key = 1 THEN 'YES' ELSE 'NO' END AS is_unique_key,       -- Unique values (non-PK)
    
    -- PERFORMANCE AND INTEGRITY FEATURES
    -- Indexes improve query performance, constraints ensure data quality
    CASE WHEN is_indexed = 1 THEN 'YES' ELSE 'NO' END AS has_index,              -- Has index for performance
    CASE WHEN has_default_constraint = 1 THEN 'YES' ELSE 'NO' END AS has_default, -- Has default value
    CASE WHEN has_check_constraint = 1 THEN 'YES' ELSE 'NO' END AS has_check_constraint, -- Has business rule validation
    
    -- ====================================================================
    -- QUALITY SCORES SECTION (ALL COLUMNS)
    -- ====================================================================
    -- Comprehensive quality assessment across multiple dimensions
    -- All scores are on 0-100 scale for consistency and comparability
    
    -- OVERALL QUALITY METRICS
    -- Composite score combining all quality factors with weighted importance
    overall_column_score,                                -- Primary quality indicator (weighted composite)
    overall_quality_level,                               -- Categorical level: EXCELLENT/GOOD/FAIR/POOR
    
    -- TECHNICAL QUALITY METRICS  
    -- Focuses on technical implementation and data type appropriateness
    technical_quality_score,                             -- Technical implementation quality (0-100)
    technical_quality_level,                             -- Technical quality category
    
    -- STRUCTURAL QUALITY METRICS
    -- Evaluates data integrity controls and relationship management
    structural_integrity_score,                          -- Constraint coverage and relationships (0-100)
    structural_quality_level,                            -- Structural quality category
    
    -- NAMING QUALITY METRICS
    -- Assesses naming convention adherence and descriptiveness
    naming_quality_score,                                -- Naming convention quality (0-100)
    naming_quality_level,                                -- Naming quality category
    
    -- BUSINESS VALUE METRICS
    -- Measures inferred business importance and operational relevance
    business_value_score,                                -- Business importance and value (0-100)
    
    -- ====================================================================
    -- SENSITIVITY AND RISK ANALYSIS SECTION (ALL COLUMNS)
    -- ====================================================================
    -- Security, privacy, and compliance risk assessment for every column
    -- Critical for determining protection requirements and access controls
    
    -- DATA SENSITIVITY CLASSIFICATION
    -- Overall sensitivity level based on content type and business context
    data_sensitivity_category,                           -- HIGHLY_SENSITIVE/SENSITIVE/MODERATELY_SENSITIVE/LOW_SENSITIVITY/PUBLIC
    
    -- PII (PERSONALLY IDENTIFIABLE INFORMATION) ANALYSIS
    -- Automated detection of potential personal data for privacy compliance
    potential_pii_indicator,                             -- Detected PII category (PERSONAL_NAME, CONTACT_INFO, etc.)
    pii_risk_assessment,                                 -- PII risk level (CRITICAL_RISK/HIGH_RISK/MEDIUM_RISK/LOW_RISK/MINIMAL_RISK)
    
    -- BUSINESS AND COMPLIANCE CRITICALITY
    -- Assessment of business importance and regulatory compliance requirements
    business_criticality,                                -- Business importance (CRITICAL/HIGH/MEDIUM/STANDARD)
    compliance_risk_level,                               -- Regulatory compliance risk (VERY_HIGH/HIGH/MEDIUM/LOW)
    
    -- ====================================================================
    -- GOVERNANCE RECOMMENDATIONS SECTION (ALL COLUMNS)
    -- ====================================================================
    -- Specific governance controls and requirements based on risk assessment
    -- Provides actionable guidance for data protection and compliance
    
    -- DATA CLASSIFICATION AND ACCESS CONTROL
    -- Formal classification level determining handling requirements
    data_classification,                                 -- RESTRICTED/CONFIDENTIAL/SENSITIVE/BUSINESS_CRITICAL/INTERNAL
    access_control_recommendation,                       -- HIGHLY_RESTRICTED/RESTRICTED/CONTROLLED/MANAGED/STANDARD
    
    -- TECHNICAL PROTECTION REQUIREMENTS
    -- Specific technical controls needed based on sensitivity and risk
    encryption_recommendation,                           -- MANDATORY/REQUIRED/RECOMMENDED/NOT_REQUIRED
    masking_recommendation,                              -- Data masking for non-production (REQUIRED/RECOMMENDED/NOT_REQUIRED)
    monitoring_priority,                                 -- Access monitoring level (CONTINUOUS/HIGH/MEDIUM/STANDARD)
    
    -- GOVERNANCE PRIORITY AND WORKFLOW
    -- Priority level for governance team attention and resource allocation
    governance_priority,                                 -- URGENT/HIGH/MEDIUM/LOW (determines work prioritization)
    
    -- ====================================================================
    -- BUSINESS CONTEXT SECTION (ALL COLUMNS)
    -- ====================================================================
    -- Business domain classification and operational context
    -- Helps organize governance responsibilities and stewardship assignments
    
    -- BUSINESS DOMAIN AND PURPOSE
    -- Classification of business function and operational purpose
    domain_category,                                     -- SALES_REVENUE/CUSTOMER_DATA/EMPLOYEE_DATA/OPERATIONS/etc.
    inferred_purpose,                                    -- Likely business purpose based on analysis
    data_lifecycle_stage,                                -- ACTIVE_TRANSACTIONAL/ACTIVE_REFERENCE/HISTORICAL_ARCHIVE/TEMPORARY
    
    -- DATA STEWARDSHIP ASSIGNMENT
    -- Recommended steward type based on domain and sensitivity
    steward_assignment,                                  -- SALES_DATA_STEWARD/CUSTOMER_DATA_STEWARD/PRIVACY_OFFICER/etc.
    
    -- ====================================================================
    -- COMPLIANCE AND REGULATORY SECTION (ALL COLUMNS)
    -- ====================================================================
    -- Legal and regulatory framework applicability
    -- Essential for compliance planning and risk management
    
    -- REGULATORY FRAMEWORK MAPPING
    -- Applicable laws and regulations based on data type and business context
    regulatory_scope,                                    -- GDPR, CCPA, HIPAA, PCI-DSS, Employment Laws, etc.
    retention_category,                                  -- Data retention classification and requirements
    
    -- ====================================================================
    -- RECOMMENDATIONS SECTION (ALL COLUMNS)
    -- ====================================================================
    -- Specific, actionable guidance for each column
    -- Drives governance activities and improvement initiatives
    
    -- IMMEDIATE ACTION REQUIREMENTS
    -- Urgent actions needed for high-risk or problematic columns
    -- NULL indicates no immediate action required
    ISNULL(immediate_actions, 'No immediate actions required') AS immediate_actions,
    
    -- IMPROVEMENT RECOMMENDATIONS
    -- Suggested enhancements for better governance, quality, or performance
    recommended_improvements,                            -- Specific improvement suggestions
    
    -- ====================================================================
    -- ANALYSIS METADATA
    -- ====================================================================
    -- Information about when and how this analysis was performed
    -- Useful for tracking analysis history and determining when refresh is needed
    analysis_timestamp                                   -- When this analysis was performed

FROM #CompleteColumnAnalysis

-- ====================================================================
-- NO FILTERING APPLIED - COMPLETE COVERAGE GUARANTEED
-- ====================================================================
-- IMPORTANT: This query includes EVERY column analyzed with no WHERE clause
-- This ensures complete coverage and comprehensive governance review
-- Use this report for:
-- - Complete database audits
-- - Comprehensive governance assessments  
-- - Systematic quality reviews
-- - Full compliance evaluations
-- - Complete stewardship planning

-- ====================================================================
-- ORDERING STRATEGY FOR SYSTEMATIC REVIEW
-- ====================================================================
-- Results are ordered to facilitate systematic, comprehensive review
-- Grouping by schema and table enables logical progression through database
ORDER BY 
    schema_name,                                         -- Group by schema for domain-based review
    table_name,                                          -- Then by table for entity-based analysis
    ordinal_position;                                    -- Finally by column position for table structure review

-- ============================================================================
-- TOP PRIORITY COLUMNS REQUIRING IMMEDIATE ATTENTION
-- ============================================================================
-- PURPOSE: Filtered subset focusing on columns requiring urgent governance attention
-- SCOPE: High-priority items only - columns with quality issues, risk concerns, or urgent needs
-- USE CASE: Action-oriented governance work, immediate attention items, resource prioritization
-- AUDIENCE: Data stewards, governance teams, administrators needing focused action lists
--
-- FILTERING CRITERIA:
-- - Governance priority: URGENT or HIGH
-- - Quality level: POOR (needs improvement)
-- - Risk level: CRITICAL_RISK or HIGH_RISK (security/privacy concerns)
-- - Business criticality: CRITICAL (business-critical data)
-- - Action required: Has immediate actions specified
--
-- REPORT PURPOSE:
-- - Prioritize governance work and resource allocation
-- - Focus attention on most critical data governance needs
-- - Provide action-oriented recommendations for immediate implementation
-- - Enable efficient triaging of governance workload
-- ============================================================================
SELECT 
    '=== TOP PRIORITY COLUMNS - IMMEDIATE ATTENTION REQUIRED ===' AS priority_section,
    
    -- ====================================================================
    -- COLUMN IDENTIFICATION
    -- ====================================================================
    schema_name,
    table_name,
    column_name,
    data_type_category,
    formatted_data_type,
    ordinal_position,                                    -- Position in table for context
    
    -- ====================================================================
    -- KEY QUALITY AND RISK METRICS
    -- ====================================================================
    overall_column_score,
    overall_quality_level,
    pii_risk_assessment,
    governance_priority,
    business_criticality,
    
    -- ====================================================================
    -- DETAILED RISK INDICATORS
    -- ====================================================================
    potential_pii_indicator,                             -- What type of PII detected
    data_sensitivity_category,                           -- Sensitivity classification
    compliance_risk_level,                               -- Regulatory risk level
    
    -- ====================================================================
    -- TECHNICAL QUALITY INDICATORS
    -- ====================================================================
    technical_quality_level,
    structural_quality_level,
    naming_quality_level,
    
    -- ====================================================================
    -- CONSTRAINT AND STRUCTURE ANALYSIS
    -- ====================================================================
    CASE WHEN is_nullable = 1 THEN 'YES' ELSE 'NO' END AS allows_nulls,
    CASE WHEN is_primary_key = 1 THEN 'YES' ELSE 'NO' END AS is_primary_key,
    CASE WHEN is_foreign_key = 1 THEN 'YES' ELSE 'NO' END AS is_foreign_key,
    CASE WHEN is_indexed = 1 THEN 'YES' ELSE 'NO' END AS has_index,
    CASE WHEN has_default_constraint = 1 THEN 'YES' ELSE 'NO' END AS has_default,
    
    -- ====================================================================
    -- GOVERNANCE RECOMMENDATIONS
    -- ====================================================================
    data_classification,                                 -- Required data classification level
    access_control_recommendation,                       -- Access control requirements
    encryption_recommendation,                           -- Encryption requirements
    masking_recommendation,                              -- Data masking requirements
    monitoring_priority,                                 -- Monitoring requirements
    
    -- ====================================================================
    -- BUSINESS CONTEXT
    -- ====================================================================
    domain_category,                                     -- Business domain
    inferred_purpose,                                    -- Likely business purpose
    steward_assignment,                                  -- Suggested data steward
    
    -- ====================================================================
    -- ACTIONABLE RECOMMENDATIONS
    -- ====================================================================
    immediate_actions,                                   -- Urgent actions needed
    recommended_improvements,                            -- Suggested improvements
    regulatory_scope                                     -- Applicable regulations
    
FROM #CompleteColumnAnalysis

-- ====================================================================
-- FILTERING CRITERIA FOR HIGH PRIORITY ITEMS
-- ====================================================================
-- Only show columns that require governance attention or have quality issues
WHERE 
    governance_priority IN ('URGENT', 'HIGH')           -- High governance priority items
    OR overall_quality_level = 'POOR'                   -- Poor quality columns
    OR pii_risk_assessment IN ('CRITICAL_RISK', 'HIGH_RISK') -- High-risk PII columns
    OR business_criticality = 'CRITICAL'                -- Business-critical columns
    OR immediate_actions IS NOT NULL                     -- Columns needing immediate action

-- ====================================================================
-- ORDERING FOR MAXIMUM IMPACT
-- ====================================================================
-- Order by priority so most urgent items appear first
ORDER BY 
    -- Primary sort: Governance priority (most urgent first)
    CASE governance_priority 
        WHEN 'URGENT' THEN 1 
        WHEN 'HIGH' THEN 2 
        WHEN 'MEDIUM' THEN 3
        ELSE 4 
    END,
    -- Secondary sort: PII risk level (most risky first)
    CASE pii_risk_assessment
        WHEN 'CRITICAL_RISK' THEN 1
        WHEN 'HIGH_RISK' THEN 2
        WHEN 'MEDIUM_RISK' THEN 3
        ELSE 4
    END,
    -- Tertiary sort: Overall quality (worst first)
    overall_column_score ASC,
    -- Final sorts: Schema, table, column for consistency
    schema_name,
    table_name,
    ordinal_position;

-- ============================================================================
-- STEWARD ASSIGNMENT SUMMARY
-- ============================================================================
-- Workload distribution for data steward teams
SELECT 
    '=== DATA STEWARD WORKLOAD DISTRIBUTION ===' AS steward_section,
    steward_assignment,
    
    -- Workload metrics
    COUNT(*) AS total_columns_assigned,
    SUM(CASE WHEN governance_priority IN ('URGENT', 'HIGH') THEN 1 ELSE 0 END) AS high_priority_items,
    SUM(CASE WHEN pii_risk_assessment IN ('CRITICAL_RISK', 'HIGH_RISK') THEN 1 ELSE 0 END) AS high_risk_items,
    SUM(CASE WHEN overall_quality_level = 'POOR' THEN 1 ELSE 0 END) AS poor_quality_items,
    
    -- Schema distribution for each steward type
    COUNT(DISTINCT schema_name) AS schemas_involved,
    COUNT(DISTINCT schema_name + '.' + table_name) AS tables_involved,
    
    -- Average quality scores for steward areas
    CAST(AVG(overall_column_score) AS DECIMAL(5,2)) AS avg_quality_score,
    CAST(AVG(business_value_score) AS DECIMAL(5,2)) AS avg_business_value_score
    
FROM #CompleteColumnAnalysis
GROUP BY steward_assignment
ORDER BY 
    high_priority_items DESC,                            -- Stewards with most urgent work first
    total_columns_assigned DESC;                         -- Then by total workload

PRINT '============================================================================';
PRINT 'COMPLETE ANALYSIS AVAILABLE IN #CompleteColumnAnalysis TABLE';
PRINT 'Use SELECT * FROM #CompleteColumnAnalysis for full details';
PRINT 'Use WHERE clauses to filter by specific criteria (schema, quality, risk, etc.)';
PRINT '============================================================================';

Documentation

These documents are specifically required for demonstrating due diligence in data classification, meeting regulatory compliance obligations, and implementing risk-appropriate controls based on sensitivity analysis results:

Executive Summary

Analysis Overview Report - High-level metrics showing total columns analyzed, quality distribution, risk assessment summary, and key findings for leadership review.

Risk Assessment Dashboard - Summary of PII detection results, sensitivity classifications, and compliance risk levels across all schemas and tables.

Quality Metrics Summary - Overall database health indicators including average trust scores, constraint coverage, and areas needing improvement.

Classification Documentation

Data Classification Matrix - Formal classification levels assigned to each column (Restricted, Confidential, Sensitive, Internal, Public) with automated detection methodology and manual review processes.

PII Detection Report - Detailed findings of personally identifiable information including detection patterns, confidence scores, risk classifications, and false positive/negative analysis.

Sensitivity Assessment Results - Column-by-column sensitivity categorization with business context, risk justification, and escalation criteria for high-sensitivity data.

Risk & Compliance Documentation

Risk Assessment Dashboard - Summary of data exposure risks by sensitivity level, including critical/high-risk data locations and vulnerability assessment.

Regulatory Mapping Report - Links between classified data elements and specific regulatory requirements (GDPR Article 6 vs Article 9, CCPA personal information categories, HIPAA PHI classifications).

Data Protection Impact Assessment (DPIA) - Required documentation for high-risk personal data processing under GDPR, based on classification results.

Access & Security Documentation

Access Control Requirements Matrix - Detailed access restrictions, encryption mandates, and technical controls required for each classification level.

Data Masking Specifications - Requirements for anonymizing/pseudonymizing sensitive data in non-production environments based on classification.

Monitoring and Audit Requirements - Specific logging, access monitoring, and audit trail requirements for each sensitivity category.

Validation & Quality Documentation

Classification Accuracy Report - Validation of automated classification results including manual review findings, accuracy metrics, and refinement recommendations.

Sensitivity Escalation Procedures - Process documentation for handling misclassified data, sensitivity level changes, and classification disputes.

False Positive/Negative Analysis - Assessment of classification errors with pattern refinement and algorithm improvement recommendations.

PreviousDatabase Inventory NextBusiness Function to Data Mapping

Last updated 2 days ago

Was this helpful?