Completeness Rule
Completeness Rule
In this series of detailed workshops we're going to:
Understand the business value and ROI of implementing business rules
Master the 7 Data Quality Dimensions in PDC
Create and configure business rules with detailed step-by-step guidance
Implement rule actions (Status, Tags, Webhooks) and understand their purpose
Organize rules using rule groups for operational efficiency
Monitor and interpret rule execution results for continuous improvement
Completeness rules are the easiest to understand and often have the highest immediate business impact. Missing data is obvious and actionable.
Accessing your Catalog
To get started using the Data Catalog, log in using the address and credentials provided by your Data Catalog service user or administrator.
To access your catalog, please follow these steps:
Open Google Chrome web browser.
Navigate to:
Enter following email and password, then click Sign In.
Username: [email protected] (mapped to Business Steward role)
Password: Welcome123!
Title
Problem: AdventureWorks needs customer titles (Mr., Ms., Dr., etc.) for:
Personalized communication and correspondence
Formal business letters and documents
Professional customer relationship management
Marketing segmentation by demographics
Cultural and social protocol adherence
Business Impact: If 40% of customers lack title information:
Inability to personalize communications properly (generic "Dear Customer" messages)
Risk of offending customers with incorrect forms of address
Reduced effectiveness of targeted marketing campaigns
Poor customer experience in formal communications
Missed opportunity to demonstrate attention to detail and professionalism
Goal: Ensure at least 70% of active customers have title information on file.
Business Rule Configuration:
SQL Query:
-- =============================================================================
-- TITLE COMPLETENESS CHECK
-- Purpose: Find persons who are missing title information (Mr., Ms., Dr., etc.)
-- Target Table: Person.Person
-- Referenced Tables: Person.Person
-- =============================================================================
SELECT
-- total_count: Total number of persons in scope (denominator for percentage)
-- COUNT(*) counts all rows that match the WHERE clause
-- This represents our universe of persons we expect to have titles
COUNT(*) AS total_count,
-- scopeCount: Number of persons who HAVE title information (passes quality check)
-- COUNT(p.Title) only counts non-NULL values
-- SQL's COUNT() function ignores NULL values by design
-- This is the "compliant" count - persons with complete title data
COUNT(p.Title) AS scopeCount,
-- nonCompliant: Number of persons WITHOUT title information (fails quality check)
-- Uses CASE statement to count NULLs explicitly
-- WHEN Title IS NULL THEN 1: adds 1 for each missing title
-- ELSE 0: adds 0 for each existing title
-- SUM totals all the 1s to get count of missing titles
SUM(CASE WHEN p.Title IS NULL THEN 1 ELSE 0 END) AS nonCompliant
FROM Person.Person p
WHERE
-- Filter: Only check specific person types that require titles
-- SC = Store Contact, IN = Individual Customer, SP = Sales Person, EM = Employee
-- These are customer-facing or professional roles where titles are important
-- Excludes types like 'VC' (Vendor Contact) who may not need titles tracked
p.PersonType IN ('SC', 'IN', 'SP', 'EM')
-- =============================================================================
-- QUERY EXPLANATION FOR PDC:
-- - PDC requires exactly 3 columns: total_count, scopeCount, nonCompliant
-- - All column names must match exactly (case-sensitive in some databases)
-- - COUNT(column) automatically excludes NULLs - this is SQL standard behavior
-- - SUM(CASE...) pattern explicitly counts NULL values for nonCompliant
--
-- RESULTS INTERPRETATION:
-- - If total_count=10000, scopeCount=7000, nonCompliant=3000
-- - Then: 70% have titles (7000/10000), 30% missing (3000/10000)
-- - This would PASS if threshold is ≥70%, FAIL if threshold is ≥75%
--
-- WHY TITLE COMPLETENESS MATTERS:
-- - Personalization drives engagement (studies show 20-30% lift in response)
-- - Proper forms of address show respect and professionalism
-- - Titles indicate professional status (Dr., Prof.) important for B2B
-- - Cultural sensitivity: some cultures place high importance on titles
--
-- TROUBLESHOOTING:
-- - If nonCompliant=0: Check PersonType filter values
-- - If total_count seems low: Verify ModifiedDate filter isn't too restrictive
-- - If total_count != scopeCount + nonCompliant: Check for data type issues
-- - Common titles in data: Mr., Mrs., Ms., Dr., Prof., Sr., Sra.
-- =============================================================================In left menu, Click on: 'Data Operations' option -> Business Rules

In the Business Rules Tile, Click on: 'Business Rules'

Click on: Add Business Rule.
Enter the following details:
BUSINESS RULE NAME
Customer Title Completeness
DESCRIPTION
Goal: 70% completeness for customer titles (Mr., Ms., Dr., etc.)
NOTE
This rule checks Person.Person table for missing Title field. Titles are important for personalized communications and professional correspondence.
CUSTOM TAGS
completeness, customer-data, personalization
RULE ENABLED
☑ (checked)
RULE APPROVED
☑ (checked)

Click: 'CREATE BUSINESS RULE'.
Click: 'CONFIGURE'.

x
x
x
x
Missing Email Addresses
Problem: AdventureWorks needs customer email addresses for:
Order confirmations
Marketing campaigns
Customer service communications
Password reset functionality
Business Impact: If 10% of customers lack emails:
Can't send order confirmations (poor customer experience)
Marketing campaigns reach only 90% of intended audience
Lost revenue opportunity
Goal: Ensure at least 98% of active customers have email addresses on file.
Business Rule Configuration:
SQL Query:
-- =============================================================================
-- EMAIL COMPLETENESS CHECK
-- Purpose: Find customers who are missing email addresses
-- Target Table: Person.EmailAddress
-- Referenced Tables: Person.Person, Person.EmailAddress
-- =============================================================================
SELECT
-- total_count: Total number of persons in scope (denominator for percentage)
-- COUNT(DISTINCT ...) ensures each person counted only once
-- Using DISTINCT handles edge cases where data issues might create duplicates
COUNT(DISTINCT p.BusinessEntityID) AS total_count,
-- scopeCount: Number of persons who HAVE email addresses (passes quality check)
-- Only counts persons who exist in both Person and EmailAddress tables
-- This is the "compliant" count
COUNT(DISTINCT e.BusinessEntityID) AS scopeCount,
-- nonCompliant: Number of persons WITHOUT email addresses (fails quality check)
-- Calculated as: total persons minus persons with emails
-- These are the data quality issues we need to fix
COUNT(DISTINCT p.BusinessEntityID) - COUNT(DISTINCT e.BusinessEntityID) AS nonCompliant
FROM Person.Person p
-- LEFT JOIN (not INNER JOIN!) to keep ALL persons, even without emails
-- This is CRITICAL: INNER JOIN would hide the problem by excluding persons without emails
-- LEFT JOIN shows the gaps: persons who DON'T have matching EmailAddress records
LEFT JOIN Person.EmailAddress e ON p.BusinessEntityID = e.BusinessEntityID
WHERE
-- Filter 1: Only check specific person types that require emails
-- SC = Store Contact, IN = Individual Customer, SP = Sales Person, EM = Employee
-- Excludes types like 'VC' (Vendor Contact) who may not need emails in our system
p.PersonType IN ('SC', 'IN', 'SP', 'EM')
-- =============================================================================
-- QUERY EXPLANATION FOR PDC:
-- - PDC requires exactly 3 columns: total_count, scopeCount, nonCompliant
-- - All column names must match exactly (case-sensitive in some databases)
-- - LEFT JOIN is essential - never use INNER JOIN when finding missing data
-- - DISTINCT prevents duplicate counting if data has referential issues
--
-- RESULTS INTERPRETATION:
-- - If total_count=10000, scopeCount=9800, nonCompliant=200
-- - Then: 98% have emails (9800/10000), 2% missing (200/10000)
-- - This would PASS if threshold is ≥98%, FAIL if threshold is ≥99%
--
-- TROUBLESHOOTING:
-- - If nonCompliant=0 but you know emails are missing: Check PersonType filter
-- - If total_count seems low: Check ModifiedDate filter (may be too restrictive)
-- - If numbers don't add up: Check for duplicate BusinessEntityIDs in source
-- ==/==========================================================================In left menu, Click on: 'Data Operations' option -> Business Rules

In the Business Rules Tile, Click on: 'Business Rules'

Click on: Add Business Rule.
Enter the following details:
BUSINESS RULE NAME
Customer Email Completeness
DESCRIPTION
Goal: 98% completeness
NOTE
This rule joins 'Person.Person' and 'Person.EmailAddress'
RULE ENABLED
☑ (checked)

Click: 'CREATE BUSINESS RULE'.
Click: 'CONFIGURE'.

x
x
x
x
Incomplete Product Information
Scenario: Products must have weight and size information for shipping calculations.
Business Rule Configuration:
Rule Name: Product_Shipping_Data_Completeness
Description: Ensures products have weight and size for accurate shipping
Data Quality Dimension: Completeness
Schedule: Weekly
Target: Production.ProductSQL Query:
SELECT
COUNT(*) AS total_count,
SUM(CASE WHEN Weight IS NOT NULL AND Size IS NOT NULL THEN 1 ELSE 0 END) AS scopeCount,
SUM(CASE WHEN Weight IS NULL OR Size IS NULL THEN 1 ELSE 0 END) AS nonCompliant
FROM Production.Product
WHERE FinishedGoodsFlag = 1 -- Only finished productsHands-On Task:
Create this rule in PDC
Run the rule immediately
Review results on the History tab
Identify which product categories have the most incomplete data
x
x
x
x
Last updated
Was this helpful?
