# Completeness Rule

{% hint style="success" %}

#### Completeness Rule

In this series of detailed workshops we're going to:

* Understand the business value and ROI of implementing business rules
* Master the 7 Data Quality Dimensions in PDC
* Create and configure business rules with detailed step-by-step guidance
* Implement rule actions (Status, Tags, Webhooks) and understand their purpose
* Organize rules using rule groups for operational efficiency
* Monitor and interpret rule execution results for continuous improvement

Completeness rules are the easiest to understand and often have the highest immediate business impact. Missing data is obvious and actionable.
{% endhint %}

***

{% hint style="info" %}

#### Accessing your Catalog

To get started using the Data Catalog, log in using the address and credentials provided by your Data Catalog service user or administrator.
{% endhint %}

To access your catalog, please follow these steps:

1. Open **Google Chrome** web browser.
2. Navigate to:

{% embed url="<https://pdc.pentaho.lab>" %}

3. Enter following email and password, then click **Sign In**.

Username: <james.lock@adventureworks.com> (mapped to Business Steward role)

Password: Welcome123!

***

{% tabs %}
{% tab title="Title" %}
{% hint style="info" %}

#### Title

**Problem:** AdventureWorks needs customer titles (Mr., Ms., Dr., etc.) for:

* Personalized communication and correspondence
* Formal business letters and documents
* Professional customer relationship management
* Marketing segmentation by demographics
* Cultural and social protocol adherence

**Business Impact:** If 40% of customers lack title information:

* Inability to personalize communications properly (generic "Dear Customer" messages)
* Risk of offending customers with incorrect forms of address
* Reduced effectiveness of targeted marketing campaigns
* Poor customer experience in formal communications
* Missed opportunity to demonstrate attention to detail and professionalism

**Goal:** Ensure at least 70% of active customers have title information on file.
{% endhint %}

**Business Rule Configuration:**

{% hint style="info" %}
**BUSINESS RULE NAME:** Customer Title Completeness

**DESCRIPTION:** Goal: 70% completeness for customer titles (Mr., Ms., Dr., etc.)

**NOTE:** This rule checks Person.Person table for missing Title field. Titles are important for personalized communications and professional correspondence.

**CUSTOM TAGS:** completeness, customer-data, personalization (optional)

**BUSINESS RULE TYPE:** Internal Data Quality

**DATA QUALITY DIMENSION:** Completeness

**SELECTED TARGET:** `Person.Person`&#x20;

**SQL REFERENCES:** Person.Person
{% endhint %}

**SQL Query:**

```sql
-- =============================================================================
-- TITLE COMPLETENESS CHECK
-- Purpose: Find persons who are missing title information (Mr., Ms., Dr., etc.)
-- Target Table: Person.Person
-- Referenced Tables: Person.Person
-- =============================================================================

SELECT
    -- total_count: Total number of persons in scope (denominator for percentage)
    -- COUNT(*) counts all rows that match the WHERE clause
    -- This represents our universe of persons we expect to have titles
    COUNT(*) AS total_count,
    
    -- scopeCount: Number of persons who HAVE title information (passes quality check)
    -- COUNT(p.Title) only counts non-NULL values
    -- SQL's COUNT() function ignores NULL values by design
    -- This is the "compliant" count - persons with complete title data
    COUNT(p.Title) AS scopeCount,
    
    -- nonCompliant: Number of persons WITHOUT title information (fails quality check)
    -- Uses CASE statement to count NULLs explicitly
    -- WHEN Title IS NULL THEN 1: adds 1 for each missing title
    -- ELSE 0: adds 0 for each existing title
    -- SUM totals all the 1s to get count of missing titles
    SUM(CASE WHEN p.Title IS NULL THEN 1 ELSE 0 END) AS nonCompliant

FROM Person.Person p

WHERE
    -- Filter: Only check specific person types that require titles
    -- SC = Store Contact, IN = Individual Customer, SP = Sales Person, EM = Employee
    -- These are customer-facing or professional roles where titles are important
    -- Excludes types like 'VC' (Vendor Contact) who may not need titles tracked
    p.PersonType IN ('SC', 'IN', 'SP', 'EM')

-- =============================================================================
-- QUERY EXPLANATION FOR PDC:
-- - PDC requires exactly 3 columns: total_count, scopeCount, nonCompliant
-- - All column names must match exactly (case-sensitive in some databases)
-- - COUNT(column) automatically excludes NULLs - this is SQL standard behavior
-- - SUM(CASE...) pattern explicitly counts NULL values for nonCompliant
--
-- RESULTS INTERPRETATION:
-- - If total_count=10000, scopeCount=7000, nonCompliant=3000
-- - Then: 70% have titles (7000/10000), 30% missing (3000/10000)
-- - This would PASS if threshold is ≥70%, FAIL if threshold is ≥75%
--
-- WHY TITLE COMPLETENESS MATTERS:
-- - Personalization drives engagement (studies show 20-30% lift in response)
-- - Proper forms of address show respect and professionalism
-- - Titles indicate professional status (Dr., Prof.) important for B2B
-- - Cultural sensitivity: some cultures place high importance on titles
--
-- TROUBLESHOOTING:
-- - If nonCompliant=0: Check PersonType filter values
-- - If total_count seems low: Verify ModifiedDate filter isn't too restrictive
-- - If total_count != scopeCount + nonCompliant: Check for data type issues
-- - Common titles in data: Mr., Mrs., Ms., Dr., Prof., Sr., Sra.
-- =============================================================================
```

{% hint style="info" %}
**Query Explanation:**

* LEFT JOIN finds people WITHOUT emails (that's what we're looking for)
* PersonType filter focuses on customers/employees who need emails
* Date filter focuses on active records
  {% endhint %}

***

1. In left menu, Click on: 'Data Operations' option -> Business Rules

<figure><img src="https://1051758685-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fw1qJj4OGmdcvowiklB9W%2Fuploads%2FlTn7q03aXMPoNwpNQqmH%2Fimage.png?alt=media&#x26;token=195d8e17-b519-4b98-9aa8-8b65b9ceb5af" alt=""><figcaption><p>Create a Business Rule</p></figcaption></figure>

2. In the Business Rules Tile, Click on: 'Business Rules'

<figure><img src="https://1051758685-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fw1qJj4OGmdcvowiklB9W%2Fuploads%2FkT60e2S0Pr9M39VBevHk%2Fimage.png?alt=media&#x26;token=16039aa0-3585-4911-a770-d8d885aa4c95" alt=""><figcaption><p>Add Business Rule</p></figcaption></figure>

3. Click on: Add Business Rule.
4. Enter the following details:

<table><thead><tr><th width="214">Setting</th><th>Value</th></tr></thead><tbody><tr><td>BUSINESS RULE NAME</td><td>Customer Title Completeness</td></tr><tr><td>DESCRIPTION</td><td>Goal: 70% completeness for customer titles (Mr., Ms., Dr., etc.)</td></tr><tr><td>NOTE</td><td>This rule checks Person.Person table for missing Title field. Titles are important for personalized communications and professional correspondence.</td></tr><tr><td>CUSTOM TAGS</td><td>completeness, customer-data, personalization</td></tr><tr><td>RULE ENABLED</td><td>☑ (checked)</td></tr><tr><td>RULE APPROVED</td><td>☑ (checked)</td></tr></tbody></table>

<figure><img src="https://1051758685-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fw1qJj4OGmdcvowiklB9W%2Fuploads%2FuHVoFd9I5ZLM0iNB8AwA%2Fimage.png?alt=media&#x26;token=4fd8f51b-fc57-4226-bf70-1771cba5cfb7" alt=""><figcaption><p>Create Title Completeness Rule</p></figcaption></figure>

5. Click: 'CREATE BUSINESS RULE'.
6. Click: 'CONFIGURE'.

<figure><img src="https://1051758685-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fw1qJj4OGmdcvowiklB9W%2Fuploads%2FF0cgYxVdo8HLmpmnibOh%2Fimage.png?alt=media&#x26;token=f36663da-37f1-4990-b528-6ef81dcc7d40" alt=""><figcaption><p>Configure Rule</p></figcaption></figure>

7.

x

x

x

x
{% endtab %}

{% tab title="Email Addresses" %}
{% hint style="info" %}

#### Missing Email Addresses

**Problem:** AdventureWorks needs customer email addresses for:

* Order confirmations
* Marketing campaigns
* Customer service communications
* Password reset functionality

**Business Impact:** If 10% of customers lack emails:

* Can't send order confirmations (poor customer experience)
* Marketing campaigns reach only 90% of intended audience
* Lost revenue opportunity

**Goal:** Ensure at least 98% of active customers have email addresses on file.
{% endhint %}

**Business Rule Configuration:**

{% hint style="info" %}
**MULTI-TABLE CONSIDERATION:** This rule joins `Person.Person` and `Person.EmailAddress`.

**Target Table Decision:**

* Question 1: Users primarily interact with? → Person (main entity)
* Question 2: Problem originates? → EmailAddress (missing emails)
* Question 3: Parent table? → Person (parent) → EmailAddress (child)
* Question 4: Business impact? → Email data itself

**SELECTED TARGET:** `Person.EmailAddress`&#x20;

**SQL REFERENCES:** Person.Person, Person.EmailAddress
{% endhint %}

**SQL Query:**

```sql
-- =============================================================================
-- EMAIL COMPLETENESS CHECK
-- Purpose: Find customers who are missing email addresses
-- Target Table: Person.EmailAddress
-- Referenced Tables: Person.Person, Person.EmailAddress
-- =============================================================================

SELECT 
    -- total_count: Total number of persons in scope (denominator for percentage)
    -- COUNT(DISTINCT ...) ensures each person counted only once
    -- Using DISTINCT handles edge cases where data issues might create duplicates
    COUNT(DISTINCT p.BusinessEntityID) AS total_count,
    
    -- scopeCount: Number of persons who HAVE email addresses (passes quality check)
    -- Only counts persons who exist in both Person and EmailAddress tables
    -- This is the "compliant" count
    COUNT(DISTINCT e.BusinessEntityID) AS scopeCount,
    
    -- nonCompliant: Number of persons WITHOUT email addresses (fails quality check)
    -- Calculated as: total persons minus persons with emails
    -- These are the data quality issues we need to fix
    COUNT(DISTINCT p.BusinessEntityID) - COUNT(DISTINCT e.BusinessEntityID) AS nonCompliant

FROM Person.Person p

    -- LEFT JOIN (not INNER JOIN!) to keep ALL persons, even without emails
    -- This is CRITICAL: INNER JOIN would hide the problem by excluding persons without emails
    -- LEFT JOIN shows the gaps: persons who DON'T have matching EmailAddress records
    LEFT JOIN Person.EmailAddress e ON p.BusinessEntityID = e.BusinessEntityID

WHERE 
    -- Filter 1: Only check specific person types that require emails
    -- SC = Store Contact, IN = Individual Customer, SP = Sales Person, EM = Employee
    -- Excludes types like 'VC' (Vendor Contact) who may not need emails in our system
    p.PersonType IN ('SC', 'IN', 'SP', 'EM')

-- =============================================================================
-- QUERY EXPLANATION FOR PDC:
-- - PDC requires exactly 3 columns: total_count, scopeCount, nonCompliant
-- - All column names must match exactly (case-sensitive in some databases)
-- - LEFT JOIN is essential - never use INNER JOIN when finding missing data
-- - DISTINCT prevents duplicate counting if data has referential issues
-- 
-- RESULTS INTERPRETATION:
-- - If total_count=10000, scopeCount=9800, nonCompliant=200
-- - Then: 98% have emails (9800/10000), 2% missing (200/10000)
-- - This would PASS if threshold is ≥98%, FAIL if threshold is ≥99%
-- 
-- TROUBLESHOOTING:
-- - If nonCompliant=0 but you know emails are missing: Check PersonType filter
-- - If total_count seems low: Check ModifiedDate filter (may be too restrictive)
-- - If numbers don't add up: Check for duplicate BusinessEntityIDs in source
-- ==/==========================================================================
```

{% hint style="info" %}
**Query Explanation:**

* LEFT JOIN finds people WITHOUT emails (that's what we're looking for)
* PersonType filter focuses on customers/employees who need emails
* Date filter focuses on active records
  {% endhint %}

***

1. In left menu, Click on: 'Data Operations' option -> Business Rules

<figure><img src="https://1051758685-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fw1qJj4OGmdcvowiklB9W%2Fuploads%2FlTn7q03aXMPoNwpNQqmH%2Fimage.png?alt=media&#x26;token=195d8e17-b519-4b98-9aa8-8b65b9ceb5af" alt=""><figcaption><p>Create a Business Rule</p></figcaption></figure>

2. In the Business Rules Tile, Click on: 'Business Rules'

<figure><img src="https://1051758685-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fw1qJj4OGmdcvowiklB9W%2Fuploads%2FkT60e2S0Pr9M39VBevHk%2Fimage.png?alt=media&#x26;token=16039aa0-3585-4911-a770-d8d885aa4c95" alt=""><figcaption><p>Add Business Rule</p></figcaption></figure>

3. Click on: Add Business Rule.
4. Enter the following details:

| Setting            | Value                                                     |
| ------------------ | --------------------------------------------------------- |
| BUSINESS RULE NAME | Customer Email Completeness                               |
| DESCRIPTION        | Goal: 98% completeness                                    |
| NOTE               | This rule joins 'Person.Person' and 'Person.EmailAddress' |
| RULE ENABLED       | ☑ (checked)                                               |

<figure><img src="https://1051758685-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fw1qJj4OGmdcvowiklB9W%2Fuploads%2F9I2yhqEmbyhkZWZI9xIa%2Fimage.png?alt=media&#x26;token=d4ed065b-0ef7-46f0-b102-d35b7fb3acf1" alt=""><figcaption><p>Create Customer Email Completeness Rule</p></figcaption></figure>

5. Click: 'CREATE BUSINESS RULE'.
6. Click: 'CONFIGURE'.

<figure><img src="https://1051758685-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fw1qJj4OGmdcvowiklB9W%2Fuploads%2FF0cgYxVdo8HLmpmnibOh%2Fimage.png?alt=media&#x26;token=f36663da-37f1-4990-b528-6ef81dcc7d40" alt=""><figcaption><p>Configure Rule</p></figcaption></figure>

7.

x

x

x

x
{% endtab %}

{% tab title="Product Information" %}
{% hint style="info" %}

#### Incomplete Product Information

**Scenario:** Products must have weight and size information for shipping calculations.
{% endhint %}

**Business Rule Configuration:**

```
Rule Name: Product_Shipping_Data_Completeness
Description: Ensures products have weight and size for accurate shipping
Data Quality Dimension: Completeness
Schedule: Weekly
Target: Production.Product
```

**SQL Query:**

```sql
SELECT 
    COUNT(*) AS total_count,
    SUM(CASE WHEN Weight IS NOT NULL AND Size IS NOT NULL THEN 1 ELSE 0 END) AS scopeCount,
    SUM(CASE WHEN Weight IS NULL OR Size IS NULL THEN 1 ELSE 0 END) AS nonCompliant
FROM Production.Product
WHERE FinishedGoodsFlag = 1 -- Only finished products
```

**Hands-On Task:**

1. Create this rule in PDC
2. Run the rule immediately
3. Review results on the History tab
4. Identify which product categories have the most incomplete data

***

x

{% endtab %}
{% endtabs %}

x

{% tabs %}
{% tab title="First Tab" %}
x
{% endtab %}

{% tab title="Second Tab" %}

{% endtab %}
{% endtabs %}

x


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://academy.pentaho.com/pentaho-data-catalog-en/data-catalog/business-rules/completeness-rule.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
