Data Discovery

Connect to a data source ..

Data Discovery establishes the critical foundation for implementing Pentaho Data Catalog (PDC) with Adventure Works database. The workshop systematically transforms raw data discovery into actionable governance, ensuring regulatory compliance (GDPR, SOX, CCPA) while enabling secure, role-based data access.

Through six structured sessions, you will create:

  • a complete data asset inventory, classify sensitive information, map organizational access requirements, and design automated compliance controls that reduce manual governance overhead by an estimated 75%.

The results deliver immediate business value through proactive risk mitigation and audit readiness. By identifying 47 sensitive data elements across Adventure Works' 71 tables and mapping them to specific regulatory requirements, organizations can avoid potential GDPR fines of up to 4% of global revenue and SOX compliance violations that carry criminal liability for executives.

The structured approach ensures that all 19,972 person records and 31,465 financial transactions are properly classified and protected according to their risk profile and business usage patterns.

This foundation enables automated segregation of duties for SOX compliance, purpose limitation for GDPR requirements, and principle of least privilege access controls - all while maintaining business operational efficiency and user productivity.

x

x

Link to PDC data sources

x

xx

Adventure Works 2022 contains approximately 70 tables organized into multiple schemas representing different functional areas of the business, with around 20,000 customers, over 70,000 orders, and 500 products.

The database contains 486 columns that require classification, making it ideal for demonstrating data governance and classification processes:

Personal Data Identification and Classification: The database contains various types of sensitive data including personal information in tables like Person.Person and HumanResources.Employee, with data such as names, addresses, contact information, dates of birth, and even employee resumes that could contain multiple types of personal data.

Data Sensitivity Categorization: Using Pentaho Data Catalog (PDC), the Adventure Works database demonstrates how to perform automated data classification, categorizing columns into sensitivity levels such as Confidential, Highly Confidential, and assigning appropriate information types based on content.

Regulatory Reporting and Audit Trails: The comprehensive business structure of Adventure Works, spanning sales, human resources, and production data, provides an excellent framework for demonstrating how data catalogs support regulatory reporting requirements.

Risk Assessment and Data Governance: The database allows data governance teams to quantify data risk and develop processes for data masking in non-production environments, which is a critical compliance requirement for protecting sensitive data in development and testing scenarios.

Last updated

Was this helpful?