Connect AW Database
Connect to AW DW ..
Adventure Works
In this hands-on workshop, you'll learn how to establish a connection between Pentaho Data Catalog and the Adventure Works 2022 database. We'll walk through configuring a Microsoft SQL Server data source connection, testing the connection, and ingesting metadata schemas to make Adventure Works data available for governance, cataloging, and analysis within PDC.
By the end of this workshop, you will be able to:
Configure Microsoft SQL Server data source connections in PDC
Set up proper JDBC connection strings with authentication parameters
Test database connectivity before committing configuration changes
Ingest specific database schemas for targeted metadata management
Configure optional data source properties for storage optimization and cost tracking
Understand the relationship between data source connections and metadata ingestion
Establish the foundation for Adventure Works data governance workflows
Workshop Connection Details: You'll configure connection to the Adventure Works 2022 database with:
Data Source Name:
mssql:adventureworks2022
Database Type: Microsoft SQL Server
Target Schemas: Person, HR, Purchasing, Sales, Production (5 core business domains)
Connection Method: JDBC URI with SQL Server authentication
Technical Requirements:
MSSQL JDBC Driver (mssql-jdbc-12.10.1.jre11.jar) must be uploaded to PDC
Database server: pdc.pentaho.lab:1433
Authentication: SQL Server authentication with sa account
This workshop establishes the critical data layer connection that enables PDC to discover, catalog, and govern Adventure Works data assets. Once connected, PDC can automatically discover tables, columns, relationships, and other metadata that forms the foundation for your data governance implementation.
Next Steps Preview: After establishing this connection, you'll be able to apply business glossaries, implement data quality rules, assign data stewards, and configure community-based access controls to the Adventure Works data assets.

Log into Data Catalog:
Username: [email protected]
Password: Welcome123!
Click: Management in the left navigation menu.

In the Resources tile, click: Add Data Source.

Specify the following information for the connection to your data source.
If you are nearing or have exceeded the limit of data sources allowed by your license agreement, a message appears when you try to add a new data source.
Data Catalog encrypts your data source connection details, such as user name and password, before storing them.
Test Connection and Ingest Metadata Schema ..
After you have specified the detailed information according to your data source type, test the connection to the data source and add the data source.
Enter the following details to connect to: Adventure Works database.
Data Source Name
mssql:adventureworks2022
Data Source ID
Leave Blank to autogenerate ID
Description
AW DW: Person, HR, Purchasing, Sales, Production
Data Source Type
Microsoft SQL Server
Affinity
Default
Configuration Method
URI
Username
sa
Password
StrongPassword123
URI
jdbc:sqlserver://pdc.pentaho.lab:1433;databaseName=AdventureWorks2022;user=sa;password=StrongPassword123;encrypt=false
Driver
mssql-jdbc-12.10.1.jre11.jar*
Database Name
AdventureWorks2022

Ensure you have uploaded the supported MSSQL Driver - mssql-jdbc-12.10.1.jre11.jar to PDC
Select the JDBC driver:

Enter the URI:
jdbc:sqlserver://pdc.pentaho.lab:1433;databaseName=AdventureWorks2022;user=sa;password=StrongPassword123;encrypt=false
Click Test Connection to test your connection to the specified data source.

Click Ingest Schema, select the following 5 schemas, and then click Ingest Schemas.

While you have the option to select all schemas, it is advisable to exclude system-related schemas that are not relevant to your requirements.
We're going to cover this in more detail in Data Optimizer.
Recommendation
For Pentaho Data Catalog, the best practice is to:
Use the standard connection URI without schema restrictions
Control schema scope through the selective "Ingest Schema" process during data source setup
Use virtual folders and role-based access control for further data governance
This approach aligns with Data Catalog's design philosophy of comprehensive discovery followed by organized access control rather than connection-level restrictions.
(Optional) In the Physical Location field, specify the physical location details of the data source.
(Optional) Configure the following storage optimization options for the data source.
Available for Migration
Enables or disables the data source for storage optimization. When enabled, it includes the data source for data optimizer activities.
Available for Writing
Enables or disables writing capabilities for the data source and enables migration when turned on.
Available for Data Mastering
Enables or disables the data source for data mastering purposes.
(Optional) In the Cost per Terabyte field, specify the data source pricing details like currency, price per terabyte, and billing frequency.
(Optional) In the Total Capacity field, specify the total capacity of the data source in terabytes.
(Optional) Enter a Note for any additional information to share with others who might access this data source.
Click Create Data Source to establish your data source connection.
Last updated
Was this helpful?