Connect AW Database
Connect to AW DW ..
Adventure Works
In this hands-on workshop, you'll learn how to establish a connection between Pentaho Data Catalog and the Adventure Works 2022 database. We'll walk through configuring a Microsoft SQL Server data source connection, testing the connection, and ingesting metadata schemas to make Adventure Works data available for governance, cataloging, and analysis within PDC.
By the end of this workshop, you will be able to:
Configure Microsoft SQL Server data source connections in PDC
Set up proper JDBC connection strings with authentication parameters
Test database connectivity before committing configuration changes
Ingest specific database schemas for targeted metadata management
Configure optional data source properties for storage optimization and cost tracking
Understand the relationship between data source connections and metadata ingestion
Establish the foundation for Adventure Works data governance workflows
Technical Requirements:
MSSQL JDBC Driver (mssql-jdbc-12.10.1.jre11.jar) must be uploaded to PDC
Database server: pdc.pentaho.lab:1433
Authentication: SQL Server authentication with sa account

Log into Data Catalog:
Username: [email protected]
Password: Welcome123!
Click: Management in the left navigation menu.

In the Resources tile, click: Add Data Source.

Specify the following information for the connection to your data source.
If you are nearing or have exceeded the limit of data sources allowed by your license agreement, a message appears when you try to add a new data source.
Data Catalog encrypts your data source connection details, such as user name and password, before storing them.
Test Connection and Ingest Metadata Schema ..
After you have specified the detailed information according to your data source type, test the connection to the data source and add the data source.
Enter the following details to connect to: Adventure Works database.
Data Source Name
mssql:adventureworks2022
Data Source ID
Leave Blank to autogenerate ID
Description
AW DW: Person, HR, Purchasing, Sales, Production
Data Source Type
Microsoft SQL Server
Affinity
Default
Configuration Method
URI
Username
sa
Password
StrongPassword123
URI
jdbc:sqlserver://pdc.pentaho.lab:1433;databaseName=AdventureWorks2022;user=sa;password=StrongPassword123;encrypt=false
Driver
mssql-jdbc-13.2.0.jre11.jar*
Database Name
AdventureWorks2022
Manage Drivers
Select an existing driver or upload a new driver to ensure that the communication between the application and the database is efficient, secure, and follows the required standards.

Ensure you have uploaded the supported MSSQL Driver - mssql-jdbc-13.2.0.jre11.jar to PDC
Download the MSSQL JDBC driver.
Untar the file.
cd ~/Downloads
unzip ~/Downloads/sqljdbc_13.2.0.0_enu.zip -d ~/DownloadClick: Manage Drivers.

Enter the URI:
jdbc:sqlserver://pdc.pentaho.lab:1433;databaseName=AdventureWorks2022;user=sa;password=StrongPassword123;encrypt=falseClick Test Connection to test your connection to the specified data source.

Click Ingest Schema, select the following 5 schemas, and then click Ingest Schemas.

While you have the option to select all schemas, it is advisable to exclude schemas that are not relevant to your requirements.
We're going to cover this in more detail in Data Optimizer.
Click: Save Data Source.
Recommendation
For Pentaho Data Catalog, the best practice is to:
Use the standard connection URI without schema restrictions
Control schema scope through the selective "Ingest Schema" process during data source setup
Use virtual folders and role-based access control for further data governance
This approach aligns with Data Catalog's design philosophy of comprehensive discovery followed by organized access control rather than connection-level restrictions.
(Optional) In the Physical Location field, specify the physical location details of the data source.
(Optional) Configure the following storage optimization options for the data source.
Available for Migration
Enables or disables the data source for storage optimization. When enabled, it includes the data source for data optimizer activities.
Available for Writing
Enables or disables writing capabilities for the data source and enables migration when turned on.
Available for Data Mastering
Enables or disables the data source for data mastering purposes.
(Optional) In the Cost per Terabyte field, specify the data source pricing details like currency, price per terabyte, and billing frequency.
(Optional) In the Total Capacity field, specify the total capacity of the data source in terabytes.
(Optional) Enter a Note for any additional information to share with others who might access this data source.
Click Create Data Source to establish your data source connection.
Last updated
Was this helpful?





