Metadata Ingestion
Ingest Metadata
In this hands-on workshop, you'll learn how to perform metadata ingestion in Pentaho Data Catalog to automatically discover and catalog technical metadata from your Adventure Works 2022 database connection. We'll walk through initiating the metadata ingest process, monitoring its progress, and understanding how this foundational step creates the technical foundation that enables data profiling, business glossary mapping, and comprehensive data governance.
By the end of this workshop, you will be able to:
Initiate automated metadata ingestion processes for connected data sources
Monitor metadata ingest job progress using PDC's Workers interface
Understand the critical role of metadata ingestion in data catalog operations
Recognize how technical metadata discovery enables advanced data governance features
Navigate the relationship between data source connections and metadata availability
Prepare your data catalog for data profiling, quality assessment, and business context mapping
Understand metadata ingestion quotas and resource management considerations
What Metadata Ingestion Discovers: The metadata ingestion process automatically catalogs:
Table structures and column definitions across all Adventure Works schemas
Data types and nullable constraints for each column
Primary and foreign key relationships between tables
Index information and database constraints
Schema organization and table categorization
Basic statistics about table sizes and row counts
Workshop Process: You'll initiate the metadata ingestion for your mssql:adventureworks2022
data source, covering all five business schemas (Person, HR, Purchasing, Sales, Production). The process runs as a background job that you can monitor through PDC's Workers interface.
Foundation for Advanced Features: This metadata ingestion creates the technical foundation that enables:
Data profiling and quality assessment
Business glossary term mapping to technical assets
Data lineage discovery and impact analysis
Community-based access controls at the table/column level
Data steward assignment and governance workflows
Resource Management Note: PDC monitors data scanning quotas for file-based sources, but database metadata ingestion (like Adventure Works) does not count against your data quota limits, making it ideal for comprehensive enterprise database cataloging.
Log into Data Catalog:
Username: [email protected]
Password: Welcome123!
Click: Management in the left navigation menu.
Navigate to: the metadata ingest section
Initiate the process by clicking the
Start
button.

After starting the ingest process, monitor its progress on the Workers page.

Last updated
Was this helpful?