# Metadata Ingestion

{% hint style="success" %}

#### Ingest Metadata

In this hands-on workshop, you'll learn how to perform metadata ingestion in Pentaho Data Catalog to automatically discover and catalog technical metadata from your Adventure Works 2022 database connection. We'll walk through initiating the metadata ingest process, monitoring its progress, and understanding how this foundational step creates the technical foundation that enables data profiling, business glossary mapping, and comprehensive data governance.

By the end of this workshop, you will be able to:

* Initiate automated metadata ingestion processes for connected data sources
* Monitor metadata ingest job progress using PDC's Workers interface
* Understand the critical role of metadata ingestion in data catalog operations
* Recognize how technical metadata discovery enables advanced data governance features
* Navigate the relationship between data source connections and metadata availability
* Prepare your data catalog for data profiling, quality assessment, and business context mapping
* Understand metadata ingestion quotas and resource management considerations

**What Metadata Ingestion Discovers:** The metadata ingestion process automatically catalogs:

* **Table structures** and column definitions across all Adventure Works schemas
* **Data types** and nullable constraints for each column
* **Primary and foreign key relationships** between tables
* **Index information** and database constraints
* **Schema organization** and table categorization
* **Basic statistics** about table sizes and row counts

**Workshop Process:** You'll initiate the metadata ingestion for your `mssql:adventureworks2022` data source, covering all five business schemas (Person, HR, Purchasing, Sales, Production). The process runs as a background job that you can monitor through PDC's Workers interface.

**Foundation for Advanced Features:** This metadata ingestion creates the technical foundation that enables:

* Data profiling and quality assessment
* Business glossary term mapping to technical assets
* Data lineage discovery and impact analysis
* Community-based access controls at the table/column level
* Data steward assignment and governance workflows

**Resource Management Note:** PDC monitors data scanning quotas for file-based sources, but database metadata ingestion (like Adventure Works) does not count against your data quota limits, making it ideal for comprehensive enterprise database cataloging.
{% endhint %}

***

{% hint style="info" %}

#### Metadata Ingestion

Metadata ingest is a foundational process in data management within a Data Catalog. It involves the automatic collection of metadata — the data about data — from a database schema / file / object. This step is crucial for understanding and organizing the data, making it easily accessible for further analysis and data profiling.
{% endhint %}

1. Log into Data Catalog:

{% embed url="<https://pdc.pentaho.lab>" %}

Username: <james.lock@adventureworks.com>

Password: Welcome123!

2. Click: Management in the left navigation menu.
3. Navigate to:  the metadata ingest section&#x20;
4. Initiate the process by clicking the `Start` button.

<figure><img src="/files/gycocOtqM5CfAxouTlyf" alt=""><figcaption><p>Metadata Ingest</p></figcaption></figure>

{% hint style="info" %}
Users can select specific tables or datasets for metadata ingestion. For example, if you are interested in patient information, you might expand the 'patients' table and opt for relevant fields such as 'passport'.

If you have already scanned more than 75% of your data quota, you see a message when you start the scan. Even if you cannot scan new data, you still can run Data Discovery or Data Identification on data you have already scanned. Databases do not have a data scan quota.
{% endhint %}

2. After starting the ingest process, monitor its progress on the Workers page.&#x20;

<figure><img src="/files/BFIWJiH1XaXdAzMGO4oP" alt=""><figcaption><p>Workers - Metadata Ingest</p></figcaption></figure>

{% hint style="info" %}
The metadata is now up-to-date ..&#x20;

The Metadata Ingest step scans the data source for new or modified files since the last run, updating the existing metadata. In addition, it removes metadata for deleted files, ensuring Data Catalog represents the data source accurately.

Next stage: [Profile the Data](/pentaho-data-catalog-en/data-catalog/data-processing/profiling.md) ..
{% endhint %}

***


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://academy.pentaho.com/pentaho-data-catalog-en/data-catalog/data-processing/metadata-ingestion.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
