Pentaho Academy Beta site ..

Profile the Data

Processing and Profiling the

After successfully ingesting the metadata from our database schemas, our next step focuses on the data - 'Patient' table. This phase is essential for evaluating the data's structure, quality, and integrity to ensure our database's efficiency and effectiveness. Although currently limited to database metadata, Data Profiling will offer deeper insights, which we will explore later in this workshop.

Processing the data
  1. Log into Data Catalog:

Username: [email protected]

Password: Welcome123!

Security Advisory: Handling Login Credentials


Process the Data

  1. Select 'Data Canvas' from the left menu option.

  2. Click the checkbox to select all the schemas.

Process the schemas
  1. Click 'Process'.

In the process of managing both structured and unstructured data, two critical steps stand out: Metadata Ingest and Data Profiling. This distinction is essential for ensuring data quality and accessibility.

Ingest Metadata

Metadata ingest is a foundational process in data management within a Data Catalog. It involves the automatic collection of metadata — the data about data — from a database schema / file / object. This step is crucial for understanding and organizing the data, making it easily accessible for further analysis and data profiling.

  1. Navigate to the metadata ingest section of your Data Catalog tool and initiate the process by clicking the Start button.

Metadata Ingest
  1. Users can select specific tables or datasets for metadata ingestion. For example, if you are interested in patient information, you might expand the 'patients' table and opt for relevant fields such as 'passport'.

  2. After starting the ingest process, monitor its progress on the Manage Workers page. This page provides real-time updates on the ingestion task.

Metadata Ingest - Worker

Last updated

Was this helpful?