Pentaho Academy Beta site ..

Metadata Store

Exploring the Pentaho Data Catalog Metadata Store ..

Metadata Store in Data Catalogs: A Foundation for Effective Data Management

A metadata store within a data catalog serves as a critical repository for storing metadata from various sources, ensuring the preservation of their relationships, historical context, and vital details. Acting as the backbone of a data catalog's architecture, this specialized database is instrumental in streamlining the organization and handling of metadata. This, in turn, significantly aids in the efficient discovery, understanding, and governance of data.

The primary role of the metadata store is to facilitate easy access to the correct data assets for users by encompassing technical, process, and business metadata. This includes comprehensive information on the data's structure, its lineage, utilization patterns, and the business context it operates within. By making data assets readily identifiable, assessable, and actionable, the metadata store plays a pivotal role in bolstering data-driven decision-making processes.

The diagram depicts a simplified representation of the key data model of the PDC Metadata Store (MDS). To understand the relationships between the 'Collections' , it's illustrated in a relational database format.

The model illustrates the relationships between data sources and entities where the customer data assets and discovered metadata properties are persisted as rich JSON graphs.

At the core of the model is the mds.entities collection.

Data Model of Pentaho Data Catalog - Metadata Store (MDS)

Connect to Metadata Store

  1. Run NoSQLBooster.

cd
cd ~/nosqlbooster4mongo*
./nosqlbooster4mongo*

So what happens when you connect/ingest/profile a datasource?

When you create a data source connection, for example, to the Synthea dataset on PostgreSQL, the details are persisted in the resourceconnections collection located in the fe database.

resourceconnections

If not not familiar with MQL, the easiest way to query the MongoDB database is to use SQL..!

For example, to display the connection name in a dashboard:

resourceName where configMethod = 'credentials'

The following table shows how SQL components map to MongoDB document components.

SQL
MongoDB

database

database

table

collection

column

field

row (record)

document

  1. Run NoSQLBooster.

cd
cd ~/nosqlbooster4mongo*
./nosqlbooster4mongo*
  1. In the Query panel, enter the following SQL statement.

//Single quotes are for strings 'string'
mb.runSQLQuery(`
       SELECT resourceName FROM resourceconnections 
       WHERE configMethod = 'credentials'     
`).sort({_id:-1})
  .limit(100)
SQL Query

As expected the result is: postgresql:synthea as there's currently only 1 connection defined.

Was this helpful?