# MDI

{% hint style="info" %}
In this Workshop, you will:

• Create a number of transformations that prepare the metadata and inject these specific values through the ETL Metadata Injection step.
{% endhint %}

{% tabs %}
{% tab title="Why MDI.?" %}
{% hint style="info" %}
Metadata is traditionally defined and configured at design time, in a process known as hard coding, because it does not change at run time.

In this scenario, onboarding the files would require a CSV file input step for each of the different delimiters.
{% endhint %}

<figure><img src="https://3680356391-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FZpCSy6Skj215f4oWypdc%2Fuploads%2Fgit-blob-d8f889d679f795cd4fe1cefd5aea4834c6079d70%2Fstatic%20mi.png?alt=media" alt="" width="375"><figcaption><p>hard coded values in ktr</p></figcaption></figure>

1. Double-click on the CSV File Input steps to display the metadata properties:

\~/How-To--Metadata-Injection/Overview of Metadata Injection/file\_hard\_coded.ktr

<figure><img src="https://3680356391-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FZpCSy6Skj215f4oWypdc%2Fuploads%2Fgit-blob-e5a1caa78da14b81780329a7dfebd73bb193bf08%2Fcsv%20static.png?alt=media" alt=""><figcaption></figcaption></figure>

<figure><img src="https://3680356391-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FZpCSy6Skj215f4oWypdc%2Fuploads%2Fgit-blob-1236a7f067a835665a1949299f54a173829c9649%2Fcsv%20static2.png?alt=media" alt=""><figcaption><p>hard coded values</p></figcaption></figure>

{% hint style="info" %}
Each data source requires its own workflow.

The challenge becomes to find a way to dynamically inject the required metadata properties at run time via a template.
{% endhint %}

***

**Metadata Properties**

{% hint style="info" %}
Steps in a Transformation are configured with associated metadata property values, e.g. step name, filename, delimiter, and so on ..

These metadata properties are saved as .ktr=xml

A neat way to introduce metadata injection is to change a metadata property in the xml of a step.
{% endhint %}

1. Copy / Paste the step into Visual Studio Code:

<figure><img src="https://3680356391-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FZpCSy6Skj215f4oWypdc%2Fuploads%2Fgit-blob-5f1548fb233ad914ae5c4fcbf15df162601a0a5d%2Fsales%20xml.png?alt=media" alt="" width="563"><figcaption><p>csv input step xml</p></figcaption></figure>

2. Change a metadata property: \<name>Whatever\</name>
3. Copy / Paste xml into a new transformation:

{% hint style="info" %}
Each of the Step 'settings' are defined in the .xml. By changing the step name, you've manually 'injected' a new value.
{% endhint %}
{% endtab %}

{% tab title="Standard" %}
{% hint style="info" %}
These Guided Demonstrations outline the ‘Use Case’ for Metadata Injection. Onboarding data workflows follow repeatable patterns, with just different metadata properties.

* Standard Metadata Injection – rename data stream fields

Once the repeatable pattern has been defined in a template, the ETL Metadata Injection step exposes their metadata properties, which can then be mapped to the corresponding injected source stream field.

* Outline the workflow for standard metadata injection.
* Configure an ETL Metadata Injection Transformation, and Template.

A typical Use Case would be renaming database columns, as you migrate databases from one system to another.
{% endhint %}

**Template**

Lets start with the template. The template is the workflow that utilizes the metadata injection.

1. Open the template:

\~/How-To--Metadata-Injection/Standard Metadata Injection/standard\_template.ktr

<figure><img src="https://3680356391-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FZpCSy6Skj215f4oWypdc%2Fuploads%2Fgit-blob-a792982de6f1d4f046f99e174a65709ee8a01b12%2Fstatic%20template.png?alt=media" alt="" width="375"><figcaption></figcaption></figure>

***

**Data Grid – Test data - input**

Meta tab: on this tab, you can specify the field metadata (output specification) of the data

Data tab: This grid contains the data. Everything is entered in String format so make sure you use the correct format masks in the metadata tab.

1. Drag and drop the Data Grid step onto the canvas:
2. Double-click to set the properties as outlined below:

<figure><img src="https://3680356391-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FZpCSy6Skj215f4oWypdc%2Fuploads%2Fgit-blob-f849238255ad5dfa84454dbec7c81aaa35910c60%2Fdg%20test.png?alt=media" alt="" width="375"><figcaption></figcaption></figure>

<figure><img src="https://3680356391-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FZpCSy6Skj215f4oWypdc%2Fuploads%2Fgit-blob-36e3c0dfc088140e124e08d018cf78c53de2d019%2Fdg%20test2.png?alt=media" alt="" width="375"><figcaption></figcaption></figure>

This is the data ingestion step. Could be a table, flat file, etc..

***

**Select Values**

1. Drag the Select values step onto the canvas:

There’s nothing to configure as the ‘metadata rules’ will be defined in the ETL Metadata Injection step.

<figure><img src="https://3680356391-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FZpCSy6Skj215f4oWypdc%2Fuploads%2Fgit-blob-88535910b2d0c8e6e34a88ab1d8a7ee82ab75406%2FSV%20(1)%20(1).png?alt=media" alt="" width="375"><figcaption></figcaption></figure>

***

**Text File Output**

The Text file output step is used to export data to text file format. This is commonly used to generate Comma Separated Values (CSV files) that can be read by spreadsheet applications. It is also possible to generate fixed width files by setting lengths on the fields in the fields tab.

1. Drag and drop the Text file output step onto the canvas:
2. Double-click to set the properties as outlined below:

<figure><img src="https://3680356391-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FZpCSy6Skj215f4oWypdc%2Fuploads%2Fgit-blob-716767468e52f404a22b989894f25c92598fa480%2FTFO%20standard.png?alt=media" alt=""><figcaption></figcaption></figure>

3. Just add the path to the output file. Notice the internal variables used to define the filename:

Filename: ${Internal.Entry.Current.Directory}/${Internal.Transformation.Name}\_output

4. Save the Transformation as:

\~/How-To--Metadata-Injection/Standard Metadata Injection/standard\_template.ktr

***

**Metadata Injection**

The Transformation sets the metadata fieldname values that are going to be used in the Metadata Injection Template.

<figure><img src="https://3680356391-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FZpCSy6Skj215f4oWypdc%2Fuploads%2Fgit-blob-f90e17bc247b245a5d04e3e5abe19550247a5ac5%2Fmdi.png?alt=media" alt="" width="375"><figcaption></figcaption></figure>

***

**Data Grid**

1. Drag and drop the Data Grid step onto the canvas:
2. Double-click to set the properties as outlined below:

<figure><img src="https://3680356391-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FZpCSy6Skj215f4oWypdc%2Fuploads%2Fgit-blob-3005785d8618163a602cc17336ea6f0cc78544dc%2Fdg%20mdi.png?alt=media" alt="" width="375"><figcaption></figcaption></figure>

<figure><img src="https://3680356391-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FZpCSy6Skj215f4oWypdc%2Fuploads%2Fgit-blob-d950b1b99dfeeede85cdc6f4ad31637f0500a012%2Fdg%20mdi2.png?alt=media" alt="" width="375"><figcaption></figcaption></figure>

***

**ETL Metadata Injection**

The ETL Metadata Injection step inserts metadata into a template transformation. Instead of statically entering ETL Metadata in a step dialog, you pass it at run-time. This step enables you to solve repetitive ETL workloads like loading of text files, data migration, and so on.

1. Drag and drop the ETL Metadata step onto the canvas:
2. Double-click to set the properties as outlined below:

<figure><img src="https://3680356391-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FZpCSy6Skj215f4oWypdc%2Fuploads%2Fgit-blob-ad92069a4f5c72ad33f388bd09cb146e2ff062d7%2Fmdi%20set.png?alt=media" alt=""><figcaption></figcaption></figure>

3. Click the Browse button to locate the Metadata Injection Template:
4. Notice the Inject Metadata tab:

These options define the ‘metadata rules’ for each step in the template. In this example, the Select values step will change the ‘source\_fieldname’ to ‘dest\_fieldname’ in the meta tab option.

5. Save the Transformation as:

\~/How-To--Metadata-Injection/Standard Metadata Injection/standard\_mdi.ktr

***

**RUN the MDI Workflow**

1. RUN standard\_mdi.ktr:
2. Open the file located at:

\~/How-To--Metadata-Injection/Standard Metadata/standard\_mdi\_output.txt

<figure><img src="https://3680356391-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FZpCSy6Skj215f4oWypdc%2Fuploads%2Fgit-blob-8e0df9e90fc438f0ab5371d3e0b12d0ff6427b68%2Fstandard%20result.png?alt=media" alt="" width="375"><figcaption></figcaption></figure>
{% endtab %}

{% tab title="Push / Pull" %}
{% hint style="info" %}
In certain scenarios you will need to push/pull the rows to/from the template:

* Push Metadata Injection.
* Pull Metadata Injection.
* Push – Pull Metadata Injection.
  {% endhint %}

{% tabs %}
{% tab title="Push" %}
{% hint style="info" %}
Streams the dataset **to** the template.
{% endhint %}

**Template**

Lets start with the template. The template is the workflow that utilizes the metadata injection.

1. Open the template:

\~/How-To--Metadata-Injection/Push - Pull Metadata Injection/push mdi/push\_template.ktr

<figure><img src="https://3680356391-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FZpCSy6Skj215f4oWypdc%2Fuploads%2Fgit-blob-d58f1f4d36117e547d646dde2b7092a5731ac97f%2Fpush%20template.png?alt=media" alt="" width="375"><figcaption></figcaption></figure>

***

**Dummy**

1. Drag the Dummy step onto the canvas:
2. Rename: dmmy-input:

The rows are streamed / ‘pushed’ into the transformation template -dummy-input step.

***

**Select Values**

1. Drag the Select values step onto the canvas:

There’s nothing to configure as the ‘metadata rules’ will be defined in the ETL Metadata Injection step.

<figure><img src="https://3680356391-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FZpCSy6Skj215f4oWypdc%2Fuploads%2Fgit-blob-88535910b2d0c8e6e34a88ab1d8a7ee82ab75406%2FSV%20(1)%20(1).png?alt=media" alt="" width="375"><figcaption></figcaption></figure>

***

**Text File Output**

1. Drag and drop the Text file output step onto the canvas:
2. Double-click to set the properties as outlined below:

<figure><img src="https://3680356391-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FZpCSy6Skj215f4oWypdc%2Fuploads%2Fgit-blob-716767468e52f404a22b989894f25c92598fa480%2FTFO%20standard.png?alt=media" alt=""><figcaption></figcaption></figure>

3. Just add the path to the output file. Notice the internal variables used to define the filename:

Filename: ${Internal.Entry.Current.Directory}/${Internal.Transformation.Name}\_output

4. Save the Transformation as:

\~/How-To--Metadata-Injection/Push - Pull Metadata Injection/push mdi/push\_template.ktr

***

**Metadata Injection**

Here the result set from the Test data – input step, is pushed down into the Template.

<figure><img src="https://3680356391-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FZpCSy6Skj215f4oWypdc%2Fuploads%2Fgit-blob-ee191054f344376c2cbd1b1d6a1c18a523966444%2Fpush%20-etl.png?alt=media" alt="" width="375"><figcaption></figcaption></figure>

***

**Data Grid – Test data - input**

Meta tab: on this tab, you can specify the field metadata (output specification) of the data

Data tab: This grid contains the data. Everything is entered in String format so make sure you use the correct format masks in the metadata tab.

1. Drag and drop the Data Grid step onto the canvas:
2. Double-click to set the properties as outlined below:

<figure><img src="https://3680356391-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FZpCSy6Skj215f4oWypdc%2Fuploads%2Fgit-blob-f849238255ad5dfa84454dbec7c81aaa35910c60%2Fdg%20test.png?alt=media" alt="" width="375"><figcaption></figcaption></figure>

<figure><img src="https://3680356391-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FZpCSy6Skj215f4oWypdc%2Fuploads%2Fgit-blob-36e3c0dfc088140e124e08d018cf78c53de2d019%2Fdg%20test2.png?alt=media" alt="" width="375"><figcaption></figcaption></figure>

This is the data ingestion step. Could be a table, flat file, etc..

***

**Data Grid**

1. Drag and drop the Data Grid step onto the canvas:
2. Double-click to set the properties as outlined below:

<figure><img src="https://3680356391-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FZpCSy6Skj215f4oWypdc%2Fuploads%2Fgit-blob-3005785d8618163a602cc17336ea6f0cc78544dc%2Fdg%20mdi.png?alt=media" alt="" width="375"><figcaption></figcaption></figure>

<figure><img src="https://3680356391-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FZpCSy6Skj215f4oWypdc%2Fuploads%2Fgit-blob-d950b1b99dfeeede85cdc6f4ad31637f0500a012%2Fdg%20mdi2.png?alt=media" alt="" width="375"><figcaption></figcaption></figure>

***

**ETL Metadata Injection**

1. Drag and drop the ETL Metadata Injection step onto the canvas:
2. Double-click to set the properties as outlined below:

<figure><img src="https://3680356391-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FZpCSy6Skj215f4oWypdc%2Fuploads%2Fgit-blob-fd7ca4b4786476f918fa0673d8faf5672abe9497%2Fpush%20etl.png?alt=media" alt=""><figcaption></figcaption></figure>

Filename: ${Internal.Entry.Current.Directory}/tr\_push\_template.ktr

3. Click on the Options tab::
4. Ensure the following details are configured, as outlined below:

<figure><img src="https://3680356391-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FZpCSy6Skj215f4oWypdc%2Fuploads%2Fgit-blob-cbc410ae8bcfb9df4fb7729e1c9158d3324437ca%2Fpush%20options.png?alt=media" alt="" width="375"><figcaption></figcaption></figure>

💡Ensure the 'Run resulting transformation' is checked.

The data is streamed – pushed - from the Test data -input step of the MDI workflow to the Input Stream step of the template.

5. Save the Transformation as:

\~/How-To--Metadata-Injection/Push - Pull Metadata Injection/push mdi

***

**RUN the MDI Workflow**

1\. RUN tr\_push\_mdi.ktr

2\. Open the file located at:

\~/How-To--Metadata-Injection/Push - Pull Metadata Injection/push mdi/push\_mdi\_output.txt

<figure><img src="https://3680356391-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FZpCSy6Skj215f4oWypdc%2Fuploads%2Fgit-blob-bc576788bc426f3140b974dd45f42123ab043789%2Fpush%20-%20tfo.png?alt=media" alt="" width="375"><figcaption></figcaption></figure>
{% endtab %}

{% tab title="Pull" %}
{% hint style="info" %}
Streams the dataset **from** the template.
{% endhint %}

**Start Pentaho Data Integration**

```bash
cd
cd ~/Pentaho/design-tools/data-integration
sh spoon.sh
```

**Template**

Lets start with the template. The template is the workflow that utilizes the metadata injection.

1. Open the template:

\~/How-To--Metadata-Injection/Push - Pull Metadata Injection/pull mdi/pull\_template.ktr

<figure><img src="https://3680356391-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FZpCSy6Skj215f4oWypdc%2Fuploads%2Fgit-blob-98218119b9ef18e64b2f18f0210c8ffa8e61c6f7%2Fpull%20template.png?alt=media" alt="" width="375"><figcaption></figcaption></figure>

**Data Grid – Test data - input**

Meta tab: on this tab, you can specify the field metadata (output specification) of the data

Data tab: This grid contains the data. Everything is entered in String format so make sure you use the correct format masks in the metadata tab.

2. Drag and drop the Data Grid step onto the canvas.
3. Double-click to set the properties as outlined below:

<figure><img src="https://3680356391-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FZpCSy6Skj215f4oWypdc%2Fuploads%2Fgit-blob-f849238255ad5dfa84454dbec7c81aaa35910c60%2Fdg%20test.png?alt=media" alt="" width="375"><figcaption></figcaption></figure>

<figure><img src="https://3680356391-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FZpCSy6Skj215f4oWypdc%2Fuploads%2Fgit-blob-36e3c0dfc088140e124e08d018cf78c53de2d019%2Fdg%20test2.png?alt=media" alt="" width="375"><figcaption></figcaption></figure>

This is the data ingestion step. Could be a table, flat file, etc..

**Select Values**

To configure the Select values step:

1. Drag the Select values step onto the canvas

There’s nothing to configure as the ‘metadata rules’ will be defined in the ETL Metadata Injection step.

<figure><img src="https://3680356391-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FZpCSy6Skj215f4oWypdc%2Fuploads%2Fgit-blob-88535910b2d0c8e6e34a88ab1d8a7ee82ab75406%2FSV%20(1)%20(1).png?alt=media" alt="" width="375"><figcaption></figcaption></figure>

**Dummy**

1. Drag the Dummy step onto the canvas.
2. Rename: dmmy-pull

The data is ‘pulled’ from the transformation template - dmmy-pull step. You will need to manually enter the fieldnames to are 'pulled'.

**Metadata Injection**

<figure><img src="https://3680356391-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FZpCSy6Skj215f4oWypdc%2Fuploads%2Fgit-blob-9745680d4a216c56bd733c7752829ff737303a0a%2Fpull%20MDI.png?alt=media" alt="" width="375"><figcaption></figcaption></figure>

**Data Grid**

To configure the Data Grid step:

1. Drag and drop the Data Grid step onto the canvas.
2. Double-click to set the properties as outlined below:

<figure><img src="https://3680356391-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FZpCSy6Skj215f4oWypdc%2Fuploads%2Fgit-blob-3005785d8618163a602cc17336ea6f0cc78544dc%2Fdg%20mdi.png?alt=media" alt="" width="375"><figcaption></figcaption></figure>

<figure><img src="https://3680356391-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FZpCSy6Skj215f4oWypdc%2Fuploads%2Fgit-blob-d950b1b99dfeeede85cdc6f4ad31637f0500a012%2Fdg%20mdi2.png?alt=media" alt="" width="375"><figcaption></figcaption></figure>

**Text File Output**

1. Drag and drop the Text file output step onto the canvas.
2. Double-click to set the properties as outlined below:

<figure><img src="https://3680356391-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FZpCSy6Skj215f4oWypdc%2Fuploads%2Fgit-blob-716767468e52f404a22b989894f25c92598fa480%2FTFO%20standard.png?alt=media" alt=""><figcaption></figcaption></figure>

**ETL Metadata Injection**

1. Drag and drop the ETL Metadata Injection step onto the canvas.
2. Double-click to set the properties as outlined below:

Filename: ${Internal.Entry.Current.Directory}/pull\_template.ktr

<figure><img src="https://3680356391-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FZpCSy6Skj215f4oWypdc%2Fuploads%2Fgit-blob-5fd66973c8f51b979e49e596fc8ac9431efb1480%2Fetl%20pull.png?alt=media" alt=""><figcaption></figcaption></figure>

3. Click on the Options tab.
4. Ensure the following details are configured, as outlined below:

<figure><img src="https://3680356391-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FZpCSy6Skj215f4oWypdc%2Fuploads%2Fgit-blob-a9becc957931daa8c6d660f891002c5ece41c982%2Fpull%20options.png?alt=media" alt="" width="563"><figcaption></figcaption></figure>

💡Ensure the 'Run resulting transformation' is checked.

The data is streamed – pushed - from the Test data -input step of the MDI workflow to the Input Stream step of the template.

5. Save the Transformation as:

\~/How-To--Metadata-Injection/Push - Pull Metadata Injection/pull mdi/pull\_midi\_output.txt
{% endtab %}
{% endtabs %}
{% endtab %}
{% endtabs %}
