# Hierarchical Data Type

{% hint style="info" %}

#### Hierarchical Data Types

A hierarchical data type represents a hierarchical structure of data, where each data element has a parent-child relationship with other data elements. A hierarchical data type can be used to store and query data that is organized in a tree-like fashion, such as organizational charts, file systems, or taxonomies.

A hierarchical data type has some advantages, such as compactness, depth-first ordering, and support for arbitrary insertions and deletions. However, it also has some limitations, such as the need for application logic to maintain the tree structure, the difficulty of handling multiple parents or complex relationships, and the lack of standardization across different database systems.

A common example is employees and managers: employees and managers are both employees of a company. A manager can have employees they manage, and can also have a manager themselves.
{% endhint %}

<div><figure><img src="/files/nNhVEYUFoQVJ05vdWYqf" alt=""><figcaption><p>Adjacency List - Hierarchical Data</p></figcaption></figure> <figure><img src="/files/pYj89bcLnOvo53eaah0I" alt=""><figcaption><p>Company</p></figcaption></figure></div>

{% hint style="info" %}
Hierarchical Data Type (HDT) is a new datatype in PDI for handling structured/complex/nested datatype based on JSON / YAML (v10.1 release) format.

There are 7 new plugins/steps:

• **Hierarchical JSON Input** - is used to get data in HDT from file / previous steps and convert it into JSON formatted string.

• **Hierarchical JSON Output** -

• **Hierarchical YAML Input** - is used to get data in HDT from file / previous steps and convert it into ? formatted string.

• **Hierarchical YAML Output** -

• **Extract to Rows** -

• **Modify values from a single row** -

• **Modify values from grouped rows** -
{% endhint %}

{% tabs %}
{% tab title="Installation HDT Plugin" %}
{% tabs %}
{% tab title="NEW - Plugin Manager 11" %}
x
{% endtab %}

{% tab title="Pentaho v10" %}
{% hint style="warning" %}
As part of the Pentaho Data Integration & Analytics plugin release journey to decouple plugins from the core Pentaho Server, **Pentaho EE 9.5 GA** is releasing new plugins and enhancements to its existing plugin collection.
{% endhint %}

1. Log into the 'Pentaho Support Portal' and download the plugin.

{% embed url="<https://support.pentaho.com/hc/en-us/articles/17591496360589-Pentaho-EE-Marketplace-Plugins-Release>" %}
Download Plugins
{% endembed %}

2. Select the Pentaho version.

<figure><img src="/files/nO5JkH1o8XQpnZjNEpVX" alt=""><figcaption><p>EE Plugin versions</p></figcaption></figure>

3. Download selected plugin(s).

<figure><img src="/files/eCPm2J162BaurfxBqy3B" alt=""><figcaption><p>EE Plugins</p></figcaption></figure>

4. Extract HDT plugin.

```bash
cd
cd ~/Downloads
unzip hierarchical-datatype-plugin-10.1.0.0-317-dist.zip .
```

4. Install HDT plugin.

```bash
cd
cd ~/Downloads/hierarchical-datatype-plugin-10.1.0.0-317-dist/hierarchical-datatype-plugin-10.1.0.0-317
./install.sh
```

5. Accept License Agreement -> Next

<figure><img src="/files/oJgxp1017zF0vQkLBOTl" alt=""><figcaption><p>Accept License</p></figcaption></figure>

6. Browse to ../data-integration/plugins directory

<figure><img src="/files/zOF87TrG7htXKnW8GVsL" alt=""><figcaption><p>Install to plugins directory</p></figcaption></figure>

7. Click 'Next' and accept overwrite warning.

<figure><img src="/files/Ru9ITqVkegtHCkPJgtPN" alt=""><figcaption><p>Installation successful</p></figcaption></figure>

8. **Restart** Pentaho Data Integration & check for Hierarchical folder.

<figure><img src="/files/vQJEbZKRXHvJWgPeYrKC" alt="" width="252"><figcaption><p>Hierarchical</p></figcaption></figure>
{% endtab %}
{% endtabs %}

{% endtab %}

{% tab title="JSON" %}
{% hint style="info" %}
The following Labs highlight some of the Use Cases
{% endhint %}

{% tabs %}
{% tab title="Extract rows" %}
{% hint style="info" %}
The Extract rows step is obvious .. Working in combination with the Hierarchical JSON Input step you are able to filter and extract specific row(s).
{% endhint %}

1. Open the following transformation:

\~/Workshop--Pentaho-Data-Integration/Module 3

<figure><img src="/files/Jh1BbosbNXQkVcyW42fh" alt=""><figcaption><p>Extract rows</p></figcaption></figure>

{% tabs %}
{% tab title="Hierarchical JSON Input" %}
{% hint style="info" %}
You can use the Hierarchical JSON input step to load JSON data into PDI from a file / previous step.

Filters to load only the desired data. The data can be split on a hierarchical data path using wildcards.
{% endhint %}

**Source tab**

1. Double-click on the Hierarchical JSON Input step to see how its configured.
2.

<table><thead><tr><th width="213">Option/Field</th><th>Description</th></tr></thead><tbody><tr><td>From file</td><td>Select to specify the file path and name of the JSON file you want to load into PDI.</td></tr><tr><td>File name</td><td>File path and name of the JSON file to load.</td></tr><tr><td>From field</td><td>Select to use an incoming field as the JSON file path.</td></tr><tr><td>Field with file name</td><td>The incoming field containing the JSON file path.</td></tr></tbody></table>

***

**Output**

{% hint style="info" %}
The Split rows across path option is especially useful when loading JSON array objects within large JSON files.

When you use the Split rows across path field you must specify all filter paths rooted at the split path. If you do not use the Split rows across path field a normal HDT extraction path is used.
{% endhint %}

1. Click on the Output tab.

<figure><img src="/files/DBjvrbnrV3UNAw6R7p19" alt=""><figcaption><p>Output tab</p></figcaption></figure>

<table><thead><tr><th width="227">Field</th><th>Description</th></tr></thead><tbody><tr><td>Output field</td><td>Specify the field name for output column.</td></tr><tr><td>Split rows across path</td><td>Specify the JSON path to be parsed.</td></tr></tbody></table>

{% hint style="info" %}
In this example, suppose this JSON file contained other hierarchies based on business units, salary, managers, etc .. The split rows across path: $.employees\[\*] references all the employees fields, the syntax referencing the path to employees from the root.
{% endhint %}

***

**Filters**

{% hint style="info" %}
Use the Path field (Optional) to specify the filters to apply while using the Split rows across path option to fetch the subset of a JSON file.
{% endhint %}

1. Click on the Filters tab.

<figure><img src="/files/wQN7LrBN3nI14k4ZUKTk" alt=""><figcaption><p>Filters tab</p></figcaption></figure>

{% hint style="info" %}
Pretty straightforward .. just filtering for: firstName, lastName & address in employees.
{% endhint %}
{% endtab %}

{% tab title="Extract rows" %}
{% hint style="info" %}
You can use the Extract to row step to parse hierarchical data type fields coming from a previous step and put it into the PDI stream. This step supports wildcards for arrays and for string keys. After parsing the data, a data type is assigned to the data.
{% endhint %}

1. Double-click on the Extract to rows step to see how its configured.

<figure><img src="/files/xAaojWacAHgDQGTNCTng" alt=""><figcaption><p>Extract to rows</p></figcaption></figure>

<table><thead><tr><th width="229">Option</th><th>Description</th></tr></thead><tbody><tr><td>Step name</td><td>Specifies the unique name of the Extract to rows step on the canvas. You can customize the name or leave it as the default.</td></tr><tr><td>Source hierarchical field</td><td>Specifies the hierarchical input field name from the previous step, which will be used to extract the data.</td></tr><tr><td>Pass through fields</td><td>Select to add the input fields to the output fields.</td></tr></tbody></table>

**Fields**

<table><thead><tr><th width="230">Field</th><th>Description</th></tr></thead><tbody><tr><td>Hierarchical data path</td><td>Complete path of the field name in the hierarchical field source.</td></tr><tr><td>Output field name</td><td>Name of the field that maps to the corresponding field in the hierarchical input source.</td></tr><tr><td>Type</td><td>Data type of the generated output field.</td></tr><tr><td>Path field name</td><td>(Optional) Adds the hierarchical path as a new output field with the specified name.</td></tr></tbody></table>

{% hint style="info" %}
So from the 'employees' data stream field the firstName, lastName and address are extracted.

The address path is referenced in another datastream field: address\_Path
{% endhint %}
{% endtab %}

{% tab title="Hierarchical JSON Output" %}
{% hint style="info" %}
Use the Hierarchical JSON output step to convert hierarchical data from a previous step into JSON format.
{% endhint %}

1. Double-click on the Hierarchical JSON Output step to see how its configured.

<figure><img src="/files/y9hML4xAHcQoIlXOdQ9s" alt=""><figcaption><p>Hierarchical JSON Output</p></figcaption></figure>

<table><thead><tr><th width="222">Field</th><th>Description</th></tr></thead><tbody><tr><td>Input hierarchical field</td><td>Specifies the hierarchical input field name from a previous step which is formatted to the JSON format.</td></tr><tr><td>Output field</td><td>Specifies the step output field to contain the generated JSON output.</td></tr></tbody></table>

**Options**

<table><thead><tr><th width="223">Option</th><th>Description</th></tr></thead><tbody><tr><td>Pass output to servlet</td><td>Select to return the data using a web service instead of passing it to output rows.</td></tr><tr><td>Pretty print?</td><td>Select to format the output JSON data.</td></tr></tbody></table>

{% hint style="info" %}
The employees data stream field is formatted as a JSON Object and outputted to the JSON Employee Details datastream field.
{% endhint %}
{% endtab %}

{% tab title="RUN" %}

1. RUN the transformation and 'Preview data'.

<figure><img src="/files/vLvrSKDPvRo5P958NDav" alt=""><figcaption><p>Preview data</p></figcaption></figure>

{% hint style="info" %}
The JSON output can be consumed further downstream.
{% endhint %}
{% endtab %}
{% endtabs %}
{% endtab %}

{% tab title="SET JSON Object" %}
{% hint style="info" %}
x
{% endhint %}

x

x

x

{% tabs %}
{% tab title="Data Grid" %}
x

x

x
{% endtab %}

{% tab title="Modify Values from single row" %}
x

x

x
{% endtab %}

{% tab title="Hierarchical JSON Output" %}
x

x

x
{% endtab %}

{% tab title="RUN" %}
x

x

x

x
{% endtab %}
{% endtabs %}
{% endtab %}
{% endtabs %}
{% endtab %}
{% endtabs %}


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://academy.pentaho.com/pentaho-data-integration/data-integration/ee-plugins/hierarchical-data-type.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
