# Merge

{% hint style="info" %}
**Introduction**

In Pentaho Data Integration (PDI), true record merging differs from joining and focuses on combining or consolidating duplicate records into single entries:

The Append operation simply stacks records from two input streams. All rows from both streams appear in the output without any sorting or matching logic applied.

With Append, the output contains all records from the first stream followed immediately by all records from the second stream. Both input streams must share the same structure with compatible field types.
{% endhint %}

<figure><img src="/files/61vAQb6U0mvaKKUHBdk5" alt=""><figcaption><p>Merge streams</p></figcaption></figure>

{% hint style="info" %}
The Sorted Merge operation interleaves records from both input streams based on a predetermined sort order. This creates an integrated output where records are organized by their values.

For Sorted Merge to work properly, both input streams must be pre-sorted on the same field(s) before reaching the merge step. The operation preserves all records while maintaining the specified sort order.

Unlike joining operations, neither of these merging methods matches records based on key fields. They simply combine complete datasets according to different organizing principles - stacking for Append and interleaving by sort order for Sorted Merge.

Both techniques are valuable when you need to process records from multiple sources while maintaining all original data points.
{% endhint %}

<figure><img src="/files/BhvLWu30NspEVJ3d6mun" alt=""><figcaption><p>Sorted Merge</p></figcaption></figure>

***

{% hint style="info" %}
**Workshops**

The Dummy step in Pentaho Data Integration is a simple "do nothing" transformation that passes data through unchanged. It serves as a placeholder, helps join multiple streams, creates empty data rows when needed, and improves transformation organization.

The Merge Rows step compares two input data streams with identical structures to identify differences between them. It requires configuration of reference and compare streams, key fields for matching rows, and value fields to compare. The step outputs a single stream with all rows plus a "flagfield" indicating if each row is identical, changed, new, or deleted. This functionality is particularly useful for change data capture, data synchronization, audit trails, and implementing slowly changing dimensions.
{% endhint %}

{% tabs %}
{% tab title="Merge stream" %}
{% hint style="info" %}
**Merge stream - Dummy**

The Transformation underlines the ‘rules’ for manipulating data streams. Each data stream must have the same structure / layout, before they can be merged.

In this guided demonstration, you will merge data streams based on a set of rules:

• Add constant step
{% endhint %}

<figure><img src="/files/MZ7h0n4NplYDleSK9XY8" alt=""><figcaption><p>Merge streams</p></figcaption></figure>

{% content-ref url="/pages/TITwLgS97OVKMAbTSxKD" %}
[Merge Streams](/pentaho-data-integration/data-integration/enrich-data/merge/merge-streams.md)
{% endcontent-ref %}
{% endtab %}

{% tab title="Merge Rows (diff)" %}
{% hint style="info" %}
**Merge rows (diff)**

The Merge Rows (diff) compares the values between the merging rows and sets a ‘flag’.

In this guided demonstration, you will compare incoming records with reference records and then determine whether the record is Identical or needs updating, inserting, deleting:

• Merge Rows (diff) stream

• Merge Rows (diff) database
{% endhint %}

<figure><img src="/files/8mJPfTzw9HQLWoKu13XO" alt=""><figcaption><p>Merge Rows (diff)</p></figcaption></figure>

{% content-ref url="/pages/1jk5exCMFFTeAUXWCl7s" %}
[Merge Rows (diff)](/pentaho-data-integration/data-integration/enrich-data/merge/merge-rows-diff.md)
{% endcontent-ref %}
{% endtab %}
{% endtabs %}


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://academy.pentaho.com/pentaho-data-integration/data-integration/enrich-data/merge.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
