# Cross Join

{% hint style="warning" %}
**Workshop - Cross Join**

The CARTESIAN JOIN or CROSS JOIN returns the Cartesian product of the sets of records from two or more joined tables. Thus, it equates to an inner join where the join-condition always evaluates to either True or where the join-condition is absent from the statement .. whatever that means .. basically, its every possible combination.

In this workshop we'll be cross joining first names with middle names and again with our surname.
{% endhint %}

<figure><img src="/files/1RLemhlsRLR0IKoWTJL2" alt=""><figcaption><p>Cross / Cartesian Join</p></figcaption></figure>

<figure><img src="/files/lJ6SBvIQXEXu3MsOGf7P" alt=""><figcaption><p>Cross Joins</p></figcaption></figure>

***

{% tabs %}
{% tab title="English" %}

<figure><img src="/files/H7ajXCDy7BysZd2TeNvb" alt=""><figcaption></figcaption></figure>
{% endtab %}

{% tab title="Second Tab" %}

{% endtab %}
{% endtabs %}

{% hint style="info" %}
**Create a new Transformation**

Any one of these actions opens a new Transformation tab for you to begin designing your transformation.

* By clicking File > New > Transformation
* By using the CTRL-N hot key
  {% endhint %}

{% tabs %}
{% tab title="1. Data Grid" %}
{% hint style="info" %}
You should be familiar with the Data Grid step. Used to list values for first and middle names.
{% endhint %}

<div><figure><img src="/files/YiVVG6w6THmDgleYMZLA" alt=""><figcaption><p>First names</p></figcaption></figure> <figure><img src="/files/Wz3KnlV3tO0WDYh09y24" alt=""><figcaption><p>Middle names</p></figcaption></figure></div>
{% endtab %}

{% tab title="2. Join" %}
{% hint style="info" %}
Joins all possible first\_name, middle\_name combinations together.
{% endhint %}

<figure><img src="/files/YQaHYTKpxLDxfX6o3nQc" alt="" width="375"><figcaption><p>Join rows</p></figcaption></figure>

{% hint style="info" %}
You can also add a condition to constrain the resulting dataset.
{% endhint %}
{% endtab %}

{% tab title="3. UDJE" %}
{% hint style="info" %}
2 new fields are added to the data stream:

* first\_middle\_name: concates the first\_names and middle names.
* initials: returns the 2 character initials.

This method has two variants and returns a new string that is a substring of this string. The substring begins with the character at the specified index and extends to the end of this string or up to endIndex – 1, if the second argument is given.
{% endhint %}

<figure><img src="/files/K7IQXzxKL3FKjF58cSEu" alt="" width="563"><figcaption><p>UDJE - concat names, initials</p></figcaption></figure>
{% endtab %}

{% tab title="4. Get variable" %}
{% hint style="info" %}
Returns the value associated with the ${surname}. This is set in the Parameters tab in Transformation properties.
{% endhint %}

<figure><img src="/files/fqUMIiISnKNL0qMdS6zF" alt="" width="563"><figcaption></figcaption></figure>
{% endtab %}

{% tab title="5. Join" %}
{% hint style="info" %}
Joins all possible first\_middle\_name, surname combinations together. The output for initials is also excludes in the list various initials combinations.
{% endhint %}

<figure><img src="/files/wau5duwmXi7J7a4dmjMq" alt="" width="563"><figcaption><p>Join rows</p></figcaption></figure>
{% endtab %}

{% tab title="6. Select values" %}
{% hint style="info" %}
Determine the order and selection of the data stream fields.
{% endhint %}

<figure><img src="/files/NQRozBNCuyEXDK2LWhPV" alt="" width="563"><figcaption><p>Select values</p></figcaption></figure>
{% endtab %}

{% tab title="7. UDJE" %}
{% hint style="info" %}
2 new fields are added to the data stream:

* boys\_initials: returns the babys’ 3 character initials.
* boys\_name: concates first\_name + middle\_name + surname
  {% endhint %}

<figure><img src="/files/w2imlH2Bm8OQDna7CeKd" alt=""><figcaption><p>UDJE - name</p></figcaption></figure>
{% endtab %}

{% tab title="8. Reservoir Sampling" %}
{% hint style="info" %}

* Returns 5 sampled records.
* Returns Random seed ${seed}
* Reservoir Sampling allows you to select a set number of random records, from an unknown number ‘reservoir’ of records, i.e. not known beforehand.
* Use a different seed value to ensure no two ‘sets’ are the same.
  {% endhint %}

<figure><img src="/files/JGVeho97eBsHIpMytPzp" alt="" width="375"><figcaption><p>Reservoir sampling</p></figcaption></figure>
{% endtab %}

{% tab title="9. RUN" %}
{% hint style="info" %}
**RUN**

The workshop illustrates the use of cross joins to create data sets with every possible combination - unless conditions are set. The final dataset is randomly selected using Reservoir Sampling - a common technique used in ML.
{% endhint %}

1. Click the Run button in the Canvas Toolbar.
2. Click on the Preview tab:

<figure><img src="/files/A1P9cik5lGRWoTnUXxJ8" alt=""><figcaption><p>Preview data</p></figcaption></figure>
{% endtab %}
{% endtabs %}


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://academy.pentaho.com/pentaho-data-integration/data-integration/enrich-data/joins/cross-join.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
