Hello World
Simple transformation to illustrate key concepts ..
Workshop - Hello World
Every data integration journey begins with understanding the fundamental building blocks. In this classic "Hello World" workshop, you'll create your first Pentaho transformation from scratch, establishing the essential workflow you'll use throughout your career as a PDI developer.
In this hands-on workshop, you'll learn the core mechanics of building transformations in Spoon by assembling a simple data pipeline. You'll work with steps, hops, and annotations—the three foundational elements that form every transformation—while gaining confidence with the drag-and-drop interface and preview capabilities that make PDI development intuitive and efficient.
What You'll Accomplish:
Create a new transformation from scratch
Add and configure transformation steps (Generate Rows and Dummy)
Connect steps using hops to define data flow
Add notes to document your transformation
Preview data at any step to verify your logic
Execute your transformation and interpret the execution metrics
Understand the Logging and Step Metrics tabs for monitoring
By the end of this workshop, you'll have completed your first working transformation and established a repeatable development workflow: add steps, configure properties, preview data, connect with hops, and run. This pattern will serve as the foundation for building increasingly complex data integration solutions throughout the course.
Prerequisites: Pentaho Data Integration installed and configured
Estimated Time: 10 minutes
Create a new Transformation
Any one of these actions opens a new Transformation tab for you to begin designing your transformation.
By clicking File > New > Transformation
By using the CTRL-N hot key

Generate Rows
Generate rows outputs a specified number of rows. By default, the rows are empty; however, they can also contain several static fields. This step is used primarily for testing purposes. It may be useful for generating a fixed number of rows, for example, if you require exactly 12 rows for 12 months.
Sometimes you may use Generate Rows to generate one row that is an initiating point for your transformation.
Start Pentaho Data Integration.
Windows - PowerShell:
Linux:
To add the Generate Rows step, expand the ‘Input’ category in the Design tab, and drag the step onto the canvas.
💡Alternatively, enter ‘Generate Rows’ into the search bar.

Double-click on the Generate Rows to open step properties.

Ensure the following details are configured:
Step name
gr_hello-world
Limit
10
Name
message
Type
string
Value
hello world
Before we close this dialog and continue creating the transformation, let’s make certain the Step generates the data we expect.
Click Preview button. The ‘Enter preview size’ dialog is displayed.

In the ‘Enter preview size’ dialog, click the [OK] button.
Verify 10 rows of data with the message you entered is displayed, and then click the [OK] button to close the ‘Examine preview data’ dialog.
Click OK button to close the ‘Generate Rows’ dialog.
Dummy
The Dummy step does process records. Its primary function is to be a placeholder for testing purposes. For example, to have a transformation, you need at least two steps connected to each other.
To add the Dummy step, expand the ‘Flow’ category in the Design tab, and drag the Dummy step onto the canvas:

Hops
Hops are the I/O buffer in your data stream.
Steps may be configured with specific I/O parameters to meet requirements.
Click on the hello world step.
Hold down the Shift key.
Drag and drop the hop onto the Dummy step.
Release the Shift key.
Add a Note
Right mouse click anywhere on Spoon canvas.
Select: New Note.


Transformation Properties
To view the transformation properties:
Double-click anywhere on the canvas.

💡Optionally, enter a more detailed description in the ‘Extended description’ property.
In Spoon, select Action > Run This Transformation.
Or Click on the Run button in the toolbar ..
The Execute a transformation window appears. You can run a transformation locally, remotely, or in a clustered environment. For the purposes of this exercise, keep the default as Local Execution.
Click Run icon and select Run Options.

In the Run Options panel you can set:
the run configuration - the server pattern (single server or across a cluster)
set the logging level
save the Transformation locally.

The transformation executes.

A green tick confirms the transformation's execution, but doesn't guarantee the success of the underlying operations.
Execution Results
The Execution Results section of the window contains several different tabs that help you to see how the transformation executed, pinpoint errors, and monitor performance.

Logging tab displays logging information for each of the steps in the transformation.

Step Metrics tab provides statistics for each step in your transformation including how many records were read, written, caused an error, processing speed (rows per second) and more. This tab also indicates whether an error occurred in a transformation step.

Metrics will help identify any back pressure on the Steps. In this example the transformation took 30ms to execute. Notice that the steps gr_hello-world & Dummy are initialized at the same time. Each step is executed in parallel, i.e. in their own thread, independent of each other.

Preview tab displays the records.
Viewing the Transformation structure
If you click the View icon in the upper left corner of the screen, the tree will change to show the structure of the transformation currently being edited.

Last updated
Was this helpful?
