⌘Ctrlk

Start your 30-day Pentaho Enterprise Evaluation ..Language

Pentaho Academy

GitBook Assistant

Working...Thinking...

Good afternoon

I'm here to help you with the docs.

⌘Ctrli

AI Based on your context

Pipeline Designer
Concepts & Terminology
- Hello World

Powered by GitBook

On this page

Concepts & Terminology

Concepts & Terminology

Transformations

Transformations are data flows built from interconnected nodes that process and transform data. These files use a .ktr extension and represent a directed graph of data transformation logic.

Nodes are the core building blocks that perform specific tasks like reading files, filtering rows, or writing to databases. Pipeline Designer provides numerous nodes organized by function (Big Data, Input, Output, Transform, etc.). You can add nodes to the canvas by dragging them from the Design pane or double-clicking. Each node runs in its own thread when the transformation executes.

Hops

Hops are the pathways connecting nodes together, shown as arrows in the interface. They define data flow direction and allow schema metadata to pass between steps. While hops may appear to create a sequence, all nodes in transformations are actually start in parallel - as a result, you cannot reliably set a variable in one node and use it in a subsequent downstream node within the same transformation. When data reaches a node with multiple outputs, it can either be copied to all destinations or distributed among them see Data Movement.

Transformations

Jobs

Job Hops in jobs behave differently than in transformations - they control execution order and specify conditions determining which node runs next based on previous results. This creates a sequential flow of control rather than parallel data processing.

Job hop conditions are specified in the following table:

Condition

Description

Unconditional

Specifies that the next job entry will be executed regardless of the result of the originating job entry

Follow when result is true

Specifies that the next job entry will be executed only when the result of the originating job entry is true; this means a successful execution such as, file found, table found, without error, and so on

Follow when result is false

Specifies that the next job entry will only be executed when the result of the originating job entry was false, meaning unsuccessful execution, file not found, table not found, error(s) occurred, and so on

Jobs

Jobs are workflow models that coordinate ETL activities, resources, and dependencies. Unlike transformations that focus on data flow, jobs orchestrate entire processes. A typical job might download FTP files, verify database tables exist, execute transformations to load data, and send error notifications if issues occur.

Job Nodes are the building blocks of jobs. The same job node can be reused multiple times with different configurations. Jobs use a .kjb extension.

Adding Notes to Transformations and Jobs

Notes help document your transformations and jobs by explaining structure, design decisions, business rules, dependencies, and other important aspects for yourself and your team.

x

Working with Notes

Adding a note: Click the Add Note icon in the canvas toolbar. Enter your note content in the dialog box, optionally customize the font, color, and shadow styling, then click Save to place it on the canvas.

Editing a note: Hover over any note to reveal Delete and Edit icons. Click Edit to modify the note's content or formatting, then save your changes.

Repositioning a note: Simply click and drag the note to any location on the canvas.

Deleting a note: Hover over the note and click the Delete icon that appears above it.

Notes are a simple but effective way to maintain clear documentation directly within your PDI workflows, making them easier to understand and maintain over time.

Data Movement

Hops connect nodes in transformations & jobs, with arrows indicating the direction of data flow.

Important Constraints

Loops: Transformations do not allow loops because Spoon relies on previous steps to determine field values, which could cause endless loops. Jobs do allow loops since they execute sequentially, but you must avoid creating endless loops.

Mixed row layouts: Transformations cannot mix rows with different layouts (such as combining table inputs with varying field counts). Mixed layouts cause failures when fields are missing or data types change unexpectedly. The trap detector warns you at design time when steps receive mixed layouts.

Managing Hop Behavior

Click on the three dots that appear when you hover over a node. Select "Data Movement" to specify how data is handled across multiple outgoing hops- either copied, distributed, or load balanced. You can also enable or disable individual hops, useful for testing purposes.

Hover over the node > click on the 3 dots.

From here you can Change the Number of Copies & set Data Movement.

PreviousPipeline Designer NextHello World

Last updated 3 months ago

Was this helpful?

Pentaho Platform

Archive Installation
Data Integration
Business Analytics
CTools
Data Catalog
Data Quality

About

Pentaho
Pentaho Documentation
Pentaho Download Center
Plugins & Addons

Community

Pentaho Community

Legal

Privacy Policy

Hitachi® and Hitachi Vantara® are registered trademarks of Hitachi, Ltd. in the U.S. and other countries. Pentaho® is a registered trademark of Hitachi Vantara LLC in the U.S. and other countries.

Was this helpful?