Concepts & Terminology
Transformations
Transformations are data flows built from interconnected nodes that process and transform data. These files use a .ktr extension and represent a directed graph of data transformation logic.
Nodes are the core building blocks that perform specific tasks like reading files, filtering rows, or writing to databases. Pipeline Designer provides numerous nodes organized by function (Big Data, Input, Output, Transform, etc.). You can add nodes to the canvas by dragging them from the Design pane or double-clicking. Each node runs in its own thread when the transformation executes.

Hops
Hops are the pathways connecting nodes together, shown as arrows in the interface. They define data flow direction and allow schema metadata to pass between steps. While hops may appear to create a sequence, all nodes in transformations are actually start in parallel - as a result, you cannot reliably set a variable in one node and use it in a subsequent downstream node within the same transformation. When data reaches a node with multiple outputs, it can either be copied to all destinations or distributed among them see Data Movement.
Transformations


Jobs

Job hop conditions are specified in the following table:
Unconditional
Specifies that the next job entry will be executed regardless of the result of the originating job entry
Follow when result is true
Specifies that the next job entry will be executed only when the result of the originating job entry is true; this means a successful execution such as, file found, table found, without error, and so on
Follow when result is false
Specifies that the next job entry will only be executed when the result of the originating job entry was false, meaning unsuccessful execution, file not found, table not found, error(s) occurred, and so on
Jobs
Jobs are workflow models that coordinate ETL activities, resources, and dependencies. Unlike transformations that focus on data flow, jobs orchestrate entire processes. A typical job might download FTP files, verify database tables exist, execute transformations to load data, and send error notifications if issues occur.
Job Nodes are the building blocks of jobs. The same job node can be reused multiple times with different configurations. Jobs use a .kjb extension.

Data Movement
Hops connect nodes in transformations & jobs, with arrows indicating the direction of data flow.
Important Constraints
Loops: Transformations do not allow loops because Spoon relies on previous steps to determine field values, which could cause endless loops. Jobs do allow loops since they execute sequentially, but you must avoid creating endless loops.
Mixed row layouts: Transformations cannot mix rows with different layouts (such as combining table inputs with varying field counts). Mixed layouts cause failures when fields are missing or data types change unexpectedly. The trap detector warns you at design time when steps receive mixed layouts.
Managing Hop Behavior
Click on the three dots that appear when you hover over a node. Select "Data Movement" to specify how data is handled across multiple outgoing hops- either copied, distributed, or load balanced. You can also enable or disable individual hops, useful for testing purposes.
Hover over the node > click on the 3 dots.

From here you can Change the Number of Copies & set Data Movement.

Last updated
Was this helpful?
