Concepts & Terminology

Understanding the key concepts & lingo ..

Concepts & Terminology

The Data Integration perspective allows you to create two basic workflow types:

Transformations

Transformations are used to describe the data flows for ETL such as reading from a source, transforming data and loading it into a target location.

Jobs

Jobs are used to coordinate ETL activities such as defining the flow and dependencies for what order transformations should be run, or prepare for execution by checking conditions such as, "Is my source file available?" or "Does a table exist in my database?"

Transformations & Jobs

Transformation

Transformations are the workhorses of the ETL process. They are comprised of:

Steps

which provide you with a wide range of functionality ranging from reading text-files to implementing slowly changing dimensions.

Steps executed in parallel.

Hops

help you define the flow of the data in the stream. They represent a row buffer between the Step Output and the next Step Input, as illustrated in the below Transformation. Data flows from the Text file input step to Filter rows to Sort Rows, finally to Table output.

Latest Pentaho Data Integration (aka Kettle) Documentation - Pentaho Data Integration - Pentaho Community Wikipentaho-public.atlassian.net

Steps

There are some key characteristics of Steps:

Step names must be unique in a single Transformation
Virtually all Steps read and write rows of data (exception Generate rows)
Most Steps can have multiple outgoing hops. These can be configured to either copy or distribute the data. Copy ensures all Steps receive a copy of the row of data; Distribute sends the data in a round robin fashion to each of the Steps.
Steps run in their own thread. It’s possible to run multiple copies of the Step, for performance tuning, each in their own thread.
All Steps are executed in parallel, so it’s not possible to define an order of execution.

In addition to Steps, Hops, and Notes enable you to document the Transformation.

PreviousKETTLE Variables NextHello World

Last updated 2 months ago

Was this helpful?

Good afternoon

Concepts & Terminology

Concepts & Terminology

Transformations & Jobs

Transformation

Steps

Parallelism

Adjusting the Queue Size

Data Types

Jobs

Job Entries

Good afternoon

hashtagConcepts & Terminology

hashtagTransformations & Jobs

hashtagTransformation

hashtagSteps

hashtagParallelism

hashtagAdjusting the Queue Size

hashtagData Types

hashtagJobs

hashtagJob Entries

Concepts & Terminology

Transformations & Jobs

Transformation

Steps

Parallelism

Adjusting the Queue Size

Data Types

Jobs

Job Entries