Delete DB table
Workshop - Delete DB Table
Data maintenance and cleansing are essential operations in enterprise data management—from removing obsolete records and purging staging tables to implementing data retention policies and preparing datasets for targeted campaigns. Organizations regularly need to delete specific subsets of data based on complex business rules that go beyond simple SQL statements. Understanding how to perform controlled, transformation-driven deletions ensures your data pipelines can maintain clean, relevant datasets while preserving referential integrity and avoiding unintended data loss.
In this hands-on workshop, you'll learn to use PDI's "Delete" step to remove database records based on complex criteria derived from transformation logic. Steel Wheels is launching a marketing campaign targeting customers who have ordered more than 50 units across their various product lines, requiring deletion of sales data that falls outside these campaign parameters. You'll configure transformations that identify qualifying records through data processing steps, then use the Delete step to remove non-qualifying records from the STG_SALES_DATA table. This approach demonstrates how transformation logic can drive precise, rule-based deletions that would be difficult or impossible to express in a single SQL DELETE statement.
What You'll Accomplish:
Configure the Delete step to remove records from database tables
Define key fields that uniquely identify records for deletion
Understand the critical difference between Pentaho comparators and SQL operators
Map stream fields to table columns for deletion criteria matching
Implement complex deletion logic using multiple comparison conditions (QUANTITYORDERED, PRODUCTLINE)
Configure batch size and commit parameters for bulk deletion operations
Recognize when to use the Delete step versus Execute SQL script or Truncate table
Understand that Delete is a terminal step that doesn't pass rows downstream
Back up data before executing destructive operations
Validate deletion results using database management tools
By the end of this workshop, you'll have practical experience implementing controlled data deletion operations that leverage transformation logic to identify precisely which records should be removed. You'll understand the critical distinction between Pentaho's comparison operators (which work opposite to SQL logic) and recognize when complex deletion scenarios require the Delete step rather than simple SQL statements. Rather than writing complex stored procedures or risking accidental data loss through broad DELETE statements, you'll build native PDI solutions that apply sophisticated business rules to determine deletion candidates, ensuring only the intended records are removed while maintaining data integrity across related tables.
Prerequisites: Understanding of basic transformation concepts, database connection configuration, familiarity with SQL DELETE operations and database keys; Critical: Always back up data before performing delete operations; Pentaho Data Integration installed and configured with appropriate database connections established
Estimated Time: 20 minutes


Create a new Transformation
Any one of these actions opens a new Transformation tab for you to begin designing your transformation.
By clicking File > New > Transformation
By using the CTRL-N hot key
Inspect the data
Before we kick off .. lets take a look at the stg_sales_data table data to get an understanding of what results to expect..
View the STG_SALES_DATA data.

As you can see we have a QUANTITYORDERED for each of our PRODUCTLINES. Each ORDERNUMBER is associated with Customer details.
Execute the following statement.

The results indicate that ORDERNUMBER 10339 where the QUANTITYORDERED is 55 should be the first expected record.
CSV File input
This step provides the ability to read data from a delimited file. The CSV label for this step is a misnomer because you can define whatever separator you want to use, such as pipes, tabs, and semicolons; you are not constrained to using commas. Internal processing allows this step to process data quickly. Options for this step are a subset of the Text File Input step.
Drag the CSV File Input step onto the canvas.
Open the CSV File Input properties dialog box.
Ensure the following details are configured, as outlined below:

Transformation properties
Parameters are associated with local variables in the transformation or job.
Double-click on the canvas and select the Parameter tab.
Ensure the following details are configured, as outlined below:

Get variables
This step allows you to get the value of a variable. This step can return rows or add values to input rows.
You must specify the complete variable specification in the format ${variable} or %%variable%% (as described in Variables) . That means you can also enter complete strings in the variable column, not just a variable.
For example, you can specify: ${java.io.tmpdir}/kettle/tempfile.txt and it will be expanded to /tmp/kettle/tempfile.txt on Unix-like systems. To convert the Variable into a data type other than String use Select Values - Meta Data tab. To get system values, including command line arguments, use the Get System Info step.
Drag the Get variables step onto the canvas.
Open the Get variables properties dialog box.
Ensure the following details are configured, as outlined below:

Delete
The Delete step functions by taking input rows from previous steps in your transformation and using their field values to construct SQL DELETE statements. For each incoming row, it identifies which records to delete by matching the key fields you specify with the corresponding columns in your target database table. This step doesn't require all fields from the table to be present in the input stream - only the key fields needed to uniquely identify records for deletion are necessary.
When configuring the Delete step, you need to specify the target database connection, the table to delete from, and the key fields that will be used to match records. You can also set batch size parameters to optimize performance for bulk operations. The step provides options for commit size, allowing you to control transaction boundaries when deleting large volumes of data.
Be careful when using the Pentaho Comparators ..!
Ensure you have backed up everything before executing as this action is a destructive change.
Drag the Delete step onto the canvas.
Open the Delete properties dialog box.
Ensure the following details are configured, as outlined below:
The Pentaho comparators are the opposite to SQL..!!

The value of the QUANTITYORDERED is set: greater / equal to the min_quantityordered.
PRODUCTLINE values mapped.
Delete records from the STG_SALES_DATA table.
RUN
When the transformation runs, the Delete step takes each incoming row, extracts the values from the specified key fields, and uses them to create and execute a DELETE statement against the target database. The step doesn't pass any rows downstream in the transformation flow - it's considered a terminal step.
This step is commonly used in data maintenance operations, data cleansing workflows, or when implementing slowly changing dimensions in data warehousing.
Check the results in the database table.
The Pentaho Comparators may result in unexpected behaviour.
Run the Transformation.
In your DB management tool View the STG_SALES_DATA table.

Last updated
Was this helpful?
