Delete DB table

Workshop - Delete DB Table

Data maintenance and cleansing are essential operations in enterprise data management—from removing obsolete records and purging staging tables to implementing data retention policies and preparing datasets for targeted campaigns. Organizations regularly need to delete specific subsets of data based on complex business rules that go beyond simple SQL statements. Understanding how to perform controlled, transformation-driven deletions ensures your data pipelines can maintain clean, relevant datasets while preserving referential integrity and avoiding unintended data loss.

In this hands-on workshop, you'll learn to use PDI's "Delete" step to remove database records based on complex criteria derived from transformation logic. Steel Wheels is launching a marketing campaign targeting customers who have ordered more than 50 units across their various product lines, requiring deletion of sales data that falls outside these campaign parameters. You'll configure transformations that identify qualifying records through data processing steps, then use the Delete step to remove non-qualifying records from the STG_SALES_DATA table. This approach demonstrates how transformation logic can drive precise, rule-based deletions that would be difficult or impossible to express in a single SQL DELETE statement.

What You'll Accomplish:

Configure the Delete step to remove records from database tables
Define key fields that uniquely identify records for deletion
Understand the critical difference between Pentaho comparators and SQL operators
Map stream fields to table columns for deletion criteria matching
Implement complex deletion logic using multiple comparison conditions (QUANTITYORDERED, PRODUCTLINE)
Configure batch size and commit parameters for bulk deletion operations
Recognize when to use the Delete step versus Execute SQL script or Truncate table
Understand that Delete is a terminal step that doesn't pass rows downstream
Back up data before executing destructive operations
Validate deletion results using database management tools

By the end of this workshop, you'll have practical experience implementing controlled data deletion operations that leverage transformation logic to identify precisely which records should be removed. You'll understand the critical distinction between Pentaho's comparison operators (which work opposite to SQL logic) and recognize when complex deletion scenarios require the Delete step rather than simple SQL statements. Rather than writing complex stored procedures or risking accidental data loss through broad DELETE statements, you'll build native PDI solutions that apply sophisticated business rules to determine deletion candidates, ensuring only the intended records are removed while maintaining data integrity across related tables.

Prerequisites: Understanding of basic transformation concepts, database connection configuration, familiarity with SQL DELETE operations and database keys; Critical: Always back up data before performing delete operations; Pentaho Data Integration installed and configured with appropriate database connections established

Estimated Time: 20 minutes

Create a new Transformation

Any one of these actions opens a new Transformation tab for you to begin designing your transformation.

By clicking File > New > Transformation
By using the CTRL-N hot key

Inspect the data

Before we kick off .. lets take a look at the stg_sales_data table data to get an understanding of what results to expect..

View the STG_SALES_DATA data.

As you can see we have a QUANTITYORDERED for each of our PRODUCTLINES. Each ORDERNUMBER is associated with Customer details.

Execute the following statement.

select * from STG_SALES_DATA
where QUANTITYORDERED > '50';

The results indicate that ORDERNUMBER 10339 where the QUANTITYORDERED is 55 should be the first expected record.

PreviousInsert / Update DB NextSCDs

Last updated 12 days ago

Was this helpful?

Good evening

hashtagWorkshop - Delete DB Table

Workshop - Delete DB Table