Insert / Update DB
Workshop - Insert/Update DB Table
Modern data integration requires intelligent synchronization strategies that handle both new records and modifications to existing data - from employee onboarding and organizational changes to customer updates and product catalog maintenance. Organizations face a common challenge: incoming data feeds contain a mix of new records that need insertion and updates to existing records, and manually separating these requires complex logic and separate processing paths. Understanding how to perform "upsert" operations (update-or-insert) in a single transformation step streamlines data synchronization and ensures your databases accurately reflect the current state of your enterprise information.
In this hands-on workshop, you'll learn to use PDI's "Insert/Update" step to intelligently handle mixed data scenarios where some records need insertion while others require updates. Steel Wheels is managing employee lifecycle events - new hires joining the organization and existing employees receiving promotions or contact information changes. You'll configure transformations that automatically determine whether each incoming record represents a new employee (requiring INSERT) or an existing employee with changes (requiring UPDATE), handling both scenarios seamlessly in a single processing step. This approach eliminates the need for complex branching logic or separate transformation paths.
What You'll Accomplish:
Configure the Insert/Update step to perform upsert (update-or-insert) operations
Define lookup keys to uniquely identify existing records (EMPLOYEENUMBER)
Map incoming stream fields to target database table columns using "Get Update fields"
Understand the three-way logic: insert if not found, update if changed, skip if unchanged
Distinguish between Insert/Update step behavior versus separate Update and Table Output steps
Configure key constraints and field mappings to ensure accurate record matching
Avoid the common mistake of updating key fields themselves (flag configuration)
Monitor step metrics to understand insert vs update operation counts
Validate database operations using SQL queries to verify both insertions and updates
Apply upsert patterns to incremental data loads and system synchronization scenarios
By the end of this workshop, you'll have practical experience implementing intelligent data synchronization that adapts to mixed record scenarios without manual classification or complex branching logic. You'll understand when to use Insert/Update (for handling both new and modified records) versus Update alone (for modifications only) or Table Output alone (for new records only), and how this pattern optimizes incremental data loading where each batch contains both additions and changes. Rather than building separate processing paths for inserts and updates or risking duplicate records, you'll build native PDI solutions that automatically route each record to the appropriate database operation, ensuring data accuracy while simplifying transformation design and maintenance.
Prerequisites: Understanding of basic transformation concepts, database connection configuration, familiarity with primary keys, INSERT and UPDATE operations; Pentaho Data Integration installed and configured with appropriate database connections established
Estimated Time: 20 minutes

1188,Firrelli,Julianne,x2174,[email protected],2,1143,Sales Manager 1619,King,Tom,x6324,[email protected],6,1088,Sales Rep 1810,Lundberg,Anna,x910,[email protected],2,1143,Sales Rep 1811,Schulz,Chris,x951,[email protected],2,1143,Sales Rep
Create a new Transformation
Any one of these actions opens a new Transformation tab for you to begin designing your transformation.
By clicking File > New > Transformation
By using the CTRL-N hot key
Text file input
The Text File Input step is used to read data from a variety of different text-file types. The most commonly used formats include Comma Separated Values (CSV files) generated by spreadsheets and fixed width flat files.
The Text File Input step provides you with the ability to specify a list of files to read, or a list of directories with wild cards in the form of regular expressions. In addition, you can accept filenames from a previous step making filename handling more even more generic.
Start Pentaho Data Integration.
Windows - PowerShell
Linux
Drag the Text File Input step onto the canvas.
Open the Text File properties dialog box.
Ensure the following details are configured, as outlined below:

Because the sample file is located in the same directory where the transformation resides, a good approach to naming the file in a way that is location independent is to use a system variable to parameterize the directory name where the file is located. In our case, the complete filename is:
${Internal.Transformation.Filename.Directory}/employees_insert_update.txt
Click on the ‘Content’ tab and configure the following properties:

Click on ‘Get Fields’ button.
Click on the ‘Fields’ tab and notice the following properties:

Close the step.
Insert / Update
The Insert/Update step first looks up a row in a table using one or more lookup keys. If the row can't be found, it inserts the row. If it can be found and the fields to update are the same, nothing is done. If they are not all the same, the row in the table is updated.
If you have multiple rows with the same keys that match, only the first row found is compared. This may lead to different results, depending on if the found row matches with given values or not. The update scenario looks like this: If a difference is found in the case of multiple rows with the same key, an UPDATE statement is fired against the database that updates all rows with the matching keys. This note also applies to the Update step.
Drag the Insert / Update Input step onto the canvas.
Open the Insert / Update properties dialog box.
Ensure the following details are configured, as outlined below:

Set the Key to lookup the values
Select EMPLOYEENUMBER from the table to equal EMPLOYEE_NUMBER from the datastream.
You can set any number of constraints to ensure unique values are looke up.
Get Update fields
Ensure the Table and Stream fields are correctly mapped.
Edit Mapping
Maps the fields
Do not set the Flag to Update the Keys.
RUN
This step is particularly useful for incremental data loads, data synchronization between systems, and scenarios where you need to either create new records or modify existing ones in a single operation without having to use separate update and insert steps.
Click the Run button in the Canvas Toolbar.
Click on the Step Metrics tab:

Notice that the Insert / Update Employees step has written 4 records and updated 2 .. therefore, 2 records have been inserted..!
Check the records in the database.

Last updated
Was this helpful?
