Update DB table

Workshop - Update DB Table

Data maintenance and synchronization are critical operations in enterprise data management—from correcting employee information and updating customer records to synchronizing data across systems and applying organizational changes. Organizations need reliable methods to modify existing database records based on incoming changes while preserving referential integrity and avoiding duplicate entries. Understanding how to selectively update database records using key-based lookups ensures your data pipelines can maintain accurate, current information across your enterprise systems.

In this hands-on workshop, you'll learn to use PDI's "Update" step to modify existing database records through key-based lookups and field mapping. Steel Wheels needs to update employee information in their database—including promotions, contact changes, and organizational restructuring. You'll configure transformations that look up employee records using unique identifiers (employee numbers), compare incoming data with existing records, and update only the records that have changed. This intelligent update mechanism ensures you don't waste database resources updating records that haven't actually changed while guaranteeing that modified data gets properly synchronized.

What You'll Accomplish:

Configure the Update step to modify existing database records
Define lookup keys to uniquely identify records for updating (EMPLOYEENUMBER)
Map incoming stream fields to target database table columns
Use the "Get Update fields" feature to automatically configure field mappings
Understand how key constraints ensure accurate record matching
Implement conditional updates that only modify records when values differ
Distinguish between Update step behavior (updates only) vs Insert/Update (upsert operations)
Monitor step metrics to verify how many records were actually updated
Validate database changes using SQL queries to confirm updates
Understand DML UPDATE operations and their role in data maintenance

By the end of this workshop, you'll have practical experience maintaining database records through intelligent update operations that respect referential integrity and optimize database performance. You'll understand when to use the Update step (for modifying existing records only) versus the Insert/Update step (for handling both new and existing records), and how key-based lookups ensure you're modifying exactly the intended records. Rather than using separate manual database scripts or risking accidental record creation, you'll build native PDI solutions that surgically update only the records that need modification, ensuring data accuracy while minimizing database overhead.

Prerequisites: Understanding of basic transformation concepts, database connection configuration, familiarity with primary keys and referential integrity; Pentaho Data Integration installed and configured with appropriate database connections established

Estimated Time: 20 minutes

1002, Murphy, Diana,x5800,[email protected],1,1000,CEO

1102, Bondur, Gerard,x5408,[email protected],4,1056,Regional Sales Manager (EMEA)

Create a new Transformation

Any one of these actions opens a new Transformation tab for you to begin designing your transformation.

By clicking File > New > Transformation
By using the CTRL-N hot key

Text file input

The Text File Input step is used to read data from a variety of different text-file types. The most commonly used formats include Comma Separated Values (CSV files) generated by spreadsheets and fixed width flat files.

The Text File Input step provides you with the ability to specify a list of files to read, or a list of directories with wild cards in the form of regular expressions. In addition, you can accept filenames from a previous step making filename handling more even more generic.

Start Pentaho Data Integration.

Windows - PowerShell

Set-Location C:\Pentaho\design-tools\data-integration
.\spoon.bat

Linux

cd
cd ~/Pentaho/design-tools/data-integration
./spoon.sh

Drag the Text File Input step onto the canvas.
Open the Text File properties dialog box.
Ensure the following details are configured, as outlined below:

Because the sample file is located in the same directory where the transformation resides, a good approach to naming the file in a way that is location independent is to use a system variable to parameterize the directory name where the file is located. In our case, the complete filename is:

${Internal.Transformation.Filename.Directory}/employees_update.txt

Click on the ‘Content’ tab and configure the following properties:

Click on ‘Get Fields’ button.
Click on the ‘Fields’ tab and notice the following properties:

Close the step.

RUN

Database operations rely on Keys - unique identifier - to maintain referential integrity.

If no matching record is found based on the key fields, the Update step simply skips that row without performing any action. Unlike the "Insert/Update" step, it doesn't create new records when matches aren't found.

The Update step is particularly useful for data maintenance operations, synchronizing systems, or applying changes to existing database records without disturbing their structure or unaffected fields.

Click the Run button in the Canvas Toolbar.
Click on the Step Metrics tab.

Notice that 2 records were updated with the Update employees step.

Check the records in the database.

select * from EMPLOYEES
where EMPLOYEENUMBER in ('1002','1102');

PreviousRead DB table NextInsert / Update DB

Last updated 12 days ago

Was this helpful?

Good night

hashtagWorkshop - Update DB Table

Workshop - Update DB Table