Error Handling
Handling errors in a transformation ..
Workshop - Error Handling
In production environments, data quality issues are inevitable - invalid dates, malformed numbers, unexpected nulls, and format mismatches occur regularly in real-world datasets. Rather than allowing your entire transformation to fail when encountering bad data, Pentaho Data Integration provides sophisticated error handling capabilities that let you gracefully capture, route, and process problem records separately from valid data.
In this hands-on workshop, you'll learn to implement PDI's error handling framework by building a transformation that processes CSV data with intentionally problematic date values. This approach ensures your data pipelines are resilient and provide visibility into data quality issues.
What You'll Accomplish:
Configure a CSV File Input step to read source data
Intentionally create a data format error to trigger error handling
Configure error hops to capture and route problem records
View error metadata fields (error descriptions, field names, error codes)
Correct the format issue and verify successful data processing
By the end of this workshop, you'll understand how to build fault-tolerant transformations that continue processing valid records while isolating problematic data for investigation. Instead of cryptic failures and incomplete loads, you'll create pipelines that provide clear visibility into exactly which records failed and why- a critical capability for maintaining data quality and meeting service level agreements in production environments.
Prerequisites: Completion of previous workshops (Hello World, Logging); Pentaho Data Integration installed and configured
Estimated Time: 10 minutes

x
CSV file input
The CSV File Input step reads data from delimited text files into a PDI transformation. While this step is called CSV File Input, you can also use CSV File Input with many other separator types, such as pipes, tabs, and semicolons.
Note: The semicolon (;) is set as the default separator type for this step.
Double-click to edit the CSV file input step.

Set the following metadata properties for: birthdate
birthdate
date
yyyy/MM/dd
Double-click on the white diagonal cross in the red error hop:

Nr of errors fieldname
It is an integer field that will define how many errors are being found in a field.
Error descriptions fieldname
It holds the data the errors like “Error inserting row”,” Data truncation: Out of range value adjusted for column ‘id’ at row 1″, etc.
Error Field Fieldname
Display the fields where it throws an error.
Error Codes Fieldname
Error codes values.
Last updated
Was this helpful?


