Read JSON
Read JSON Objects ..
Workshop - Read JSON
Steel Wheels have several JSON data sources. You will create a simple workflow to extract the required reporting dataset.
In this hands-on workshop, you'll learn to work with PDI's "JSON Input" step to extract data from JSON structures. Steel Wheels receives order and customer data in JSON format from multiple sources, and you'll configure transformations to parse this data using JSON Path expressions. You'll discover how to navigate nested JSON objects and arrays, automatically detect field structures, and handle common data variations like multiple delivery statuses. This practical approach ensures you can integrate JSON data confidently, regardless of complexity.
What You'll Accomplish:
Configure the JSON Input step to read data from JSON files
Use JSON Path notation to navigate objects and extract specific values
Leverage the "Get Fields" feature to automatically discover JSON structure
Understand root ($) and child (.) operators for path navigation
Work with nested JSON objects and key-value pairs
Define appropriate data types for extracted fields (String, Integer, Boolean)
Handle JSON variations where field values differ (e.g., "Delivered" vs "Returned" status)
Build a complete transformation that outputs structured rows from JSON input
By the end of this workshop, you'll have practical experience parsing JSON data sources and understand how to map JSON structures to tabular datasets for reporting and analysis. You'll develop skills in reading JSON Path expressions and recognizing when data type definitions are critical for accurate data processing. Rather than pre-processing JSON with external scripts or tools, you'll build native PDI solutions that handle JSON data directly and efficiently.
Prerequisites: Understanding of basic transformation concepts, familiarity with JSON structure (objects, arrays, key-value pairs); Pentaho Data Integration installed and configured
Estimated Time: 15 minutes
Create a new Transformation
Any one of these actions opens a new Transformation tab for you to begin designing your transformation.
By clicking File > New > Transformation
By using the CTRL-N hot key

View the jsonfile.js
Notice that the delivery status can be Delivered | Returned
Start Pentaho Data Integration.
Windows - PowerShell:
Linux:
JSON Input
The JSON Input step extracts relevant portions out of JSON structures, files or incoming fields, and outputs rows.
Drag the ‘Get data from XML’ step onto the canvas.
Drag the ‘JSON Input’ step onto the canvas.
Double-click on the step, and configure the following properties:

Using the internal variable to point to configure file path.
Click on the ‘Fields’ tab and configure the following properties:

Close the Step.
Dummy
The Dummy step does process records. Its primary function is to be a placeholder for testing purposes. For example, to have a transformation, you need at least two steps connected to each other.
To add the Dummy step, expand the ‘Flow’ category in the Design tab, and drag the Dummy step onto the canvas.
Draw a Hop from the JSON Input to Dummy step.
Click the Run button in the Canvas Toolbar.
Preview the data.

Last updated
Was this helpful?
