Read XML
XML data sources ..
Workshop - Read XML
Read XML from a file, a URL, or a field value. Use Get data from XML.
What you’ll do
Read XML from a local file.
Read XML from a URL (URI).
Use XPath to select nodes and fields.
Use Get Fields to infer the XML structure.
Debug a data type mismatch using the logs.
Prerequisites: Basic transformations. Basic XML (elements, attributes, hierarchy). PDI installed.
Estimated time: 30 minutes
Workshop files
Download these files before you start:
The sample XML input.
The starter transformation (optional).

Create a new transformation
Use any of these options to open a new transformation tab:
Select File > New > Transformation
Use
Ctrl+N(Windows/Linux) orCmd+N(macOS)


Get data from XML
This step provides the ability to read data from any type of XML file using XPath specifications.
Start Pentaho Data Integration.
Drag the ‘Get data from XML’ step onto the canvas.
Double-click on the step, and configure the following properties:

Click on the Content tab, and configure the following properties:

Click on the Fields tab, and then on the ‘Get Fields’ button.

Click OK.
RUN Transformation
The workshop illustrates how to ingest an XML data source. The XML can either stream from:
a previous step (typically a URL)
a file
a stream field (XML stored in a field)
Remember to disable the hops on the second workflow.
Click the Run button in the Canvas Toolbar.
Preview the data.

In this workflow, a URL to an XML data source is parsed via XPath to retrieve the dataset.

In this workshop, you pass the URL in a data stream field.
Copy the URL to your clipboard. You will paste it into the XPath dialog.
Generate rows
Generate rows outputs a specified number of rows. By default, the rows are empty; however, they can contain several static fields. This step is used primarily for testing purposes. It may be useful for generating a fixed number of rows, for example, you want exactly 12 rows for 12 months.
Drag the ‘Generate Rows’ step onto the canvas.
Double-click on the step, and configure the following properties:

Get data from XML
The dataset is being parsed from a stream field xmlUrl that’s being passed on from the ‘Pass URL’ step.
Drag the ‘Get Data from XML’ step onto the canvas.
Create a hop from the ‘Pass URL’ step.
Double-click on the step, and configure the following properties:

Click on the ‘Content’ tab and configure the following properties:

Click on the ‘Fields’ tab and configure the following properties:

Click on the ‘Get Fields’ button.
Next: open the Dummy tab.
RUN the Transformation
Remember to enable the hops and disable the hop in Workflow 1: XML - File
The workflow will fail .. do you know why.?
Click the Run button in the Canvas Toolbar

Check the logs.

Looks like Zone data type is alphanumeric (string), not integer.
Change Zone data type to string and re-run transformation.
Click on the Dummy step and Preview data.

Last updated
Was this helpful?

