display-medicalText File Input

Ingest semi-structured text files into clean rows.

circle-exclamation

Workshop - Text File Input

Text File Input

circle-info

Workshop files

Download the following files.

Keep the filenames unchanged.

Save them in your workshop folder.

file-download
26KB
file-download
665B

circle-info

Create a new transformation

Use any of these options to open a new transformation tab:

  • Select File > New > Transformation

  • Use Ctrl+N (Windows/Linux) or Cmd+N (macOS)

Text files

circle-info

Review the input file first. It will guide your parsing approach.

orders.txt
circle-info

What to notice:

  • Each order spans multiple lines.

  • Line 3 contains two values: order status and order date.

  • Order value includes a currency symbol ($).

  • There is inconsistent whitespace.

circle-info

Approach

You will:

  • Flatten multi-line records into a single row.

  • Extract values into new fields (capture groups).

  • Clean strings (remove labels and currency symbols).

  • Set data types and formats (date and number).

String Operations

circle-info

Text File Input

Use Text file input to read the raw lines from orders.txt. Treat each line as a single string field for now.

  1. Start Pentaho Data Integration (Spoon).

circle-info
  1. In the Design tab, expand the Input category.

  2. Drag Text file input onto the canvas.

circle-info

Tip: You can also search for Text file input.

  1. Double-click the step. Configure the file path:

Add path to file
circle-info

Because the sample file is located in the same directory where the transformation resides, a good approach to naming the file in a way that is location independent is to use a system variable to parameterize the directory name where the file is located. In our case, the complete filename is:

${Internal.Transformation.Filename.Directory}/orders.txt

  1. Select the Content tab. Configure it like this:

Text file input - Content
  1. Select Fields. Select Get Fields.

Text File input - Fields
circle-info

The step returns one field named Field1. It has type String.

  1. Optional: rename the step to Read orders.

  2. Select OK.

Last updated

Was this helpful?