Excel Writer
Working with Excel ..
Workshop - Microsoft Excel Writer
Business reports often need to be delivered in Excel format with professional formatting, branding, and layouts that stakeholders expect. Rather than manually copying data into spreadsheet templates each reporting period, organizations can automate this process using PDI's Excel Writer capability. This approach ensures consistency, eliminates manual errors, and dramatically reduces the time required to produce recurring reports.
In this hands-on workshop, you'll automate Steel Wheels' half-yearly Sales and Expenses Report by building a transformation that populates a pre-formatted Excel template with current data. You'll work with multiple data streams that write to different sections of the same spreadsheet, learning how to control execution order to ensure data writes happen sequentially rather than simultaneously. This practical scenario mirrors real-world reporting requirements where maintaining professional formatting while automating data population is essential.
What You'll Accomplish:
Work with pre-formatted Excel templates and understand cell positioning
Configure the Excel Writer step to write data to specific worksheet locations
Use the Block Until Steps Finish step to control execution order in parallel data flows
Generate dynamic report headers using the Generate Rows step
Read sales and expenses data from text files
Write multiple datasets to different sections of the same Excel workbook
Manage file locking and append operations when writing to Excel
Produce a fully formatted, professional business report automatically
By the end of this workshop, you'll understand how to bridge the gap between raw data and polished business reports. You'll have practical experience with execution blocking—a critical technique when multiple steps need to write to the same file sequentially. Instead of spending hours manually updating spreadsheets each reporting period, you'll build automated solutions that populate professional Excel reports with current data in seconds, maintaining all the formatting and formulas your business users expect.
Prerequisites: Understanding of basic transformation concepts, experience with Text File Input; Pentaho Data Integration installed and configured
Estimated Time: 35 minutes
Create a new Transformation
Any one of these actions opens a new Transformation tab for you to begin designing your transformation.
By clicking File > New > Transformation
By using the CTRL-N hot key

Excel Template
The various stages of the transformation write data to a template.xlsx. The template has 2 worksheets:
Sales Chart - this worksheet creates a 3D stacked graph
SourceData - worksheet.
Open the template.xlsx following template in Excel:

SourceData - the datasheet. Transformations write to the required cells that are used to create the graph.

Write Year
The first workflow is to write the current Year to the SourceData worksheet in the template.xlsx
You can obviously change the Year value ..

Generate rows - Year
Generate rows outputs a specified number of rows. By default, the rows are empty; however, they can contain several static fields. This step is used primarily for testing purposes. It may be useful for generating a fixed number of rows, for example, you want exactly 12 rows for 12 months. Sometimes you may use Generate Rows to generate one row that is an initiating point for your transformation.
Start Pentaho Data Integration.
Windows - PowerShell
Linux
Drag the ‘Generate Rows’ step onto the canvas.
Double-click on the step, and configure the following properties:

Close Step.
Summary
Generates a record that holds the Year value – 2023 – in the year stream field.
The Excel template will also need to be formatted yyyy to interpret the Date.
Excel Writer - Year
The Microsoft Excel Writer step writes incoming rows into an MS Excel file. It supports both the xls and xlsx file formats. The xlsx format is usually a good choice when working with template files, as it is more likely to preserve charts and other misc objects in the output.
The proprietary (binary) xls format is not as well understood and deciphered, so moving/replicating nontrivial xls content in non-MS software environments is usually problematic.
Drag the ‘Excel writer’ step onto the canvas.
Create a hop from the ‘Year’ step.
Double-click on the step, and configure the following properties:

The filename sets the name and output directory for the Excel Workbook:
[current transformation directory]\Sales_and_Expenses_2023.xlsx
The option: replace with new output file, results in a new Excel Workbook file overwriting previous version every time the Transformation is Run.
Write the output to the SourceData Excel worksheet.
Use template.xlsx as the template.
[current transformation directory]\template.xlsx
4. Click on the Content tab, and configure the following properties:

Click on ‘Get Fields’ button.
Click OK.
Write Sales
The second workflow is to repeat the workflow for Sales, with the addition of a ‘Blockstep’, which ensures the ‘Write Year’ workflow has been completed, before writing to the template.xlsx.

Text file input - Read Sales
The Text File Input step is used to read data from a variety of different text-file types. The most commonly used formats include Comma Separated Values (CSV files) generated by spreadsheets and fixed width flat files.
The Text File Input step provides you with the ability to specify a list of files to read, or a list of directories with wild cards in the form of regular expressions. In addition, you can accept filenames from a previous step making filename handling more even more generic.
Drag the ‘Text file input’ step onto the canvas.
Double-click on the step, and configure the following properties:

Click on the Content tab, and configure the following properties:

Ensure the Header is selected.
No empty rows
Mixed Format
Click on the Fields tab, and click on ‘Get Fields’ button:

Returns the Header values as stream fields.
Click OK.
Block until Step Finish - Wait Year
This step simply waits until all the step copies that are specified in the dialog have finished. You can use it to avoid the natural concurrency (parallelism) that exists between transformation step copies.
Step name
Name of the step. Note: This name must be unique in a single transformation.
Watch the following steps
Use this grid to specify the steps to wait for.
Get steps
Push this button to auto-fill the "Watch the following steps" grid with all steps available in the transformation.
Step name
The name of the step to wait for.
CopyNr
The (0-based) copy number of the step. If the named step has an explicit setting for "Change number of copies to start", and you want to wait for all copies to finish, you'll need to enter one row in the grid for each copy, and use this column to specify which copy of the step to wait for. For the default number of copies (1), the CopyNr isalways 0.
Drag the ‘Block this step until steps finish’ step onto the canvas.
Create a hop from the ‘Read Sales’ step.
Double-click on the step, and configure the following properties:

This will result in the workflow being blocked until the Write Year step has been completed.
Excel Writer - Write Sales
The Microsoft Excel Writer step writes incoming rows into an MS Excel file. It supports both the xls and xlsx file formats. The xlsx format is usually a good choice when working with template files, as it is more likely to preserve charts and other misc objects in the output.
The proprietary (binary) xls format is not as well understood and deciphered, so moving/replicating nontrivial xls content in non-MS software environments is usually problematic.
Drag the ‘Excel writer’ step onto the canvas.
Create a hop from the ‘Wait Year’ step.
Double-click on the step, and configure the following properties:

The filename sets the name and output directory for the Excel Workbook:
[current transformation directory]\Sales_and_Expenses_2023.xlsx
The option: Use existing file for writing, writes the data to the existing Sales_and_Expenses_2023.xlsx file
Write the output to the SourceData Excel worksheet.
No need for the template as the data is now being written to the file.
Click on the Content tab, and configure the following properties:

Click on the ‘Get Fields’ button.
Delete the productline field, as its not required. The template already has the fieldname and you are just writing the data, starting at cell B5.
Click OK.

Run Transformation
This workshop illustrates the concept of parallelism. As each Transformation step is initialized in parallel, i.e. their own thread, then blocking steps must be added to the workflow, to prevent data being simultaneously written to the Excel SourceData worksheet.
Click the Run button in the Canvas Toolbar.
Open the Sales_and_Expenses_2023.xlsx file.

Last updated
Was this helpful?






