Excel Writer
Working with Excel ..
Workshop - Microsoft Excel Writer
Business reports often need to be delivered in Excel format with professional formatting, branding, and layouts that stakeholders expect. Rather than manually copying data into spreadsheet templates each reporting period, organizations can automate this process using PDI's Excel Writer capability. This approach ensures consistency, eliminates manual errors, and dramatically reduces the time required to produce recurring reports.
In this hands-on workshop, you'll automate Steel Wheels' half-yearly Sales and Expenses Report by building a transformation that populates a pre-formatted Excel template with current data. You'll work with multiple data streams that write to different sections of the same spreadsheet, learning how to control execution order to ensure data writes happen sequentially rather than simultaneously. This practical scenario mirrors real-world reporting requirements where maintaining professional formatting while automating data population is essential.
What You'll Accomplish:
Work with pre-formatted Excel templates and understand cell positioning
Configure the Excel Writer step to write data to specific worksheet locations
Use the Block Until Steps Finish step to control execution order in parallel data flows
Generate dynamic report headers using the Generate Rows step
Read sales and expenses data from text files
Write multiple datasets to different sections of the same Excel workbook
Manage file locking and append operations when writing to Excel
Produce a fully formatted, professional business report automatically
By the end of this workshop, you'll understand how to bridge the gap between raw data and polished business reports. You'll have practical experience with execution blocking—a critical technique when multiple steps need to write to the same file sequentially. Instead of spending hours manually updating spreadsheets each reporting period, you'll build automated solutions that populate professional Excel reports with current data in seconds, maintaining all the formatting and formulas your business users expect.
Prerequisites: Understanding of basic transformation concepts, experience with Text File Input; Pentaho Data Integration installed and configured
Estimated Time: 35 minutes

Write Year
The first workflow is to write the current Year to the SourceData worksheet in the template.xlsx
You can obviously change the Year value ..

Generate rows - Year
Generate rows outputs a specified number of rows. By default, the rows are empty; however, they can contain several static fields. This step is used primarily for testing purposes. It may be useful for generating a fixed number of rows, for example, you want exactly 12 rows for 12 months. Sometimes you may use Generate Rows to generate one row that is an initiating point for your transformation.
Start Pentaho Data Integration.
Drag the ‘Generate Rows’ step onto the canvas.
Double-click on the step, and configure the following properties:

Close Step.
Excel Writer - Year
The Microsoft Excel Writer step writes incoming rows into an MS Excel file. It supports both the xls and xlsx file formats. The xlsx format is usually a good choice when working with template files, as it is more likely to preserve charts and other misc objects in the output.
The proprietary (binary) xls format is not as well understood and deciphered, so moving/replicating nontrivial xls content in non-MS software environments is usually problematic.
Drag the ‘Excel writer’ step onto the canvas.
Create a hop from the ‘Year’ step.
Double-click on the step, and configure the following properties:

4. Click on the Content tab, and configure the following properties:

Click on ‘Get Fields’ button.
Click OK.
Write Sales
The second workflow is to repeat the workflow for Sales, with the addition of a ‘Blockstep’, which ensures the ‘Write Year’ workflow has been completed, before writing to the template.xlsx.

Text file input - Read Sales
The Text File Input step is used to read data from a variety of different text-file types. The most commonly used formats include Comma Separated Values (CSV files) generated by spreadsheets and fixed width flat files.
The Text File Input step provides you with the ability to specify a list of files to read, or a list of directories with wild cards in the form of regular expressions. In addition, you can accept filenames from a previous step making filename handling more even more generic.
Drag the ‘Text file input’ step onto the canvas.
Double-click on the step, and configure the following properties:

Click on the Content tab, and configure the following properties:

Click on the Fields tab, and click on ‘Get Fields’ button:

Click OK.
Block until Step Finish - Wait Year
This step simply waits until all the step copies that are specified in the dialog have finished. You can use it to avoid the natural concurrency (parallelism) that exists between transformation step copies.
Step name
Name of the step. Note: This name must be unique in a single transformation.
Watch the following steps
Use this grid to specify the steps to wait for.
Get steps
Push this button to auto-fill the "Watch the following steps" grid with all steps available in the transformation.
Step name
The name of the step to wait for.
CopyNr
The (0-based) copy number of the step. If the named step has an explicit setting for "Change number of copies to start", and you want to wait for all copies to finish, you'll need to enter one row in the grid for each copy, and use this column to specify which copy of the step to wait for. For the default number of copies (1), the CopyNr isalways 0.
Drag the ‘Block this step until steps finish’ step onto the canvas.
Create a hop from the ‘Read Sales’ step.
Double-click on the step, and configure the following properties:

Excel Writer - Write Sales
The Microsoft Excel Writer step writes incoming rows into an MS Excel file. It supports both the xls and xlsx file formats. The xlsx format is usually a good choice when working with template files, as it is more likely to preserve charts and other misc objects in the output.
The proprietary (binary) xls format is not as well understood and deciphered, so moving/replicating nontrivial xls content in non-MS software environments is usually problematic.
Drag the ‘Excel writer’ step onto the canvas.
Create a hop from the ‘Wait Year’ step.
Double-click on the step, and configure the following properties:

Click on the Content tab, and configure the following properties:

Click on the ‘Get Fields’ button.
Delete the productline field, as its not required. The template already has the fieldname and you are just writing the data, starting at cell B5.
Click OK.

Run Transformation
This workshop illustrates the concept of parallelism. As each Transformation step is initialized in parallel, i.e. their own thread, then blocking steps must be added to the workflow, to prevent data being simultaneously written to the Excel SourceData worksheet.
Click the Run button in the Canvas Toolbar.
Open the Sales_and_Expenses_2023.xlsx file.

Last updated
Was this helpful?








