Create DB table

Create tables ..

Workshop - Create DB Table

Database table creation is a fundamental operation in data integration workflows—from staging areas and temporary working tables to persistent data marts and operational data stores. Organizations need efficient methods to create database tables dynamically as data structures evolve and new data sources are onboarded. Understanding how to programmatically create and populate database tables through PDI ensures your data pipelines can adapt to changing requirements without manual database administration intervention.

In this hands-on workshop, you'll learn to use PDI's "Table Output" step to both create database tables and insert data in a single transformation. Steel Wheels needs to establish new database tables for their reporting systems, and you'll configure transformations that automatically generate the necessary SQL CREATE TABLE statements based on your incoming data stream structure. You'll leverage PDI's automatic field mapping, execute SQL directly from the transformation designer, and understand performance optimization options like commit size and batch updates for efficient data loading.

What You'll Accomplish:

Configure the Table Output step to create new database tables
Use the "Get Fields" feature to automatically map stream fields to table columns
Generate SQL CREATE TABLE statements from your data stream structure
Execute SQL statements directly from the Table Output step dialog
Verify field mappings between incoming stream fields and target table columns
Understand DML INSERT operations and how Table Output implements them
Configure performance options including commit size and batch update settings
Create and populate tables in a single transformation execution
Validate table creation and data insertion using database management tools

By the end of this workshop, you'll have practical experience creating database tables programmatically through your PDI transformations rather than requiring separate DDL scripts or manual database administration. You'll understand when to use automatic table creation versus working with pre-existing table structures, and how to optimize data loading performance through batch operations. This approach enables self-service data integration where transformations can create their own working tables as needed, reducing dependencies on database administrators and accelerating development cycles.

Prerequisites: Understanding of basic transformation concepts, database connection configuration, familiarity with database tables and SQL concepts; Pentaho Data Integration installed and configured with appropriate database connections established

Estimated Time: 30 minutes

Create a new Transformation

Any one of these actions opens a new Transformation tab for you to begin designing your transformation.

By clicking File > New > Transformation
By using the CTRL-N hot key

Load Sales Data

Load sales data from a csv file into: STG_SALES_DATA table.

Modify the VARCHAR to prevent the value being truncated.

Follow the steps outlined below:

CSV File input

This step provides the ability to read data from a delimited file. The CSV label for this step is a misnomer because you can define whatever separator you want to use, such as pipes, tabs, and semicolons; you are not constrained to using commas. Internal processing allows this step to process data quickly. Options for this step are a subset of the Text File Input step.

This step has fewer overall options than the general Text File Input step, but it has a few advantages over it: NIO

Native system calls for reading the file means faster performance, but it is limited to only local files currently. No VFS support.

Parallel running

If you configure this step to run in multiple copies or in clustered mode, and you enable parallel running, each copy will read a separate block of a single file allowing you to distribute the file reading to several threads or even several slave nodes in a clustered transformation.

Lazy conversion

If you will be reading many fields from the file and many of those fields will not be manipulate, but merely passed through the transformation to land in some other text file or a database, lazy conversion can prevent Kettle from performing unnecessary work on those fields such as converting them into objects such as strings, dates, or numbers.

Start Pentaho Data Integration.

Windows - PowerShell

Set-Location C:\Pentaho\design-tools\data-integration
.\spoon.bat

Linux

cd
cd ~/Pentaho/design-tools/data-integration
./spoon.sh

Drag the CSV file input step onto the canvas.
Open the CSV file input properties dialog box.

Stepname

csvi-sales_data

Filename

${Internal.Entry.Current.Directory}\sales_data.csv

Delimiter

, (comma)

Lazy Conversion

unchecked

Header row present

checked

The CSV File input parses a sample number of records to set the metadata properties. As the Varchar is being mapped to a database table then you need to be careful to avoid truncation.