Create DB table
Create tables ..
Workshop - Create DB Table
Database table creation is a fundamental operation in data integration workflows—from staging areas and temporary working tables to persistent data marts and operational data stores. Organizations need efficient methods to create database tables dynamically as data structures evolve and new data sources are onboarded. Understanding how to programmatically create and populate database tables through PDI ensures your data pipelines can adapt to changing requirements without manual database administration intervention.
In this hands-on workshop, you'll learn to use PDI's "Table Output" step to both create database tables and insert data in a single transformation. Steel Wheels needs to establish new database tables for their reporting systems, and you'll configure transformations that automatically generate the necessary SQL CREATE TABLE statements based on your incoming data stream structure. You'll leverage PDI's automatic field mapping, execute SQL directly from the transformation designer, and understand performance optimization options like commit size and batch updates for efficient data loading.
What You'll Accomplish:
Configure the Table Output step to create new database tables
Use the "Get Fields" feature to automatically map stream fields to table columns
Generate SQL CREATE TABLE statements from your data stream structure
Execute SQL statements directly from the Table Output step dialog
Verify field mappings between incoming stream fields and target table columns
Understand DML INSERT operations and how Table Output implements them
Configure performance options including commit size and batch update settings
Create and populate tables in a single transformation execution
Validate table creation and data insertion using database management tools
By the end of this workshop, you'll have practical experience creating database tables programmatically through your PDI transformations rather than requiring separate DDL scripts or manual database administration. You'll understand when to use automatic table creation versus working with pre-existing table structures, and how to optimize data loading performance through batch operations. This approach enables self-service data integration where transformations can create their own working tables as needed, reducing dependencies on database administrators and accelerating development cycles.
Prerequisites: Understanding of basic transformation concepts, database connection configuration, familiarity with database tables and SQL concepts; Pentaho Data Integration installed and configured with appropriate database connections established
Estimated Time: 30 minutes
Create a new Transformation
Any one of these actions opens a new Transformation tab for you to begin designing your transformation.
By clicking File > New > Transformation
By using the CTRL-N hot key

Load Sales Data
Load sales data from a csv file into: STG_SALES_DATA table.
Modify the VARCHAR to prevent the value being truncated.

Follow the steps outlined below:
CSV File input
This step provides the ability to read data from a delimited file. The CSV label for this step is a misnomer because you can define whatever separator you want to use, such as pipes, tabs, and semicolons; you are not constrained to using commas. Internal processing allows this step to process data quickly. Options for this step are a subset of the Text File Input step.
This step has fewer overall options than the general Text File Input step, but it has a few advantages over it: NIO
Native system calls for reading the file means faster performance, but it is limited to only local files currently. No VFS support.
Parallel running
If you configure this step to run in multiple copies or in clustered mode, and you enable parallel running, each copy will read a separate block of a single file allowing you to distribute the file reading to several threads or even several slave nodes in a clustered transformation.
Lazy conversion
If you will be reading many fields from the file and many of those fields will not be manipulate, but merely passed through the transformation to land in some other text file or a database, lazy conversion can prevent Kettle from performing unnecessary work on those fields such as converting them into objects such as strings, dates, or numbers.
Start Pentaho Data Integration.
Windows - PowerShell
Linux
Drag the CSV file input step onto the canvas.
Open the CSV file input properties dialog box.
Stepname
csvi-sales_data
Filename
${Internal.Entry.Current.Directory}\sales_data.csv
Delimiter
, (comma)
Lazy Conversion
unchecked
Header row present
checked
The CSV File input parses a sample number of records to set the metadata properties. As the Varchar is being mapped to a database table then you need to be careful to avoid truncation.
Ensure the following details are configured, as outlined below:

Click on the Get Fields button.
Click OK.
Table output
The Table Output step allows you to load data into a database table. Table Output is equivalent to the DML operator INSERT. This step provides configuration options for target table and a lot of housekeeping and/or performance-related options such as Commit Size and Use batch update for inserts.
Drag the Table Output step onto the canvas.
Open the Table Output properties dialog box. Ensure the following details are configured, as outlined below:

Click on the Database fields.
Click on the ‘Get Fields’ button.

Ensure the mappings bewteen Table fields and Stream fields are correct.
Click on the SQL button.

Click Execute.
Click OK to Close all windows.
RUN
At some stage you will need a Database Management tool to check the results. Tables are usually created by executing external SQL scripts which, if required, can be set as in the cope of the Project.
Click the Run button in the Canvas Toolbar.
Check that the table has been created in the sampledata database.

Load Orders
Load orders data from a csv file into: STG_ORDERS_MERGED table.
Modify the VARCHAR to prevent the value being truncated.

Follow the steps outlined below:
CSV File input
This step provides the ability to read data from a delimited file. The CSV label for this step is a misnomer because you can define whatever separator you want to use, such as pipes, tabs, and semicolons; you are not constrained to using commas. Internal processing allows this step to process data quickly. Options for this step are a subset of the Text File Input step.
This step has fewer overall options than the general Text File Input step, but it has a few advantages over it: NIO
Native system calls for reading the file means faster performance, but it is limited to only local files currently. No VFS support.
Parallel running
If you configure this step to run in multiple copies or in clustered mode, and you enable parallel running, each copy will read a separate block of a single file allowing you to distribute the file reading to several threads or even several slave nodes in a clustered transformation.
Lazy conversion
If you will be reading many fields from the file and many of those fields will not be manipulate, but merely passed through the transformation to land in some other text file or a database, lazy conversion can prevent Kettle from performing unnecessary work on those fields such as converting them into objects such as strings, dates, or numbers.
Drag the CSV file input step onto the canvas.
Open the CSV file input properties dialog box.
Ensure the following details are configured, as outlined below:



Table output
The Table Output step allows you to load data into a database table. Table Output is equivalent to the DML operator INSERT. This step provides configuration options for target table and a lot of housekeeping and/or performance-related options such as Commit Size and Use batch update for inserts.
Drag the Table Output step onto the canvas.
Open the Table Output properties dialog box. Ensure the following details are configured, as outlined below:

Click on the Database fields.
Click on the ‘Get Fields’ button.

Click on the SQL button.

Click Execute.
Click OK to Close all windows.
RUN
At some stage you will need a Database Management tool to check the results. Tables are usually created by executing external SQL scripts which, if required, can be set as in the cope of the Project.
Click the Run button in the Canvas Toolbar.
Check that the table has been created in the sampledata database.

Last updated
Was this helpful?
