Data Model
Creating a model from various Data Sources ..
Workshop - Data Model
Data modeling is the foundation of effective business intelligence and analytics. In this comprehensive workshop, you'll master the Data Source Wizard, learning how to transform raw data from CSV files and relational databases into well-structured, analysis-ready data models that power Pentaho's reporting and analysis tools. Whether you're working with simple spreadsheet data or complex star schema databases, you'll gain the skills to create data sources that enable meaningful business insights through Analyzer, Interactive Reports, and Dashboard reports.
In this hands-on workshop, you'll experience the complete data modeling lifecycle, starting with simple CSV file imports and progressing to sophisticated multidimensional models built on relational database tables. You'll learn how to configure data source properties, define staging settings, and design effective data models for both reporting and analysis purposes. As you work through the exercises, you'll master critical concepts including star schema design, dimension hierarchies, measure aggregations, and time dimension properties. You'll also develop the expertise to optimize your data models for performance and usability, ensuring that business users can easily explore and analyze their data with confidence.
What You'll Accomplish:
Launch and navigate the Data Source Wizard interface in Pentaho User Console
Create data sources from CSV files with proper encoding and delimiter configuration
Import and configure database tables with appropriate joins and relationships
Define fact tables and dimension tables for multidimensional analysis
Configure measures with appropriate aggregations and formatting
Build dimension hierarchies with geographic and temporal properties
Create drill-down paths for Territory, Customer, Product, and Time dimensions
Apply member properties to enrich dimension levels with descriptive attributes
Configure time dimension properties with proper format patterns for filtering
Customize data models for both Interactive Reporting and Analyzer tools
Export and enhance models for advanced business analytics requirements
By the end of this workshop, you'll have created both a simple CSV-based reporting model and a sophisticated star schema analytical model that demonstrates enterprise-grade data modeling best practices. You'll understand how to structure data for optimal query performance, provide intuitive navigation through dimension hierarchies, and deliver the rich analytical capabilities that business users expect from modern business intelligence tools.
Prerequisites: Pentaho Business Analytics Server with sample database connection (hsqldb_sampledata) configured Estimated Time: 25 minutes

Launch the Data Source Wizard
Log into the Pentaho User Console:
User: Admin
Password: password
Click Create New > Data Source or access from Manage Data Sources.

Select the model Data Source:
Download the Checkbook.csv file:
Enter Data Source Name: Checkbook
Select CSV File as Source Type.
Click Import and browse to select your CSV file.

Configure the following CSV properties:
Data Source Name
Checkbook
Source Type
CSV File
File
/Downloads/Checkbook_data.csv
Encoding
ISO-8859-1
Delimiter
Comma
Enclosure
Double Quote

Preview and Verify.
Click Next to Continue.

Edit the following properties:
Date
STRING
15
0
Description
STRING
50
0
Category Desc
STRING
50
0
Category
STRING
5
0
Amount
NUMERIC
#,##0.00
10
2
Click: Show File Contents.

Click Finish.
Choose Model Option: Customize model now

Select a reporting option:
Interactive Reporting
Notice that the Wizard puts all the columns in one category. You can use the toolbar buttons to easily create additional categories and move the columns up and down to organize the columns into categories.
On the Reporting tab, click the Check book category.
In the Properties pane, click in the Category Name field.
Rename: Checkbook Category.

Format the Amount: #.#

Click OK.
From the User Console Home Perspective, click Create New > Interactive Report.
In the Select Data Source window, click Checkbook, and then click OK.

On the Data tab, notice the Checkbook Category name.

Add Category Desc, on the Data tab, double-click Category Desc.
Add Date, on the Data tab, double-click Date.
Add Amount, on the Data tab, double-click Amount.
Edit the Checkbook model in the Data Source Wizard - From the Cog wheel drop-down menu option, select: Edit.

Expand the Measures Dimensions.

Delete the Amount Dimension - its a metric..! Click on the X in the top toolbar.

Drag &drop the Category Desc Level to below the Category Level.

Delete the redundant Category Desc Dimension.

Flip the Measures to below the Dimensions.

Click: OK
From the User Console Home Perspective, click Create New > Analyzer.
In the Select Data Source window, click Checkbook, and then click OK.

Add the following Levels and Measure to define the Analyzer report.

Database Table
When you create a Database Table data source, you are presented with two options: Reporting Only or Reporting and Analysis. The choice you make depends on the structure of the database tables you are accessing and the User Console tools being used.
Before creating the data source, let's understand the database structure. This is a classic star schema with one fact table and three dimension tables.
FACT
Stores transactional sales data with numeric measures
Prerequisites:
Database connection named 'hsqldb_sampledata' is configured and working
You have SELECT permissions on all four tables
Foreign key relationships exist in the database (or you'll create joins manually)
Tables contain sample data for testing
Complete the following steps to define your Sales_Analysis model and Data Source:
Define the following Joins:
ORDERFACT
TIME_ID
DIM_TIME
TIME_ID
ORDERFACT
CUSTOMERNUMBER
CUSTOMER_W_TER
CUSTOMERNUMBER
ORDERFACT
ORDERNUMBER
ORDERS
ORDERNUMBER
ORDERFACT
PRODUCTCODE
PRODUCT
PRODUCTCODE

Click: Finish.
Finally .. you'll need to qualify each measure and dimension ..
Measures
Measures are the numeric values you want to analyze. They come from the Fact table and can be aggregated (summed, averaged, counted).
Expand the Measures category.
Delete the following dimensions:
CUSTOMERNUMBER
MONTH ID
ORDERLINENUMBER
ORDERNUMBER
QTR ID
YEAR ID

Format the remaining Measures:
PRICEEACH
Unit Price
Average
0.00
QUANTITYORDERED
Quantity Ordered
DISTINCT_COUNT
#
TOTALPRICE
Total Sales
SUM
0.00
Only Measures in the Fact table can be defined.
Follow the steps outlined below to configure the Dimensions in your Model:
Highlight & expand the Geography Dimension.
Rename the Dimension: MARKETS.
Add a Hierarchy: Markets.


Expand the Markets Hierarchy and follow the steps below to configure the Levels - delete POSTALCODE:
Highlight the TERRITORY Level & configure with the following Properties.

Highlight the COUNTRY Level & configure with the following Properties.

Highlight the STATE Level & configure with the following Properties.

Highlight the CITY Level & configure with the following Properties.

Highlight & expand the CUSTOMER W TER Dimension.

It will be easier to start with a 'clean' dimension, delete everything..!
Rename the CUSTOMER W TER Dimension: CUSTOMER.

Add a Hierarchy & enter: Customer.

Highlight Customer Hierarchy & add Level: Customer Name.

Click: OK.
Configure the Customer Name properties as illustrated below:


Follow the steps below to add the Customer Name members:
Highlight the Customer Name Level > Add Member Property: Customer Number.

Click: OK.
Configure the Customer Number properties as illustrated below:

Highlight the Customer Name Level > Add Member Property: Contact First Name.

Click: OK.
Configure the Contact First Name properties as illustrated below:

Highlight the Customer Name Level > Add Member Property: Contact Last Name.

Click: OK.
Configure the Contact Last Name properties as illustrated below:

Highlight the Customer Name Level > Add Member Property: Phone Number.

Click: OK.
Configure the Phone Number properties as illustrated below:

Highlight & expand the DIM TIME Dimension.
Again It will be easier to start with a 'clean' dimension, delete everything..!
Add a Hierarchy & enter: Products

Follow the steps below to configure the Products Levels:
Under the Products Hierarchy > Add a Level: Product Line

Click: OK.
Configure the Product Line properties as illustrated below:

Under the Products Hierarchy > Add a Level: Product Name

Click: OK.
Configure the Product Name properties as illustrated below:

TIME
Typically, you might sort your data by year or month. However, by designating a time dimension in the Data Source Model Editor, you can filter on a variety of date ranges and relative date options, such as previous periods, before [a user-defined period], after [a user-defined period], current period, or next period.
Analyzer supports many types of relative date filters. In order to apply them to a level of a time hierarchy, you need to define the time-specific properties for that level. This is because each data warehouse implementation may have a different date format and set of time hierarchy levels.
Highlight & expand the DIM TIME Dimension.
It will be easier to start with a 'clean' dimension, delete everything..!
Add a Hierarchy & enter: Time

Follow the steps below to configure the Time Levels:
Under the Time Hierarchy > Add a Level: Years

Click: OK.
Configure the Year properties as illustrated below - refer to Time Dimension Properties:

Under the Time Hierarchy > Add a Level: Quarters
Configure the Quarters properties as illustrated below - refer to Time Dimension Properties:

Under the Time Hierarchy > Add a Level: Months
Configure the Months properties as illustrated below - refer to Time Dimension Properties:

Time Dimension Properties
The Data Source Model Editor has two key settings for configuring time dimensions:
Time Level Type
This setting specifies what type of date increment a level represents (Years, Quarters, Months, Days, etc.). Analyzer uses this information to enable time-period functions like "previous quarter" or "year to date."
Source Column Format
This describes how your data is actually formatted in the database. It tells Analyzer how to read and interpret the time values. Important: This setting does NOT change how data displays—it only describes the existing format so Analyzer can filter correctly.
These settings enable Analyzer to properly filter and analyze time-based data.
Common Format Patterns
Years
yy
yyyy
25
2025
Quarters
Q
QQQ
1
01
Q1
Months
M
MM
MMM
3
03
Mar
Weeks
w
ww
W
5
05
Days
d
dd
7
07
Hours
H
HH
k
kk
Minutes
m
mm
5
05
Seconds
s
ss
8
08
1, 2, 3, 4
Q
Simple numeric quarter
Q1, Q2, Q3, Q4
'Q'Q
'Q' is literal text, Q is the number
2001-Q1, 2001-Q2
yyyy-'Q'Q
Year, - literal Q, quarter number
Last updated
Was this helpful?



