display-medicalMinIO

Hands-on workshops using MinIO as an S3-compatible object store.

circle-exclamation

Workshop series: PDI + MinIO (S3)

Workshop
Key Skills

Sales Dashboard

joins, lookups, aggregations

Inventory Reconciliation

XML parsing, outer joins, variance

Customer 360

multi-source, JSONL, calculations

Clickstream Funnel

sessionization, pivoting

Log Parsing

regex, time-series analysis

Data Lake Ingestion

schema normalization, validation

triangle-exclamation
  1. Verify that MinIO is running and populated.

  1. Start Pentaho Data Integration.

circle-info

Start Pentaho Data Integration (Spoon).


circle-exclamation

Sales Dashboard


circle-info

Workshop files

These files are already in MinIO:

  • pvfs://MinIO/raw-data/csv/sales.csv

  • pvfs://MinIO/raw-data/csv/products.csv

  • pvfs://MinIO/raw-data/csv/customers.csv

Output path used later: pvfs://MinIO/staging/dashboard/


Sales Dashboard
circle-info

Create a new transformation.

Use any of these options:

  • Select File > New > Transformation

  • Use Ctrl+N (Windows/Linux) or Cmd+N (macOS)


Follow the steps to create the transformation:

circle-info

Text File Input

The Text File Input step is used to read data from a variety of different text-file types. The most commonly used formats include Comma Separated Values (CSV files) generated by spreadsheets and fixed width flat files.

The Text File Input step provides you with the ability to specify a list of files to read, or a list of directories with wild cards in the form of regular expressions. In addition, you can accept filenames from a previous step making filename handling more even more generic.

Text file inputs
circle-info

VFS connection names are case-sensitive. These examples assume your connection name is MinIO.

  1. Drag & drop 3 Text File Input Steps onto the canvas.

  2. Save transformation as: sales_dashboard_etl.ktr in your workshop folder.


Sales (Order Management)

  1. Double-click on the first TFI step, and configure with the following properties:

Setting
Value

Step name

Sales

Filename

pvfs://MinIO/raw-data/csv/sales.csv

Delimiter

,

Head row present

Format

mixed

Select - sales.csv from VFS connections
  1. Click: Get Fields to auto-detect columns.

circle-info

Business Logic: Note that sale_amount may differ from price * quantity due to:

  • Volume discounts

  • Promotional pricing

  • Customer-specific pricing tiers

  • Currency conversion (for international sales)

Get Fields - Sales
  1. Preview data.

Preview data - Sales
circle-info

Business Significance:

  • sale_amount: Actual revenue (may include discounts)

  • quantity: Volume metrics for demand planning

  • payment_method: Payment preference insights

  • status: Filter out cancelled/refunded orders


Products (ERP system)

  1. Double-click on the second TFI step, and configure with the following properties:

Setting
Value

Step name

Products

Filename

pvfs://MinIO/raw-data/csv/products.csv

Delimiter

,

Head row present

Format

mixed

Select - products.csv from VFS connections
  1. Click: Get Fields to auto-detect columns.

Get Fields - Customers
  1. Preview the data.

Preview data - Products
circle-info

Business Significance:

  • category: Enables product performance analysis by segment

  • price: Base pricing for margin calculations

  • stock_quantity: Inventory turnover insights


Customers (CRM System)

  1. Double-click on the third TFI step, and configure with the following properties:

Setting
Value

Step name

Customers

Filename

pvfs://MinIO/raw-data/csv/customers.csv

Delimiter

,

Header row present

Format

mixed

Select - customers.csv from VFS connections
  1. Click: Get Fields to auto-detect columns.

Get Fields - Customers
  1. Preview the data.

Preview data - Customers
circle-info

Business Significance:

  • customer_id: Primary key for joining to sales

  • country: Critical for geographic segmentation

  • status: Identifies churned vs. active customers

  • registration_date: Enables customer tenure analysis

Last updated

Was this helpful?