For the complete documentation index, see llms.txt. This page is also available as Markdown.

MinIO

Hands-on workshops using MinIO as an S3-compatible object store.

Workshop series: PDI + MinIO (S3)

Workshop
Key Skills

Sales Dashboard

joins, lookups, aggregations

Inventory Reconciliation

XML parsing, outer joins, variance

Customer 360

multi-source, JSONL, calculations

Clickstream Funnel

sessionization, pivoting

Log Parsing

regex, time-series analysis

Data Lake Ingestion

schema normalization, validation

  1. Verify that MinIO is running and populated.

  1. Start Pentaho Data Integration.

Start Pentaho Data Integration (Spoon).


Sales Dashboard


Workshop files

These files are already in MinIO:

  • pvfs://MinIO/raw-data/csv/sales.csv

  • pvfs://MinIO/raw-data/csv/products.csv

  • pvfs://MinIO/raw-data/csv/customers.csv

Output path used later: pvfs://MinIO/staging/dashboard/


Sales Dashboard

Create a new transformation.

Use any of these options:

  • Select File > New > Transformation

  • Use Ctrl+N (Windows/Linux) or Cmd+N (macOS)


Follow the steps to create the transformation:

Text File Input

The Text File Input step is used to read data from a variety of different text-file types. The most commonly used formats include Comma Separated Values (CSV files) generated by spreadsheets and fixed width flat files.

The Text File Input step provides you with the ability to specify a list of files to read, or a list of directories with wild cards in the form of regular expressions. In addition, you can accept filenames from a previous step making filename handling more even more generic.

Text file inputs

VFS connection names are case-sensitive. These examples assume your connection name is MinIO.

  1. Drag & drop 3 Text File Input Steps onto the canvas.

  2. Save transformation as: sales_dashboard_etl.ktr in your workshop folder.


Sales (Order Management)

  1. Double-click on the first TFI step, and configure with the following properties:

Setting
Value

Step name

Sales

Filename

pvfs://MinIO/raw-data/csv/sales.csv

Delimiter

,

Head row present

Format

mixed

Select - sales.csv from VFS connections
  1. Click: Get Fields to auto-detect columns.

Business Logic: Note that sale_amount may differ from price * quantity due to:

  • Volume discounts

  • Promotional pricing

  • Customer-specific pricing tiers

  • Currency conversion (for international sales)

Get Fields - Sales
  1. Preview data.

Preview data - Sales

Business Significance:

  • sale_amount: Actual revenue (may include discounts)

  • quantity: Volume metrics for demand planning

  • payment_method: Payment preference insights

  • status: Filter out cancelled/refunded orders


Products (ERP system)

  1. Double-click on the second TFI step, and configure with the following properties:

Setting
Value

Step name

Products

Filename

pvfs://MinIO/raw-data/csv/products.csv

Delimiter

,

Head row present

Format

mixed

Select - products.csv from VFS connections
  1. Click: Get Fields to auto-detect columns.

Get Fields - Customers
  1. Preview the data.

Preview data - Products

Business Significance:

  • category: Enables product performance analysis by segment

  • price: Base pricing for margin calculations

  • stock_quantity: Inventory turnover insights


Customers (CRM System)

  1. Double-click on the third TFI step, and configure with the following properties:

Setting
Value

Step name

Customers

Filename

pvfs://MinIO/raw-data/csv/customers.csv

Delimiter

,

Header row present

Format

mixed

Select - customers.csv from VFS connections
  1. Click: Get Fields to auto-detect columns.

Get Fields - Customers
  1. Preview the data.

Preview data - Customers

Business Significance:

  • customer_id: Primary key for joining to sales

  • country: Critical for geographic segmentation

  • status: Identifies churned vs. active customers

  • registration_date: Enables customer tenure analysis

Last updated

Was this helpful?