Storage
Setup Object Stores & SMB ..
Object Stores
Object storage systems like Amazon S3 and MinIO provide a way to store and retrieve large amounts of unstructured data such as files, images, videos, and backups through a simple web-based API. Unlike traditional file systems that organize data in hierarchical folders, object stores use a flat namespace where each piece of data (called an object) is stored in containers called buckets and accessed via unique keys or URLs.
Amazon S3 is AWS's flagship object storage service that offers virtually unlimited scalability, multiple storage classes for different use cases, and integration with other AWS services.
MinIO is an open-source alternative that provides S3-compatible APIs and can be deployed on-premises or in private clouds, making it popular for organizations that want object storage capabilities without vendor lock-in.
Both systems are designed for high durability, availability, and can handle massive scale while providing simple REST API access for applications to store and retrieve data programmatically.

The following steps are intended for setting up a Pentaho Lab environment and need to be completed in order to complete the Workshops.
Ensure you have downloaded the Workshop--Installation:
To install git:
Prerequisites
Ubuntu 24.04 LTS system (physical or virtual machine)
User account with sudo privileges
Internet connection
Basic familiarity with Linux command line
MinIO
Follow the instructions below to setup a MinIO Docker Container.
Select your OS & add the Sample Data, finally configure a VFS connection in Data Integration:
Installs and configures MinIO on Ubuntu 24.04 running in Docker.
Create a MinIO folder and copy the required files.
Create directory & copy

Ensure all the files have successfully been copied over.
Execute the docker-compose script to create the container.
MinIO Container

Following best industry practices, MinIO is installed as root in the /opt/minio directory. If you wish the pentaho user to also manage the service then you may need to add the user to the Docker group.
Check the container is up and running in Docker.

Installs and configures MinIO on Windows 11 running in Docker Desktop.
Create a MinIO folder and copy the required files.
Create directory & copy
Check the Directory has been created and the files copied over.
Execute the docker-compose script to create the container.
MinIO Container

Check the container is up and running in Desktop Docker.

Access MinIO UI:
Username: minioadmin
Password: minioadmin

The MinIO port has been changed to prevent conflicts.
The population scripts create realistic datasets in multiple formats commonly used in data integration workflows:
Data Sources Created
CSV Files - Structured tabular data
customers.csv- Customer records (12 entries)products.csv- Product catalog (12 entries)sales.csv- Sales transactions (15 entries)
JSON Files - API responses and configuration
api_response.json- Nested API response with ordersuser_events.json- Event stream (JSONL format)config.json- Application configuration
XML Files - Legacy system exports
inventory.xml- Warehouse inventory dataemployees.xml- HR employee records
Log Files - Application and system logs
application.log- Structured application logsaccess.log- Web server access logserror.log- Error and warning logs
Parquet Files (Optional - requires Python)
transactions.parquet- Big data format for analytics
Buckets Created
raw-data - Landing zone for raw source files
staging - Intermediate processing area
curated - Clean, processed data ready for consumption
logs - Application and process logs
archive - Historical data archives
Ensure MinIO API is accessible.

Install the prerequisite packages to generate data and run script:
Checks Dependencies - Verifies MinIO Client (mc) is installed
Tests Connectivity - Ensures MinIO is running and accessible
Configures Client - Sets up MinIO Client alias
Creates Buckets - Creates organizational buckets
Generates CSV Files - Creates customer, product, and sales data
Generates JSON Files - Creates API responses and configuration
Generates XML Files - Creates inventory and employee data
Generates Log Files - Creates realistic application logs
Uploads to MinIO - Copies all files to appropriate buckets
Check python3 is installed.
Install python3-venv package.
Install required packages.

Install MinIO Client(mc).
Verified its installed.
Execute install script.

x
Virtual File Systems
PDI allows you to establish connections to most Virtual File Systems (VFS) through VFS connections. These connections store the necessary properties to access specific file systems, eliminating the need to repeatedly enter configuration details.
Once you've added a VFS connection in PDI, you can reference it whenever you need to work with files or folders on that Virtual File System. This streamlines your workflow by allowing you to reuse connection information across multiple steps.
For instance, if you're working with Hitachi Content Platform (HCP), you can create a single VFS connection and then use it throughout all HCP transformation steps. This approach saves time and ensures consistency by removing the need to re-enter credentials or access information for each data operation.
Start Pentaho Data Integration.
Windows - PowerShell
Linux
Create a VFS connection to the MinIO buckets
Click: 'View' Tab.
Right mouse click on VFS Connections > New.

Enter the following details:

Connection Name
MinIO
Connection Type
Minio/HCP
Description
Connection to sales-data bucket
S3 Connection Type
Minio/HCP
Access Key
minioadmin
Secret Key
minioadmin
Endpoint
http://localhost:9000 [MinIO API endpoint]
Signature Version
AWSS3V4SignerType
PathStyle Access
enable
Root Folder Path
/
Test the connection.
In this hands-on workshop, you'll learn to deploy and configure an SMB (Server Message Block) server using Docker Desktop and on Linux.

Follow the instructions below to deploy an SMB Docker container. We're going to deploy a very simple SMB server with 2 users:
Alice -
Bob -
As Windows already has a default SMB server running on port:445 so this is changed to 1445.
Pentaho Data Integration
(Optional) Download the latest jcifs driver.
(Optional) Copy the JCIFS JAR file into Pentaho Data Integration "lib" folder.
Download CIFS driver
Pentaho Data Integration ships with jcifs-1.3.3.jar
If you wish to replace the current driver, rename to: jcifs-1.3.3.jar -> jcifs-1.3.3.jar.bak
jcifs 2.1.40.jar driver has been downloaded to the Workshop--Data-Integration/Drivers
Create SMB Share Directories
Create a SMB folder and copy the required files. Will also add some sample data.
Create directory & copy - PowerShell
Check the Directory has been created and the files copied over.

x
x
Creating Local Windows Users
Let's add our SMB users to the system:
Bob -
Alice -
Right-click on the Start button.
Select Computer Management from the context menu.
Alternatively, press
Win + Xand select Computer Management.In Computer Management, expand System Tools.
Click on: Local Users and Groups.
Select Users folder.
Right-click in the Users pane (right side).
Select New User...
Fill in the following details:
Username: bob
Full Name: Bob Smith
Password: password

Click Create
Click Close
Repeat the workflow to create: alice

Configure Folder Permissions
Open File Explorer and navigate to
C:\SMB\BobRight-click on the Bob folder.
Select Properties.
Click the Security tab.
Click Edit...
Click Add...
Type
bobin the text box and click: Check NamesClick OK.
Select: bob in the permissions list.

Check the following permissions:
Click OK twice

Repeat the workflow for alice's folder.

SMB Share
Right-click on the
C:\SMB\Bobfolder.Select Properties.
Click the Sharing tab.
Click Advanced Sharing...

☑ Check "Share this folder"
Share name: Bob (default is fine).

Click Permissions.
Remove "Everyone" if present (select and click Remove).
Click Add...

Add the following user:
Type
bob, click Check Names, click OK

Set permissions for Bob Smith:
Select Bob Smith: ☑ Full Control

Click OK three times.
In the Sharing tab, note the Network Path: \\[Computer Name]\Bob
From another computer: \\[your-computer-ip]\Bob

Obviously if you want to play around with the SMB shares for Alice & 'Shared' then you will have to repeat the workflow.
Test
Time to test ..
Let's see if we can access C:\SMB\Bob from across the Network.
In the File Explorer or Run command , enter the following UNC path to access C:\SMB\Bob
You should see the following popup message displayed:

x
Follow the instructions outlined below to deploy an SMB server on Ubuntu 24.04.
Ensure all installed Packages are up-to-date.
Install Samba server.
Make a copy of the existing configuration file and create a new
/etc/samba/smb.confconfiguration file
Any user existing on the samba user list must also exist within the
/etc/passwdfile.
Add the home directory share.
Copy & paste the following to the bottom of the file - private home & public access.
Save.
Create a directory that mounts public share and change its access permission.
Restart your samba server.
SMB Server
We're going to setup the Samba server with access to shareable, public directory - /var/samba/ - that can be accessed anonymously.
Next .. access to the 'pentaho user' - /pentaho/home directory. Obviously you'll need to be a registered user with a password to access the directory.
Let’s create some test files.
Public
A 'public' directory that be accessed from any machine ..
In File Explorer, select: + Other Locations.
Enter the following connection details:

Connect as: Anonymous.

You should see the public-share file.

Registered
Only registered users can access the /pentaho/home directory ..
In File Explorer, select: + Other Locations.
Enter the following connection details:

Connect as: Registered User.
Username: pentaho
Domain: WORKGROUP
Password: password

You should see the public-share file somewhere in the /home directory.
Follow the instructions below to deploy:
Last updated
Was this helpful?
