# Ubuntu Pentaho Lab

{% hint style="info" %}

#### **Pentaho Lab**

Pentaho Data Integration is a client-based tool commonly installed and configured to run on Windows 11.

There are several licensing options, for these workshops we will be installing a Enterprise Edition. This will give you the opportunity to try out building a complete solution - automated data pipelines + analytics ..
{% endhint %}

<figure><img src="https://4179059538-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F5WjeDOi3MsT9tY3oU6mQ%2Fuploads%2FaO9M8NuR6rzHnWiAvgEi%2Fimage.png?alt=media&#x26;token=135c03ed-25dd-465c-9b25-6c2abe6ed093" alt=""><figcaption><p>Pentaho Tiers</p></figcaption></figure>

{% hint style="danger" %}
The following steps are intended for setting up a Pentaho Lab environment and need to be completed in order to complete the Workshops.

Ensure you have downloaded the Workshop--Installation:

```bash
cd
git clone https://github.com/jporeilly/Workshop--Installation
```

To install git:

```bash
sudo apt install git
```

{% endhint %}

{% hint style="info" %}
**Prerequisites**

* Ubuntu 24.04 LTS system (physical or virtual machine)
* User account with sudo privileges
* Internet connection
* Basic familiarity with Linux command line
  {% endhint %}

{% tabs %}
{% tab title="Docker" %}
{% hint style="info" %}

#### Docker

Docker is a platform that enables developers to package applications and their dependencies into lightweight, portable containers. Containers ensure that applications run consistently across different computing environments, from development laptops to production servers. This workshop will guide you through the complete process of installing Docker Engine on Ubuntu 24.04 LTS (Noble Numbat).
{% endhint %}

1. Before installing Docker, update your existing package list.

```bash
sudo apt update && sudo apt upgrade
```

2. Install packages that allow apt to use repositories over HTTPS.

```bash
sudo apt install -y ca-certificates curl gnupg lsb-release
```

3. Create a directory for keyrings and add Docker's GPG key.

```bash
sudo install -m 0755 -d /etc/apt/keyrings
```

```bash
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
```

```bash
sudo chmod a+r /etc/apt/keyrings/docker.gpg
```

4. Add the Docker repository to your apt sources.

```bash
echo \
  "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu \
  $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
  sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
```

5. Now that the Docker repository is added, update the package index.

```bash
sudo apt update && sudo apt upgrade
```

6. Install Docker Engine, containerd, and Docker Compose.

```bash
sudo apt install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
```

7. Check that Docker is installed correctly by checking the version.

```bash
docker --version
```

You should see output similar to (Nov 2025):

```
Docker version 29.0.2, build 810xxxx
```

8. Verify that Docker Engine is running.

```bash
sudo systemctl status docker
```

The service should show as "active (running)".

9. Quit.

```bash
q
```

10. Test your Docker installation by running the hello-world container.

```bash
sudo docker run hello-world
```

{% hint style="info" %}
This command downloads a test image and runs it in a container. If successful, you'll see a message confirming that Docker is working correctly.
{% endhint %}

***

{% hint style="info" %}

#### Without Sudo

By default, Docker requires sudo privileges. To run Docker commands without sudo.
{% endhint %}

1. Add your user to the docker group.

```bash
sudo usermod -aG docker $USER
```

2. Apply the new group membership (or log out and back in).

```bash
newgrp docker
```

3. Verify you can run Docker without sudo.

```bash
docker run hello-world
```

4. Ensure Docker starts automatically when the system boots.

```bash
sudo systemctl enable docker.service
sudo systemctl enable containerd.service
```

***

{% hint style="info" %}

#### Verification & Testing

To confirm everything is working properly, run the following commands:
{% endhint %}

Check Docker version:

```bash
docker version
```

View Docker system information:

```bash
docker info
```

List running containers:

```bash
docker ps
```

List all containers (including stopped ones):

```bash
docker ps -a
```

List downloaded images:

```bash
docker images
```

***

{% hint style="info" %}

#### Common Commands

Here are essential Docker commands you'll use regularly:

* `docker pull <image>` - Download an image from Docker Hub
* `docker images` - List all local images
* `docker run <image>` - Create and start a container from an image
* `docker ps` - List running containers
* `docker ps -a` - List all containers
* `docker stop <container>` - Stop a running container
* `docker rm <container>` - Remove a stopped container
* `docker rmi <image>` - Remove an image
* `docker logs <container>` - View container logs
* `docker exec -it <container> bash` - Access a running container's shell
  {% endhint %}
  {% endtab %}

{% tab title="MySQL Container" %}
{% hint style="info" %}

#### **Docker Compose - MySQL**

The pentaho\_admin user only has READ permission for the Steel Wheels - sampledata database. The administrator account has been removed.

As you'll be running through CRUID database operations we need to deploy a sampledata database - Docker container, granting all privileges to an admin user.
{% endhint %}

{% file src="<https://4179059538-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F5WjeDOi3MsT9tY3oU6mQ%2Fuploads%2F7uyO1nOpAC1L5LImVpXw%2Fcopy_mysql.sh?alt=media&token=faf508ec-d5ac-485c-87a9-b168555f7d62>" %}

{% file src="<https://4179059538-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F5WjeDOi3MsT9tY3oU6mQ%2Fuploads%2FTNHQIpfuNWOLZ6ydxOJa%2Frun_mysql_compose.sh?alt=media&token=c059e221-8972-4825-aeec-744f682cfd14>" %}

{% file src="<https://4179059538-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F5WjeDOi3MsT9tY3oU6mQ%2Fuploads%2Fs3JdzrMjXvhWtw16Jx8v%2Fdocker-compose.yml?alt=media&token=8294fd3e-cb13-404a-b5b9-eb8fac9feea4>" %}

1. Run the following script to create a MySQL folder and copy the required files.

```bash
cd
cd ~/Workshop--Installation/MySQL
# Make it executable
chmod +x copy_mysql.sh
./copy_mysql.sh
```

2. Check the Directory has been created and the files copied over.
3. Execute the docker-compose script to create the container.

```bash
cd
cd ~/MySQL
# Make it executable
chmod +x run_mysql_compose.sh
./run_mysql_compose.sh
```

<figure><img src="https://4179059538-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F5WjeDOi3MsT9tY3oU6mQ%2Fuploads%2FpHkcDLcEdqMddzXATvqx%2Fimage.png?alt=media&#x26;token=e9c7e569-63fc-4e1c-9176-99e225a79f87" alt="" width="500"><figcaption><p>Deploy MySQL Docker container</p></figcaption></figure>

4. Check the container is up and running in Docker.

```
docker ps
```

<figure><img src="https://4179059538-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F5WjeDOi3MsT9tY3oU6mQ%2Fuploads%2FgKgy69kHeeNPNKDO1rxv%2Fimage.png?alt=media&#x26;token=4facea24-c8b8-4ca1-bfbd-d315a64a473b" alt=""><figcaption><p>Docker MySQL containers</p></figcaption></figure>
{% endtab %}

{% tab title="sampledata" %}
{% hint style="info" %}

#### **Sampledata**

Next on the list is to create the sampledata database.
{% endhint %}

<figure><img src="https://4179059538-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F5WjeDOi3MsT9tY3oU6mQ%2Fuploads%2FfHrm1HbmPxxV8bPSDUK9%2Fimage%20(62).png?alt=media&#x26;token=4d4eb534-9ee4-49eb-80d1-17c363ada1f1" alt=""><figcaption><p>Relationship Diagram</p></figcaption></figure>

{% tabs %}
{% tab title="1. sampledata\_schema.sql" %}
{% hint style="info" %}

#### **sampledata\_schema.sql**

This script creates a comprehensive relational database structure for a sample business application. It's designed to model a sales and order management system for a company that sells various products.
{% endhint %}

{% hint style="info" %}

#### **Database Setup**

* Creates a database named

  ```
  sampledata
  ```

  with UTF-8 character set
* Sets up users with appropriate permissions
* Configures SQL mode for better data integrity
  {% endhint %}

{% hint style="info" %}

#### **Tables**

**OFFICES**: Stores company office locations with address details

**EMPLOYEES**: Contains employee information with relationships to offices and reporting structure

**CUSTOMERS**: Stores customer information including contact details and credit limits

**PRODUCTS**: Contains product catalog with inventory and pricing information

**ORDERS**: Tracks customer orders with status and dates

**ORDERDETAILS**: Contains line items for each order with quantity and price

**PAYMENTS**: Records customer payments with amounts and dates

**ORDERFACT**: A fact table for order analytics

**CUSTOMER\_W\_TER**: Extended customer information with territory

**DIM\_TIME**: Time dimension table for reporting

**DEPARTMENT\_MANAGERS**: Stores department manager information

**QUADRANT\_ACTUALS**: Contains budget vs. actual financial data with a generated VARIANCE column

**TRIAL\_BALANCE**: Financial accounting data
{% endhint %}

{% hint style="info" %}

#### **Views**

**customer\_order\_summary**: Summarizes orders and spending by customer

**product\_performance**: Analyzes product sales metrics including revenue and profit

**employee\_sales\_performance**: Tracks sales performance by employee

**monthly\_sales\_trend**: Shows sales trends over time by month

**product\_inventory\_status**: Categorizes products by inventory levels

**customer\_payment\_history**: Summarizes customer payment activity and balances
{% endhint %}

{% hint style="info" %}

#### **Stored Procedures**

**GetCustomerOrders**: Retrieves orders for a specific customer

**UpdateProductStock**: Updates product inventory levels

**GetProductSalesByQuarter**: Analyzes quarterly product sales

**GetTopCustomersByRegion**: Identifies top customers by region

**GetInventoryValueByProductLine**: Calculates inventory metrics by product line

**Triggers**

**before\_order\_insert**: Validates date constraints on orders

**before\_payment\_insert**: Ensures payment amounts are positive
{% endhint %}

{% hint style="info" %}

#### **Events**

* **daily\_maintenance**: Scheduled task for database maintenance
  {% endhint %}

1. Execute the following command to create the schema.

```bash
cd 
cd ~/MySQL
cat sampledata_schema.sql | docker exec -i mysql-database-1 mysql -u root -p"password" sampledata
```

{% hint style="info" %}
This command is importing SQL schema data into a MySQL database running in a Docker container. Here's a breakdown:

This command reads the SQL file:

```powershell
cat sampledata_schema.sql
```

Pipes (forwards) the file contents to the next command:

```
|
```

This executes a command in a running Docker container:

```
docker exec -i mysql-database-1 mysql -u root -p"password" sampledata
```

{% endhint %}

2. You can check the sampledata database & tables with the following commands.

Show databases:

```powershell
docker exec -i mysql-database-1 mysql -u root -p"password" -e "SHOW DATABASES;"
```

Show tables:

```powershell
docker exec -i mysql-database-1 mysql -u root -p"password" sampledata -e "SHOW TABLES;"
```

Show table columns:

```powershell
docker exec -i mysql-database-1 mysql -u root -p"password" sampledata -e "DESCRIBE table_name;"
```

{% endtab %}

{% tab title="2. sampledata\_data.sql" %}
{% hint style="info" %}

#### **sampledata\_data.sql**

This script populates the database with sample data to demonstrate the functionality of the schema.
{% endhint %}

{% hint style="info" %}

#### **Reference Data**

* Office locations across different regions
* Employee hierarchy with job titles
* Product catalog organized by product lines
  {% endhint %}

{% hint style="info" %}

#### **Transactional Data**

* Customer records with contact information
* Order history with dates and status
* Order details with quantities and prices
* Payment records
  {% endhint %}

{% hint style="info" %}

#### **Financial Data**

* Budget vs. actual figures in QUADRANT\_ACTUALS
* Trial balance accounting data
  {% endhint %}

{% hint style="info" %}

#### **Data Characteristics**

* Realistic business scenarios with varied order statuses
* Comprehensive product catalog with descriptions and pricing
* Hierarchical employee structure with reporting relationships
* Time-based data spanning multiple years for trend analysis
* Financial data suitable for budgeting and variance analysis
  {% endhint %}

{% hint style="info" %}

#### **Notable Features**

* Data follows referential integrity constraints
* Proper handling of NULL values where appropriate
* Realistic pricing and quantity values
* Generated columns (like VARIANCE) are excluded from direct inserts
* Orders are sequenced to satisfy foreign key constraints
  {% endhint %}

1. Execute the following command to load the data into the sampledata tables.

```bash
cd 
cd ~/MySQL
cat sampledata_data.sql | docker exec -i mysql-database-1 mysql -u root -p"password" sampledata
```

2. You can use the following commands to check that the data has loaded.

To count the number of rows in a specific table:

```docker
docker exec -i mysql-database-1 mysql -u root -p"password" sampledata -e "SELECT COUNT(*) FROM table_name;"
```

To view the first few rows from a table:

```docker
docker exec -i mysql-database-1 mysql -u root -p"password" sampledata -e "SELECT * FROM table_name LIMIT 10;"
```

To check counts for all tables:

```docker
docker exec -i mysql-database-1 mysql -u root -p"password" sampledata -e "SELECT TABLE_NAME, TABLE_ROWS FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA = 'sampledata';"
```

To get a summary of tables and their statuses:

```docker
docker exec -i mysql-database-1 mysql -u root -p"password" sampledata -e "SHOW TABLE STATUS;"
```

<figure><img src="https://4179059538-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F5WjeDOi3MsT9tY3oU6mQ%2Fuploads%2FC2JqcCFFJZKg4RchJKQv%2Fimage.png?alt=media&#x26;token=52e97f01-a8a4-4971-a4b0-c02f08dcf832" alt=""><figcaption><p>Check Tables</p></figcaption></figure>
{% endtab %}
{% endtabs %}
{% endtab %}

{% tab title="DBeaver" %}
{% hint style="info" %}

#### DBeaver

Your going to need a database management tool. DBeaver Community is a free, open-source database management tool for personal projects.
{% endhint %}

1. Simpliest option is to download & install from Snapstore.

```bash
cd
sudo apt update && sudo apt upgrade
sudo apt install snapd
sudo snap install dbeaver-ce
```

Or

Go to the official [DBeaver download page](https://dbeaver.io/download/)

{% embed url="<https://dbeaver.io/files/dbeaver-ce_latest_amd64.deb>" %}

Or

To install that DEB file.

```bash
wget https://dbeaver.io/debs/dbeaver.gpg.key -O /tmp/dbeaver.gpg.key
sudo gpg --dearmor -o /usr/share/keyrings/dbeaver.gpg /tmp/dbeaver.gpg.key
echo "deb [signed-by=/usr/share/keyrings/dbeaver.gpg] https://dbeaver.io/debs/dbeaver-ce /" | sudo tee /etc/apt/sources.list.d/dbeaver.list
sudo apt update
sudo apt install dbeaver-ce
```

4. Pin DBeaver to Dash - bottom toolbar.

***

{% hint style="info" %}

#### **MySQL Database**

If you have completed the previous 3 requirements, then you should have a MySQL Docker container, exposed on port:3306 with sampledata databse.
{% endhint %}

1. Launch DBeaver and Select: MySQL.

<figure><img src="https://4179059538-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F5WjeDOi3MsT9tY3oU6mQ%2Fuploads%2FdO4Wo4hKJzPuJW6OCI7C%2Fimage.png?alt=media&#x26;token=72d92d39-22dd-4603-a480-94495aa6a808" alt=""><figcaption><p>MySQL</p></figcaption></figure>

2. Configure the connection with the following properties:

Username: root or pentaho\_user

Password: password

<figure><img src="https://4179059538-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F5WjeDOi3MsT9tY3oU6mQ%2Fuploads%2Fx4hj27wSVxKqareREN8p%2Fimage.png?alt=media&#x26;token=1268171f-8274-4cdb-87af-f0f44c3a5392" alt=""><figcaption><p>Configure &#x26; Test MySQL connection - sampledata</p></figcaption></figure>

{% hint style="warning" %}
You may need to download the supported version of the database driver.

Also enable: allowPublicKeyRetrieval
{% endhint %}

<figure><img src="https://4179059538-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F5WjeDOi3MsT9tY3oU6mQ%2Fuploads%2FMcdTFaqU0LsctCgZfC7S%2Fimage%20(56).png?alt=media&#x26;token=2a981c76-44d2-416d-a6f1-f08152c14740" alt=""><figcaption><p>Enable: allowPublicKeyRetrieval</p></figcaption></figure>

3. Test the connection.

<figure><img src="https://4179059538-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F5WjeDOi3MsT9tY3oU6mQ%2Fuploads%2FMMdsCwDdInxb5tkHhfjS%2Fimage.png?alt=media&#x26;token=420b31e7-86a7-46f6-807f-255c607e3526" alt=""><figcaption><p>Test connection</p></figcaption></figure>

4. Expand: databases > sampledata > Tables

<figure><img src="https://4179059538-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F5WjeDOi3MsT9tY3oU6mQ%2Fuploads%2FE7sRaOXCFgJLUlfRmbyv%2Fimage.png?alt=media&#x26;token=ecf46ab4-f4f5-4f8c-bc48-15b85389da2a" alt=""><figcaption><p>Customer Data</p></figcaption></figure>

5. Open a SQL window and run a test query.

```sql
select * from CUSTOMERS
where COUNTRY = 'USA' and CITY = 'NYC';
```

<figure><img src="https://4179059538-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F5WjeDOi3MsT9tY3oU6mQ%2Fuploads%2FEiNfAWHg3ZxaJX6INeIC%2Fimage.png?alt=media&#x26;token=bec6eaec-6e08-488e-ab04-85001312c550" alt=""><figcaption><p>SQL query - NYC Customers</p></figcaption></figure>
{% endtab %}
{% endtabs %}

<details>

<summary>General Troubleshooting (click to expand)</summary>

**Issue: "permission denied" errors**

* Solution: Ensure your user is in the docker group and you've logged out/in or run `newgrp docker`

**Issue: Docker service won't start**

* Solution: Check logs with `sudo journalctl -u docker.service`

**Issue: Cannot connect to Docker daemon**

* Solution: Ensure Docker service is running with `sudo systemctl start docker`

</details>
