# Docker

{% hint style="info" %}

#### Docker Container&#x20;

Docker container deployment enables you to package and run Pentaho products within portable, production-ready containers. Containerization ensures consistent behavior across development, testing, and production environments while simplifying deployment and scaling operations.

You can create Docker containers for the Pentaho Server, which includes the complete Business Analytics and Data Integration platform with the Pentaho User Console, scheduling services, and repository management. The server supports enterprise database backends including PostgreSQL, MySQL, Oracle, and SQL Server.

For distributed ETL processing, you can deploy Carte server containers that execute transformations and jobs remotely. The Kitchen and Pan command-line tools are also available as containers, enabling integration with CI/CD pipelines and automated batch workflows.

Container deployments are particularly effective for cloud environments where you can quickly scale resources to match data processing demands. By running Pentaho workloads in containers, organizations can optimize infrastructure costs while maintaining the flexibility to move between on-premises and cloud platforms.
{% endhint %}

<figure><img src="/files/Bhejyo4bfBXKZLnXw0M7" alt=""><figcaption><p><em>Docker Container Architecture showing Pentaho Server and PostgreSQL containers</em></p></figcaption></figure>

{% tabs %}
{% tab title="Pentaho Server Container" %}
{% hint style="info" %}
This container runs the complete Pentaho Business Analytics and Data Integration platform on Apache Tomcat 10. It includes the Pentaho User Console (PUC), scheduling services, and all analytics capabilities.&#x20;
{% endhint %}
{% endtab %}

{% tab title="PostgreSQL Container" %}
{% hint style="info" %}
This container provides the relational database backend using PostgreSQL 17. It hosts three critical databases required by Pentaho: Jackrabbit (content repository), Quartz (scheduler), and Hibernate (security and audit). Data is persisted through a Docker volume to survive container restarts.
{% endhint %}

<figure><img src="/files/64dertqWlxEri5oIxnDf" alt=""><figcaption><p><em>PostgreSQL Database Architecture</em></p></figcaption></figure>

&#x20;Pentaho Server requires three separate databases, each serving a distinct purpose:  &#x20;

<table><thead><tr><th width="128" valign="top">Database</th><th width="133" valign="top">Owner</th><th valign="top">Purpose &#x26; Contents</th></tr></thead><tbody><tr><td valign="top">jackrabbit</td><td valign="top">jcr_user</td><td valign="top">Java Content Repository (JCR) - Stores all Pentaho content including reports, dashboards, data sources, analysis schemas, and user files. This is the primary content storage for the Pentaho repository.</td></tr><tr><td valign="top">quartz</td><td valign="top">pentaho_user</td><td valign="top">Quartz Scheduler - Manages all scheduled jobs, triggers, and calendars. Contains tables for job definitions (QRTZ6_JOB_DETAILS), triggers (QRTZ6_TRIGGERS), execution history, and cluster coordination locks.</td></tr><tr><td valign="top">hibernate</td><td valign="top">hibuser</td><td valign="top">Hibernate Repository - Hosts security configuration, audit logging, user session data, and contains two additional schemas: pentaho_dilogs (ETL execution logging) and pentaho_operations_mart (analytics data mart).</td></tr></tbody></table>

{% hint style="info" %}
The hibernate database contains specialized schemas for operational monitoring:

**pentaho\_dilogs**: Captures detailed ETL execution information including job logs, transformation logs, step performance metrics, and error records. Essential for debugging data integration workflows and monitoring pipeline health.

**pentaho\_operations\_mart**: A dimensional data mart for analytics on Pentaho usage. Contains dimension tables (DIM\_DATE, DIM\_TIME, DIM\_EXECUTOR) and fact tables (FACT\_EXECUTION, FACT\_STEP\_EXECUTION) for analyzing platform utilization, performance trends, and user activity.

For production deployments, implement regular backups of the repository-data Docker volume. The jackrabbit database is the most critical as it contains all user content. Consider using pg\_dump for logical backups or volume snapshots for full recovery options.
{% endhint %}
{% endtab %}

{% tab title="Network" %}
{% hint style="info" %}
Key points:

* The Pentaho container connects to PostgreSQL using the service name 'repository' as the hostname
* PostgreSQL listens on port 5432 internally (not exposed to host by default)
* Pentaho Server exposes port 8080, mapped to the host system
* All inter-container traffic remains within the Docker network for security&#x20;
  {% endhint %}

<figure><img src="/files/O24xb2E97EdrynJByGhj" alt=""><figcaption><p><em>Data flow showing HTTP requests and JDBC connections</em></p></figcaption></figure>

Tomcat manages connection pools defined in context.xml. Each pool serves a specific purpose: &#x20;

<table><thead><tr><th valign="top">Pool Name</th><th valign="top">Connection Target</th><th valign="top">Used For</th></tr></thead><tbody><tr><td valign="top">jdbc/Hibernate</td><td valign="top">repository:5432/hibernate</td><td valign="top">Security, Users, Roles</td></tr><tr><td valign="top">jdbc/Quartz</td><td valign="top">repository:5432/quartz</td><td valign="top">Job Scheduling</td></tr><tr><td valign="top">jdbc/jackrabbit</td><td valign="top">repository:5432/jackrabbit</td><td valign="top">Content Repository</td></tr><tr><td valign="top">jdbc/Audit</td><td valign="top">repository:5432/hibernate</td><td valign="top">Audit Logging</td></tr><tr><td valign="top">jdbc/live_logging_info</td><td valign="top">repository:5432/hibernate</td><td valign="top">ETL Runtime Logs</td></tr><tr><td valign="top">jdbc/PDI_Operations_Mart</td><td valign="top">repository:5432/hibernate</td><td valign="top">Operations Analytics</td></tr></tbody></table>

{% hint style="info" %}
When a user accesses Pentaho Server:

&#x20;1\.    User's browser sends HTTP request to localhost:8080

2\.    Docker forwards the request to Pentaho container's port 8080

3\.    Tomcat receives request and routes to pentaho.war web application

4\.    Application retrieves/stores data via JDBC connection pools

5\.    JDBC connections route to 'repository:5432' (PostgreSQL container)

6\.    Response flows back through the same path to user's browser
{% endhint %}
{% endtab %}

{% tab title="Volume Mapping" %}
{% hint style="info" %}

#### Volume Mapping

The deployment uses both named Docker volumes and bind mounts for persistence and configuration:
{% endhint %}

```
Docker Volumes:
  vault_data             -> /vault/data
  pentaho_postgres_data  -> /var/lib/postgresql/data
  pentaho_solutions      -> /opt/pentaho/pentaho-server/pentaho-solutions
  pentaho_data           -> /opt/pentaho/pentaho-server/data
 
Bind Mounts:
  ./softwareOverride     -> /docker-entrypoint-init (ro)
  ./db_init_postgres     -> /docker-entrypoint-initdb.d (ro)
  ./postgres-config      -> /etc/postgresql/conf.d (ro)
  ./config/.kettle       -> /home/pentaho/.kettle
  ./config/.pentaho      -> /home/pentaho/.pentaho
  ./vault/config         -> /vault/config (ro)
  ./scripts              -> /scripts (ro)
```

{% endtab %}
{% endtabs %}

***

{% hint style="danger" %}
Before you begin the Docker deployment, ensure you have completed the Setup: [Pentaho Containers](/pentaho-11-installation-en/setup/pentaho-containers.md)
{% endhint %}

Run through the following steps to deploy Pentaho Server with PostgreSQL 15 repository.

{% tabs %}
{% tab title="1. Prepare Environment" %}
{% hint style="info" %}

#### Prepare Environment

Check Docker is up and running:

* Copy Pentaho-Server-PostgreSQL assets
* Copy over `pentaho-server-ee-11.0.0.0-237.zip`
* Verify Docker & Docker Compose
* Check ports
  {% endhint %}

{% hint style="danger" %}
Ensure you have downloaded: `pentaho-server-ee-11.0.0.0-237.zip`
{% endhint %}

1. Create project directory & copy over assets.

```bash
cd
cp -r ~/Workshop--Installation/Pentaho-Containers/On-Prem/Pentaho-Server-PostgreSQL .
```

2. Copy over the pentaho-server-ee-11.0.0.0-237.zip /docker/stagedArtefacts directory.

{% hint style="info" %}
If you have deployed an Archive Pentaho Server then copy from:&#x20;

`/opt/pentaho/software/pentaho-server-ee-version`

Otherwise download package from the [Pentaho Customer Portal](https://support.pentaho.com/hc/en-us).
{% endhint %}

```bash
cd
cd ~/Pentaho-Server-PostgreSQL/docker/stagedArtifacts
cp /opt/pentaho/software/server/pentaho-server-ee-11.0.0.0-237.zip . 
```

3. Verify that the file.

```bash
cd
cd ~/Pentaho-Server-PostgreSQL/docker/stagedArtifacts
ls -al
```

4. Check the Docker version.

```bash
docker --version
# Expected output: Docker version 29.0.2 or higher
```

5. Check Docker Compose version.

```bash
docker compose --version
# Expected output: Docker Compose version 2.40.3 or higher
```

```bash
sudo apt install docker-compose
# installs Docker Compose
```

6. Verify Docker daemon is running.

```bash
docker info
# Should display system-wide information without errors
```

7. Check port 8080 / 8090 is available on Host OS.

```bash
sudo lsof -i ::8080
```

{% hint style="info" %}
If port 8080 is in use by another application, you can change the PORT variable in the .env file to any available port (e.g. 8090, 8081, 9090).
{% endhint %}

8. Pentaho Server requires a valid license. The `.env` file contains a LICENSE\_URL pointing to the Flexera license server. Ensure your license entitlements are active before deployment.

{% hint style="warning" %}
Without a valid license, Pentaho Server will start but many features will be disabled. Verify your license status before proceeding with production deployments.
{% endhint %}
{% endtab %}

{% tab title="2. Directory Layout" %}
{% hint style="info" %}

#### Directory Layout

This deployment configuration provides several important capabilities:

* Completely self-contained and portable deployment.
* Automated database initialization with SQL scripts.
* Health checks and proper startup ordering between services.
* Persistent data volumes for database and Pentaho content.
* HashiCorp Vault for secrets management with AppRole authentication.
* Read-only containers with tmpfs mounts for security.
* Resource limits (CPU/memory) for stability.
* Log rotation to prevent disk exhaustion.
* Software override system for customizing configurations without modifying core files.
* Production-ready configuration templates.
* PostgreSQL JDBC driver included
* Easy backup and restore procedures
  {% endhint %}

{% hint style="success" %}
Check out the other repository deployment options at:

```
~/Workshop--Installation/Pentaho-Containers/On-Prem/
```

{% endhint %}

***

**Root Directory Files**

```
Pentaho-Server-PostgreSQL/
├── README.md             # Main documentation file
├── ARCHITECTURE.md       # System architecture details
├── CONFIGURATION.md      # Configuration reference guide
├── TROUBLESHOOTING.md    # Problem solving guide
├── docker-compose.yml    # Docker Compose service definitions
├── Makefile              # Convenience targets (make help)
├── deploy.sh             # Automated deployment script
├── .env                  # Environment configuration (created)
├── .env.template         # Environment template with defaults
```

{% hint style="info" %}
**Documentation Files:**

**README.md** - The main entry point documentation providing project overview, quick start instructions, prerequisites, and general usage information for the workshop.

**ARCHITECTURE.md** - Detailed technical documentation covering the system architecture, component relationships, container design, networking, data flow, and architectural decisions for the Docker-based deployment.

**CONFIGURATION.md** - Comprehensive configuration reference guide detailing all available environment variables, configuration options, customization parameters, and settings for both Pentaho Server and PostgreSQL components.

**TROUBLESHOOTING.md** - Problem-solving guide with common issues, error messages, diagnostic procedures, and solutions for deployment and runtime problems you might encounter.

**Orchestration & Deployment:**

**docker-compose.yml** - The Docker Compose service definitions file that declares all containers (Pentaho Server, PostgreSQL, potentially Vault/other services), their configurations, networking, volumes, and dependencies.

**Makefile** - Contains convenience command targets for common operations like building, starting, stopping, and cleaning up the environment. Users can run `make help` to see available commands.

**deploy.sh** - Automated deployment script that likely handles the complete deployment workflow including environment validation, building images, starting services, and initial configuration.

**Environment Configuration:**

**.env** - The active environment configuration file (created from template) containing actual values for database passwords, ports, hostnames, and other environment-specific settings. This file is typically git-ignored.

**.env.template** - The template file with default values and placeholders that users copy to create their `.env` file, providing a reference for all configurable environment variables.
{% endhint %}

***

**Docker Build Context**

```
├── docker/
│   ├── Dockerfile                # Multi-stage Pentaho Server image build
│   ├── entrypoint/
│   │   └── docker-entrypoint.sh  # Container startup script
│   └── stagedArtifacts/
│       └── pentaho-server-ee-11.0.0.0-237.zip
```

{% hint style="info" %}
The **docker/** directory contains all the core components needed to build and run the Pentaho Server containerized deployment:

**Dockerfile** - This is the main build configuration using a multi-stage build approach to create the Pentaho Server container image. Multi-stage builds help optimize the final image size by separating the build environment from the runtime environment.

**entrypoint/** directory contains the **docker-entrypoint.sh** script, which is the initialization script that runs when the container starts. This typically handles tasks like environment setup, configuration management, health checks, and starting the Pentaho Server services.

**stagedArtifacts/** directory serves as the staging area for the Pentaho Server installation package. It currently contains **pentaho-server-ee-11.0.0.0-237.zip**, which is the Enterprise Edition version 11.0.0.0 build 237 that gets extracted and installed during the Docker image build process.
{% endhint %}

***

**PostgreSQL Repsoitory Database Initialization**

```
├── db_init_postgres/
│   ├── 1_create_jcr_postgresql.sql         # Jackrabbit content repo
│   ├── 2_create_quartz_postgresql.sql      # Quartz scheduler
│   ├── 3_create_repository_postgresql.sql  # Hibernate repository
│   ├── 4_pentaho_logging_postgresql.sql    # Audit/DI logging schema
│   └── 5_pentaho_mart_postgresql.sql       # Operations mart schema
```

{% hint style="info" %}
The **db\_init\_postgres/** directory contains the PostgreSQL database initialization scripts that set up all the required schemas for Pentaho Server 11. These scripts are numbered to execute in a specific sequence:

**1\_create\_jcr\_postgresql.sql** - Creates the **Jackrabbit Content Repository (JCR)** schema, which stores the Pentaho repository content including solution files, schedules, reports, dashboards, and metadata. This is the core content management system for Pentaho.

**2\_create\_quartz\_postgresql.sql** - Sets up the **Quartz Scheduler** schema, which manages all scheduled jobs and tasks within Pentaho Server, including report generation, ETL executions, and other automated processes.

**3\_create\_repository\_postgresql.sql** - Creates the **Hibernate Repository** schema, which stores user authentication, authorization data, roles, permissions, and other security-related information managed by Pentaho's security subsystem.

**4\_pentaho\_logging\_postgresql.sql** - Establishes the **Audit and Data Integration (DI) Logging** schema for capturing execution logs, transformation/job metrics, and audit trail information from PDI processes running on the server.

**5\_pentaho\_mart\_postgresql.sql** - Creates the **Operations Mart** schema, which stores operational analytics data about Pentaho Server usage, performance metrics, and system monitoring information used by the Pentaho Operations Mart dashboard.
{% endhint %}

***

**PostgreSQL and Vault Configuration**

```
├── postgres-config/
│   ├── custom.conf             # PostgreSQL performance tuning
│   └── pg_hba.conf             # Client authentication config
├── vault/
│   ├── config/
│   │   └── vault.hcl           # Vault server configuration
│   └── policies/
│       └── pentaho-policy.hcl  # Pentaho access policy
├── secrets/
│   └── postgres_password.txt   # Docker secrets file
```

{% hint style="info" %}
**postgres-config/** - PostgreSQL Configuration

**custom.conf** - Custom PostgreSQL performance tuning parameters optimized for Pentaho Server workloads. This likely includes settings for shared buffers, work memory, connection limits, checkpoint configurations, and other performance-related parameters tailored to handle Pentaho's database requirements.

**pg\_hba.conf** - PostgreSQL Host-Based Authentication configuration file that controls client connection authentication methods, IP address access rules, and security policies for database connections from the Pentaho Server container.

***

**vault/** - HashiCorp Vault Integration

This directory supports incorporating Vault for secrets management:

**config/vault.hcl** - The HashiCorp Vault server configuration file defining storage backend, listener settings, API endpoints, seal/unseal behavior, and general Vault server operational parameters.

**policies/pentaho-policy.hcl** - Vault access control policy specifically for Pentaho Server, defining which secrets paths the Pentaho application can read, write, or manage. This enforces least-privilege access to sensitive credentials.

***

**secrets/** - Docker Secrets Management

**postgres\_password.txt** - A Docker secrets file containing the PostgreSQL password. When using Docker secrets (or Vault integration), this file provides the database credentials in a secure manner rather than passing them as plain environment variables. The file should have restricted permissions and is typically referenced by Docker Compose using the `secrets:` configuration.
{% endhint %}

<table><thead><tr><th valign="top">Key</th><th valign="top">Description</th></tr></thead><tbody><tr><td valign="top">postgres_password</td><td valign="top">PostgreSQL superuser password</td></tr><tr><td valign="top">pentaho_user</td><td valign="top">Pentaho database username</td></tr><tr><td valign="top">pentaho_password</td><td valign="top">Pentaho database password</td></tr><tr><td valign="top">jdbc_url</td><td valign="top">JDBC connection URL</td></tr></tbody></table>

***

**Pentaho Configuration Overrides**

```
├── softwareOverride/
│   ├── 1_drivers/                              # JDBC drivers
│   │   └── tomcat/lib/
│   ├── 2_repository/                           # Database configuration
│   │   ├── pentaho-solutions/system/
│   │   │   ├── hibernate/
│   │   │   ├── jackrabbit/
│   │   │   └── scheduler-plugin/quartz/
│   │   └── tomcat/webapps/pentaho/META-INF/
│   ├── 3_security/                             # Authentication settings
│   │   └── pentaho-solutions/system/
│   └── 4_others/                               # Tomcat and app settings
│       ├── pentaho-solutions/system/
│       └── tomcat/
```

{% hint style="info" %}
The **softwareOverride/** directory contains customized configuration files and components that override the default Pentaho Server installation. The numbered structure ensures a logical organization and potentially an ordered application during the Docker build process:

***

**1\_drivers/** - JDBC Database Drivers

**tomcat/lib/** - Contains JDBC driver JAR files (specifically the PostgreSQL JDBC driver) that get copied into Tomcat's library directory, enabling Pentaho Server to connect to PostgreSQL databases.

***

**2\_repository/** - Database Repository Configuration

This section configures Pentaho's connection to all PostgreSQL-backed repositories:

**pentaho-solutions/system/hibernate/** - Hibernate repository configuration files (repository.xml, hibernate-settings.xml) for user/role security data

**pentaho-solutions/system/jackrabbit/** - Jackrabbit JCR repository configuration (repository.xml) for content storage

**pentaho-solutions/system/scheduler-plugin/quartz/** - Quartz scheduler database configuration (quartz.properties) for job scheduling

**tomcat/webapps/pentaho/META-INF/** - Contains context.xml with JNDI datasource definitions for all Pentaho databases (Quartz, Jackrabbit, Hibernate, Audit, Operations Mart)

***

**3\_security/** - Authentication & Security Settings

**pentaho-solutions/system/** - Security configuration files including applicationContext-security.xml, security.properties, and potentially LDAP/SSO configurations for authentication and authorization.

***

**4\_others/** - Additional Tomcat & Application Settings

**pentaho-solutions/system/** - Other system-level configurations like pentaho.xml, pentaho-spring-beans.xml, log4j settings, and application behavior configurations

**tomcat/** - Tomcat server customizations including server.xml, web.xml, setenv.sh for JVM parameters, and other Tomcat-specific tuning
{% endhint %}

**Utility Scripts**

```
├── scripts/
│   ├── backup-postgres.sh      # Database backup utility
│   ├── restore-postgres.sh     # Database restore utility
│   ├── backup-vault.sh         # Vault credentials backup
│   ├── restore-vault.sh        # Vault credentials restore
│   ├── rotate-secrets.sh       # Password rotation script
│   ├── fetch-secrets.sh        # Secret retrieval helper
│   ├── vault-init.sh           # Vault initialization
│   └── validate-deployment.sh  # Deployment validation
```

{% hint style="info" %}
The **scripts/** directory contains operational and maintenance utilities for managing the Pentaho Server deployment, organized by functional area:

***

**Database Management:**

**backup-postgres.sh** - Automated PostgreSQL backup utility that creates dumps of all Pentaho databases (JCR, Quartz, Hibernate, Audit, Operations Mart). Likely includes timestamping, compression, and backup retention logic.

**restore-postgres.sh** - Database restoration utility to recover Pentaho databases from backup files, useful for disaster recovery, environment cloning, or migrating data between instances.

***

**Vault/Secrets Management:**

**backup-vault.sh** - HashiCorp Vault credentials and unseal keys backup script, ensuring recovery capability for the Vault instance containing sensitive Pentaho credentials.

**restore-vault.sh** - Vault restoration utility to recover Vault data and re-initialize the secrets management system from backup.

**rotate-secrets.sh** - Automated password rotation script that updates database passwords and other sensitive credentials in Vault, then propagates changes to Pentaho Server configuration - supporting security best practices.

**fetch-secrets.sh** - Helper utility to retrieve secrets from Vault programmatically, useful for scripts that need to access credentials without hardcoding them.

**vault-init.sh** - Initial Vault setup script that handles Vault initialization, unsealing, creating the Pentaho policy, and storing initial secrets for the deployment.

***

**Operations & Validation:**

**validate-deployment.sh** - Deployment validation script that performs health checks on all components (PostgreSQL connectivity, Pentaho Server startup, Vault accessibility, service availability), confirming the environment is properly configured and operational.
{% endhint %}

***

**User Configuration and Data Storage**

```
├── config/
│   ├── .kettle/                # PDI/Kettle configuration
│   │   └── kettle.properties
│   └── .pentaho/               # Pentaho user settings
├── backups/                    # Database backup storage
│   └── *.sql.gz                # Compressed SQL backups
└── logs/                       # Application logs (optional)
```

{% hint style="info" %}
**config/** - Application Configuration

This directory stores user-level and application-level configuration files:

**`.kettle/`** - PDI (Pentaho Data Integration) / Kettle configuration directory

* **kettle.properties** - Contains Kettle/PDI environment variables, connection parameters, system properties, and global settings used by transformation and job executions running on Pentaho Server.

**`.pentaho/`** - Pentaho user settings directory for storing user-specific preferences, cached metadata, and application state information.

***

**backups/** - Database Backup Storage

**`*.sql.gz`** - Repository for compressed PostgreSQL database backup files created by the `backup-postgres.sh` script. The gzip compression reduces storage requirements while maintaining complete database snapshots for disaster recovery, environment cloning, or rollback scenarios. Backup files are likely timestamped for version tracking.

***

**logs/** - Application Logging

Centralized logging directory for capturing runtime logs from all services. This likely includes:

* Pentaho Server application logs (catalina.out, pentaho.log)
* PostgreSQL database logs
* Vault service logs
* Docker container logs
* ETL execution logs

This supports the **log rotation configuration** and monitoring capabilities you've been incorporating into your deployment, making troubleshooting and auditing easier during workshops.
{% endhint %}

***

**Key Files**

<table><thead><tr><th valign="top">File</th><th valign="top">Purpose</th></tr></thead><tbody><tr><td valign="top">docker-compose.yml</td><td valign="top">Defines all services (pentaho-server, postgres), networks, and volumes</td></tr><tr><td valign="top">docker/Dockerfile</td><td valign="top">Multi-stage build using debian:trixie-slim with OpenJDK 21</td></tr><tr><td valign="top">docker-entrypoint.sh</td><td valign="top">Processes softwareOverride directories at container startup</td></tr><tr><td valign="top">.env</td><td valign="top">Environment-specific configuration (ports, passwords, memory)</td></tr><tr><td valign="top">deploy.sh</td><td valign="top">Automated deployment with pre-flight validation checks</td></tr><tr><td valign="top">db_init_postgres/*.sql</td><td valign="top">PostgreSQL database initialization scripts</td></tr><tr><td valign="top">vault-init.sh</td><td valign="top">Initializes Vault and stores secrets</td></tr><tr><td valign="top">rotate-secrets.sh</td><td valign="top">Rotates database passwords securely</td></tr></tbody></table>
{% endtab %}

{% tab title="3. Pre-flight Tasks" %}
{% hint style="info" %}

#### Pre-flight Taks

The Pre-flight Tasks section outlines the essential preparation steps needed before deploying Pentaho Server 11 in Docker containers.&#x20;

First, you need to configure the environment variables by editing the `.env.template` file with your deployment-specific settings. This includes defining the Pentaho version and image details, PostgreSQL credentials and port configuration (defaulting to 5432), Pentaho HTTP and HTTPS ports (8090 and 8443), JVM memory allocation (minimum 4GB, maximum 8GB), the license server URL, and Vault port settings. Once configured, this template is saved as the active `.env` file.

PostgreSQL performance tuning is handled through the `postgres-config/custom.conf` file, where you can customize connection limits (defaulting to 200 max connections), memory allocation parameters including shared buffers and cache sizes, and other performance optimizations specifically tuned for containerized environments.

Finally, the `softwareOverride/` directory provides an optional mechanism for customizing Pentaho configurations without modifying core installation files. The PostgreSQL JDBC driver comes included by default, but you can optionally upgrade it by downloading from Maven Central or copying from the workshop's database drivers collection. This preparation ensures all required files, configurations, and credentials are properly staged before running the automated deployment script.
{% endhint %}

***

**Configure .env**

1. Edit the .env.template

```bash
cd
cd ~/Pentaho-Server-PostgreSQL
nano .env.template
```

2. Enter the following details:

<table><thead><tr><th valign="top">Variable</th><th valign="top">Default</th><th valign="top">Description</th></tr></thead><tbody><tr><td valign="top">PENTAHO_VERSION</td><td valign="top">11.0.0.0-237</td><td valign="top">Pentaho Server version</td></tr><tr><td valign="top">PENTAHO_IMAGE_NAME</td><td valign="top">pentaho/pentaho-server</td><td valign="top">Docker image name</td></tr><tr><td valign="top">PENTAHO_IMAGE_TAG</td><td valign="top">11.0.0.0-237</td><td valign="top">Docker image tag</td></tr><tr><td valign="top">POSTGRES_PASSWORD</td><td valign="top">password</td><td valign="top">PostgreSQL root password</td></tr><tr><td valign="top">POSTGRES_PORT</td><td valign="top">5432</td><td valign="top">PostgreSQL exposed port</td></tr><tr><td valign="top">PENTAHO_HTTP_PORT</td><td valign="top">8090</td><td valign="top">Pentaho HTTP port</td></tr><tr><td valign="top">PENTAHO_HTTPS_PORT</td><td valign="top">8443</td><td valign="top">Pentaho HTTPS port</td></tr><tr><td valign="top">PENTAHO_MIN_MEMORY</td><td valign="top">4096m</td><td valign="top">JVM minimum heap size</td></tr><tr><td valign="top">PENTAHO_MAX_MEMORY</td><td valign="top">8192m</td><td valign="top">JVM maximum heap size</td></tr><tr><td valign="top">LICENSE_URL</td><td valign="top">(empty)</td><td valign="top">EE license server URL</td></tr><tr><td valign="top">VAULT_PORT</td><td valign="top">8200</td><td valign="top">Vault API port</td></tr></tbody></table>

3. Save:

```
CTRL + o
Enter
CTRL + x
```

4. Create .env&#x20;

```bash
cd
cd ~/Pentaho-Server-PostgreSQL
cp .env.template .env
```

***

**Customize postgres-config/custom.conf**

1. Edit the .env.template

```bash
cd
cd ~/Pentaho-Server-PostgreSQL/progres-config
nano custom.conf
```

2. Enter the following details:

```conf
# Connection limits
max_connections = 200

# Memory (adjust based on available RAM)
shared_buffers = 256MB
effective_cache_size = 768MB
work_mem = 16MB

# Performance
random_page_cost = 1.1
effective_io_concurrency = 200
```

3. Save:

```
CTRL + o
Enter
CTRL + x
```

***

**softwareOverride**

{% hint style="info" %}
The `softwareOverride/` directory provides a powerful mechanism to customize Pentaho Server without modifying the core installation. Files are copied into the Pentaho installation during container startup, processed in alphabetical order by directory name.
{% endhint %}

````
```
softwareOverride/
├── 1_drivers/           # JDBC drivers and data connectors
│   ├── tomcat/lib/
│   │   └── postgresql-42.x.x.jar    # PostgreSQL JDBC driver (included)
│   └── pentaho-solutions/drivers/    # Big data drivers (.kar files)
├── 2_repository/        # Database repository configuration
│   ├── pentaho-solutions/system/
│   │   ├── hibernate/hibernate-settings.xml
│   │   ├── jackrabbit/repository.xml
│   │   └── scheduler-plugin/quartz/quartz.properties
│   └── tomcat/webapps/pentaho/META-INF/context.xml
├── 3_security/          # Authentication and authorization
│   └── pentaho-solutions/system/
│       ├── applicationContext-spring-security-hibernate.properties
│       └── applicationContext-spring-security-memory.xml
├── 4_others/            # Tomcat, defaults, and miscellaneous
│   ├── pentaho-solutions/system/
│   │   ├── defaultUser.spring.properties
│   │   ├── pentaho.xml
│   │   └── security.properties
│   └── tomcat/
│       ├── bin/startup.sh
│       └── webapps/pentaho/WEB-INF/web.xml
└── 99_exchange/         # User data exchange (not auto-processed)
```
````

The PostgreSQL JDBC driver is included in the Pentaho distribution. If you need to upgrade:

1. Download from [Maven Central](https://repo1.maven.org/maven2/org/postgresql/postgresql/)
2. Place in `softwareOverride/1_drivers/tomcat/lib/`

Or

Copy from Workshop--Installation/'Database Drivers'/

```bash
cd
cd ~/Workshop--Installation/'Database Drivers'
cp postgresql-42.7.8.jar ~/Pentaho-Server-PostgreSQL/softwareOverride/1_drivers/tomcat/lib
```

{% endtab %}

{% tab title="4. Deployment" %}
{% hint style="info" %}

#### Deployment

This section walks through the deployment process using either the automated script or manual commands.
{% endhint %}

Select Deployment option:

{% tabs %}
{% tab title="Automated" %}
{% hint style="info" %}

#### Automated Deployment

The deploy.sh script automates the entire deployment process with pre-flight validation:

./deploy.sh

The script performs the following actions:

* Validates Docker and Docker Compose installation
* Verifies Pentaho package exists in: `docker/stagedArtifacts/`
* Creates `.env` from template if missing
* Checks disk space (10GB minimum)
* Verifies required ports are available
* Builds the Pentaho Server Docker image
* Starts PostgreSQL and waits for health check
* Starts Pentaho Server and monitors startup
* Displays access URLs and credentials
  {% endhint %}

1\. Set execute permissions on the deployment scripts.

```bash
cd
cd ~/Pentaho-Server-PostgreSQL
chmod +x deploy.sh
chmod +x scripts/*.sh
```

2. Deploy the containers.

{% hint style="danger" %}
Ensure you dont have a postgresql service up and running:

```bash
systemctl stop postgresql
```

{% endhint %}

```bash
cd
cd ~/Pentaho-Server-PostgreSQL && ./deploy.sh
```

{% tabs %}
{% tab title="Pre-flight & Building Phase" %}
{% hint style="info" %}
**Pre-Flight Checks ✓**

The script validates the environment before starting:

* Docker is installed
* Docker Compose is installed
* Docker daemon is running
* Pentaho package found
* `.env` file exists
* Sufficient disk space (414GB available)
* Port 8090 (Pentaho HTTP) is available
* Port 5432 (PostgreSQL) is available

**Building Phase**

A custom Docker image is built with **24 build steps** taking approximately **5-10 minutes**:

* Base image: `debian:trixie-slim`
* Installs system packages via `apt-get update` and `apt-get upgrade`
* Installs **OpenJDK 21 JRE headless** with `curl` and `rm`
* Creates a `pentaho` user and group (GID 5000)
* (Optional) Installs Pentaho plugins (PAZ, PIR, PDD)
* Copies Pentaho installation to `/opt/pentaho/`
* Exports layers and manifests
* **Final image**: `pentaho/pentaho-server:11.0.0.0-237`
  {% endhint %}

<figure><img src="/files/DHr0ULb81axeVfO3xtLT" alt=""><figcaption><p>Pre-flight checks &#x26; Build</p></figcaption></figure>
{% endtab %}

{% tab title="Deployment & Final Status" %}
{% hint style="info" %}
**Starting PostgreSQL Database**

Pulls **PostgreSQL 15** image and related layers:

* Creates network: `pentaho-server-postgresql_pentaho-net`
* Creates volume: `pentaho-server-postgresql_pentaho_postgres_data`
* Creates container: `pentaho-postgres`
* Waits for PostgreSQL readiness: **✓ PostgreSQL is ready**

**Starting Pentaho Server**

This phase takes **2-3 minutes for first-time initialization**:

**Pulls HashiCorp Vault 1.15 image** for secrets management

**Creates volumes:**

* `pentaho-server-postgresql_pentaho_solutions`
* `pentaho-server-postgresql_pentaho_data`
* `pentaho-server-postgresql_vault_data`
  {% endhint %}

<figure><img src="/files/W6Bldy0xgCA17N3X7Ka8" alt=""><figcaption><p>Deploy Containers</p></figcaption></figure>

{% hint style="info" %}
**Final Status** 🎉

The deployment is successful and provides you with:

**Pentaho Server Access:**

* URL: `http://localhost:8090/pentaho`
* Login: `admin` / `password`

**PostgreSQL Database:**

* Host: `localhost:5432`
* Login: `postgres` / `password`&#x20;
  {% endhint %}

| Action           | Command                  |
| ---------------- | ------------------------ |
| View logs        | `docker compose logs -f` |
| Stop services    | `docker compose stop`    |
| Start services   | `docker compose start`   |
| Restart services | `docker compose restart` |
| Shutdown         | `docker compose down`    |

{% hint style="info" %}
**Helper scripts provided:**

* `./scripts/backup-postgres.sh` -- Backup database
* `./scripts/restore-postgres.sh <backup-file>` -- Restore database
* `./scripts/validate-deployment.sh` -- Validate deployment
  {% endhint %}

<figure><img src="/files/2SSnhsbBv6OkmRCQQkjF" alt=""><figcaption><p>Helper</p></figcaption></figure>
{% endtab %}
{% endtabs %}
{% endtab %}

{% tab title="Manual Build" %}
{% hint style="info" %}

#### Manual Build

{% endhint %}

1. Build the Pentaho Server image.

```bash
docker compose build --no-cache pentaho-server
```

{% hint style="info" %}
This process takes approximately 5-10 minutes as it extracts the Pentaho package and configures the image.
{% endhint %}

2. Start PostgreSQL database

```bash
docker compose up -d postgres
 
# Wait for PostgreSQL to be healthy
docker compose logs -f postgres
```

{% hint style="info" %}
Watch for the message indicating PostgreSQL is ready to accept connections.
{% endhint %}

3. Start the Pentaho Server.

```bash
docker compose up -d pentaho-server
 
# Monitor startup progress
docker compose logs -f pentaho-server
```

{% hint style="info" %}
The Pentaho Server typically takes 2-3 minutes for first-time initialization. Watch for the message:

```
Server startup in [X] milliseconds
```

{% endhint %}
{% endtab %}
{% endtabs %}

3. Verify container status.

```bash
docker compose ps
```

```bash
cd
cd ~/Pentaho-Server-PostgreSQL
make status
```

4. Run validation script.

```bash
cd
cd ~/Pentaho-Server-PostgreSQL/scripts && ./validate-deployment.sh
```

<figure><img src="/files/S5KL94OfbrSIbxldNKZo" alt=""><figcaption><p>Validate Deployment</p></figcaption></figure>

5. Open a web browser and navigate to:

{% embed url="<http://localhost:8090/pentaho>" %}

5. Login with the default credentials:

| Username | Admin    |
| -------- | -------- |
| Password | password |

6. Enter the Licensing Server URL

<figure><img src="/files/mPIIpegM6yi49gZneyuV" alt=""><figcaption><p>Enter licensing details</p></figcaption></figure>
{% endtab %}

{% tab title="5. Backup & Recovery" %}
{% hint style="info" %}

#### Backup & Recovery

Implement regular backups to protect your Pentaho data and configuration.
{% endhint %}

1. Create a compressed backup of Pentaho databases.

```bash
./scripts/backup-postgres.sh
 
# Backups are saved to backups/ directory with timestamp
# Example: backups/pentaho-postgres-backup-20260113-143022.sql.gz

```

<figure><img src="/files/qAkw6qnIbq7ekO3Qi32E" alt=""><figcaption><p>Backup script</p></figcaption></figure>
{% endtab %}
{% endtabs %}


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://academy.pentaho.com/pentaho-11-installation-en/installation/containers/docker.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
