Docker
Docker deployment - Pentaho 11 + PostgreSQL 15 repository ..
Docker Container
Docker container deployment enables you to package and run Pentaho products within portable, production-ready containers. Containerization ensures consistent behavior across development, testing, and production environments while simplifying deployment and scaling operations.
You can create Docker containers for the Pentaho Server, which includes the complete Business Analytics and Data Integration platform with the Pentaho User Console, scheduling services, and repository management. The server supports enterprise database backends including PostgreSQL, MySQL, Oracle, and SQL Server.
For distributed ETL processing, you can deploy Carte server containers that execute transformations and jobs remotely. The Kitchen and Pan command-line tools are also available as containers, enabling integration with CI/CD pipelines and automated batch workflows.
Container deployments are particularly effective for cloud environments where you can quickly scale resources to match data processing demands. By running Pentaho workloads in containers, organizations can optimize infrastructure costs while maintaining the flexibility to move between on-premises and cloud platforms.

This container runs the complete Pentaho Business Analytics and Data Integration platform on Apache Tomcat 10. It includes the Pentaho User Console (PUC), scheduling services, and all analytics capabilities.
This container provides the relational database backend using PostgreSQL 17. It hosts three critical databases required by Pentaho: Jackrabbit (content repository), Quartz (scheduler), and Hibernate (security and audit). Data is persisted through a Docker volume to survive container restarts.

Pentaho Server requires three separate databases, each serving a distinct purpose:
jackrabbit
jcr_user
Java Content Repository (JCR) - Stores all Pentaho content including reports, dashboards, data sources, analysis schemas, and user files. This is the primary content storage for the Pentaho repository.
quartz
pentaho_user
Quartz Scheduler - Manages all scheduled jobs, triggers, and calendars. Contains tables for job definitions (QRTZ6_JOB_DETAILS), triggers (QRTZ6_TRIGGERS), execution history, and cluster coordination locks.
hibernate
hibuser
Hibernate Repository - Hosts security configuration, audit logging, user session data, and contains two additional schemas: pentaho_dilogs (ETL execution logging) and pentaho_operations_mart (analytics data mart).
The hibernate database contains specialized schemas for operational monitoring:
pentaho_dilogs: Captures detailed ETL execution information including job logs, transformation logs, step performance metrics, and error records. Essential for debugging data integration workflows and monitoring pipeline health.
pentaho_operations_mart: A dimensional data mart for analytics on Pentaho usage. Contains dimension tables (DIM_DATE, DIM_TIME, DIM_EXECUTOR) and fact tables (FACT_EXECUTION, FACT_STEP_EXECUTION) for analyzing platform utilization, performance trends, and user activity.
For production deployments, implement regular backups of the repository-data Docker volume. The jackrabbit database is the most critical as it contains all user content. Consider using pg_dump for logical backups or volume snapshots for full recovery options.
Key points:
The Pentaho container connects to PostgreSQL using the service name 'repository' as the hostname
PostgreSQL listens on port 5432 internally (not exposed to host by default)
Pentaho Server exposes port 8080, mapped to the host system
All inter-container traffic remains within the Docker network for security

Tomcat manages connection pools defined in context.xml. Each pool serves a specific purpose:
jdbc/Hibernate
repository:5432/hibernate
Security, Users, Roles
jdbc/Quartz
repository:5432/quartz
Job Scheduling
jdbc/jackrabbit
repository:5432/jackrabbit
Content Repository
jdbc/Audit
repository:5432/hibernate
Audit Logging
jdbc/live_logging_info
repository:5432/hibernate
ETL Runtime Logs
jdbc/PDI_Operations_Mart
repository:5432/hibernate
Operations Analytics
When a user accesses Pentaho Server:
1. User's browser sends HTTP request to localhost:8080
2. Docker forwards the request to Pentaho container's port 8080
3. Tomcat receives request and routes to pentaho.war web application
4. Application retrieves/stores data via JDBC connection pools
5. JDBC connections route to 'repository:5432' (PostgreSQL container)
6. Response flows back through the same path to user's browser
Before you begin the Docker deployment, ensure you have completed the Setup: Pentaho Containers
Run through the following steps to deploy Pentaho Server with PostgreSQL 15 repository.
Prepare Environment
Check Docker is up and running:
Copy Pentaho-Server-PostgreSQL assets
Copy over
pentaho-server-ee-11.0.0.0-237.zipVerify Docker & Docker Compose
Check ports
Ensure you have downloaded: pentaho-server-ee-11.0.0.0-237.zip
Create project directory & copy over assets.
Copy over the pentaho-server-ee-11.0.0.0-237.zip /docker/stagedArtefacts directory.
If you have deployed an Archive Pentaho Server then copy from:
/opt/pentaho/software/pentaho-server-ee-version
Otherwise download package from the Pentaho Customer Portal.
Verify that the file.
Check the Docker version.
Check Docker Compose version.
Verify Docker daemon is running.
Check port 8080 / 8090 is available on Host OS.
If port 8080 is in use by another application, you can change the PORT variable in the .env file to any available port (e.g. 8090, 8081, 9090).
Pentaho Server requires a valid license. The
.envfile contains a LICENSE_URL pointing to the Flexera license server. Ensure your license entitlements are active before deployment.
Without a valid license, Pentaho Server will start but many features will be disabled. Verify your license status before proceeding with production deployments.
Directory Layout
This deployment configuration provides several important capabilities:
Completely self-contained and portable deployment.
Automated database initialization with SQL scripts.
Health checks and proper startup ordering between services.
Persistent data volumes for database and Pentaho content.
HashiCorp Vault for secrets management with AppRole authentication.
Read-only containers with tmpfs mounts for security.
Resource limits (CPU/memory) for stability.
Log rotation to prevent disk exhaustion.
Software override system for customizing configurations without modifying core files.
Production-ready configuration templates.
PostgreSQL JDBC driver included
Easy backup and restore procedures
Check out the other repository deployment options at:
Root Directory Files
Documentation Files:
README.md - The main entry point documentation providing project overview, quick start instructions, prerequisites, and general usage information for the workshop.
ARCHITECTURE.md - Detailed technical documentation covering the system architecture, component relationships, container design, networking, data flow, and architectural decisions for the Docker-based deployment.
CONFIGURATION.md - Comprehensive configuration reference guide detailing all available environment variables, configuration options, customization parameters, and settings for both Pentaho Server and PostgreSQL components.
TROUBLESHOOTING.md - Problem-solving guide with common issues, error messages, diagnostic procedures, and solutions for deployment and runtime problems you might encounter.
Orchestration & Deployment:
docker-compose.yml - The Docker Compose service definitions file that declares all containers (Pentaho Server, PostgreSQL, potentially Vault/other services), their configurations, networking, volumes, and dependencies.
Makefile - Contains convenience command targets for common operations like building, starting, stopping, and cleaning up the environment. Users can run make help to see available commands.
deploy.sh - Automated deployment script that likely handles the complete deployment workflow including environment validation, building images, starting services, and initial configuration.
Environment Configuration:
.env - The active environment configuration file (created from template) containing actual values for database passwords, ports, hostnames, and other environment-specific settings. This file is typically git-ignored.
.env.template - The template file with default values and placeholders that users copy to create their .env file, providing a reference for all configurable environment variables.
Docker Build Context
The docker/ directory contains all the core components needed to build and run the Pentaho Server containerized deployment:
Dockerfile - This is the main build configuration using a multi-stage build approach to create the Pentaho Server container image. Multi-stage builds help optimize the final image size by separating the build environment from the runtime environment.
entrypoint/ directory contains the docker-entrypoint.sh script, which is the initialization script that runs when the container starts. This typically handles tasks like environment setup, configuration management, health checks, and starting the Pentaho Server services.
stagedArtifacts/ directory serves as the staging area for the Pentaho Server installation package. It currently contains pentaho-server-ee-11.0.0.0-237.zip, which is the Enterprise Edition version 11.0.0.0 build 237 that gets extracted and installed during the Docker image build process.
PostgreSQL Database Initialization
The db_init_postgres/ directory contains the PostgreSQL database initialization scripts that set up all the required schemas for Pentaho Server 11. These scripts are numbered to execute in a specific sequence:
1_create_jcr_postgresql.sql - Creates the Jackrabbit Content Repository (JCR) schema, which stores the Pentaho repository content including solution files, schedules, reports, dashboards, and metadata. This is the core content management system for Pentaho.
2_create_quartz_postgresql.sql - Sets up the Quartz Scheduler schema, which manages all scheduled jobs and tasks within Pentaho Server, including report generation, ETL executions, and other automated processes.
3_create_repository_postgresql.sql - Creates the Hibernate Repository schema, which stores user authentication, authorization data, roles, permissions, and other security-related information managed by Pentaho's security subsystem.
4_pentaho_logging_postgresql.sql - Establishes the Audit and Data Integration (DI) Logging schema for capturing execution logs, transformation/job metrics, and audit trail information from PDI processes running on the server.
5_pentaho_mart_postgresql.sql - Creates the Operations Mart schema, which stores operational analytics data about Pentaho Server usage, performance metrics, and system monitoring information used by the Pentaho Operations Mart dashboard.
PostgreSQL and Vault Configuration
postgres-config/ - PostgreSQL Configuration
custom.conf - Custom PostgreSQL performance tuning parameters optimized for Pentaho Server workloads. This likely includes settings for shared buffers, work memory, connection limits, checkpoint configurations, and other performance-related parameters tailored to handle Pentaho's database requirements.
pg_hba.conf - PostgreSQL Host-Based Authentication configuration file that controls client connection authentication methods, IP address access rules, and security policies for database connections from the Pentaho Server container.
vault/ - HashiCorp Vault Integration
This directory supports your recent work incorporating Vault for secrets management:
config/vault.hcl - The HashiCorp Vault server configuration file defining storage backend, listener settings, API endpoints, seal/unseal behavior, and general Vault server operational parameters.
policies/pentaho-policy.hcl - Vault access control policy specifically for Pentaho Server, defining which secrets paths the Pentaho application can read, write, or manage. This enforces least-privilege access to sensitive credentials.
secrets/ - Docker Secrets Management
postgres_password.txt - A Docker secrets file containing the PostgreSQL password. When using Docker secrets (or Vault integration), this file provides the database credentials in a secure manner rather than passing them as plain environment variables. The file should have restricted permissions and is typically referenced by Docker Compose using the secrets: configuration.
postgres_password
PostgreSQL superuser password
pentaho_user
Pentaho database username
pentaho_password
Pentaho database password
jdbc_url
JDBC connection URL
Pentaho Configuration Overrides
The softwareOverride/ directory contains customized configuration files and components that override the default Pentaho Server installation. The numbered structure ensures a logical organization and potentially an ordered application during the Docker build process:
1_drivers/ - JDBC Database Drivers
tomcat/lib/ - Contains JDBC driver JAR files (specifically the PostgreSQL JDBC driver) that get copied into Tomcat's library directory, enabling Pentaho Server to connect to PostgreSQL databases.
2_repository/ - Database Repository Configuration
This section configures Pentaho's connection to all PostgreSQL-backed repositories:
pentaho-solutions/system/hibernate/ - Hibernate repository configuration files (repository.xml, hibernate-settings.xml) for user/role security data
pentaho-solutions/system/jackrabbit/ - Jackrabbit JCR repository configuration (repository.xml) for content storage
pentaho-solutions/system/scheduler-plugin/quartz/ - Quartz scheduler database configuration (quartz.properties) for job scheduling
tomcat/webapps/pentaho/META-INF/ - Contains context.xml with JNDI datasource definitions for all Pentaho databases (Quartz, Jackrabbit, Hibernate, Audit, Operations Mart)
3_security/ - Authentication & Security Settings
pentaho-solutions/system/ - Security configuration files including applicationContext-security.xml, security.properties, and potentially LDAP/SSO configurations for authentication and authorization.
4_others/ - Additional Tomcat & Application Settings
pentaho-solutions/system/ - Other system-level configurations like pentaho.xml, pentaho-spring-beans.xml, log4j settings, and application behavior configurations
tomcat/ - Tomcat server customizations including server.xml, web.xml, setenv.sh for JVM parameters, and other Tomcat-specific tuning
Utility Scripts
The scripts/ directory contains operational and maintenance utilities for managing the Pentaho Server deployment, organized by functional area:
Database Management:
backup-postgres.sh - Automated PostgreSQL backup utility that creates dumps of all Pentaho databases (JCR, Quartz, Hibernate, Audit, Operations Mart). Likely includes timestamping, compression, and backup retention logic.
restore-postgres.sh - Database restoration utility to recover Pentaho databases from backup files, useful for disaster recovery, environment cloning, or migrating data between instances.
Vault/Secrets Management:
backup-vault.sh - HashiCorp Vault credentials and unseal keys backup script, ensuring recovery capability for the Vault instance containing sensitive Pentaho credentials.
restore-vault.sh - Vault restoration utility to recover Vault data and re-initialize the secrets management system from backup.
rotate-secrets.sh - Automated password rotation script that updates database passwords and other sensitive credentials in Vault, then propagates changes to Pentaho Server configuration - supporting security best practices.
fetch-secrets.sh - Helper utility to retrieve secrets from Vault programmatically, useful for scripts that need to access credentials without hardcoding them.
vault-init.sh - Initial Vault setup script that handles Vault initialization, unsealing, creating the Pentaho policy, and storing initial secrets for the deployment.
Operations & Validation:
validate-deployment.sh - Deployment validation script that performs health checks on all components (PostgreSQL connectivity, Pentaho Server startup, Vault accessibility, service availability), confirming the environment is properly configured and operational.
User Configuration and Data Storage
config/ - Application Configuration
This directory stores user-level and application-level configuration files:
.kettle/ - PDI (Pentaho Data Integration) / Kettle configuration directory
kettle.properties - Contains Kettle/PDI environment variables, connection parameters, system properties, and global settings used by transformation and job executions running on Pentaho Server.
.pentaho/ - Pentaho user settings directory for storing user-specific preferences, cached metadata, and application state information.
backups/ - Database Backup Storage
*.sql.gz - Repository for compressed PostgreSQL database backup files created by the backup-postgres.sh script. The gzip compression reduces storage requirements while maintaining complete database snapshots for disaster recovery, environment cloning, or rollback scenarios. Backup files are likely timestamped for version tracking.
logs/ - Application Logging
Centralized logging directory for capturing runtime logs from all services. This likely includes:
Pentaho Server application logs (catalina.out, pentaho.log)
PostgreSQL database logs
Vault service logs
Docker container logs
ETL execution logs
This supports the log rotation configuration and monitoring capabilities you've been incorporating into your deployment, making troubleshooting and auditing easier during workshops.
Key Files
docker-compose.yml
Defines all services (pentaho-server, postgres), networks, and volumes
docker/Dockerfile
Multi-stage build using debian:trixie-slim with OpenJDK 21
docker-entrypoint.sh
Processes softwareOverride directories at container startup
.env
Environment-specific configuration (ports, passwords, memory)
deploy.sh
Automated deployment with pre-flight validation checks
db_init_postgres/*.sql
PostgreSQL database initialization scripts
vault-init.sh
Initializes Vault and stores secrets
rotate-secrets.sh
Rotates database passwords securely
Pre-flight Taks
The Pre-flight Tasks section outlines the essential preparation steps needed before deploying Pentaho Server 11 in Docker containers.
First, you need to configure the environment variables by editing the .env.template file with your deployment-specific settings. This includes defining the Pentaho version and image details, PostgreSQL credentials and port configuration (defaulting to 5432), Pentaho HTTP and HTTPS ports (8090 and 8443), JVM memory allocation (minimum 4GB, maximum 8GB), the license server URL, and Vault port settings. Once configured, this template is saved as the active .env file.
PostgreSQL performance tuning is handled through the postgres-config/custom.conf file, where you can customize connection limits (defaulting to 200 max connections), memory allocation parameters including shared buffers and cache sizes, and other performance optimizations specifically tuned for containerized environments.
Finally, the softwareOverride/ directory provides an optional mechanism for customizing Pentaho configurations without modifying core installation files. The PostgreSQL JDBC driver comes included by default, but you can optionally upgrade it by downloading from Maven Central or copying from the workshop's database drivers collection. This preparation ensures all required files, configurations, and credentials are properly staged before running the automated deployment script.
Configure .env
Edit the .env.template
Enter the following details:
PENTAHO_VERSION
11.0.0.0-237
Pentaho Server version
PENTAHO_IMAGE_NAME
pentaho/pentaho-server
Docker image name
PENTAHO_IMAGE_TAG
11.0.0.0-237
Docker image tag
POSTGRES_PASSWORD
password
PostgreSQL root password
POSTGRES_PORT
5432
PostgreSQL exposed port
PENTAHO_HTTP_PORT
8090
Pentaho HTTP port
PENTAHO_HTTPS_PORT
8443
Pentaho HTTPS port
PENTAHO_MIN_MEMORY
4096m
JVM minimum heap size
PENTAHO_MAX_MEMORY
8192m
JVM maximum heap size
LICENSE_URL
(empty)
EE license server URL
VAULT_PORT
8200
Vault API port
Save:
Create .env
Customize postgres-config/custom.conf
Edit the .env.template
Enter the following details:
Save:
softwareOverride
The softwareOverride/ directory provides a powerful mechanism to customize Pentaho Server without modifying the core installation. Files are copied into the Pentaho installation during container startup, processed in alphabetical order by directory name.
The PostgreSQL JDBC driver is included in the Pentaho distribution. If you need to upgrade:
Download from Maven Central
Place in
softwareOverride/1_drivers/tomcat/lib/
Or
Copy from Workshop--Installation/'Database Drivers'/
Deployment
This section walks through the deployment process using either the automated script or manual commands.
Select Deployment option:
Automated Deployment
The deploy.sh script automates the entire deployment process with pre-flight validation:
./deploy.sh
The script performs the following actions:
Validates Docker and Docker Compose installation
Verifies Pentaho package exists in:
docker/stagedArtifacts/Creates
.envfrom template if missingChecks disk space (10GB minimum)
Verifies required ports are available
Builds the Pentaho Server Docker image
Starts PostgreSQL and waits for health check
Starts Pentaho Server and monitors startup
Displays access URLs and credentials
1. Set execute permissions on the deployment scripts.
Deploy the containers.
Ensure you dont have a postgresql service up and running:
Pre-Flight Checks ✓
The script validates the environment before starting:
Docker is installed
Docker Compose is installed
Docker daemon is running
Pentaho package found
.envfile existsSufficient disk space (414GB available)
Port 8090 (Pentaho HTTP) is available
Port 5432 (PostgreSQL) is available
Building Phase
A custom Docker image is built with 24 build steps taking approximately 5-10 minutes:
Base image:
debian:trixie-slimInstalls system packages via
apt-get updateandapt-get upgradeInstalls OpenJDK 21 JRE headless with
curlandrmCreates a
pentahouser and group (GID 5000)(Optional) Installs Pentaho plugins (PAZ, PIR, PDD)
Copies Pentaho installation to
/opt/pentaho/Exports layers and manifests
Final image:
pentaho/pentaho-server:11.0.0.0-237

Starting PostgreSQL Database
Pulls PostgreSQL 15 image and related layers:
Creates network:
pentaho-server-postgresql_pentaho-netCreates volume:
pentaho-server-postgresql_pentaho_postgres_dataCreates container:
pentaho-postgresWaits for PostgreSQL readiness: ✓ PostgreSQL is ready
Starting Pentaho Server
This phase takes 2-3 minutes for first-time initialization:
Pulls HashiCorp Vault 1.15 image for secrets management
Creates volumes:
pentaho-server-postgresql_pentaho_solutionspentaho-server-postgresql_pentaho_datapentaho-server-postgresql_vault_data

Final Status 🎉
The deployment is successful and provides you with:
Pentaho Server Access:
URL:
http://localhost:8090/pentahoLogin:
admin/password
PostgreSQL Database:
Host:
localhost:5432Login:
postgres/password
View logs
docker compose logs -f
Stop services
docker compose stop
Start services
docker compose start
Restart services
docker compose restart
Shutdown
docker compose down
Helper scripts provided:
./scripts/backup-postgres.sh-- Backup database./scripts/restore-postgres.sh <backup-file>-- Restore database./scripts/validate-deployment.sh-- Validate deployment

Build the Pentaho Server image.
This process takes approximately 5-10 minutes as it extracts the Pentaho package and configures the image.
Start PostgreSQL database
Watch for the message indicating PostgreSQL is ready to accept connections.
Start the Pentaho Server.
The Pentaho Server typically takes 2-3 minutes for first-time initialization. Watch for the message:
Verify container status.
Run validation script.

Open a web browser and navigate to:
Login with the default credentials:
Username
Admin
Password
password
Enter the Licensing Server URL

Last updated
Was this helpful?

