PDC Configuration Files

PDC Configuration reference ..

x

.env.default

This .env.default file is a comprehensive configuration for Pentaho Data Catalog version 10.2.8, containing over 200 environment variables that control every aspect of the deployment.

This configuration supports a full PDC deployment with all optional services enabled through the COMPOSE_PROFILES setting. The modular design allows selective deployment of only required services by modifying the profiles list.

The configuration is organized into logical sections covering Docker images, database connections, service configurations, and security settings.

Global Configuration

This section defines core system-wide settings that affect the entire PDC deployment.

Variable
Value
Description

GLOBAL_IMAGE_PREFIX

hitachi.jfrog.io/docker

Container registry prefix for all Docker images

GLOBAL_SERVER_HOST_NAME

${GLOBAL_SERVER_HOST_NAME:?please set it in conf/.env}

Required hostname/FQDN for the PDC server - must be set in conf/.env

GLOBAL_PG_BACKUP_DAYS

90

Number of days to retain PostgreSQL backups

PDC Application Images

These variables define the specific Docker images and versions for all Pentaho Data Catalog core services.

Variable
Value
Description

PDC_FE_IMAGE

pentaho/pdc-app-server:10.2.8

Frontend application server

PDC_PUBLIC_API_IMAGE

pentaho/pdc-public-api:10.2.8

Public API service

PDC_BI_METADATA_API_IMAGE

pentaho/pdc-bi-metadata-api:10.2.8

Business Intelligence metadata API

PDC_MONGODB_MIGRATIONS_IMAGE

pentaho/pdc-mongodb-migrations:10.2.8

MongoDB database migrations

PDC_OPS_IMAGE

pentaho/pdc-ops-service:10.2.8

Operations service for job management

PDC_TASK_RUNNER_IMAGE

pentaho/pdc-task-runner:10.2.8

Background task execution service

PDC_WS_DEFAULT_IMAGE

pentaho/pdc-worker-service-bundle:10.2.8

Default worker service bundle

Specialized Service Images

These images provide specific functionality for different aspects of data management and governance.

Variable
Value
Description

PDC_MDS_IMAGE

pentaho/pdc-metadata-model:10.2.8

Metadata model service

PDC_LINEAGE_IMAGE

pentaho/psc-lineage:10.2.8

Data lineage tracking service

PDC_GLOSSARY_IMAGE

pentaho/psc-glossary:10.2.8

Business glossary management

PDC_PHYSICAL_ASSETS_IMAGE

pentaho/psc-physical-assets:10.2.8

Physical data asset management

PDC_MLMODELS_IMAGE

pentaho/psc-ml-models:10.2.8

Machine learning model management

PDC_COLLAB_APP_IMAGE

pentaho/psc-collaboration-service:10.2.8

Collaboration features

PDC_REFDATA_APP_IMAGE

pentaho/psc-reference-data:10.2.8

Reference data management

PDSO (Pentaho Data Services Optimizer) Images

PDSO provides data optimization and management capabilities with separate API and UI components.

Variable
Value
Description

PDSO_API_IMAGE

pentaho/pdso-manager:10.2.8

PDSO management API

PDSO_APP_IMAGE

pentaho/pdso-manager-ui:10.2.8

PDSO management UI

PDSO_DATA_SERVICE_IMAGE

pentaho/pdso-data-service:10.2.8

PDSO data processing service

CRYPTO_SERVICE_IMAGE

pentaho/pdso-crypto-service:10.2.8

Cryptographic services

User Management Images

These images handle authentication, authorization, and user administration using Keycloak as the identity provider.

Variable
Value
Description

USER_MGMT_IMAGE

pentaho/user-management:10.2.8

Core user management service

USER_MGMT_UI_IMAGE

pentaho/user-management-ui:10.2.8

User management interface

USER_MGMT_ADMIN_API_IMAGE

pentaho/um-admin-api:10.2.8

Administrative API for user management

USER_MGMT_AUTH_PROXY_IMAGE

pentaho/um-auth-proxy:10.2.8

Authentication proxy service

USER_MGMT_KEYCLOAK_IMAGE

pentaho/pdc-keycloak-themes:10.2.8

Keycloak with PDC themes

Master Data Management (MDM) Images

PDM (Pentaho Data Management) provides master data management capabilities with backend, frontend, and database initialization components.

Variable
Value
Description

MDM_BE_IMAGE

pentaho/pdm-be:10.2.8

MDM backend service

MDM_FE_IMAGE

pentaho/pdm-fe:10.2.8

MDM frontend interface

MDM_DB_INIT_IMAGE

pentaho/pdm-db:10.2.8

MDM database initialization

Supporting Service Images

These images provide essential infrastructure services like search, messaging, and data integration.

Variable
Value
Description

PDC_GLOBAL_SEARCH_IMAGE

pentaho/pdc-global-search:10.2.8

Global search functionality

ACCESS_REQUEST_SERVICE_IMAGE

pentaho/access-request-service:10.2.8

Data access request management

ML_GATEWAY_SERVICE_IMAGE

pentaho/ml-gateway-service:10.2.8

Machine learning gateway

PDI_DATAPIPES_IMAGE

pentaho/datapipes:10.2.8

Data integration pipelines

PDC_CHATBOT_BACKEND_IMAGE

pentaho/pdc-chatbot-backend:250820-bb72b27-release-v10.2.8

AI chatbot backend

Infrastructure Images

These are third-party infrastructure components required for PDC operations.

Variable
Value
Description

POSTGRESQL16_IMAGE

postgres:16.8-bookworm

PostgreSQL 16 database

REDIS_IMAGE

bitnami/redis:7.4.3-debian-12-r0

Redis cache and session store

OPENSEARCH_IMAGE

bitnami/opensearch:2.19.2-debian-12-r0

OpenSearch for full-text search

KAFKA_IMAGE

bitnami/kafka:3.9.0-debian-12-r13

Apache Kafka message broker

MONGODB_IMAGE

mongodb/mongodb-enterprise-server:6.0.23-ubuntu2204

MongoDB database

TRAEFIK_IMAGE

traefik:v2.11.24

Reverse proxy and load balancer

PDI (Pentaho Data Integration) Configuration

These settings configure the PDI service for data integration workflows.

Variable
Value
Description

PDI_DATAPIPES_USERNAME

Default username for PDI operations

PDI_DATAPIPES_PASSWORD

Welcome123!

Default password for PDI (should be changed)

PDI_DATABASE_PASSWORD

password

Database password for PDI

PDC_USER_GUIDE_LINK

https://docs.hitachivantara.com/...

Link to user documentation

Docker Compose Configuration

These settings control how Docker Compose deploys and manages the PDC stack.

Variable
Value
Description

COMPOSE_PROJECT_NAME

pdc

Docker Compose project name

COMPOSE_PROFILES

core,mongodb,collab,mdm,refdata...

Active service profiles to deploy

PDC_VENDOR_PATH

./vendor

Path to vendor-specific configurations

PDC_CLIENT_PATH

./conf

Path to client configuration files

Worker Service Configuration

These settings configure the default worker service behavior for data processing tasks.

Variable
Value
Description

PDC_WS_DEFAULT_OPS_JOBPOOLMINSIZE

5

Minimum job pool size

PDC_WS_DEFAULT_OPS_JOBPOOLMAXSIZE

5

Maximum job pool size

PDC_WS_DEFAULT_JDBC_DEFAULT_POOL_SIZE

10

Default JDBC connection pool size

PDC_WS_DEFAULT_JDBC_DEFAULT_CONNECTION_TIMEOUT

60

JDBC connection timeout in seconds

Database Sampling Algorithms

These configure how PDC samples data from various database types for profiling and analysis.

Variable
Value
Description

PDC_WS_DEFAULT_JDBC_ORACLE_PERCENTAGE_SAMPLING_ALGORITHM

DEFAULT

Oracle database sampling method

PDC_WS_DEFAULT_JDBC_POSTGRES_PERCENTAGE_SAMPLING_ALGORITHM

DEFAULT

PostgreSQL sampling method

PDC_WS_DEFAULT_JDBC_SNOWFLAKE_PERCENTAGE_SAMPLING_ALGORITHM

DEFAULT

Snowflake sampling method

PDC_WS_DEFAULT_JDBC_BIGQUERY_PERCENTAGE_SAMPLING_ALGORITHM

DEFAULT

BigQuery sampling method

MongoDB Configuration

MongoDB serves as the primary metadata repository for PDC with separate databases for different services.

Variable
Value
Description

PDC_MONGO_MONGO_INITDB_ROOT_PASSWORD

broot

MongoDB root password

PDC_MONGODB_BACKUP_DAYS

90

MongoDB backup retention period

PDC_FE_MONGODB_SOCKET_TIMEOUT

300000

MongoDB socket timeout (5 minutes)

PDC_CRON_MONGODB_SCHEDULE

@daily

MongoDB backup schedule

Frontend Configuration

These settings control the PDC web frontend behavior and integration points.

Variable
Value
Description

PDC_FE_VERSION_MAIN

10.2.8

Frontend version identifier

PDC_FE_LOG_LEVEL

info

Frontend logging level

PDC_FE_DATAMASK_ENABLE

(empty)

Enable/disable data masking

PDC_FE_DATAMASK_CHAR

*

Character used for data masking

Operations Service Configuration

These settings control job prioritization and operational behavior.

Variable
Value
Description

PDC_OPS_ADMIN_JOB_PRIORITY

1

Highest priority for admin jobs

PDC_OPS_DATA_STEWARD_JOB_PRIORITY

3

Priority for data steward jobs

PDC_OPS_BUSINESS_USER_JOB_PRIORITY

5

Priority for business user jobs

PDC_OPS_DEFAULT_JOB_PRIORITY

3

Default job priority

Kafka Configuration

Kafka handles asynchronous messaging between PDC services.

Variable
Value
Description

KAFKA_INIT_SEED_TOPIC_LIST

datasource-operation,entity-operation...

Pre-created Kafka topics

Reference Data Service Configuration

These settings configure the reference data management service with its dedicated PostgreSQL database.

Variable
Value
Description

REFDATA_API_IS_PRODUCTION

true

Production mode flag

REFDATA_API_DB_HOST

refdata-postgres

Database host

REFDATA_API_DB_USERNAME

postgres

Database username

REFDATA_API_DB_PASSWORD

secret

Database password

Rules Engine Configuration

The rules engine processes data governance and quality rules.

Variable
Value
Description

RULES_ENGINE_LOG_LEVEL

INFO

Rules engine logging level

RULES_ENABLE_INSTRUMENTATION

true

Enable performance monitoring

RULES_MDS_ENTITY_UPDATE_BATCH_SIZE

20000

Batch size for entity updates

PDSO Configuration

PDSO (Pentaho Data Services Optimizer) provides data optimization capabilities.

Variable
Value
Description

PDSO_ENABLE_HASH_CALCULATION

true

Enable hash-based data comparison

PDSO_ENABLE_STUB_CREATION

false

Disable stub creation

PDSO_DEFAULT_DESTINATION_PATH

"pentaho_migration"

Default migration path

Redis Configuration

Redis provides caching and session management.

Variable
Value
Description

REDIS_MASTER_PASSWORD

redis_master_broot

Master node password

REDIS_REPLICA_PASSWORD

redis_replica_broot

Replica node password

Keycloak (Identity Management) Configuration

Keycloak provides authentication and authorization services.

Variable
Value
Description

KEYCLOAK_URL

https://${GLOBAL_SERVER_HOST_NAME}/keycloak

Keycloak access URL

KEYCLOAK_USER

admin

Keycloak admin username

KEYCLOAK_PASSWORD

admin

Keycloak admin password (should be changed)

EMAIL_DOMAINS

["hv.com", "hitachivantara.com"]

Allowed email domains

TENANT_NAME

pdc

Default tenant name

Master Data Management Database Configuration

MDM uses its own PostgreSQL database for master data storage.

Variable
Value
Description

MDM_API_DB_USERNAME

postgres

MDM database username

MDM_API_DB_PASSWORD

secret

MDM database password

MDM_API_DB_HOST

mdm-postgres

MDM database host

MDM_API_DB_PORT

5432

MDM database port

Licensing Configuration

These settings configure license management and feature validation.

Variable
Value
Description

LICENSE_DATASOURCE_FEATURE_VERSION

10.2

Licensed version for data sources

LICENSE_USERS_COUNT_FEATURE_VERSION

10.2

Licensed version for user count

LICENSE_MDM_FEATURE_VERSION

10.2

Licensed version for MDM features

Access Request Service Configuration

This service manages data access requests and integrates with external ticketing systems.

Variable
Value
Description

ACCESS_REQUEST_SERVICE_PROVIDER_TOOL

(empty)

External ticketing system (JIRA/ServiceNow)

ACCESS_REQUEST_SERVICE_JIRA_URL

(empty)

JIRA instance URL

ACCESS_REQUEST_SERVICE_SERVICENOW_URL

(empty)

ServiceNow instance URL

Machine Learning Configuration

These settings configure AI/ML features including custom models and token limits.

Variable
Value
Description

ML_CUSTOM_TOKENS

6000

Token limit for ML operations

ML_TOKEN_WINDOW_SIZE

8000

Context window size for ML models

PDC_WS_DEFAULT_MAX_FILE_SIZE_FOR_ML

10485760

Max file size for ML processing (10MB)

ML_LLM_MODEL

(empty)

Large Language Model identifier

OpenSearch Configuration

OpenSearch provides full-text search and analytics capabilities.

Variable
Value
Description

OPENSEARCH_HEAP_SIZE

2048m

JVM heap size for OpenSearch

OPENSEARCH_USERNAME

admin

OpenSearch admin username

OPENSEARCH_PASSWORD

Es3vweMuABJr

OpenSearch admin password

PDC_OPENSEARCH_BACKUP_DAYS

90

OpenSearch backup retention

PostgreSQL Configuration

PostgreSQL serves as the primary relational database for various PDC services.

Variable
Value
Description

POSTGRES_USERNAME

postgres

PostgreSQL admin username

POSTGRES_PASSWORD

admin123#

PostgreSQL admin password

POSTGRES_BIDB_USER_NAME

bidb_ro

Read-only user for BI database

Security Considerations

Several passwords and secrets are visible in this default configuration:

  • Change all default passwords before production deployment

  • Use environment-specific .env files to override sensitive values

  • Consider using Docker secrets or external secret management

  • The PDC_PDI_PRIVACY_ENCRYPTION_KEY should be regenerated for each environment

Last updated

Was this helpful?