PDC Configuration Files
PDC Configuration reference ..
x
.env.default
This .env.default
file is a comprehensive configuration for Pentaho Data Catalog version 10.2.8, containing over 200 environment variables that control every aspect of the deployment.
This configuration supports a full PDC deployment with all optional services enabled through the COMPOSE_PROFILES
setting. The modular design allows selective deployment of only required services by modifying the profiles list.
The configuration is organized into logical sections covering Docker images, database connections, service configurations, and security settings.
Global Configuration
This section defines core system-wide settings that affect the entire PDC deployment.
GLOBAL_IMAGE_PREFIX
hitachi.jfrog.io/docker
Container registry prefix for all Docker images
GLOBAL_SERVER_HOST_NAME
${GLOBAL_SERVER_HOST_NAME:?please set it in conf/.env}
Required hostname/FQDN for the PDC server - must be set in conf/.env
GLOBAL_PG_BACKUP_DAYS
90
Number of days to retain PostgreSQL backups
PDC Application Images
These variables define the specific Docker images and versions for all Pentaho Data Catalog core services.
PDC_FE_IMAGE
pentaho/pdc-app-server:10.2.8
Frontend application server
PDC_PUBLIC_API_IMAGE
pentaho/pdc-public-api:10.2.8
Public API service
PDC_BI_METADATA_API_IMAGE
pentaho/pdc-bi-metadata-api:10.2.8
Business Intelligence metadata API
PDC_MONGODB_MIGRATIONS_IMAGE
pentaho/pdc-mongodb-migrations:10.2.8
MongoDB database migrations
PDC_OPS_IMAGE
pentaho/pdc-ops-service:10.2.8
Operations service for job management
PDC_TASK_RUNNER_IMAGE
pentaho/pdc-task-runner:10.2.8
Background task execution service
PDC_WS_DEFAULT_IMAGE
pentaho/pdc-worker-service-bundle:10.2.8
Default worker service bundle
Specialized Service Images
These images provide specific functionality for different aspects of data management and governance.
PDC_MDS_IMAGE
pentaho/pdc-metadata-model:10.2.8
Metadata model service
PDC_LINEAGE_IMAGE
pentaho/psc-lineage:10.2.8
Data lineage tracking service
PDC_GLOSSARY_IMAGE
pentaho/psc-glossary:10.2.8
Business glossary management
PDC_PHYSICAL_ASSETS_IMAGE
pentaho/psc-physical-assets:10.2.8
Physical data asset management
PDC_MLMODELS_IMAGE
pentaho/psc-ml-models:10.2.8
Machine learning model management
PDC_COLLAB_APP_IMAGE
pentaho/psc-collaboration-service:10.2.8
Collaboration features
PDC_REFDATA_APP_IMAGE
pentaho/psc-reference-data:10.2.8
Reference data management
PDSO (Pentaho Data Services Optimizer) Images
PDSO provides data optimization and management capabilities with separate API and UI components.
PDSO_API_IMAGE
pentaho/pdso-manager:10.2.8
PDSO management API
PDSO_APP_IMAGE
pentaho/pdso-manager-ui:10.2.8
PDSO management UI
PDSO_DATA_SERVICE_IMAGE
pentaho/pdso-data-service:10.2.8
PDSO data processing service
CRYPTO_SERVICE_IMAGE
pentaho/pdso-crypto-service:10.2.8
Cryptographic services
User Management Images
These images handle authentication, authorization, and user administration using Keycloak as the identity provider.
USER_MGMT_IMAGE
pentaho/user-management:10.2.8
Core user management service
USER_MGMT_UI_IMAGE
pentaho/user-management-ui:10.2.8
User management interface
USER_MGMT_ADMIN_API_IMAGE
pentaho/um-admin-api:10.2.8
Administrative API for user management
USER_MGMT_AUTH_PROXY_IMAGE
pentaho/um-auth-proxy:10.2.8
Authentication proxy service
USER_MGMT_KEYCLOAK_IMAGE
pentaho/pdc-keycloak-themes:10.2.8
Keycloak with PDC themes
Master Data Management (MDM) Images
PDM (Pentaho Data Management) provides master data management capabilities with backend, frontend, and database initialization components.
MDM_BE_IMAGE
pentaho/pdm-be:10.2.8
MDM backend service
MDM_FE_IMAGE
pentaho/pdm-fe:10.2.8
MDM frontend interface
MDM_DB_INIT_IMAGE
pentaho/pdm-db:10.2.8
MDM database initialization
Supporting Service Images
These images provide essential infrastructure services like search, messaging, and data integration.
PDC_GLOBAL_SEARCH_IMAGE
pentaho/pdc-global-search:10.2.8
Global search functionality
ACCESS_REQUEST_SERVICE_IMAGE
pentaho/access-request-service:10.2.8
Data access request management
ML_GATEWAY_SERVICE_IMAGE
pentaho/ml-gateway-service:10.2.8
Machine learning gateway
PDI_DATAPIPES_IMAGE
pentaho/datapipes:10.2.8
Data integration pipelines
PDC_CHATBOT_BACKEND_IMAGE
pentaho/pdc-chatbot-backend:250820-bb72b27-release-v10.2.8
AI chatbot backend
POSTGRESQL16_IMAGE
postgres:16.8-bookworm
PostgreSQL 16 database
REDIS_IMAGE
bitnami/redis:7.4.3-debian-12-r0
Redis cache and session store
OPENSEARCH_IMAGE
bitnami/opensearch:2.19.2-debian-12-r0
OpenSearch for full-text search
KAFKA_IMAGE
bitnami/kafka:3.9.0-debian-12-r13
Apache Kafka message broker
MONGODB_IMAGE
mongodb/mongodb-enterprise-server:6.0.23-ubuntu2204
MongoDB database
TRAEFIK_IMAGE
traefik:v2.11.24
Reverse proxy and load balancer
PDI (Pentaho Data Integration) Configuration
These settings configure the PDI service for data integration workflows.
PDI_DATAPIPES_PASSWORD
Welcome123!
Default password for PDI (should be changed)
PDI_DATABASE_PASSWORD
password
Database password for PDI
PDC_USER_GUIDE_LINK
https://docs.hitachivantara.com/...
Link to user documentation
Docker Compose Configuration
These settings control how Docker Compose deploys and manages the PDC stack.
COMPOSE_PROJECT_NAME
pdc
Docker Compose project name
COMPOSE_PROFILES
core,mongodb,collab,mdm,refdata...
Active service profiles to deploy
PDC_VENDOR_PATH
./vendor
Path to vendor-specific configurations
PDC_CLIENT_PATH
./conf
Path to client configuration files
Worker Service Configuration
These settings configure the default worker service behavior for data processing tasks.
PDC_WS_DEFAULT_OPS_JOBPOOLMINSIZE
5
Minimum job pool size
PDC_WS_DEFAULT_OPS_JOBPOOLMAXSIZE
5
Maximum job pool size
PDC_WS_DEFAULT_JDBC_DEFAULT_POOL_SIZE
10
Default JDBC connection pool size
PDC_WS_DEFAULT_JDBC_DEFAULT_CONNECTION_TIMEOUT
60
JDBC connection timeout in seconds
Database Sampling Algorithms
These configure how PDC samples data from various database types for profiling and analysis.
PDC_WS_DEFAULT_JDBC_ORACLE_PERCENTAGE_SAMPLING_ALGORITHM
DEFAULT
Oracle database sampling method
PDC_WS_DEFAULT_JDBC_POSTGRES_PERCENTAGE_SAMPLING_ALGORITHM
DEFAULT
PostgreSQL sampling method
PDC_WS_DEFAULT_JDBC_SNOWFLAKE_PERCENTAGE_SAMPLING_ALGORITHM
DEFAULT
Snowflake sampling method
PDC_WS_DEFAULT_JDBC_BIGQUERY_PERCENTAGE_SAMPLING_ALGORITHM
DEFAULT
BigQuery sampling method
MongoDB Configuration
MongoDB serves as the primary metadata repository for PDC with separate databases for different services.
PDC_MONGO_MONGO_INITDB_ROOT_PASSWORD
broot
MongoDB root password
PDC_MONGODB_BACKUP_DAYS
90
MongoDB backup retention period
PDC_FE_MONGODB_SOCKET_TIMEOUT
300000
MongoDB socket timeout (5 minutes)
PDC_CRON_MONGODB_SCHEDULE
@daily
MongoDB backup schedule
PDC_FE_VERSION_MAIN
10.2.8
Frontend version identifier
PDC_FE_LOG_LEVEL
info
Frontend logging level
PDC_FE_DATAMASK_ENABLE
(empty)
Enable/disable data masking
PDC_FE_DATAMASK_CHAR
*
Character used for data masking
PDC_OPS_ADMIN_JOB_PRIORITY
1
Highest priority for admin jobs
PDC_OPS_DATA_STEWARD_JOB_PRIORITY
3
Priority for data steward jobs
PDC_OPS_BUSINESS_USER_JOB_PRIORITY
5
Priority for business user jobs
PDC_OPS_DEFAULT_JOB_PRIORITY
3
Default job priority
KAFKA_INIT_SEED_TOPIC_LIST
datasource-operation,entity-operation...
Pre-created Kafka topics
Reference Data Service Configuration
These settings configure the reference data management service with its dedicated PostgreSQL database.
REFDATA_API_IS_PRODUCTION
true
Production mode flag
REFDATA_API_DB_HOST
refdata-postgres
Database host
REFDATA_API_DB_USERNAME
postgres
Database username
REFDATA_API_DB_PASSWORD
secret
Database password
RULES_ENGINE_LOG_LEVEL
INFO
Rules engine logging level
RULES_ENABLE_INSTRUMENTATION
true
Enable performance monitoring
RULES_MDS_ENTITY_UPDATE_BATCH_SIZE
20000
Batch size for entity updates
PDSO_ENABLE_HASH_CALCULATION
true
Enable hash-based data comparison
PDSO_ENABLE_STUB_CREATION
false
Disable stub creation
PDSO_DEFAULT_DESTINATION_PATH
"pentaho_migration"
Default migration path
REDIS_MASTER_PASSWORD
redis_master_broot
Master node password
REDIS_REPLICA_PASSWORD
redis_replica_broot
Replica node password
Keycloak (Identity Management) Configuration
Keycloak provides authentication and authorization services.
KEYCLOAK_URL
https://${GLOBAL_SERVER_HOST_NAME}/keycloak
Keycloak access URL
KEYCLOAK_USER
admin
Keycloak admin username
KEYCLOAK_PASSWORD
admin
Keycloak admin password (should be changed)
EMAIL_DOMAINS
["hv.com", "hitachivantara.com"]
Allowed email domains
TENANT_NAME
pdc
Default tenant name
Master Data Management Database Configuration
MDM uses its own PostgreSQL database for master data storage.
MDM_API_DB_USERNAME
postgres
MDM database username
MDM_API_DB_PASSWORD
secret
MDM database password
MDM_API_DB_HOST
mdm-postgres
MDM database host
MDM_API_DB_PORT
5432
MDM database port
LICENSE_DATASOURCE_FEATURE_VERSION
10.2
Licensed version for data sources
LICENSE_USERS_COUNT_FEATURE_VERSION
10.2
Licensed version for user count
LICENSE_MDM_FEATURE_VERSION
10.2
Licensed version for MDM features
Access Request Service Configuration
This service manages data access requests and integrates with external ticketing systems.
ACCESS_REQUEST_SERVICE_PROVIDER_TOOL
(empty)
External ticketing system (JIRA/ServiceNow)
ACCESS_REQUEST_SERVICE_JIRA_URL
(empty)
JIRA instance URL
ACCESS_REQUEST_SERVICE_SERVICENOW_URL
(empty)
ServiceNow instance URL
Machine Learning Configuration
These settings configure AI/ML features including custom models and token limits.
ML_CUSTOM_TOKENS
6000
Token limit for ML operations
ML_TOKEN_WINDOW_SIZE
8000
Context window size for ML models
PDC_WS_DEFAULT_MAX_FILE_SIZE_FOR_ML
10485760
Max file size for ML processing (10MB)
ML_LLM_MODEL
(empty)
Large Language Model identifier
OpenSearch Configuration
OpenSearch provides full-text search and analytics capabilities.
OPENSEARCH_HEAP_SIZE
2048m
JVM heap size for OpenSearch
OPENSEARCH_USERNAME
admin
OpenSearch admin username
OPENSEARCH_PASSWORD
Es3vweMuABJr
OpenSearch admin password
PDC_OPENSEARCH_BACKUP_DAYS
90
OpenSearch backup retention
PostgreSQL Configuration
PostgreSQL serves as the primary relational database for various PDC services.
POSTGRES_USERNAME
postgres
PostgreSQL admin username
POSTGRES_PASSWORD
admin123#
PostgreSQL admin password
POSTGRES_BIDB_USER_NAME
bidb_ro
Read-only user for BI database
Security Considerations
Several passwords and secrets are visible in this default configuration:
Change all default passwords before production deployment
Use environment-specific
.env
files to override sensitive valuesConsider using Docker secrets or external secret management
The
PDC_PDI_PRIVACY_ENCRYPTION_KEY
should be regenerated for each environment
Last updated
Was this helpful?