Adventure Works
Data Catlog deployment configuration settings ..
x
x
x
x
Disk Configuration
Here are the best practices for disk partitioning when installing Ubuntu 24.04.4LTS to ensure adequate space for PDC 10.2.8 Docker and other services:
Choose "Custom" partitioning instead of automatic
Use LVM (Logical Volume Manager) - this allows easy resizing later
Leave 20-30% of disk space unallocated in the volume group for future expansion
/boot
1GB
/boot/efi
512MB
swap
16GB
/root
100GB
/var
150GB (Docker data)
/
100GB
Unallocated
~130GB
x
x
x
x
x
x
Critical Settings
When deploying Pentaho Data Catalog, several configuration categories require careful consideration. This guide identifies the most critical settings that must be configured before production deployment, organized by priority and impact.
1. Mandatory
These settings must be configured before PDC can function properly. Failure to set these will prevent the system from starting or operating correctly.
GLOBAL_SERVER_HOST_NAME
${GLOBAL_SERVER_HOST_NAME:?please set it in conf/.env}
REQUIRED
The fully qualified domain name (FQDN) or IP address where PDC will be accessible. This affects URLs, SSL certificates, and service discovery.
2. Security
These default passwords and security settings pose significant security risks and must be changed before production deployment.
PDC_MONGO_MONGO_INITDB_ROOT_PASSWORD
broot
HIGH
Change to complex password (min 12 chars, mixed case, numbers, symbols)
KEYCLOAK_PASSWORD
admin
CRITICAL
Change immediately - controls access to identity management
KEYCLOAK_USER
admin
HIGH
Consider changing username from default
POSTGRES_PASSWORD
admin123#
HIGH
Change to complex password for main database
POSTGRES_BIDB_USER_PASSWORD
4aU94t8v+4d+W
MEDIUM
Change for BI database read-only user
REDIS_MASTER_PASSWORD
redis_master_broot
HIGH
Change for cache/session security
REDIS_REPLICA_PASSWORD
redis_replica_broot
HIGH
Change for cache replication security
OPENSEARCH_PASSWORD
Es3vweMuABJr
HIGH
Change for search engine security
PDI_DATAPIPES_PASSWORD
Welcome123!
HIGH
Change for data integration security
REFDATA_API_DB_PASSWORD
secret
HIGH
Change for reference data database
MDM_API_DB_PASSWORD
secret
HIGH
Change for master data management database
MONGODB_BKP_USER_PASSWORD
qdA9WMcw35WjzQl7ELhlHEF
MEDIUM
Change for backup user security
PDC_PDI_PRIVACY_ENCRYPTION_KEY
NUhHRk5VWkJBTkZTbWxDanR0enJLVU05UlF3M3RwWks=
CRITICAL
Generate new encryption key for data privacy
3. Environment
These settings should be customized based on your specific deployment environment and requirements.
Infrastructure Sizing
OPENSEARCH_HEAP_SIZE
2048m
Memory allocation for search
Adjust based on data volume and available RAM
PDC_WS_DEFAULT_OPS_JOBPOOLMINSIZE
5
Worker pool sizing
Adjust based on expected workload
PDC_WS_DEFAULT_OPS_JOBPOOLMAXSIZE
5
Worker pool sizing
Increase for high-volume environments
PDC_WS_DEFAULT_JDBC_DEFAULT_POOL_SIZE
10
Database connections
Adjust based on concurrent users
PDC_WS_DEFAULT_JDBC_DEFAULT_CONNECTION_TIMEOUT
60
Connection timeout
Adjust for network latency
Backup and Retention
GLOBAL_PG_BACKUP_DAYS
90
PostgreSQL backup retention
Align with data retention policies
PDC_MONGODB_BACKUP_DAYS
90
MongoDB backup retention
Align with data retention policies
PDC_OPENSEARCH_BACKUP_DAYS
90
OpenSearch backup retention
Align with data retention policies
4. Integration
These settings enable PDC to integrate with external systems and should be configured based on your infrastructure.
Email and Notifications
KEYCLOAK_SMTP
Pre-configured Office365
Email server settings
Update with your SMTP server details
EMAIL_DOMAINS
["hv.com", "hitachivantara.com"]
Allowed email domains
Update with your organization's domains
External System Integration
ACCESS_REQUEST_SERVICE_PROVIDER_TOOL
(empty)
Ticketing system
Set to "JIRA" or "SERVICENOW" if using
ACCESS_REQUEST_SERVICE_JIRA_URL
(empty)
JIRA integration
Configure if using JIRA for access requests
ACCESS_REQUEST_SERVICE_SERVICENOW_URL
(empty)
ServiceNow integration
Configure if using ServiceNow
LICENSING_SERVER_URL
(empty)
License server
Configure for license validation
PDC_FE_DASHBOARD_URL
(empty)
External dashboard
Set if integrating with external BI tools
PDC_FE_JUPYTER_URL
(empty)
Jupyter integration
Set if providing Jupyter notebook access
5. Feature Enablement
These settings control which PDC features are enabled and how they behave.
Service Profiles
COMPOSE_PROFILES
core,mongodb,collab,mdm,refdata,refdata-postgres,mdm-postgres,physical-assets,ml-models,bi-metadata,ai-ml,access-request,policy-hierarchy,metadata-rules,pdso,datapipes
Determines which services are deployed
Remove unused services to reduce resource consumption
Machine Learning Features
ML_LLM_MODEL
(empty)
AI/ML capabilities
Set if using external LLM services
ML_LLM_API_KEY
(empty)
AI/ML access
Configure API key for external AI services
ML_LLM_INFERENCE_BASE_URL
(empty)
AI service endpoint
Set inference service URL
PDC_WS_DEFAULT_MAX_FILE_SIZE_FOR_ML
10485760
(10MB)
File processing limits
Adjust based on typical file sizes
Data Masking
PDC_FE_DATAMASK_ENABLE
(empty)
Data privacy
Set to enable data masking in UI
PDC_FE_DATAMASK_CHAR
*
Masking character
Customize masking character
PDC_FE_DATAMASK_TAG_NAMES
sensitive
Tag-based masking
Define which tags trigger masking
6. Performance Tuning
These settings should be optimized based on your performance requirements and system resources.
Job Processing
RULES_MDS_ENTITY_UPDATE_BATCH_SIZE
20000
Rule processing speed
Increase for better throughput, decrease for lower memory
RULES_MDS_ENTITY_SCROLL_BATCH_SIZE
1000
Memory vs speed tradeoff
Adjust based on available memory
PDC_WS_DEFAULT_SE_BATCH_SIZE
50
Search indexing
Increase for faster indexing
PDC_WS_DEFAULT_DS_PERM_CREATION_BATCH_SIZE
4
Permission processing
Increase cautiously to avoid overwhelming auth system
Timeout Settings
PDC_FE_MONGODB_SOCKET_TIMEOUT
300000
(5 min)
Frontend responsiveness
Increase for slow networks
ML_BENTO_TIME_OUT
6000
ML processing timeout
Adjust based on ML workload complexity
7. Logging & Monitoring
These settings control system observability and should be configured for production monitoring.
PDC_FE_LOG_LEVEL
info
Frontend logging
Keep as info
for production, use debug
for troubleshooting
RULES_ENGINE_LOG_LEVEL
INFO
Rules engine logging
Keep as INFO
for production
RULES_ENABLE_INSTRUMENTATION
true
Performance monitoring
Keep enabled for production monitoring
8. Network & SSL
These settings control how PDC handles network traffic and security.
PDC_IN_TRAEFIK_ENTRYPOINTS_WEB_HTTP_REDIRECTIONS_ENTRYPOINT_TO
web-secure
HTTPS enforcement
Keep for HTTPS redirect, clear to disable
KEYCLOAK_PROTOCOL
http
Authentication security
Change to https
for production
RULES_IAM_ENABLE_SSL
true
Rules engine security
Keep enabled for production
Configuration Priority Checklist
Before First Deployment:
✅ Set
GLOBAL_SERVER_HOST_NAME
✅ Change all default passwords
✅ Generate new
PDC_PDI_PRIVACY_ENCRYPTION_KEY
✅ Configure email settings
✅ Review and customize
COMPOSE_PROFILES
Before Production:
✅ Size infrastructure components appropriately
✅ Configure backup retention policies
✅ Set up external system integrations
✅ Configure SSL/HTTPS settings
✅ Enable monitoring and logging
Performance Optimization:
✅ Tune worker pool sizes
✅ Adjust batch processing sizes
✅ Configure timeout values
✅ Optimize database connection pools
Security Best Practices
Use external secret management: Consider using Docker secrets, Kubernetes secrets, or external secret management systems instead of plain text passwords
Implement least privilege: Create dedicated database users with minimal required permissions
Regular rotation: Establish a schedule for rotating passwords and encryption keys
Network security: Use firewalls and network segmentation to protect database and internal service communications
SSL/TLS everywhere: Enable SSL for all service-to-service communications
Audit logging: Ensure comprehensive audit logging is enabled for compliance requirementsx
x
x
x
x
x
Mailhog
Useful to have a mail server to test PDC notifications.
MailHog is an email testing tool for developers:
Configure your application to use MailHog for SMTP delivery
View messages in the web UI, or retrieve them with the JSON API
Optionally release messages to real SMTP servers for delivery
Configure the Keycloak
Create a folder for Mailhog and copy docker-compose.yml
Run the docker-compose.yml
x
x
x
Last updated
Was this helpful?