# PDC Configuration Files

{% hint style="info" %}

{% endhint %}

x

{% tabs %}
{% tab title=".env.default" %}
{% hint style="info" %}

#### .env.default

This `.env.default` file is a comprehensive configuration for Pentaho Data Catalog version 10.2.8, containing over 200 environment variables that control every aspect of the deployment.&#x20;

This configuration supports a full PDC deployment with all optional services enabled through the `COMPOSE_PROFILES` setting. The modular design allows selective deployment of only required services by modifying the profiles list.

The configuration is organized into logical sections covering Docker images, database connections, service configurations, and security settings.
{% endhint %}

{% hint style="info" %}

#### Global Configuration

This section defines core system-wide settings that affect the entire PDC deployment.
{% endhint %}

| Variable                  | Value                                                    | Description                                                          |
| ------------------------- | -------------------------------------------------------- | -------------------------------------------------------------------- |
| `GLOBAL_IMAGE_PREFIX`     | `hitachi.jfrog.io/docker`                                | Container registry prefix for all Docker images                      |
| `GLOBAL_SERVER_HOST_NAME` | `${GLOBAL_SERVER_HOST_NAME:?please set it in conf/.env}` | Required hostname/FQDN for the PDC server - must be set in conf/.env |
| `GLOBAL_PG_BACKUP_DAYS`   | `90`                                                     | Number of days to retain PostgreSQL backups                          |

{% hint style="info" %}

#### PDC Application Images

These variables define the specific Docker images and versions for all Pentaho Data Catalog core services.
{% endhint %}

<table><thead><tr><th width="261">Variable</th><th>Value</th><th>Description</th></tr></thead><tbody><tr><td><code>PDC_FE_IMAGE</code></td><td><code>pentaho/pdc-app-server:10.2.8</code></td><td>Frontend application server</td></tr><tr><td><code>PDC_PUBLIC_API_IMAGE</code></td><td><code>pentaho/pdc-public-api:10.2.8</code></td><td>Public API service</td></tr><tr><td><code>PDC_BI_METADATA_API_IMAGE</code></td><td><code>pentaho/pdc-bi-metadata-api:10.2.8</code></td><td>Business Intelligence metadata API</td></tr><tr><td><code>PDC_MONGODB_MIGRATIONS_IMAGE</code></td><td><code>pentaho/pdc-mongodb-migrations:10.2.8</code></td><td>MongoDB database migrations</td></tr><tr><td><code>PDC_OPS_IMAGE</code></td><td><code>pentaho/pdc-ops-service:10.2.8</code></td><td>Operations service for job management</td></tr><tr><td><code>PDC_TASK_RUNNER_IMAGE</code></td><td><code>pentaho/pdc-task-runner:10.2.8</code></td><td>Background task execution service</td></tr><tr><td><code>PDC_WS_DEFAULT_IMAGE</code></td><td><code>pentaho/pdc-worker-service-bundle:10.2.8</code></td><td>Default worker service bundle</td></tr></tbody></table>

{% hint style="info" %}

#### Specialized Service Images

These images provide specific functionality for different aspects of data management and governance.
{% endhint %}

<table><thead><tr><th width="238">Variable</th><th width="318">Value</th><th>Description</th></tr></thead><tbody><tr><td><code>PDC_MDS_IMAGE</code></td><td><code>pentaho/pdc-metadata-model:10.2.8</code></td><td>Metadata model service</td></tr><tr><td><code>PDC_LINEAGE_IMAGE</code></td><td><code>pentaho/psc-lineage:10.2.8</code></td><td>Data lineage tracking service</td></tr><tr><td><code>PDC_GLOSSARY_IMAGE</code></td><td><code>pentaho/psc-glossary:10.2.8</code></td><td>Business glossary management</td></tr><tr><td><code>PDC_PHYSICAL_ASSETS_IMAGE</code></td><td><code>pentaho/psc-physical-assets:10.2.8</code></td><td>Physical data asset management</td></tr><tr><td><code>PDC_MLMODELS_IMAGE</code></td><td><code>pentaho/psc-ml-models:10.2.8</code></td><td>Machine learning model management</td></tr><tr><td><code>PDC_COLLAB_APP_IMAGE</code></td><td><code>pentaho/psc-collaboration-service:10.2.8</code></td><td>Collaboration features</td></tr><tr><td><code>PDC_REFDATA_APP_IMAGE</code></td><td><code>pentaho/psc-reference-data:10.2.8</code></td><td>Reference data management</td></tr></tbody></table>

{% hint style="info" %}

#### PDSO (Pentaho Data Services Optimizer) Images

PDSO provides data optimization and management capabilities with separate API and UI components.
{% endhint %}

| Variable                  | Value                                | Description                  |
| ------------------------- | ------------------------------------ | ---------------------------- |
| `PDSO_API_IMAGE`          | `pentaho/pdso-manager:10.2.8`        | PDSO management API          |
| `PDSO_APP_IMAGE`          | `pentaho/pdso-manager-ui:10.2.8`     | PDSO management UI           |
| `PDSO_DATA_SERVICE_IMAGE` | `pentaho/pdso-data-service:10.2.8`   | PDSO data processing service |
| `CRYPTO_SERVICE_IMAGE`    | `pentaho/pdso-crypto-service:10.2.8` | Cryptographic services       |

### User Management Images

These images handle authentication, authorization, and user administration using Keycloak as the identity provider.

<table><thead><tr><th width="246">Variable</th><th>Value</th><th>Description</th></tr></thead><tbody><tr><td><code>USER_MGMT_IMAGE</code></td><td><code>pentaho/user-management:10.2.8</code></td><td>Core user management service</td></tr><tr><td><code>USER_MGMT_UI_IMAGE</code></td><td><code>pentaho/user-management-ui:10.2.8</code></td><td>User management interface</td></tr><tr><td><code>USER_MGMT_ADMIN_API_IMAGE</code></td><td><code>pentaho/um-admin-api:10.2.8</code></td><td>Administrative API for user management</td></tr><tr><td><code>USER_MGMT_AUTH_PROXY_IMAGE</code></td><td><code>pentaho/um-auth-proxy:10.2.8</code></td><td>Authentication proxy service</td></tr><tr><td><code>USER_MGMT_KEYCLOAK_IMAGE</code></td><td><code>pentaho/pdc-keycloak-themes:10.2.8</code></td><td>Keycloak with PDC themes</td></tr></tbody></table>

{% hint style="info" %}

### Master Data Management (MDM) Images

PDM (Pentaho Data Management) provides master data management capabilities with backend, frontend, and database initialization components.
{% endhint %}

| Variable            | Value                   | Description                 |
| ------------------- | ----------------------- | --------------------------- |
| `MDM_BE_IMAGE`      | `pentaho/pdm-be:10.2.8` | MDM backend service         |
| `MDM_FE_IMAGE`      | `pentaho/pdm-fe:10.2.8` | MDM frontend interface      |
| `MDM_DB_INIT_IMAGE` | `pentaho/pdm-db:10.2.8` | MDM database initialization |

{% hint style="info" %}

### Supporting Service Images

These images provide essential infrastructure services like search, messaging, and data integration.
{% endhint %}

<table><thead><tr><th width="258">Variable</th><th>Value</th><th>Description</th></tr></thead><tbody><tr><td><code>PDC_GLOBAL_SEARCH_IMAGE</code></td><td><code>pentaho/pdc-global-search:10.2.8</code></td><td>Global search functionality</td></tr><tr><td><code>ACCESS_REQUEST_SERVICE_IMAGE</code></td><td><code>pentaho/access-request-service:10.2.8</code></td><td>Data access request management</td></tr><tr><td><code>ML_GATEWAY_SERVICE_IMAGE</code></td><td><code>pentaho/ml-gateway-service:10.2.8</code></td><td>Machine learning gateway</td></tr><tr><td><code>PDI_DATAPIPES_IMAGE</code></td><td><code>pentaho/datapipes:10.2.8</code></td><td>Data integration pipelines</td></tr><tr><td><code>PDC_CHATBOT_BACKEND_IMAGE</code></td><td><code>pentaho/pdc-chatbot-backend:250820-bb72b27-release-v10.2.8</code></td><td>AI chatbot backend</td></tr></tbody></table>

{% hint style="info" %}

### Infrastructure Images

These are third-party infrastructure components required for PDC operations.
{% endhint %}

| Variable             | Value                                                 | Description                     |
| -------------------- | ----------------------------------------------------- | ------------------------------- |
| `POSTGRESQL16_IMAGE` | `postgres:16.8-bookworm`                              | PostgreSQL 16 database          |
| `REDIS_IMAGE`        | `bitnami/redis:7.4.3-debian-12-r0`                    | Redis cache and session store   |
| `OPENSEARCH_IMAGE`   | `bitnami/opensearch:2.19.2-debian-12-r0`              | OpenSearch for full-text search |
| `KAFKA_IMAGE`        | `bitnami/kafka:3.9.0-debian-12-r13`                   | Apache Kafka message broker     |
| `MONGODB_IMAGE`      | `mongodb/mongodb-enterprise-server:6.0.23-ubuntu2204` | MongoDB database                |
| `TRAEFIK_IMAGE`      | `traefik:v2.11.24`                                    | Reverse proxy and load balancer |

{% hint style="info" %}

### PDI (Pentaho Data Integration) Configuration

These settings configure the PDI service for data integration workflows.
{% endhint %}

| Variable                 | Value                                 | Description                                  |
| ------------------------ | ------------------------------------- | -------------------------------------------- |
| `PDI_DATAPIPES_USERNAME` | `data_steward@hv.com`                 | Default username for PDI operations          |
| `PDI_DATAPIPES_PASSWORD` | `Welcome123!`                         | Default password for PDI (should be changed) |
| `PDI_DATABASE_PASSWORD`  | `password`                            | Database password for PDI                    |
| `PDC_USER_GUIDE_LINK`    | `https://docs.hitachivantara.com/...` | Link to user documentation                   |

### Docker Compose Configuration

These settings control how Docker Compose deploys and manages the PDC stack.

| Variable               | Value                                | Description                            |
| ---------------------- | ------------------------------------ | -------------------------------------- |
| `COMPOSE_PROJECT_NAME` | `pdc`                                | Docker Compose project name            |
| `COMPOSE_PROFILES`     | `core,mongodb,collab,mdm,refdata...` | Active service profiles to deploy      |
| `PDC_VENDOR_PATH`      | `./vendor`                           | Path to vendor-specific configurations |
| `PDC_CLIENT_PATH`      | `./conf`                             | Path to client configuration files     |

{% hint style="info" %}

### Worker Service Configuration

These settings configure the default worker service behavior for data processing tasks.
{% endhint %}

<table><thead><tr><th width="425">Variable</th><th width="98">Value</th><th>Description</th></tr></thead><tbody><tr><td><code>PDC_WS_DEFAULT_OPS_JOBPOOLMINSIZE</code></td><td><code>5</code></td><td>Minimum job pool size</td></tr><tr><td><code>PDC_WS_DEFAULT_OPS_JOBPOOLMAXSIZE</code></td><td><code>5</code></td><td>Maximum job pool size</td></tr><tr><td><code>PDC_WS_DEFAULT_JDBC_DEFAULT_POOL_SIZE</code></td><td><code>10</code></td><td>Default JDBC connection pool size</td></tr><tr><td><code>PDC_WS_DEFAULT_JDBC_DEFAULT_CONNECTION_TIMEOUT</code></td><td><code>60</code></td><td>JDBC connection timeout in seconds</td></tr></tbody></table>

{% hint style="info" %}

### Database Sampling Algorithms

These configure how PDC samples data from various database types for profiling and analysis.
{% endhint %}

<table><thead><tr><th width="372">Variable</th><th width="105">Value</th><th>Description</th></tr></thead><tbody><tr><td><code>PDC_WS_DEFAULT_JDBC_ORACLE_PERCENTAGE_SAMPLING_ALGORITHM</code></td><td><code>DEFAULT</code></td><td>Oracle database sampling method</td></tr><tr><td><code>PDC_WS_DEFAULT_JDBC_POSTGRES_PERCENTAGE_SAMPLING_ALGORITHM</code></td><td><code>DEFAULT</code></td><td>PostgreSQL sampling method</td></tr><tr><td><code>PDC_WS_DEFAULT_JDBC_SNOWFLAKE_PERCENTAGE_SAMPLING_ALGORITHM</code></td><td><code>DEFAULT</code></td><td>Snowflake sampling method</td></tr><tr><td><code>PDC_WS_DEFAULT_JDBC_BIGQUERY_PERCENTAGE_SAMPLING_ALGORITHM</code></td><td><code>DEFAULT</code></td><td>BigQuery sampling method</td></tr></tbody></table>

{% hint style="info" %}

### MongoDB Configuration

MongoDB serves as the primary metadata repository for PDC with separate databases for different services.
{% endhint %}

<table><thead><tr><th width="336">Variable</th><th width="103">Value</th><th>Description</th></tr></thead><tbody><tr><td><code>PDC_MONGO_MONGO_INITDB_ROOT_PASSWORD</code></td><td><code>broot</code></td><td>MongoDB root password</td></tr><tr><td><code>PDC_MONGODB_BACKUP_DAYS</code></td><td><code>90</code></td><td>MongoDB backup retention period</td></tr><tr><td><code>PDC_FE_MONGODB_SOCKET_TIMEOUT</code></td><td><code>300000</code></td><td>MongoDB socket timeout (5 minutes)</td></tr><tr><td><code>PDC_CRON_MONGODB_SCHEDULE</code></td><td><code>@daily</code></td><td>MongoDB backup schedule</td></tr></tbody></table>

{% hint style="info" %}

### Frontend Configuration

These settings control the PDC web frontend behavior and integration points.
{% endhint %}

<table><thead><tr><th>Variable</th><th width="126">Value</th><th>Description</th></tr></thead><tbody><tr><td><code>PDC_FE_VERSION_MAIN</code></td><td><code>10.2.8</code></td><td>Frontend version identifier</td></tr><tr><td><code>PDC_FE_LOG_LEVEL</code></td><td><code>info</code></td><td>Frontend logging level</td></tr><tr><td><code>PDC_FE_DATAMASK_ENABLE</code></td><td>(empty)</td><td>Enable/disable data masking</td></tr><tr><td><code>PDC_FE_DATAMASK_CHAR</code></td><td><code>*</code></td><td>Character used for data masking</td></tr></tbody></table>

{% hint style="info" %}

### Operations Service Configuration

These settings control job prioritization and operational behavior.
{% endhint %}

<table><thead><tr><th width="330">Variable</th><th width="88">Value</th><th>Description</th></tr></thead><tbody><tr><td><code>PDC_OPS_ADMIN_JOB_PRIORITY</code></td><td><code>1</code></td><td>Highest priority for admin jobs</td></tr><tr><td><code>PDC_OPS_DATA_STEWARD_JOB_PRIORITY</code></td><td><code>3</code></td><td>Priority for data steward jobs</td></tr><tr><td><code>PDC_OPS_BUSINESS_USER_JOB_PRIORITY</code></td><td><code>5</code></td><td>Priority for business user jobs</td></tr><tr><td><code>PDC_OPS_DEFAULT_JOB_PRIORITY</code></td><td><code>3</code></td><td>Default job priority</td></tr></tbody></table>

{% hint style="info" %}

### Kafka Configuration

Kafka handles asynchronous messaging between PDC services.
{% endhint %}

| Variable                     | Value                                      | Description              |
| ---------------------------- | ------------------------------------------ | ------------------------ |
| `KAFKA_INIT_SEED_TOPIC_LIST` | `datasource-operation,entity-operation...` | Pre-created Kafka topics |

{% hint style="info" %}

### Reference Data Service Configuration

These settings configure the reference data management service with its dedicated PostgreSQL database.
{% endhint %}

<table><thead><tr><th width="242">Variable</th><th>Value</th><th>Description</th></tr></thead><tbody><tr><td><code>REFDATA_API_IS_PRODUCTION</code></td><td><code>true</code></td><td>Production mode flag</td></tr><tr><td><code>REFDATA_API_DB_HOST</code></td><td><code>refdata-postgres</code></td><td>Database host</td></tr><tr><td><code>REFDATA_API_DB_USERNAME</code></td><td><code>postgres</code></td><td>Database username</td></tr><tr><td><code>REFDATA_API_DB_PASSWORD</code></td><td><code>secret</code></td><td>Database password</td></tr></tbody></table>

{% hint style="info" %}

### Rules Engine Configuration

The rules engine processes data governance and quality rules.
{% endhint %}

<table><thead><tr><th width="315">Variable</th><th width="98">Value</th><th>Description</th></tr></thead><tbody><tr><td><code>RULES_ENGINE_LOG_LEVEL</code></td><td><code>INFO</code></td><td>Rules engine logging level</td></tr><tr><td><code>RULES_ENABLE_INSTRUMENTATION</code></td><td><code>true</code></td><td>Enable performance monitoring</td></tr><tr><td><code>RULES_MDS_ENTITY_UPDATE_BATCH_SIZE</code></td><td><code>20000</code></td><td>Batch size for entity updates</td></tr></tbody></table>

{% hint style="info" %}

### PDSO Configuration

PDSO (Pentaho Data Services Optimizer) provides data optimization capabilities.
{% endhint %}

<table><thead><tr><th width="276">Variable</th><th>Value</th><th>Description</th></tr></thead><tbody><tr><td><code>PDSO_ENABLE_HASH_CALCULATION</code></td><td><code>true</code></td><td>Enable hash-based data comparison</td></tr><tr><td><code>PDSO_ENABLE_STUB_CREATION</code></td><td><code>false</code></td><td>Disable stub creation</td></tr><tr><td><code>PDSO_DEFAULT_DESTINATION_PATH</code></td><td><code>"pentaho_migration"</code></td><td>Default migration path</td></tr></tbody></table>

{% hint style="info" %}

### Redis Configuration

Redis provides caching and session management.
{% endhint %}

| Variable                 | Value                 | Description           |
| ------------------------ | --------------------- | --------------------- |
| `REDIS_MASTER_PASSWORD`  | `redis_master_broot`  | Master node password  |
| `REDIS_REPLICA_PASSWORD` | `redis_replica_broot` | Replica node password |

{% hint style="info" %}

### Keycloak (Identity Management) Configuration

Keycloak provides authentication and authorization services.
{% endhint %}

<table><thead><tr><th width="178">Variable</th><th width="296">Value</th><th>Description</th></tr></thead><tbody><tr><td><code>KEYCLOAK_URL</code></td><td><code>https://${GLOBAL_SERVER_HOST_NAME}/keycloak</code></td><td>Keycloak access URL</td></tr><tr><td><code>KEYCLOAK_USER</code></td><td><code>admin</code></td><td>Keycloak admin username</td></tr><tr><td><code>KEYCLOAK_PASSWORD</code></td><td><code>admin</code></td><td>Keycloak admin password (should be changed)</td></tr><tr><td><code>EMAIL_DOMAINS</code></td><td><code>["hv.com", "hitachivantara.com"]</code></td><td>Allowed email domains</td></tr><tr><td><code>TENANT_NAME</code></td><td><code>pdc</code></td><td>Default tenant name</td></tr></tbody></table>

{% hint style="info" %}

#### Master Data Management Database Configuration

MDM uses its own PostgreSQL database for master data storage.
{% endhint %}

| Variable              | Value          | Description           |
| --------------------- | -------------- | --------------------- |
| `MDM_API_DB_USERNAME` | `postgres`     | MDM database username |
| `MDM_API_DB_PASSWORD` | `secret`       | MDM database password |
| `MDM_API_DB_HOST`     | `mdm-postgres` | MDM database host     |
| `MDM_API_DB_PORT`     | `5432`         | MDM database port     |

{% hint style="info" %}

#### Licensing Configuration

These settings configure license management and feature validation.
{% endhint %}

<table><thead><tr><th width="325">Variable</th><th width="117">Value</th><th>Description</th></tr></thead><tbody><tr><td><code>LICENSE_DATASOURCE_FEATURE_VERSION</code></td><td><code>10.2</code></td><td>Licensed version for data sources</td></tr><tr><td><code>LICENSE_USERS_COUNT_FEATURE_VERSION</code></td><td><code>10.2</code></td><td>Licensed version for user count</td></tr><tr><td><code>LICENSE_MDM_FEATURE_VERSION</code></td><td><code>10.2</code></td><td>Licensed version for MDM features</td></tr></tbody></table>

{% hint style="info" %}

#### Access Request Service Configuration

This service manages data access requests and integrates with external ticketing systems.
{% endhint %}

<table><thead><tr><th width="339">Variable</th><th width="114">Value</th><th>Description</th></tr></thead><tbody><tr><td><code>ACCESS_REQUEST_SERVICE_PROVIDER_TOOL</code></td><td>(empty)</td><td>External ticketing system (JIRA/ServiceNow)</td></tr><tr><td><code>ACCESS_REQUEST_SERVICE_JIRA_URL</code></td><td>(empty)</td><td>JIRA instance URL</td></tr><tr><td><code>ACCESS_REQUEST_SERVICE_SERVICENOW_URL</code></td><td>(empty)</td><td>ServiceNow instance URL</td></tr></tbody></table>

{% hint style="info" %}

#### Machine Learning Configuration

These settings configure AI/ML features including custom models and token limits.
{% endhint %}

<table><thead><tr><th width="318">Variable</th><th width="124">Value</th><th>Description</th></tr></thead><tbody><tr><td><code>ML_CUSTOM_TOKENS</code></td><td><code>6000</code></td><td>Token limit for ML operations</td></tr><tr><td><code>ML_TOKEN_WINDOW_SIZE</code></td><td><code>8000</code></td><td>Context window size for ML models</td></tr><tr><td><code>PDC_WS_DEFAULT_MAX_FILE_SIZE_FOR_ML</code></td><td><code>10485760</code></td><td>Max file size for ML processing (10MB)</td></tr><tr><td><code>ML_LLM_MODEL</code></td><td>(empty)</td><td>Large Language Model identifier</td></tr></tbody></table>

#### OpenSearch Configuration

OpenSearch provides full-text search and analytics capabilities.

<table><thead><tr><th width="250">Variable</th><th width="145">Value</th><th>Description</th></tr></thead><tbody><tr><td><code>OPENSEARCH_HEAP_SIZE</code></td><td><code>2048m</code></td><td>JVM heap size for OpenSearch</td></tr><tr><td><code>OPENSEARCH_USERNAME</code></td><td><code>admin</code></td><td>OpenSearch admin username</td></tr><tr><td><code>OPENSEARCH_PASSWORD</code></td><td><code>Es3vweMuABJr</code></td><td>OpenSearch admin password</td></tr><tr><td><code>PDC_OPENSEARCH_BACKUP_DAYS</code></td><td><code>90</code></td><td>OpenSearch backup retention</td></tr></tbody></table>

{% hint style="info" %}

#### PostgreSQL Configuration

PostgreSQL serves as the primary relational database for various PDC services.
{% endhint %}

<table><thead><tr><th width="256">Variable</th><th width="136">Value</th><th>Description</th></tr></thead><tbody><tr><td><code>POSTGRES_USERNAME</code></td><td><code>postgres</code></td><td>PostgreSQL admin username</td></tr><tr><td><code>POSTGRES_PASSWORD</code></td><td><code>admin123#</code></td><td>PostgreSQL admin password</td></tr><tr><td><code>POSTGRES_BIDB_USER_NAME</code></td><td><code>bidb_ro</code></td><td>Read-only user for BI database</td></tr></tbody></table>

{% hint style="info" %}

#### Security Considerations

Several passwords and secrets are visible in this default configuration:

* Change all default passwords before production deployment
* Use environment-specific `.env` files to override sensitive values
* Consider using Docker secrets or external secret management
* The `PDC_PDI_PRIVACY_ENCRYPTION_KEY` should be regenerated for each environment
  {% endhint %}
  {% endtab %}

{% tab title="Critical Settings" %}
{% hint style="info" %}

## Critical Pentaho Data Catalog Configuration Settings

When deploying Pentaho Data Catalog, several configuration categories require careful consideration. This guide identifies the most critical settings that must be configured before production deployment, organized by priority and impact.
{% endhint %}

{% hint style="info" %}

### 1. MANDATORY CONFIGURATIONS

These settings **must** be configured before PDC can function properly. Failure to set these will prevent the system from starting or operating correctly.
{% endhint %}

| Variable                  | Default Value                                            | Required Action | Description                                                                                                                                    |
| ------------------------- | -------------------------------------------------------- | --------------- | ---------------------------------------------------------------------------------------------------------------------------------------------- |
| `GLOBAL_SERVER_HOST_NAME` | `${GLOBAL_SERVER_HOST_NAME:?please set it in conf/.env}` | **REQUIRED**    | The fully qualified domain name (FQDN) or IP address where PDC will be accessible. This affects URLs, SSL certificates, and service discovery. |

{% hint style="info" %}

### 2. SECURITY-CRITICAL CONFIGURATIONS

These default passwords and security settings pose significant security risks and **must** be changed before production deployment.
{% endhint %}

| Variable                               | Default Value                                  | Security Risk | Recommended Action                                                      |
| -------------------------------------- | ---------------------------------------------- | ------------- | ----------------------------------------------------------------------- |
| `PDC_MONGO_MONGO_INITDB_ROOT_PASSWORD` | `broot`                                        | **HIGH**      | Change to complex password (min 12 chars, mixed case, numbers, symbols) |
| `KEYCLOAK_PASSWORD`                    | `admin`                                        | **CRITICAL**  | Change immediately - controls access to identity management             |
| `KEYCLOAK_USER`                        | `admin`                                        | **HIGH**      | Consider changing username from default                                 |
| `POSTGRES_PASSWORD`                    | `admin123#`                                    | **HIGH**      | Change to complex password for main database                            |
| `POSTGRES_BIDB_USER_PASSWORD`          | `4aU94t8v+4d+W`                                | **MEDIUM**    | Change for BI database read-only user                                   |
| `REDIS_MASTER_PASSWORD`                | `redis_master_broot`                           | **HIGH**      | Change for cache/session security                                       |
| `REDIS_REPLICA_PASSWORD`               | `redis_replica_broot`                          | **HIGH**      | Change for cache replication security                                   |
| `OPENSEARCH_PASSWORD`                  | `Es3vweMuABJr`                                 | **HIGH**      | Change for search engine security                                       |
| `PDI_DATAPIPES_PASSWORD`               | `Welcome123!`                                  | **HIGH**      | Change for data integration security                                    |
| `REFDATA_API_DB_PASSWORD`              | `secret`                                       | **HIGH**      | Change for reference data database                                      |
| `MDM_API_DB_PASSWORD`                  | `secret`                                       | **HIGH**      | Change for master data management database                              |
| `MONGODB_BKP_USER_PASSWORD`            | `qdA9WMcw35WjzQl7ELhlHEF`                      | **MEDIUM**    | Change for backup user security                                         |
| `PDC_PDI_PRIVACY_ENCRYPTION_KEY`       | `NUhHRk5VWkJBTkZTbWxDanR0enJLVU05UlF3M3RwWks=` | **CRITICAL**  | Generate new encryption key for data privacy                            |

{% hint style="info" %}

### 3. ENVIRONMENT-SPECIFIC CONFIGURATIONS

These settings should be customized based on your specific deployment environment and requirements.
{% endhint %}

#### Infrastructure Sizing

<table><thead><tr><th width="307">Variable</th><th width="98">Default Value</th><th>Consideration</th><th>Recommended Action</th></tr></thead><tbody><tr><td><code>OPENSEARCH_HEAP_SIZE</code></td><td><code>2048m</code></td><td>Memory allocation for search</td><td>Adjust based on data volume and available RAM</td></tr><tr><td><code>PDC_WS_DEFAULT_OPS_JOBPOOLMINSIZE</code></td><td><code>5</code></td><td>Worker pool sizing</td><td>Adjust based on expected workload</td></tr><tr><td><code>PDC_WS_DEFAULT_OPS_JOBPOOLMAXSIZE</code></td><td><code>5</code></td><td>Worker pool sizing</td><td>Increase for high-volume environments</td></tr><tr><td><code>PDC_WS_DEFAULT_JDBC_DEFAULT_POOL_SIZE</code></td><td><code>10</code></td><td>Database connections</td><td>Adjust based on concurrent users</td></tr><tr><td><code>PDC_WS_DEFAULT_JDBC_DEFAULT_CONNECTION_TIMEOUT</code></td><td><code>60</code></td><td>Connection timeout</td><td>Adjust for network latency</td></tr></tbody></table>

#### Backup and Retention

<table><thead><tr><th width="256">Variable</th><th width="111">Default Value</th><th>Consideration</th><th>Recommended Action</th></tr></thead><tbody><tr><td><code>GLOBAL_PG_BACKUP_DAYS</code></td><td><code>90</code></td><td>PostgreSQL backup retention</td><td>Align with data retention policies</td></tr><tr><td><code>PDC_MONGODB_BACKUP_DAYS</code></td><td><code>90</code></td><td>MongoDB backup retention</td><td>Align with data retention policies</td></tr><tr><td><code>PDC_OPENSEARCH_BACKUP_DAYS</code></td><td><code>90</code></td><td>OpenSearch backup retention</td><td>Align with data retention policies</td></tr></tbody></table>

{% hint style="info" %}

### 4. INTEGRATION CONFIGURATIONS

These settings enable PDC to integrate with external systems and should be configured based on your infrastructure.
{% endhint %}

#### Email and Notifications

| Variable               | Default Value                      | Configuration Need       | Action Required                         |
| ---------------------- | ---------------------------------- | ------------------------ | --------------------------------------- |
| `KEYCLOAK_SMTP`        | Pre-configured Office365           | Email server settings    | Update with your SMTP server details    |
| `KEYCLOAK_ADMIN_EMAIL` | `admin@hv.com`                     | Admin notification email | Change to your admin email              |
| `EMAIL_DOMAINS`        | `["hv.com", "hitachivantara.com"]` | Allowed email domains    | Update with your organization's domains |

#### External System Integration

<table><thead><tr><th width="254">Variable</th><th width="109">Default Value</th><th>Integration Type</th><th>Configuration Required</th></tr></thead><tbody><tr><td><code>ACCESS_REQUEST_SERVICE_PROVIDER_TOOL</code></td><td>(empty)</td><td>Ticketing system</td><td>Set to "JIRA" or "SERVICENOW" if using</td></tr><tr><td><code>ACCESS_REQUEST_SERVICE_JIRA_URL</code></td><td>(empty)</td><td>JIRA integration</td><td>Configure if using JIRA for access requests</td></tr><tr><td><code>ACCESS_REQUEST_SERVICE_SERVICENOW_URL</code></td><td>(empty)</td><td>ServiceNow integration</td><td>Configure if using ServiceNow</td></tr><tr><td><code>LICENSING_SERVER_URL</code></td><td>(empty)</td><td>License server</td><td>Configure for license validation</td></tr><tr><td><code>PDC_FE_DASHBOARD_URL</code></td><td>(empty)</td><td>External dashboard</td><td>Set if integrating with external BI tools</td></tr><tr><td><code>PDC_FE_JUPYTER_URL</code></td><td>(empty)</td><td>Jupyter integration</td><td>Set if providing Jupyter notebook access</td></tr></tbody></table>

{% hint style="info" %}

### 5. FEATURE ENABLEMENT CONFIGURATIONS

These settings control which PDC features are enabled and how they behave.Service Profiles
{% endhint %}

| Variable           | Default Value                                                                                                                                                             | Impact                                 | Customization Needed                                  |
| ------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------- | ----------------------------------------------------- |
| `COMPOSE_PROFILES` | `core,mongodb,collab,mdm,refdata,refdata-postgres,mdm-postgres,physical-assets,ml-models,bi-metadata,ai-ml,access-request,policy-hierarchy,metadata-rules,pdso,datapipes` | Determines which services are deployed | Remove unused services to reduce resource consumption |

#### Machine Learning Features

<table><thead><tr><th width="255">Variable</th><th>Default Value</th><th>Feature Impact</th><th>Configuration Need</th></tr></thead><tbody><tr><td><code>ML_LLM_MODEL</code></td><td>(empty)</td><td>AI/ML capabilities</td><td>Set if using external LLM services</td></tr><tr><td><code>ML_LLM_API_KEY</code></td><td>(empty)</td><td>AI/ML access</td><td>Configure API key for external AI services</td></tr><tr><td><code>ML_LLM_INFERENCE_BASE_URL</code></td><td>(empty)</td><td>AI service endpoint</td><td>Set inference service URL</td></tr><tr><td><code>PDC_WS_DEFAULT_MAX_FILE_SIZE_FOR_ML</code></td><td><code>10485760</code> (10MB)</td><td>File processing limits</td><td>Adjust based on typical file sizes</td></tr></tbody></table>

#### Data Masking

<table><thead><tr><th width="258">Variable</th><th>Default Value</th><th>Feature</th><th>Configuration</th></tr></thead><tbody><tr><td><code>PDC_FE_DATAMASK_ENABLE</code></td><td>(empty)</td><td>Data privacy</td><td>Set to enable data masking in UI</td></tr><tr><td><code>PDC_FE_DATAMASK_CHAR</code></td><td><code>*</code></td><td>Masking character</td><td>Customize masking character</td></tr><tr><td><code>PDC_FE_DATAMASK_TAG_NAMES</code></td><td><code>sensitive</code></td><td>Tag-based masking</td><td>Define which tags trigger masking</td></tr></tbody></table>

{% hint style="info" %}

### 6. PERFORMANCE TUNING CONFIGURATIONS

These settings should be optimized based on your performance requirements and system resources.
{% endhint %}

#### Job Processing

<table><thead><tr><th width="217">Variable</th><th width="123">Default Value</th><th>Performance Impact</th><th>Tuning Guidance</th></tr></thead><tbody><tr><td><code>RULES_MDS_ENTITY_UPDATE_BATCH_SIZE</code></td><td><code>20000</code></td><td>Rule processing speed</td><td>Increase for better throughput, decrease for lower memory</td></tr><tr><td><code>RULES_MDS_ENTITY_SCROLL_BATCH_SIZE</code></td><td><code>1000</code></td><td>Memory vs speed tradeoff</td><td>Adjust based on available memory</td></tr><tr><td><code>PDC_WS_DEFAULT_SE_BATCH_SIZE</code></td><td><code>50</code></td><td>Search indexing</td><td>Increase for faster indexing</td></tr><tr><td><code>PDC_WS_DEFAULT_DS_PERM_CREATION_BATCH_SIZE</code></td><td><code>4</code></td><td>Permission processing</td><td>Increase cautiously to avoid overwhelming auth system</td></tr></tbody></table>

#### Timeout Settings

<table><thead><tr><th width="283">Variable</th><th>Default Value</th><th>Impact</th><th>Adjustment Guidelines</th></tr></thead><tbody><tr><td><code>PDC_FE_MONGODB_SOCKET_TIMEOUT</code></td><td><code>300000</code> (5 min)</td><td>Frontend responsiveness</td><td>Increase for slow networks</td></tr><tr><td><code>ML_BENTO_TIME_OUT</code></td><td><code>6000</code></td><td>ML processing timeout</td><td>Adjust based on ML workload complexity</td></tr></tbody></table>

{% hint style="info" %}

### 7. LOGGING AND MONITORING CONFIGURATIONS

These settings control system observability and should be configured for production monitoring.
{% endhint %}

<table><thead><tr><th width="264">Variable</th><th>Default Value</th><th>Purpose</th><th>Recommended Setting</th></tr></thead><tbody><tr><td><code>PDC_FE_LOG_LEVEL</code></td><td><code>info</code></td><td>Frontend logging</td><td>Keep as <code>info</code> for production, use <code>debug</code> for troubleshooting</td></tr><tr><td><code>RULES_ENGINE_LOG_LEVEL</code></td><td><code>INFO</code></td><td>Rules engine logging</td><td>Keep as <code>INFO</code> for production</td></tr><tr><td><code>RULES_ENABLE_INSTRUMENTATION</code></td><td><code>true</code></td><td>Performance monitoring</td><td>Keep enabled for production monitoring</td></tr></tbody></table>

{% hint style="info" %}

### 8. NETWORK AND SSL CONFIGURATIONS

These settings control how PDC handles network traffic and security.
{% endhint %}

<table><thead><tr><th width="212">Variable</th><th>Default Value</th><th>Security Impact</th><th>Configuration Need</th></tr></thead><tbody><tr><td><code>PDC_IN_TRAEFIK_ENTRYPOINTS_WEB_HTTP_REDIRECTIONS_ENTRYPOINT_TO</code></td><td><code>web-secure</code></td><td>HTTPS enforcement</td><td>Keep for HTTPS redirect, clear to disable</td></tr><tr><td><code>KEYCLOAK_PROTOCOL</code></td><td><code>http</code></td><td>Authentication security</td><td>Change to <code>https</code> for production</td></tr><tr><td><code>RULES_IAM_ENABLE_SSL</code></td><td><code>true</code></td><td>Rules engine security</td><td>Keep enabled for production</td></tr></tbody></table>

#### Configuration Priority Checklist

#### Before First Deployment:

1. ✅ Set `GLOBAL_SERVER_HOST_NAME`
2. ✅ Change all default passwords
3. ✅ Generate new `PDC_PDI_PRIVACY_ENCRYPTION_KEY`
4. ✅ Configure email settings
5. ✅ Review and customize `COMPOSE_PROFILES`

#### Before Production:

1. ✅ Size infrastructure components appropriately
2. ✅ Configure backup retention policies
3. ✅ Set up external system integrations
4. ✅ Configure SSL/HTTPS settings
5. ✅ Enable monitoring and logging

#### Performance Optimization:

1. ✅ Tune worker pool sizes
2. ✅ Adjust batch processing sizes
3. ✅ Configure timeout values
4. ✅ Optimize database connection pools

{% hint style="info" %}

### Security Best Practices

* **Use external secret management**: Consider using Docker secrets, Kubernetes secrets, or external secret management systems instead of plain text passwords
* **Implement least privilege**: Create dedicated database users with minimal required permissions
* **Regular rotation**: Establish a schedule for rotating passwords and encryption keys
* **Network security**: Use firewalls and network segmentation to protect database and internal service communications
* **SSL/TLS everywhere**: Enable SSL for all service-to-service communications
* **Audit logging**: Ensure comprehensive audit logging is enabled for compliance requirements
  {% endhint %}
  {% endtab %}
  {% endtabs %}


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://academy.pentaho.com/pentaho-data-catalog-en/reference/pdc-configuration-files.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
