# Adventure Works

x

x

{% tabs %}
{% tab title="PDC" %}
x

x

{% tabs %}
{% tab title="Installation" %}
{% hint style="info" %}

#### Disk Configuration

Here are the best practices for disk partitioning when installing Ubuntu 24.04.4LTS to ensure adequate space for PDC 10.2.8 Docker and other services:
{% endhint %}

1. Choose "Custom" partitioning instead of automatic
2. Use LVM (Logical Volume Manager) - this allows easy resizing later
3. Leave 20-30% of disk space unallocated in the volume group for future expansion

|             |                     |
| ----------- | ------------------- |
| /boot       | 1GB                 |
| /boot/efi   | 512MB               |
| swap        | 16GB                |
| /root       | 100GB               |
| /var        | 150GB (Docker data) |
| /           | 100GB               |
| Unallocated | \~130GB             |

4.

x

x

x
{% endtab %}

{% tab title="Configuration" %}
x
{% endtab %}
{% endtabs %}

x

x

{% tabs %}
{% tab title=".env.default" %}
{% hint style="info" %}

####

{% endhint %}
{% endtab %}

{% tab title="Critical Settings" %}
{% hint style="info" %}

#### Critical Settings

When deploying Pentaho Data Catalog, several configuration categories require careful consideration. This guide identifies the most critical settings that must be configured before production deployment, organized by priority and impact.
{% endhint %}

### 1. Mandatory

These settings **must** be configured before PDC can function properly. Failure to set these will prevent the system from starting or operating correctly.

<table><thead><tr><th width="224">Variable</th><th width="157">Default Value</th><th>Required Action</th><th>Description</th></tr></thead><tbody><tr><td><code>GLOBAL_SERVER_HOST_NAME</code></td><td><code>${GLOBAL_SERVER_HOST_NAME:?please set it in conf/.env}</code></td><td><strong>REQUIRED</strong></td><td>The fully qualified domain name (FQDN) or IP address where PDC will be accessible. This affects URLs, SSL certificates, and service discovery.</td></tr></tbody></table>

### 2. Security

These default passwords and security settings pose significant security risks and **must** be changed before production deployment.

<table><thead><tr><th width="248">Variable</th><th width="193">Default Value</th><th>Security Risk</th><th>Recommended Action</th></tr></thead><tbody><tr><td><code>PDC_MONGO_MONGO_INITDB_ROOT_PASSWORD</code></td><td><code>broot</code></td><td><strong>HIGH</strong></td><td>Change to complex password (min 12 chars, mixed case, numbers, symbols)</td></tr><tr><td><code>KEYCLOAK_PASSWORD</code></td><td><code>admin</code></td><td><strong>CRITICAL</strong></td><td>Change immediately - controls access to identity management</td></tr><tr><td><code>KEYCLOAK_USER</code></td><td><code>admin</code></td><td><strong>HIGH</strong></td><td>Consider changing username from default</td></tr><tr><td><code>POSTGRES_PASSWORD</code></td><td><code>admin123#</code></td><td><strong>HIGH</strong></td><td>Change to complex password for main database</td></tr><tr><td><code>POSTGRES_BIDB_USER_PASSWORD</code></td><td><code>4aU94t8v+4d+W</code></td><td><strong>MEDIUM</strong></td><td>Change for BI database read-only user</td></tr><tr><td><code>REDIS_MASTER_PASSWORD</code></td><td><code>redis_master_broot</code></td><td><strong>HIGH</strong></td><td>Change for cache/session security</td></tr><tr><td><code>REDIS_REPLICA_PASSWORD</code></td><td><code>redis_replica_broot</code></td><td><strong>HIGH</strong></td><td>Change for cache replication security</td></tr><tr><td><code>OPENSEARCH_PASSWORD</code></td><td><code>Es3vweMuABJr</code></td><td><strong>HIGH</strong></td><td>Change for search engine security</td></tr><tr><td><code>PDI_DATAPIPES_PASSWORD</code></td><td><code>Welcome123!</code></td><td><strong>HIGH</strong></td><td>Change for data integration security</td></tr><tr><td><code>REFDATA_API_DB_PASSWORD</code></td><td><code>secret</code></td><td><strong>HIGH</strong></td><td>Change for reference data database</td></tr><tr><td><code>MDM_API_DB_PASSWORD</code></td><td><code>secret</code></td><td><strong>HIGH</strong></td><td>Change for master data management database</td></tr><tr><td><code>MONGODB_BKP_USER_PASSWORD</code></td><td><code>qdA9WMcw35WjzQl7ELhlHEF</code></td><td><strong>MEDIUM</strong></td><td>Change for backup user security</td></tr><tr><td><code>PDC_PDI_PRIVACY_ENCRYPTION_KEY</code></td><td><code>NUhHRk5VWkJBTkZTbWxDanR0enJLVU05UlF3M3RwWks=</code></td><td><strong>CRITICAL</strong></td><td>Generate new encryption key for data privacy</td></tr></tbody></table>

### 3. Environment

These settings should be customized based on your specific deployment environment and requirements.

#### Infrastructure Sizing

| Variable                                         | Default Value | Consideration                | Recommended Action                            |
| ------------------------------------------------ | ------------- | ---------------------------- | --------------------------------------------- |
| `OPENSEARCH_HEAP_SIZE`                           | `2048m`       | Memory allocation for search | Adjust based on data volume and available RAM |
| `PDC_WS_DEFAULT_OPS_JOBPOOLMINSIZE`              | `5`           | Worker pool sizing           | Adjust based on expected workload             |
| `PDC_WS_DEFAULT_OPS_JOBPOOLMAXSIZE`              | `5`           | Worker pool sizing           | Increase for high-volume environments         |
| `PDC_WS_DEFAULT_JDBC_DEFAULT_POOL_SIZE`          | `10`          | Database connections         | Adjust based on concurrent users              |
| `PDC_WS_DEFAULT_JDBC_DEFAULT_CONNECTION_TIMEOUT` | `60`          | Connection timeout           | Adjust for network latency                    |

#### Backup and Retention

| Variable                     | Default Value | Consideration               | Recommended Action                 |
| ---------------------------- | ------------- | --------------------------- | ---------------------------------- |
| `GLOBAL_PG_BACKUP_DAYS`      | `90`          | PostgreSQL backup retention | Align with data retention policies |
| `PDC_MONGODB_BACKUP_DAYS`    | `90`          | MongoDB backup retention    | Align with data retention policies |
| `PDC_OPENSEARCH_BACKUP_DAYS` | `90`          | OpenSearch backup retention | Align with data retention policies |

### 4. Integration

These settings enable PDC to integrate with external systems and should be configured based on your infrastructure.

#### Email and Notifications

| Variable               | Default Value                      | Configuration Need       | Action Required                         |
| ---------------------- | ---------------------------------- | ------------------------ | --------------------------------------- |
| `KEYCLOAK_SMTP`        | Pre-configured Office365           | Email server settings    | Update with your SMTP server details    |
| `KEYCLOAK_ADMIN_EMAIL` | `admin@hv.com`                     | Admin notification email | Change to your admin email              |
| `EMAIL_DOMAINS`        | `["hv.com", "hitachivantara.com"]` | Allowed email domains    | Update with your organization's domains |

#### External System Integration

| Variable                                | Default Value | Integration Type       | Configuration Required                      |
| --------------------------------------- | ------------- | ---------------------- | ------------------------------------------- |
| `ACCESS_REQUEST_SERVICE_PROVIDER_TOOL`  | (empty)       | Ticketing system       | Set to "JIRA" or "SERVICENOW" if using      |
| `ACCESS_REQUEST_SERVICE_JIRA_URL`       | (empty)       | JIRA integration       | Configure if using JIRA for access requests |
| `ACCESS_REQUEST_SERVICE_SERVICENOW_URL` | (empty)       | ServiceNow integration | Configure if using ServiceNow               |
| `LICENSING_SERVER_URL`                  | (empty)       | License server         | Configure for license validation            |
| `PDC_FE_DASHBOARD_URL`                  | (empty)       | External dashboard     | Set if integrating with external BI tools   |
| `PDC_FE_JUPYTER_URL`                    | (empty)       | Jupyter integration    | Set if providing Jupyter notebook access    |

### 5. Feature Enablement

These settings control which PDC features are enabled and how they behave.

#### Service Profiles

| Variable           | Default Value                                                                                                                                                             | Impact                                 | Customization Needed                                  |
| ------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------- | ----------------------------------------------------- |
| `COMPOSE_PROFILES` | `core,mongodb,collab,mdm,refdata,refdata-postgres,mdm-postgres,physical-assets,ml-models,bi-metadata,ai-ml,access-request,policy-hierarchy,metadata-rules,pdso,datapipes` | Determines which services are deployed | Remove unused services to reduce resource consumption |

#### Machine Learning Features

| Variable                              | Default Value     | Feature Impact         | Configuration Need                         |
| ------------------------------------- | ----------------- | ---------------------- | ------------------------------------------ |
| `ML_LLM_MODEL`                        | (empty)           | AI/ML capabilities     | Set if using external LLM services         |
| `ML_LLM_API_KEY`                      | (empty)           | AI/ML access           | Configure API key for external AI services |
| `ML_LLM_INFERENCE_BASE_URL`           | (empty)           | AI service endpoint    | Set inference service URL                  |
| `PDC_WS_DEFAULT_MAX_FILE_SIZE_FOR_ML` | `10485760` (10MB) | File processing limits | Adjust based on typical file sizes         |

#### Data Masking

| Variable                    | Default Value | Feature           | Configuration                     |
| --------------------------- | ------------- | ----------------- | --------------------------------- |
| `PDC_FE_DATAMASK_ENABLE`    | (empty)       | Data privacy      | Set to enable data masking in UI  |
| `PDC_FE_DATAMASK_CHAR`      | `*`           | Masking character | Customize masking character       |
| `PDC_FE_DATAMASK_TAG_NAMES` | `sensitive`   | Tag-based masking | Define which tags trigger masking |

### 6. Performance Tuning

These settings should be optimized based on your performance requirements and system resources.

#### Job Processing

| Variable                                     | Default Value | Performance Impact       | Tuning Guidance                                           |
| -------------------------------------------- | ------------- | ------------------------ | --------------------------------------------------------- |
| `RULES_MDS_ENTITY_UPDATE_BATCH_SIZE`         | `20000`       | Rule processing speed    | Increase for better throughput, decrease for lower memory |
| `RULES_MDS_ENTITY_SCROLL_BATCH_SIZE`         | `1000`        | Memory vs speed tradeoff | Adjust based on available memory                          |
| `PDC_WS_DEFAULT_SE_BATCH_SIZE`               | `50`          | Search indexing          | Increase for faster indexing                              |
| `PDC_WS_DEFAULT_DS_PERM_CREATION_BATCH_SIZE` | `4`           | Permission processing    | Increase cautiously to avoid overwhelming auth system     |

#### Timeout Settings

| Variable                        | Default Value    | Impact                  | Adjustment Guidelines                  |
| ------------------------------- | ---------------- | ----------------------- | -------------------------------------- |
| `PDC_FE_MONGODB_SOCKET_TIMEOUT` | `300000` (5 min) | Frontend responsiveness | Increase for slow networks             |
| `ML_BENTO_TIME_OUT`             | `6000`           | ML processing timeout   | Adjust based on ML workload complexity |

### 7. Logging & Monitoring

These settings control system observability and should be configured for production monitoring.

| Variable                       | Default Value | Purpose                | Recommended Setting                                            |
| ------------------------------ | ------------- | ---------------------- | -------------------------------------------------------------- |
| `PDC_FE_LOG_LEVEL`             | `info`        | Frontend logging       | Keep as `info` for production, use `debug` for troubleshooting |
| `RULES_ENGINE_LOG_LEVEL`       | `INFO`        | Rules engine logging   | Keep as `INFO` for production                                  |
| `RULES_ENABLE_INSTRUMENTATION` | `true`        | Performance monitoring | Keep enabled for production monitoring                         |

### 8. Network & SSL

These settings control how PDC handles network traffic and security.

| Variable                                                         | Default Value | Security Impact         | Configuration Need                        |
| ---------------------------------------------------------------- | ------------- | ----------------------- | ----------------------------------------- |
| `PDC_IN_TRAEFIK_ENTRYPOINTS_WEB_HTTP_REDIRECTIONS_ENTRYPOINT_TO` | `web-secure`  | HTTPS enforcement       | Keep for HTTPS redirect, clear to disable |
| `KEYCLOAK_PROTOCOL`                                              | `http`        | Authentication security | Change to `https` for production          |
| `RULES_IAM_ENABLE_SSL`                                           | `true`        | Rules engine security   | Keep enabled for production               |

### Configuration Priority Checklist

#### Before First Deployment:

1. ✅ Set `GLOBAL_SERVER_HOST_NAME`
2. ✅ Change all default passwords
3. ✅ Generate new `PDC_PDI_PRIVACY_ENCRYPTION_KEY`
4. ✅ Configure email settings
5. ✅ Review and customize `COMPOSE_PROFILES`

#### Before Production:

1. ✅ Size infrastructure components appropriately
2. ✅ Configure backup retention policies
3. ✅ Set up external system integrations
4. ✅ Configure SSL/HTTPS settings
5. ✅ Enable monitoring and logging

#### Performance Optimization:

1. ✅ Tune worker pool sizes
2. ✅ Adjust batch processing sizes
3. ✅ Configure timeout values
4. ✅ Optimize database connection pools

{% hint style="info" %}

### Security Best Practices

* **Use external secret management**: Consider using Docker secrets, Kubernetes secrets, or external secret management systems instead of plain text passwords
* **Implement least privilege**: Create dedicated database users with minimal required permissions
* **Regular rotation**: Establish a schedule for rotating passwords and encryption keys
* **Network security**: Use firewalls and network segmentation to protect database and internal service communications
* **SSL/TLS everywhere**: Enable SSL for all service-to-service communications
* **Audit logging**: Ensure comprehensive audit logging is enabled for compliance requirementsx
  {% endhint %}
  {% endtab %}

{% tab title="docker-compose.yaml" %}
{% hint style="info" %}

#### docker-compose.yaml

{% endhint %}

x

x

x

x
{% endtab %}

{% tab title="" %}
x

x

x

x

x
{% endtab %}
{% endtabs %}
{% endtab %}

{% tab title="Mailhog" %}
{% hint style="info" %}

#### Mailhog

Useful to have a mail server to test PDC notifications. &#x20;

MailHog is an email testing tool for developers:

* Configure your application to use MailHog for SMTP delivery
* View messages in the web UI, or retrieve them with the JSON API
* Optionally release messages to real SMTP servers for delivery
  {% endhint %}

1. Configure the Keycloak

{% embed url="<https://github.com/mailhog/MailHog>" %}

1. Create a folder for Mailhog and copy docker-compose.yml

```
```

2. Run the docker-compose.yml

x

{% hint style="info" %}

{% endhint %}

x

x
{% endtab %}
{% endtabs %}
