APIs

Run the following commands to verify system requirements.

# Check Python version (must be 3.8+)
python3 --version

# Check pip availability
pip --version

# Verify project directory access
ls -la /home/pdc/Projects/APIs/Key_Metrics/

Ensure the required packages are installed.

# Navigate to project directory
cd ~/Projects/APIs/Key_Metrics

# Install package in development mode (recommended)
pip install -e .

# Alternative: Using uv package manager
# uv sync

# Verify installation
extract-entities --help
bulk-update-api --help
bulk-update-opensearch --help

Generate a JWT Bearer-token.

Ensure the credentials used have the required permissions - james.lock has the admin / system_administrator role.

The JWT - Bearer Token - will enable authentication, while PDC will authorize OpenSearch API calls.

The bearer token is time limited so you may have to update the token.

curl -k -L -X POST 'https://pdc.pentaho.lab/keycloak/realms/pdc/protocol/openid-connect/token' \
  -H 'Content-Type: application/x-www-form-urlencoded' \
  --data-urlencode 'client_id=pdc-client' \
  --data-urlencode 'grant_type=password' \
  --data-urlencode '[email protected]' \
  --data-urlencode 'password=Welcome123!' | jq -r '.access_token'

This will return and output the JWT token as .access_token:

eyJhbGciOiJSUzI1NiIsInR5cCIgOiAiSldUIiwia2lkIiA6ICJoTTRKdGZzc0tnWUdXOUJPMEVFeGNISWdDZ0FsWUFnOENQS1JvcWYzbUVvIn0.eyJle

Create a config.py:

# Data Catalog API Configuration
API_CONFIG = {
    "base_url": "https://pdc.pentaho.lab",
    "auth_token": "your-bearer-token-here",
    "timeout": 30,
    "max_retries": 3
}

# OpenSearch Configuration
OPENSEARCH_CONFIG = {
    "url": "http://localhost:9200",
    "username": "admin",  # Add if authentication required
    "password": "Es3vweMuABJr", #located in the .env.default
    "verify_ssl": False
}

# File Paths
FILES = {
    "entity_extraction": "data/output/entity_extraction.csv",
    "calculated_input": "data/input/calculated_metrics.csv",
    "joined_output": "data/output/bulk_update_ready.csv",
}

# Processing Options
PROCESSING = {
    "batch_size": 50,
    "delay_between_batches": 1,  # seconds
    "dry_run": False  # Set True for testing
}

Run a diagnostic check.

# Run diagnostic check
python3 << 'EOF'
import sys
import importlib

required_modules = ['requests', 'pandas', 'urllib3', 'csv', 'json']
missing = []

for module in required_modules:
    try:
        importlib.import_module(module)
        print(f"✅ {module} - OK")
    except ImportError:
        print(f"❌ {module} - MISSING")
        missing.append(module)

if missing:
    print(f"\n⚠️  Install missing modules: pip install {' '.join(missing)}")
else:
    print("\n✅ All required modules installed!")
EOF

We're now ready to kick the project off .. !

PreviousMLflow NextCommon Questions

Last updated 2 months ago

Was this helpful?