Sensitivity Level & Trust Scores

Sensitivity Level & Trust Score

This hands-on workshop teaches you how to implement an automated solution for bulk updating Trust Scores and Sensitivity levels across your entire data catalog. You'll learn to extract entity data, calculate metrics, join data using Pentaho, and perform bulk updates efficiently.

By the end of this workshop, you will be able to:

Extract entity data from your data catalog with hierarchical names
Join calculated Trust Score and Sensitivity values using Pentaho Data Integration
Perform bulk updates across all schemas, tables, and columns
Validate and monitor the update process
Troubleshoot common issues

The solution consists of three main components:

Entity Extraction Tool - Extracts all entities with hierarchical names from OpenSearch
Pentaho Data Integration - Joins your calculated values with entity data
Bulk Update Tool - Updates Trust Score and Sensitivity via API or OpenSearch

Expected Outcomes

Automated bulk updates of Trust Score (0-100) and Sensitivity (HIGH/MEDIUM/LOW)
Support for schema, table, and column level updates
Validation and error reporting
Scalable solution for thousands of entities

Entity Extraction

The extraction process retrieves all entities from your data catalog with their hierarchical relationships intact.

What Gets Extracted:

Entity unique identifiers (UUIDs)
Entity types (SCHEMA/TABLE/COLUMN)
Hierarchical names for joining
Current Trust Score and Sensitivity values
Fully qualified domain names (FQDNs)

Learning Objectives:

Understand the entity extraction process
Extract all entities with hierarchical names from your data catalog
Analyze the extracted data structure
Prepare data for joining with calculated metrics

Run the extraction script:

# Change to Key_Metrics directory
cd
cd /home/pdc/Projects/APIs/Key_Metrics

# Extract all entities with full details
extract-entities \
    --opensearch-url http://localhost:9200 \
    --output data/output/entity_extraction.csv \
    --verbose

Expected output:

🔍 Extracting all entities from data catalog...
📊 Processing batch 1/25 (100 entities)...
📊 Processing batch 2/25 (100 entities)...
...
✅ Extracted 1,247 entities to data/output/entity_extraction.csv

📈 Entity Summary:
   - COLUMN: 892
   - TABLE: 343
   - SCHEMA: 12

Take a look at the entity_extraction.csv

Column

Description

Example

entity_id

Unique identifier

ef60e629-4261-4ce6-8635-961ca4b1b420

entity_type

Type of entity

SCHEMA, TABLE, COLUMN

entity_name

Entity's actual name

Employee

schema_name

Schema name for joining

HumanResources

table_name

Table name (empty for schemas)

Employee

column_name

Column name (empty for schemas/tables)

FirstName

fqdn

Internal fully qualified name

688cc7b9c5759eae5fdcba07/...

fqdn_display

Human-readable path

mssql:adventureworks2022/...

current_trust_score

Existing trust score

current_sensitivity

Existing sensitivity

HIGH

new_trust_score

For your calculated values

(empty)

new_sensitivity

For your calculated values

(empty)

Run a Data Quality Analysis

# Analyze extracted data
echo "=== Entity Type Distribution ==="
cut -d',' -f2 data/output/entity_extraction.csv | tail -n +2 | sort | uniq -c

echo -e "\n=== Top 5 Schemas ==="
cut -d',' -f4 data/output/entity_extraction.csv | tail -n +2 | sort | uniq -c | sort -rn | head -5

echo -e "\n=== Entities with Complete Hierarchy ==="
grep -v ',,,,' data/output/entity_extraction.csv | wc -l

echo -e "\n=== Sample Complete Entities ==="
grep -v ',,,,' data/output/entity_extraction.csv | head -3

# Count by schema
cut -d',' -f4 data/output/entity_extraction.csv | sort | uniq -c | head -10

# Find all schemas
awk -F',' '$2=="SCHEMA" {print $4}' data/output/entity_extraction.csv | sort | uniq

# Find all tables in HumanResources schema
awk -F',' '$2=="TABLE" && $4=="HumanResources" {print $5}' data/output/entity_extraction.csv

# Find all columns in Employee table
awk -F',' '$2=="COLUMN" && $4=="HumanResources" && $5=="Employee" {print $6}' data/output/entity_extraction.csv

3. Run

Run Complete Extraction

cd /home/pdc/Projects/APIs/Key_Metrics

# Extract all entities with full details
extract-entities --opensearch-url http://localhost:9200 --output data/output/entity_extraction.csv
Expected Output Messages
 Extracting all entities from data catalog...
✅ Extracted 1,247 entities to data/output/entity_extraction.csv
 Edit the 'new_trust_score' and 'new_sensitivity' columns with your desired values

 Entity Summary:
  COLUMN: 892
  SCHEMA: 12
  TABLE: 343
 Step 2: Analyze Extracted Data
View Sample Data
bash
# Check file was created
ls -la data/output/entity_extraction.csv

# View first 5 rows
head -5 data/output/entity_extraction.csv

# Count total entities
wc -l data/output/entity_extraction.csv
Understanding the CSV Structure
csv
entity_id,entity_type,entity_name,schema_name,table_name,column_name,fqdn,fqdn_display,current_trust_score,current_sensitivity,new_trust_score,new_sensitivity
Column Explanations:

entity_id: Unique ID needed for bulk updates
entity_type: SCHEMA/TABLE/COLUMN for filtering
entity_name: The actual name of the entity
schema_name: Schema name for joining
table_name: Table name for joining (empty for schema-level)
column_name: Column name for joining (empty for schema/table-level)
fqdn
: Internal fully qualified name
fqdn_display: Human-readable path
current_trust_score: Existing trust score (if any)
current_sensitivity: Existing sensitivity (if any)
new_trust_score: Empty - for your calculated values
new_sensitivity: Empty - for your calculated values
 Step 3: Data Quality Checks
Check for Missing Names
bash
# Count entities with missing schema names
grep -c ',,,' data/output/entity_extraction.csv

# View entities with complete hierarchical names
grep -v ',,,' data/output/entity_extraction.csv | head -10
Analyze Entity Distribution
bash
# Count by entity type
cut -d',' -f2 data/output/entity_extraction.csv | sort | uniq -c

# Count by schema
cut -d',' -f4 data/output/entity_extraction.csv | sort | uniq -c | head -10
Sample Analysis Commands
bash
# Find all schemas
awk -F',' '$2=="SCHEMA" {print $4}' data/output/entity_extraction.csv | sort | uniq

# Find all tables in HumanResources schema
awk -F',' '$2=="TABLE" && $4=="HumanResources" {print $5}' data/output/entity_extraction.csv

# Find all columns in Employee table
awk -F',' '$2=="COLUMN" && $4=="HumanResources" && $5=="Employee" {print $6}' data/output/entity_extraction.csv
 Step 4: Prepare Sample Data for Testing
Create Test Dataset
bash
# Extract first 50 entities for testing
head -51 data/output/entity_extraction.csv > data/output/test_entities.csv
Create Sample Join Data
bash
# Create a sample calculated metrics CSV for testing
cat > data/input/sample_calculated_metrics.csv << 'EOF'
schema_name,table_name,column_name,calculated_trust_score,calculated_sensitivity
HumanResources,,,75,HIGH
HumanResources,Employee,,85,MEDIUM
HumanResources,Employee,FirstName,90,LOW
HumanResources,Employee,LastName,90,LOW
HumanResources,Employee,EmailAddress,70,HIGH
Sales,,,80,MEDIUM
Sales,Customer,,85,LOW
Sales,Customer,CustomerID,95,LOW
EOF
 Step 5: Validate Extraction Results
Check Data Completeness
bash
# Verify all expected columns are present
head -1 data/output/entity_extraction.csv | tr ',' '\n' | nl

# Check for any parsing errors
grep -n 'ERROR\|WARN' data/output/entity_extraction.csv || echo "No errors found"
Verify Hierarchical Names
bash
# Check schema-level entities
echo "=== SCHEMA ENTITIES ==="
awk -F',' '$2=="SCHEMA" {print "Schema: " $4 " (ID: " $1 ")"}' data/output/entity_extraction.csv | head -5

# Check table-level entities  
echo "=== TABLE ENTITIES ==="
awk -F',' '$2=="TABLE" {print "Table: " $4 "." $5 " (ID: " $1 ")"}' data/output/entity_extraction.csv | head -5

# Check column-level entities
echo "=== COLUMN ENTITIES ==="
awk -F',' '$2=="COLUMN" {print "Column: " $4 "." $5 "." $6 " (ID: " $1 ")"}' data/output/entity_extraction.csv | head -5
 Common Issues & Solutions
Issue: "No entities found"
bash
# Check OpenSearch connection
curl -s "http://localhost:9200/pdc_entities/_search?size=1"

# Check if OpenSearch container is running
docker ps | grep opensearch
Issue: "Empty schema/table/column names"
This is normal for some entity types
Focus on entities with complete hierarchical names for joining
Issue: "Extraction takes too long"
bash
# Extract smaller subset for testing
extract-entities --opensearch-url http://localhost:9200 --output data/output/small_test.csv
# Then manually limit the query size in the extraction tool
 Section 3 Checklist
 Full entity extraction completed successfully
 CSV file created with expected structure
 Entity counts match expected numbers (schemas, tables, columns)
 Hierarchical names (schema_name, table_name, column_name) populated correctly
 Sample calculated metrics CSV created for testing
 Data quality checks completed
 No critical errors in extraction process
 Ready for Next Section
With entity extraction complete, you now have:

Complete entity inventory with IDs
Hierarchical names for joining
Current Trust Score and Sensitivity values
Test data for validation
Next: Section 4 - Pentaho Data Integration (Joining Process)

Data Integration

Learning Objectives

Set up Pentaho Data Integration transformation
Join extracted entities with calculated metrics
Handle different entity types (schema, table, column)
Output properly formatted CSV for bulk updates
Validate join results

 Step 1: Prepare Input Files

Verify Your Input Files

# Check entity extraction file
ls -la data/output/entity_extraction.csv
head -3 data/output/entity_extraction.csv

# Check your calculated metrics file (adjust path as needed)
ls -la data/input/your_calculated_metrics.csv
head -3 data/input/your_calculated_metrics.csv
Expected File Structures
Entity Extraction CSV:

csv
entity_id,entity_type,entity_name,schema_name,table_name,column_name,fqdn,fqdn_display,current_trust_score,current_sensitivity,new_trust_score,new_sensitivity
ef60e629-4261-4ce6-8635-961ca4b1b420,SCHEMA,HumanResources,HumanResources,,,688cc7b9c5759eae5fdcba07/AdventureWorks2022/HumanResources,mssql:adventureworks2022/AdventureWorks2022/HumanResources,48,HIGH,,
Your Calculated Metrics CSV (example format):

csv
schema_name,table_name,column_name,trust_score,sensitivity
HumanResources,,,75,HIGH
HumanResources,Employee,,85,MEDIUM
HumanResources,Employee,FirstName,90,LOW
️ Step 2: Create PDI Transformation
Open Pentaho Data Integration
Launch PDI/Spoon
Create New Transformation
Save as: trust_sensitivity_join.ktr
Add Input Steps
Step 2.1: Add CSV File Input for Entity Extraction

Drag "CSV file input" step to canvas
Configure:
Step name: Entity_Extraction_Input
Filename: /home/pdc/Projects/APIs/Key_Metrics/data/output/entity_extraction.csv
Delimiter: ,
Enclosure: "
Header: Y (Yes)
Get Fields to auto-detect structure
Preview to verify data loads correctly
Step 2.2: Add CSV File Input for Calculated Metrics

Drag another "CSV file input" step
Configure:
Step name: Calculated_Metrics_Input
Filename: /home/pdc/Projects/APIs/Key_Metrics/data/input/your_calculated_metrics.csv
Delimiter: ,
Header: Y (Yes)
Get Fields and Preview
 Step 3: Configure Joins by Entity Type
Understanding Join Logic
Schema-level: Join on schema_name only
Table-level: Join on schema_name + table_name
Column-level: Join on schema_name + table_name + column_name
Step 3.1: Add Filter Step for Entity Types
Drag "Filter rows" step
Connect from Entity_Extraction_Input
Configure three filters:
Filter for Schemas:

Step name: Schema_Filter
Condition: entity_type = "SCHEMA"
Filter for Tables:

Step name: Table_Filter
Condition: entity_type = "TABLE"
Filter for Columns:

Step name: Column_Filter
Condition: entity_type = "COLUMN"
Step 3.2: Create Separate Calculated Metrics Streams
For Schema-level Metrics:

Drag "Filter rows" connected to Calculated_Metrics_Input
Step name: Schema_Metrics_Filter
Condition: table_name IS NULL AND column_name IS NULL
For Table-level Metrics:

Drag "Filter rows"
Step name: Table_Metrics_Filter
Condition: table_name IS NOT NULL AND column_name IS NULL
For Column-level Metrics:

Drag "Filter rows"
Step name: Column_Metrics_Filter
Condition: column_name IS NOT NULL
 Step 4: Configure Stream Lookups
Step 4.1: Schema-level Join
Drag "Stream lookup" step
Connect main stream: Schema_Filter → Schema_Lookup
Connect lookup stream: Schema_Metrics_Filter → Schema_Lookup
Configure lookup:
Lookup step: Schema_Metrics_Filter
Keys: schema_name = schema_name
Retrieve fields: trust_score, sensitivity
Rename retrieved fields: new_trust_score, new_sensitivity
Step 4.2: Table-level Join
Drag "Stream lookup" step
Configure:
Main stream: Table_Filter
Lookup stream: Table_Metrics_Filter
Keys:
schema_name = schema_name
table_name = table_name
Retrieve: trust_score → new_trust_score, sensitivity → new_sensitivity
Step 4.3: Column-level Join
Drag "Stream lookup" step
Configure:
Main stream: Column_Filter
Lookup stream: Column_Metrics_Filter
Keys:
schema_name = schema_name
table_name = table_name
column_name = column_name
Retrieve: trust_score → new_trust_score, sensitivity → new_sensitivity
 Step 5: Combine Results
Step 5.1: Union All Streams
Drag "Append streams" step
Connect all lookup results:
Schema_Lookup → Append_All
Table_Lookup → Append_All
Column_Lookup → Append_All
Step 5.2: Clean and Format Output
Drag "Select values" step
Connect: Append_All → Select_Output_Fields
Select only needed fields:
entity_id
entity_type
schema_name
table_name
column_name
fqdn_display (rename to 
fqdn
)
new_trust_score (rename to trust_score)
new_sensitivity (rename to sensitivity)
Step 5.3: Filter Non-null Updates
Drag "Filter rows" step
Connect: Select_Output_Fields → Filter_Valid_Updates
Condition: trust_score IS NOT NULL OR sensitivity IS NOT NULL
 Step 6: Output Results
Step 6.1: Add CSV Output
Drag "Text file output" step
Connect: Filter_Valid_Updates → CSV_Output
Configure:
Filename: /home/pdc/Projects/APIs/Key_Metrics/data/output/bulk_update_ready.csv
Extension: 
csv
Separator: ,
Enclosure: "
Header: Y (Yes)
Format: Unix
Expected Output Format
csv
entity_id,entity_type,schema_name,table_name,column_name,fqdn,trust_score,sensitivity
ef60e629-4261-4ce6-8635-961ca4b1b420,SCHEMA,HumanResources,,,mssql:adventureworks2022/AdventureWorks2022/HumanResources,75,HIGH
table-id-123,TABLE,HumanResources,Employee,,mssql:adventureworks2022/AdventureWorks2022/HumanResources/Employee,85,MEDIUM
✅ Step 7: Test and Validate
Step 7.1: Run Transformation
Save transformation
Click "Run" button
Monitor execution logs
Check for errors
Step 7.2: Validate Output
bash
# Check output file was created
ls -la data/output/bulk_update_ready.csv

# Count joined records
wc -l data/output/bulk_update_ready.csv

# Preview results
head -10 data/output/bulk_update_ready.csv

# Check for entity types
cut -d',' -f2 data/output/bulk_update_ready.csv | sort | uniq -c
Step 7.3: Quality Checks
bash
# Verify all records have entity_id
awk -F',' '$1=="" {print "Missing entity_id on line " NR}' data/output/bulk_update_ready.csv

# Check trust_score range (should be 0-100)
awk -F',' '$7!="" && ($7<0 || $7>100) {print "Invalid trust_score: " $7 " on line " NR}' data/output/bulk_update_ready.csv

# Check sensitivity values (should be HIGH/MEDIUM/LOW)
awk -F',' '$8!="" && $8!="HIGH" && $8!="MEDIUM" && $8!="LOW" {print "Invalid sensitivity: " $8 " on line " NR}' data/output/bulk_update_ready.csv
 Common Issues & Solutions
Issue: "No join results"
Check field names match exactly between files
Verify data types (text vs numeric)
Check for extra spaces in field values
Use "Preview" extensively to debug
Issue: "Duplicate records"
Check join keys are unique in lookup stream
Add "Sort rows" before lookup if needed
Use "Group by" to deduplicate if necessary
Issue: "Missing entity_ids"
Verify entity extraction completed successfully
Check filter conditions aren't too restrictive
Ensure entity_type values match exactly
 Section 4 Checklist
 PDI transformation created and saved
 Both input files load correctly
 Entity type filters configured
 Stream lookups configured for all three levels
 Join keys properly mapped
 Output fields selected and renamed
 CSV output configured with correct path
 Transformation runs without errors
 Output file created with expected format
 Quality checks passed
 Join statistics look reasonable
 Ready for Next Section
You now have a properly formatted CSV file ready for bulk updates containing:

Entity IDs for API calls
Calculated Trust Scores and Sensitivity values
Proper field names and formats
Next: Section 5 - Bulk Updates (API and OpenSearch)

Bulk Updates (API and OpenSearch)

Learning Objectives

Understand the two update methods (API vs OpenSearch)
Configure authentication and connection settings
Perform bulk updates with validation
Monitor update progress and handle errors
Verify updates were applied successfully

Section 5: Bulk Updates (API and OpenSearch) markdown

 Learning Objectives

Understand the two update methods (API vs OpenSearch)
Configure authentication and connection settings
Perform bulk updates with validation
Monitor update progress and handle errors
Verify updates were applied successfully

⚖️ Step 1: Choose Update Method

API Method (Recommended)

Pros:

Uses official data catalog API
Respects business rules and validation
Maintains audit trails
Safer for production use

Cons:

Slower for large datasets (rate limited)
Requires valid API authentication

OpenSearch Direct Method

Pros:

Faster bulk operations
No API rate limits
Direct database updates

Cons:

Bypasses business logic
Requires OpenSearch access
Less audit trail
Higher risk if misconfigured

 Step 2: Configure Authentication

API Authentication Setup

# Verify your API token is configured
cat config.py | grep auth_token

# Test API connectivity
curl -H "Authorization: Bearer YOUR_TOKEN_HERE" \
     -H "Content-Type: application/json" \
     "[https://pdc.pentaho.lab/api/v1/entities/ef60e629-4261-4ce6-8635-961ca4b1b420](https://pdc.pentaho.lab/api/v1/entities/ef60e629-4261-4ce6-8635-961ca4b1b420)"
OpenSearch Authentication Setup
bash
# Test OpenSearch connectivity
curl -s "http://localhost:9200/_cluster/health"

# Test entity index access
curl -s "http://localhost:9200/pdc_entities/_search?size=1"
離 Step 3: Dry Run Testing
Step 3.1: Prepare Test Data
bash
# Create small test dataset (first 10 records)
head -11 data/output/bulk_update_ready.csv > data/output/test_update.csv

# Verify test data
cat data/output/test_update.csv
Step 3.2: API Dry Run
bash
cd /home/pdc/Projects/APIs/Key_Metrics

# Run API dry run (no actual updates)
bulk-update-api \
  --base-url https://pdc.pentaho.lab \
  --auth-token YOUR_TOKEN_HERE \
  --csv-file data/output/test_update.csv \
  --dry-run
Expected Dry Run Output
 Starting bulk update process...
 Loaded 10 entities from CSV
離 DRY RUN MODE - No actual updates will be made

✅ Entity ef60e629-4261-4ce6-8635-961ca4b1b420 (SCHEMA): Would update trust_score=75, sensitivity=HIGH
✅ Entity table-id-123 (TABLE): Would update trust_score=85, sensitivity=MEDIUM
⚠️  Entity column-id-456 (COLUMN): Current values match - no update needed

 Summary:
  - Total entities: 10
  - Would update: 8
  - Skipped (no changes): 2
  - Errors: 0
Step 3.3: OpenSearch Dry Run
bash
# Run OpenSearch dry run
bulk-update-opensearch \
  --opensearch-url http://localhost:9200 \
  --csv-file data/output/test_update.csv \
  --dry-run
 Step 4: Production Bulk Updates
Step 4.1: API Bulk Update (Recommended)
bash
# Full production update via API
bulk-update-api \
  --base-url https://pdc.pentaho.lab \
  --auth-token YOUR_TOKEN_HERE \
  --csv-file data/output/bulk_update_ready.csv \
  --batch-size 50 \
  --delay 1
API Update Parameters
--batch-size 50: Process 50 entities at a time
--delay 1: Wait 1 second between batches (rate limiting)
--max-retries 3: Retry failed requests up to 3 times
--timeout 30: Request timeout in seconds
Step 4.2: OpenSearch Bulk Update (Advanced)
bash
# Direct OpenSearch bulk update
bulk-update-opensearch \
  --opensearch-url http://localhost:9200 \
  --csv-file data/output/bulk_update_ready.csv \
  --batch-size 100
OpenSearch Parameters
--batch-size 100: Larger batches for faster processing
--refresh wait_for: Wait for index refresh after updates
 Step 5: Monitor Update Progress
Real-time Monitoring
The update tools provide real-time progress:

 Starting bulk update process...
 Loaded 1,247 entities from CSV

Batch 1/25 (50 entities):
✅ ef60e629-4261-4ce6-8635-961ca4b1b420: Updated trust_score=75, sensitivity=HIGH
✅ table-id-123: Updated trust_score=85, sensitivity=MEDIUM
❌ column-id-789: Error - Entity not found
⚠️  schema-id-456: Skipped - No changes needed

Progress: [████████████████████████████████████████] 100% (1,247/1,247)

 Final Summary:
  - Total entities: 1,247
  - Successfully updated: 1,198
  - Skipped (no changes): 35
  - Errors: 14
  - Duration: 4m 23s
Log Files
bash
# Check detailed logs
ls -la logs/
tail -f logs/bulk_update_$(date +%Y%m%d).log
 Step 6: Verify Updates
Step 6.1: API Verification
bash
# Verify specific entity was updated
curl -H "Authorization: Bearer YOUR_TOKEN_HERE" \
     "https://pdc.pentaho.lab/api/v1/entities/ef60e629-4261-4ce6-8635-961ca4b1b420" \
     | jq '.attributes.trustScore, .attributes.features.sensitivity'
Step 6.2: OpenSearch Verification
bash
# Check entity in OpenSearch
curl -s "http://localhost:9200/pdc_entities/_doc/ef60e629-4261-4ce6-8635-961ca4b1b420" \
     | jq '._source.attributes.trustScore, ._source.attributes.features.sensitivity'
Step 6.3: Bulk Verification Script
bash
# Create verification script
cat > verify_updates.sh << 'EOF'
#!/bin/bash
echo "Verifying random sample of updates..."

# Get 10 random entity IDs from update file
tail -n +2 data/output/bulk_update_ready.csv | \
  shuf -n 10 | \
  cut -d',' -f1 | \
  while read entity_id; do
    echo "Checking entity: $entity_id"
    curl -s -H "Authorization: Bearer YOUR_TOKEN_HERE" \
         "https://pdc.pentaho.lab/api/v1/entities/$entity_id" | \
         jq -r '.attributes.trustScore // "null", .attributes.features.sensitivity // "null"'
  done
EOF

chmod +x verify_updates.sh
./verify_updates.sh
 Step 7: Error Handling
Common Errors and Solutions
Error: "Entity not found"

bash
# Check if entity ID exists in OpenSearch
curl -s "http://localhost:9200/pdc_entities/_doc/ENTITY_ID_HERE"

# Solution: Remove invalid entity IDs from CSV
Error: "Authentication failed"

bash
# Test token validity
curl -H "Authorization: Bearer YOUR_TOKEN_HERE" \
     "https://pdc.pentaho.lab/api/v1/user/profile"

# Solution: Generate new API token
Error: "Rate limit exceeded"

bash
# Solution: Increase delay between requests
bulk-update-api --delay 2 --batch-size 25 ...
Error: "Invalid trust score value"

bash
# Check for invalid values in CSV
awk -F',' '$7!="" && ($7<0 || $7>100)' data/output/bulk_update_ready.csv

# Solution: Fix values in CSV and re-run
Error Recovery
bash
# Generate failed entities report
grep "❌" logs/bulk_update_$(date +%Y%m%d).log > failed_entities.txt

# Create retry CSV from failed entities
# (Manual process - extract entity IDs and create new CSV)
 Step 8: Performance Optimization
For Large Datasets (>10,000 entities)
API Method Optimization:

bash
# Use smaller batches with longer delays
bulk-update-api \
  --batch-size 25 \
  --delay 2 \
  --max-retries 5 \
  --timeout 60
OpenSearch Method Optimization:

bash
# Use larger batches for faster processing
bulk-update-opensearch \
  --batch-size 500 \
  --refresh false
Parallel Processing (Advanced)
bash
# Split large CSV into chunks
split -l 1000 data/output/bulk_update_ready.csv chunk_

# Process chunks in parallel (be careful with rate limits)
for chunk in chunk_*; do
  bulk-update-api --csv-file $chunk --batch-size 25 --delay 3 &
done
wait
 Section 5 Checklist
 Update method chosen (API or OpenSearch)
 Authentication configured and tested
 Dry run completed successfully on test data
 Production bulk update executed
 Update progress monitored
 Error handling implemented
 Sample of updates verified
 Performance optimized for dataset size
 Logs and reports generated
 Failed entities identified for retry
 Ready for Next Section
Your bulk updates are now complete! You should have:

Successfully updated Trust Scores and Sensitivity levels
Detailed logs of the update process
Verification of applied changes
Error reports for any failed updates
Next: Section 6 - Testing & Validation (Comprehensive Verification)

Learning Objectives

Perform comprehensive validation of bulk updates
Test data catalog UI to confirm changes are visible
Validate data integrity and consistency
Create automated validation scripts
Document test results and generate reports

Section 6: Testing & Validation markdown

Section 6: Testing & Validation



 Step 1: Pre-Update Baseline Capture

Create Baseline Report (if not done before updates)

# Extract current state for comparison
extract-entities --output data/validation/baseline_before_update.csv

# Count entities by current values
echo "=== BASELINE TRUST SCORE DISTRIBUTION ==="
awk -F',' 'NR>1 && $9!="" {print $9}' data/validation/baseline_before_update.csv | sort -n | uniq -c

echo "=== BASELINE SENSITIVITY DISTRIBUTION ==="
awk -F',' 'NR>1 && $10!="" {print $10}' data/validation/baseline_before_update.csv | sort | uniq -c
✅ Step 2: Post-Update Validation
Step 2.1: Extract Current State
bash
# Extract entities after updates
extract-entities --output data/validation/state_after_update.csv

# Compare record counts
echo "Before update: $(wc -l < data/validation/baseline_before_update.csv) entities"
echo "After update:  $(wc -l < data/validation/state_after_update.csv) entities"
Step 2.2: Validate Updated Entities
bash
# Create validation script
cat > validate_updates.sh << 'EOF'
#!/bin/bash

echo " Validating bulk updates..."

# Count successful updates
UPDATED_COUNT=$(join -t',' -1 1 -2 1 \
  <(tail -n +2 data/output/bulk_update_ready.csv | sort) \
  <(tail -n +2 data/validation/state_after_update.csv | sort) | \
  wc -l)

echo "✅ Validated $UPDATED_COUNT entity updates"

# Check for mismatched values
echo " Checking for value mismatches..."
join -t',' -1 1 -2 1 \
  <(tail -n +2 data/output/bulk_update_ready.csv | cut -d',' -f1,7,8 | sort) \
  <(tail -n +2 data/validation/state_after_update.csv | cut -d',' -f1,9,10 | sort) | \
  awk -F',' '$2!=$4 || $3!=$5 {print "❌ Mismatch for entity " $1 ": Expected " $2 "," $3 " Got " $4 "," $5}' | \
  head -10

echo "✅ Validation complete"
EOF

chmod +x validate_updates.sh
./validate_updates.sh
 Step 3: Data Catalog UI Testing
Step 3.1: Manual UI Verification
Open Data Catalog: https://pdc.pentaho.lab
Navigate to a test entity (use entity from your update CSV)
Check Key Metrics section:
Trust Score should match your CSV value
Sensitivity should match your CSV value
Test different entity types:
Schema-level entity
Table-level entity
Column-level entity
Step 3.2: Browser-based Validation Script
bash
# Create UI validation checklist
cat > ui_validation_checklist.md << 'EOF'
# UI Validation Checklist

## Test Entities (Sample from your CSV)
- [ ] Schema: HumanResources (Expected: Trust=75, Sensitivity=HIGH)
- [ ] Table: HumanResources.Employee (Expected: Trust=85, Sensitivity=MEDIUM)
- [ ] Column: HumanResources.Employee.FirstName (Expected: Trust=90, Sensitivity=LOW)

## Validation Steps for Each Entity
1. [ ] Navigate to entity page
2. [ ] Scroll to "Key Metrics" section
3. [ ] Verify Trust Score displays correctly
4. [ ] Verify Sensitivity level displays correctly
5. [ ] Check that values are not cached/outdated
6. [ ] Verify no UI errors or broken displays

## Cross-browser Testing
- [ ] Chrome/Chromium
- [ ] Firefox
- [ ] Edge (if available)

## Notes:
- Record any discrepancies
- Note any UI performance issues
- Check for proper formatting of values
EOF

echo " UI validation checklist created: ui_validation_checklist.md"
 Step 4: Data Integrity Validation
Step 4.1: Trust Score Range Validation
bash
# Check all trust scores are in valid range (0-100)
echo " Validating Trust Score ranges..."
awk -F',' 'NR>1 && $9!="" && ($9<0 || $9>100) {
  print "❌ Invalid Trust Score: " $9 " for entity " $1
}' data/validation/state_after_update.csv

# Count entities by trust score ranges
echo " Trust Score Distribution:"
awk -F',' 'NR>1 && $9!="" {
  if ($9 >= 0 && $9 <= 25) print "LOW"
  else if ($9 <= 50) print "MEDIUM-LOW" 
  else if ($9 <= 75) print "MEDIUM-HIGH"
  else print "HIGH"
}' data/validation/state_after_update.csv | sort | uniq -c
Step 4.2: Sensitivity Value Validation
bash
# Check all sensitivity values are valid
echo " Validating Sensitivity values..."
awk -F',' 'NR>1 && $10!="" && $10!="HIGH" && $10!="MEDIUM" && $10!="LOW" {
  print "❌ Invalid Sensitivity: " $10 " for entity " $1
}' data/validation/state_after_update.csv

# Count sensitivity distribution
echo " Sensitivity Distribution:"
awk -F',' 'NR>1 && $10!="" {print $10}' data/validation/state_after_update.csv | sort | uniq -c
Step 4.3: Entity Type Distribution
bash
# Verify updates across all entity types
echo " Updates by Entity Type:"
join -t',' -1 1 -2 1 \
  <(tail -n +2 data/output/bulk_update_ready.csv | cut -d',' -f1,2 | sort) \
  <(tail -n +2 data/validation/state_after_update.csv | cut -d',' -f1,2 | sort) | \
  cut -d',' -f2 | sort | uniq -c
 Step 5: API Consistency Testing
Step 5.1: API vs OpenSearch Consistency
bash
# Create API consistency test
cat > test_api_consistency.sh << 'EOF'
#!/bin/bash

echo " Testing API vs OpenSearch consistency..."

# Test 5 random entities
tail -n +2 data/validation/state_after_update.csv | shuf -n 5 | cut -d',' -f1 | while read entity_id; do
  echo "Testing entity: $entity_id"
  
  # Get from API
  API_RESULT=$(curl -s -H "Authorization: Bearer YOUR_TOKEN_HERE" \
    "https://pdc.pentaho.lab/api/v1/entities/$entity_id" | \
    jq -r '[.attributes.trustScore // "null", .attributes.features.sensitivity // "null"] | @csv')
  
  # Get from OpenSearch
  OS_RESULT=$(curl -s "http://localhost:9200/pdc_entities/_doc/$entity_id" | \
    jq -r '[._source.attributes.trustScore // "null", ._source.attributes.features.sensitivity // "null"] | @csv')
  
  if [ "$API_RESULT" = "$OS_RESULT" ]; then
    echo "✅ Consistent: $API_RESULT"
  else
    echo "❌ Inconsistent - API: $API_RESULT, OpenSearch: $OS_RESULT"
  fi
done
EOF

chmod +x test_api_consistency.sh
# Update YOUR_TOKEN_HERE before running
# ./test_api_consistency.sh
 Step 6: Performance Impact Testing
Step 6.1: Query Performance Test
bash
# Test search performance after updates
echo " Testing search performance..."

# Time entity searches
time curl -s "http://localhost:9200/pdc_entities/_search?q=attributes.trustScore:[70 TO 100]&size=100" > /dev/null
time curl -s "http://localhost:9200/pdc_entities/_search?q=attributes.features.sensitivity:HIGH&size=100" > /dev/null

# Test API response times
time curl -s -H "Authorization: Bearer YOUR_TOKEN_HERE" \
  "https://pdc.pentaho.lab/api/v1/entities?limit=100" > /dev/null
Step 6.2: Index Health Check
bash
# Check OpenSearch index health
curl -s "http://localhost:9200/_cluster/health/pdc_entities?pretty"

# Check index statistics
curl -s "http://localhost:9200/pdc_entities/_stats?pretty" | jq '.indices.pdc_entities.total.docs'
 Step 7: Generate Validation Report
Step 7.1: Comprehensive Validation Report
bash
# Create comprehensive validation report
cat > generate_validation_report.sh << 'EOF'
#!/bin/bash

REPORT_FILE="data/validation/validation_report_$(date +%Y%m%d_%H%M%S).md"

cat > $REPORT_FILE << 'REPORT'
# Trust Score & Sensitivity Update Validation Report

## Executive Summary
- **Update Date**: $(date)
- **Total Entities Processed**: $(wc -l < data/output/bulk_update_ready.csv)
- **Validation Status**: ✅ PASSED / ❌ FAILED

## Update Statistics
REPORT

echo "### Entities by Type" >> $REPORT_FILE
awk -F',' 'NR>1 {print $2}' data/output/bulk_update_ready.csv | sort | uniq -c | \
  awk '{print "- " $2 ": " $1 " entities"}' >> $REPORT_FILE

echo "" >> $REPORT_FILE
echo "### Trust Score Distribution" >> $REPORT_FILE
awk -F',' 'NR>1 && $9!="" {
  if ($9 <= 25) print "LOW (0-25)"
  else if ($9 <= 50) print "MEDIUM-LOW (26-50)"
  else if ($9 <= 75) print "MEDIUM-HIGH (51-75)"
  else print "HIGH (76-100)"
}' data/validation/state_after_update.csv | sort | uniq -c | \
  awk '{print "- " $2 " " $3 ": " $1 " entities"}' >> $REPORT_FILE

echo "" >> $REPORT_FILE
echo "### Sensitivity Distribution" >> $REPORT_FILE
awk -F',' 'NR>1 && $10!="" {print $10}' data/validation/state_after_update.csv | sort | uniq -c | \
  awk '{print "- " $2 ": " $1 " entities"}' >> $REPORT_FILE

echo "" >> $REPORT_FILE
echo "## Validation Results" >> $REPORT_FILE
echo "- [ ] All trust scores in valid range (0-100)" >> $REPORT_FILE
echo "- [ ] All sensitivity values valid (HIGH/MEDIUM/LOW)" >> $REPORT_FILE
echo "- [ ] UI displays updated values correctly" >> $REPORT_FILE
echo "- [ ] API and OpenSearch data consistent" >> $REPORT_FILE
echo "- [ ] No performance degradation observed" >> $REPORT_FILE

echo " Validation report generated: $REPORT_FILE"
EOF

chmod +x generate_validation_report.sh
./generate_validation_report.sh
 Step 8: Rollback Preparation
Step 8.1: Create Rollback Data
bash
# Prepare rollback CSV if needed
echo " Preparing rollback data..."

# Extract original values for updated entities
join -t',' -1 1 -2 1 \
  <(tail -n +2 data/output/bulk_update_ready.csv | cut -d',' -f1 | sort) \
  <(tail -n +2 data/validation/baseline_before_update.csv | sort) | \
  awk -F',' '{print $1 "," $2 "," $3 "," $4 "," $5 "," $6 "," $9 "," $10}' \
  > data/validation/rollback_data.csv

# Add header
sed -i '1i entity_id,entity_type,schema_name,table_name,column_name,fqdn,trust_score,sensitivity' \
  data/validation/rollback_data.csv

echo " Rollback data prepared: data/validation/rollback_data.csv"
 Section 6 Checklist
 Baseline data captured before updates
 Post-update state extracted and compared
 Updated entities validated against expected values
 Data catalog UI manually tested for sample entities
 Trust score ranges validated (0-100)
 Sensitivity values validated (HIGH/MEDIUM/LOW)
 Entity type distribution verified
 API and OpenSearch consistency tested
 Search performance impact assessed
 OpenSearch index health verified
 Comprehensive validation report generated
 Rollback data prepared (if needed)
 All validation tests passed
 Ready for Next Section
Your validation is complete! You should have:

Confirmed all updates were applied correctly
Verified data integrity and consistency
Tested UI functionality
Generated comprehensive reports
Prepared for any necessary rollbacks
Next: Section 7 - Troubleshooting (Common Issues and Solutions)


Ready for the final Section 7: Troubleshooting?
Feedback submitted
Close
2 days ago
20 hours ago
2 hours ago
16 minutes ago
a minute ago
Press Enter again to interrupt and send a new message



Write
Chat
ChatWriteLegacy

Claude Sonnet 4

Section 7: Troubleshooting (Common Issues and Solutions) markdown

Section 7: Troubleshooting

 Learning Objectives

Identify and resolve common issues during implementation
Understand error messages and their solutions
Implement monitoring and alerting
Create maintenance procedures
Establish best practices for ongoing operations

 Common Issues & Solutions

 Environment Setup Issues

Issue: "Command not found: extract-entities"

# Diagnosis
which extract-entities
echo $PATH

# Solutions
# 1. Reinstall package
pip uninstall key-metrics-updater
pip install -e .

# 2. Check if installed in user space
pip show key-metrics-updater

# 3. Use full path if needed
python -m key_metrics.cli extract-entities --help
Issue: "Permission denied accessing files"

bash
# Diagnosis
ls -la /home/pdc/Projects/APIs/Key_Metrics/data/

# Solutions
sudo chown -R $USER:$USER /home/pdc/Projects/APIs/Key_Metrics/
chmod -R 755 /home/pdc/Projects/APIs/Key_Metrics/
mkdir -p data/{input,output,validation,backup}
Issue: "Python module import errors"

bash
# Diagnosis
python -c "import requests, pandas, urllib3"

# Solutions
pip install requests pandas urllib3
# Or if using conda:
conda install requests pandas urllib3
 Entity Extraction Issues
Issue: "No entities found in OpenSearch"

bash
# Diagnosis
curl -s "http://localhost:9200/_cat/indices?v" | grep pdc_entities
curl -s "http://localhost:9200/pdc_entities/_count"

# Solutions
# 1. Check OpenSearch container
docker ps | grep opensearch
docker start pdc-opensearch-1

# 2. Verify index name
curl -s "http://localhost:9200/_cat/indices?v"

# 3. Check network connectivity
docker network ls
docker inspect pdc-opensearch-1 | grep NetworkMode
Issue: "Extraction returns empty schema/table/column names"

bash
# Diagnosis - Check FQDN parsing
head -5 data/output/entity_extraction.csv | cut -d',' -f7

# This is normal for some entities
# Focus on entities with complete hierarchical names
grep -v ',,,,' data/output/entity_extraction.csv | head -5
Issue: "Extraction is very slow"

bash
# Solutions
# 1. Limit extraction for testing
extract-entities --output test.csv --max-entities 100

# 2. Use scroll API more efficiently
# (Modify extract.py to use smaller scroll sizes)

# 3. Extract specific entity types only
# (Modify extract.py to filter by entity_type)
 Pentaho Data Integration Issues
Issue: "CSV input step shows no data"

# Diagnosis in PDI
1. Check file path is absolute
2. Verify delimiter and enclosure settings
3. Use "Preview" button to test
4. Check file permissions

# Solutions
- Use full absolute paths: /home/pdc/Projects/APIs/Key_Metrics/data/...
- Verify CSV format with: head -3 your_file.csv
- Check for BOM or encoding issues: file your_file.csv
Issue: "Join produces no results"

# Diagnosis
1. Preview both input streams before join
2. Check field names match exactly
3. Verify data types are compatible
4. Look for extra spaces or special characters

# Solutions
- Add "Trim spaces" step before join
- Use "String operations" to clean field values
- Add debug output steps to see intermediate data
Issue: "Duplicate records in output"

# Diagnosis
- Check if lookup stream has duplicate keys
- Verify join conditions are correct

# Solutions
- Add "Sort rows" step before lookup
- Use "Group by" to deduplicate lookup stream
- Add "Unique rows" step after join
 Bulk Update Issues
Issue: "Authentication failed - 401 Unauthorized"

bash
# Diagnosis
curl -H "Authorization: Bearer YOUR_TOKEN" \
     "https://pdc.pentaho.lab/api/v1/user/profile"

# Solutions
# 1. Generate new API token
# 2. Check token format (no extra spaces/characters)
# 3. Verify token permissions
# 4. Check if token expired
Issue: "Rate limit exceeded - 429 Too Many Requests"

bash
# Solutions
# 1. Increase delay between requests
bulk-update-api --delay 3 --batch-size 20

# 2. Use smaller batch sizes
bulk-update-api --batch-size 10

# 3. Implement exponential backoff
# (Modify api_updater.py to add retry logic)
Issue: "Entity not found - 404 errors"

bash
# Diagnosis
# Check if entity IDs are valid
curl -s "http://localhost:9200/pdc_entities/_doc/ENTITY_ID_HERE"

# Solutions
# 1. Re-extract entities to get current IDs
extract-entities --output fresh_entities.csv

# 2. Filter out invalid IDs before update
grep -v "404" logs/bulk_update.log | grep "entity_id" > valid_entities.txt
Issue: "Trust Score update fails but Sensitivity succeeds"

bash
# This is a known issue - Trust Score has different storage/permissions

# Solutions
# 1. Update Trust Score and Sensitivity separately
bulk-update-api --csv-file trust_scores_only.csv
bulk-update-api --csv-file sensitivity_only.csv

# 2. Use OpenSearch direct method for Trust Score
bulk-update-opensearch --csv-file trust_scores.csv
 UI Display Issues
Issue: "Updated values not showing in UI"

# Diagnosis
1. Check browser cache (Ctrl+F5 to hard refresh)
2. Verify updates actually applied (check API/OpenSearch)
3. Check if UI is caching data

# Solutions
- Clear browser cache completely
- Try incognito/private browsing mode
- Wait 5-10 minutes for cache refresh
- Check if CDN caching is involved
Issue: "Values show as 'null' or empty in UI"

bash
# Diagnosis
curl -s -H "Authorization: Bearer YOUR_TOKEN" \
     "https://pdc.pentaho.lab/api/v1/entities/ENTITY_ID" | \
     jq '.attributes.trustScore, .attributes.features.sensitivity'

# Solutions
- Verify correct field paths in update payload
- Check if entity type supports these attributes
- Ensure values are not being overwritten by business rules
 Diagnostic Tools
Create Diagnostic Script
bash
cat > diagnose_system.sh << 'EOF'
#!/bin/bash

echo " System Diagnostics for Key Metrics Updater"
echo "============================================="

echo " Package Installation:"
pip show key-metrics-updater || echo "❌ Package not installed"

echo " Python Environment:"
python --version
which python

echo " File Permissions:"
ls -la /home/pdc/Projects/APIs/Key_Metrics/data/ | head -5

echo " OpenSearch Connectivity:"
curl -s "http://localhost:9200/_cluster/health" | jq '.status' || echo "❌ OpenSearch not accessible"

echo " API Connectivity:"
curl -s -H "Authorization: Bearer YOUR_TOKEN" \
     "https://pdc.pentaho.lab/api/v1/user/profile" | jq '.username' || echo "❌ API not accessible"

echo " Data Files:"
echo "Entity extraction: $(ls -la data/output/entity_extraction.csv 2>/dev/null | wc -l) files"
echo "Update ready: $(ls -la data/output/bulk_update_ready.csv 2>/dev/null | wc -l) files"

echo " Docker Containers:"
docker ps | grep -E "(opensearch|mongodb)" || echo "❌ No relevant containers running"

echo "✅ Diagnostics complete"
EOF

chmod +x diagnose_system.sh
# Update YOUR_TOKEN before running
 Monitoring & Alerting
Create Monitoring Script
bash
cat > monitor_updates.sh << 'EOF'
#!/bin/bash

LOG_FILE="logs/monitoring_$(date +%Y%m%d).log"
mkdir -p logs

echo "$(date): Starting monitoring check" >> $LOG_FILE

# Check entity counts
CURRENT_COUNT=$(curl -s "http://localhost:9200/pdc_entities/_count" | jq '.count')
echo "$(date): Total entities: $CURRENT_COUNT" >> $LOG_FILE

# Check for entities with trust scores
TRUST_COUNT=$(curl -s "http://localhost:9200/pdc_entities/_search" \
  -H "Content-Type: application/json" \
  -d '{"query":{"exists":{"field":"attributes.trustScore"}},"size":0}' | \
  jq '.hits.total.value')
echo "$(date): Entities with trust scores: $TRUST_COUNT" >> $LOG_FILE

# Check for entities with sensitivity
SENS_COUNT=$(curl -s "http://localhost:9200/pdc_entities/_search" \
  -H "Content-Type: application/json" \
  -d '{"query":{"exists":{"field":"attributes.features.sensitivity"}},"size":0}' | \
  jq '.hits.total.value')
echo "$(date): Entities with sensitivity: $SENS_COUNT" >> $LOG_FILE

# Alert if counts drop significantly
if [ "$TRUST_COUNT" -lt 100 ]; then
  echo "$(date): ⚠️ WARNING: Low trust score count: $TRUST_COUNT" >> $LOG_FILE
fi

echo "$(date): Monitoring check complete" >> $LOG_FILE
EOF

chmod +x monitor_updates.sh

# Add to crontab for regular monitoring
# crontab -e
# Add: 0 */6 * * * /home/pdc/Projects/APIs/Key_Metrics/monitor_updates.sh
 Maintenance Procedures
Regular Maintenance Checklist
bash
cat > maintenance_checklist.md << 'EOF'
# Monthly Maintenance Checklist

## Data Quality Checks
- [ ] Run entity extraction and compare counts with previous month
- [ ] Check for entities with invalid trust scores (outside 0-100)
- [ ] Verify sensitivity values are only HIGH/MEDIUM/LOW
- [ ] Review entities with null/empty values

## System Health
- [ ] Check OpenSearch cluster health
- [ ] Verify API authentication tokens are valid
- [ ] Review error logs for patterns
- [ ] Test sample updates with dry-run mode

## Performance Monitoring
- [ ] Check extraction time trends
- [ ] Monitor bulk update performance
- [ ] Review API response times
- [ ] Check OpenSearch index size and performance

## Security Review
- [ ] Rotate API tokens if needed
- [ ] Review access logs
- [ ] Check for unauthorized access attempts
- [ ] Verify backup procedures

## Documentation Updates
- [ ] Update any changed API endpoints
- [ ] Review and update troubleshooting guides
- [ ] Update entity count statistics
- [ ] Document any new issues discovered
EOF
 Emergency Procedures
Rollback Procedure
bash
cat > emergency_rollback.sh << 'EOF'
#!/bin/bash

echo " EMERGENCY ROLLBACK PROCEDURE"
echo "==============================="

if [ ! -f "data/validation/rollback_data.csv" ]; then
  echo "❌ No rollback data found! Cannot proceed."
  exit 1
fi

echo " Rollback will affect $(wc -l < data/validation/rollback_data.csv) entities"
read -p "Continue with rollback? (yes/no): " confirm

if [ "$confirm" = "yes" ]; then
  echo " Starting rollback..."
  
  # Create backup of current state first
  extract-entities --output "data/backup/pre_rollback_$(date +%Y%m%d_%H%M%S).csv"
  
  # Execute rollback
  bulk-update-api \
    --base-url https://pdc.pentaho.lab \
    --auth-token YOUR_TOKEN_HERE \
    --csv-file data/validation/rollback_data.csv \
    --batch-size 25 \
    --delay 2
  
  echo "✅ Rollback complete"
else
  echo "❌ Rollback cancelled"
fi
EOF

chmod +x emergency_rollback.sh
 Support Contacts
markdown
## When to Escalate Issues

### Level 1: Self-Service (Use this guide)
- Environment setup problems
- CSV formatting issues
- Basic authentication problems
- Standard error messages covered in this guide

### Level 2: System Administrator
- OpenSearch cluster issues
- Docker container problems
- Network connectivity issues
- Permission and security problems

### Level 3: Data Catalog Team
- API endpoint changes
- Business rule modifications
- Data model changes
- Performance degradation

### Level 4: Emergency
- Data corruption
- System-wide outages
- Security breaches
- Mass data loss

## Emergency Contact Information
- **System Admin**: [Your contact info]
- **Data Catalog Team**: [Your contact info]
- **Emergency Hotline**: [Your contact info]
 Section 7 Checklist
 Common issues and solutions documented
 Diagnostic tools created and tested
 Monitoring scripts implemented
 Maintenance procedures established
 Emergency rollback procedure prepared
 Support escalation paths defined
 All troubleshooting scripts tested
 Documentation complete and accessible
 Workshop Complete!
Congratulations! You have successfully completed the Trust Score & Sensitivity Bulk Update Workshop. You now have:

✅ Complete automated solution for bulk updates
✅ Comprehensive troubleshooting guide for common issues
✅ Monitoring and maintenance procedures for ongoing operations
✅ Emergency procedures for critical situations
✅ Production-ready tools for your data catalog

 Next Steps
Implement regular monitoring using the provided scripts
Schedule monthly maintenance using the checklist
Train team members on the procedures
Customize scripts for your specific environment
Document any new issues you encounter
 Additional Resources
Project README: 
README.md
Example configurations: /home/pdc/Projects/APIs/Key_Metrics/examples/
Source code: /home/pdc/Projects/APIs/Key_Metrics/src/key_metrics/
Happy automating! 

 Workshop Guide Complete!
Perfect! I've created a comprehensive 7-section workshop guide that walks through every detailed step of implementing your Trust Score and Sensitivity bulk update project. Each section is designed to be copy-and-paste friendly for your documentation.

 Workshop Sections Created:
Introduction & Overview - Objectives, prerequisites, architecture, and timing
Environment Setup - Installation, credentials, configuration, and verification
Entity Extraction - Understanding structure, extraction process, and data analysis
Pentaho Data Integration - Complete PDI transformation with joins for all entity types
Bulk Updates - API and OpenSearch methods with monitoring and error handling
Testing & Validation - Comprehensive verification, UI testing, and reporting
Troubleshooting - Common issues, diagnostic tools, maintenance, and emergency procedures
 Key Features of the Workshop:
Step-by-step instructions with code examples
Validation scripts and quality checks
Troubleshooting guides for common issues
Monitoring and maintenance procedures
Emergency rollback procedures
Production-ready tools and best practices
⏱️ Workshop Duration: 2-3 hours total
Each section includes:

✅ Detailed checklists to track progress
 Diagnostic commands for troubleshooting
 Validation scripts for quality assurance
 Common issues and their solutions
The workshop is now ready for implementation and can be used to train team members on the complete process of automating Trust Score and Sensitivity updates across your entire data catalog!

You can copy and paste each section as needed for your documentation or training materials. The guide provides everything needed to successfully implement and maintain the bulk update solution in your production environment.

PreviousAPIs NextData Discovery

Last updated 2 days ago

Was this helpful?