# Sensitivity Level & Trust Scores {% hint style="success" %} #### Sensitivity Level & Trust Score This hands-on workshop teaches you how to implement an automated solution for bulk updating Trust Scores and Sensitivity levels across your entire data catalog. You'll learn to extract entity data, calculate metrics, join data using Pentaho, and perform bulk updates efficiently. By the end of this workshop, you will be able to: * Extract entity data from your data catalog with hierarchical names * Join calculated Trust Score and Sensitivity values using Pentaho Data Integration * Perform bulk updates across all schemas, tables, and columns * Validate and monitor the update process * Troubleshoot common issues {% endhint %}

{% hint style="info" %} The solution consists of three main components: 1. **Entity Extraction Tool** - Extracts all entities with hierarchical names from OpenSearch 2. **Pentaho Data Integration** - Joins your calculated values with entity data 3. **Bulk Update Tool** - Updates Trust Score and Sensitivity via API or OpenSearch #### Expected Outcomes * Automated bulk updates of Trust Score (0-100) and Sensitivity (HIGH/MEDIUM/LOW) * Support for schema, table, and column level updates * Validation and error reporting * Scalable solution for thousands of entities {% endhint %} *** {% hint style="info" %} x {% endhint %} {% tabs %} {% tab title="Entity Extraction" %} {% hint style="info" %} #### Entity Extraction The extraction process retrieves all entities from your data catalog with their hierarchical relationships intact. **What Gets Extracted**: * Entity unique identifiers (UUIDs) * Entity types (SCHEMA/TABLE/COLUMN) * Hierarchical names for joining * Current Trust Score and Sensitivity values * Fully qualified domain names (FQDNs) **Learning Objectives:** * Understand the entity extraction process * Extract all entities with hierarchical names from your data catalog * Analyze the extracted data structure * Prepare data for joining with calculated metrics {% endhint %} 1. Run the extraction script: ```bash # Change to Key_Metrics directory cd cd /home/pdc/Projects/APIs/Key_Metrics # Extract all entities with full details extract-entities \ --opensearch-url http://localhost:9200 \ --output data/output/entity_extraction.csv \ --verbose ``` Expected output: ``` 🔍 Extracting all entities from data catalog... 📊 Processing batch 1/25 (100 entities)... 📊 Processing batch 2/25 (100 entities)... ... ✅ Extracted 1,247 entities to data/output/entity_extraction.csv 📈 Entity Summary: - COLUMN: 892 - TABLE: 343 - SCHEMA: 12 ``` 2. Take a look at the entity\_extraction.csv

Column	Description	Example
entity_id	Unique identifier	ef60e629-4261-4ce6-8635-961ca4b1b420
entity_type	Type of entity	SCHEMA, TABLE, COLUMN
entity_name	Entity's actual name	Employee
schema_name	Schema name for joining	HumanResources
table_name	Table name (empty for schemas)	Employee
column_name	Column name (empty for schemas/tables)	FirstName
fqdn	Internal fully qualified name	688cc7b9c5759eae5fdcba07/...
fqdn_display	Human-readable path	mssql:adventureworks2022/...
current_trust_score	Existing trust score	48
current_sensitivity	Existing sensitivity	HIGH
new_trust_score	For your calculated values	(empty)
new_sensitivity	For your calculated values	(empty)

3. Run a Data Quality Analysis ```bash # Analyze extracted data echo "=== Entity Type Distribution ===" cut -d',' -f2 data/output/entity_extraction.csv | tail -n +2 | sort | uniq -c echo -e "\n=== Top 5 Schemas ===" cut -d',' -f4 data/output/entity_extraction.csv | tail -n +2 | sort | uniq -c | sort -rn | head -5 echo -e "\n=== Entities with Complete Hierarchy ===" grep -v ',,,,' data/output/entity_extraction.csv | wc -l echo -e "\n=== Sample Complete Entities ===" grep -v ',,,,' data/output/entity_extraction.csv | head -3 # Count by schema cut -d',' -f4 data/output/entity_extraction.csv | sort | uniq -c | head -10 # Find all schemas awk -F',' '$2=="SCHEMA" {print $4}' data/output/entity_extraction.csv | sort | uniq # Find all tables in HumanResources schema awk -F',' '$2=="TABLE" && $4=="HumanResources" {print $5}' data/output/entity_extraction.csv # Find all columns in Employee table awk -F',' '$2=="COLUMN" && $4=="HumanResources" && $5=="Employee" {print $6}' data/output/entity_extraction.csv ``` 3. x 4. x 5. x #### 3. Run **Run Complete Extraction**

cd /home/pdc/Projects/APIs/Key_Metrics

# Extract all entities with full details
extract-entities --opensearch-url http://localhost:9200 --output data/output/entity_extraction.csv
Expected Output Messages
 Extracting all entities from data catalog...
✅ Extracted 1,247 entities to data/output/entity_extraction.csv
 Edit the 'new_trust_score' and 'new_sensitivity' columns with your desired values

 Entity Summary:
  COLUMN: 892
  SCHEMA: 12
  TABLE: 343
 Step 2: Analyze Extracted Data
View Sample Data
bash
# Check file was created
ls -la data/output/entity_extraction.csv

# View first 5 rows
head -5 data/output/entity_extraction.csv

# Count total entities
wc -l data/output/entity_extraction.csv
Understanding the CSV Structure
csv
entity_id,entity_type,entity_name,schema_name,table_name,column_name,fqdn,fqdn_display,current_trust_score,current_sensitivity,new_trust_score,new_sensitivity
Column Explanations:

entity_id: Unique ID needed for bulk updates
entity_type: SCHEMA/TABLE/COLUMN for filtering
entity_name: The actual name of the entity
schema_name: Schema name for joining
table_name: Table name for joining (empty for schema-level)
column_name: Column name for joining (empty for schema/table-level)
fqdn
: Internal fully qualified name
fqdn_display: Human-readable path
current_trust_score: Existing trust score (if any)
current_sensitivity: Existing sensitivity (if any)
new_trust_score: Empty - for your calculated values
new_sensitivity: Empty - for your calculated values
 Step 3: Data Quality Checks
Check for Missing Names
bash
# Count entities with missing schema names
grep -c ',,,' data/output/entity_extraction.csv

# View entities with complete hierarchical names
grep -v ',,,' data/output/entity_extraction.csv | head -10
Analyze Entity Distribution
bash
# Count by entity type
cut -d',' -f2 data/output/entity_extraction.csv | sort | uniq -c

# Count by schema
cut -d',' -f4 data/output/entity_extraction.csv | sort | uniq -c | head -10
Sample Analysis Commands
bash
# Find all schemas
awk -F',' '$2=="SCHEMA" {print $4}' data/output/entity_extraction.csv | sort | uniq

# Find all tables in HumanResources schema
awk -F',' '$2=="TABLE" && $4=="HumanResources" {print $5}' data/output/entity_extraction.csv

# Find all columns in Employee table
awk -F',' '$2=="COLUMN" && $4=="HumanResources" && $5=="Employee" {print $6}' data/output/entity_extraction.csv
 Step 4: Prepare Sample Data for Testing
Create Test Dataset
bash
# Extract first 50 entities for testing
head -51 data/output/entity_extraction.csv > data/output/test_entities.csv
Create Sample Join Data
bash
# Create a sample calculated metrics CSV for testing
cat > data/input/sample_calculated_metrics.csv << 'EOF'
schema_name,table_name,column_name,calculated_trust_score,calculated_sensitivity
HumanResources,,,75,HIGH
HumanResources,Employee,,85,MEDIUM
HumanResources,Employee,FirstName,90,LOW
HumanResources,Employee,LastName,90,LOW
HumanResources,Employee,EmailAddress,70,HIGH
Sales,,,80,MEDIUM
Sales,Customer,,85,LOW
Sales,Customer,CustomerID,95,LOW
EOF
 Step 5: Validate Extraction Results
Check Data Completeness
bash
# Verify all expected columns are present
head -1 data/output/entity_extraction.csv | tr ',' '\n' | nl

# Check for any parsing errors
grep -n 'ERROR\|WARN' data/output/entity_extraction.csv || echo "No errors found"
Verify Hierarchical Names
bash
# Check schema-level entities
echo "=== SCHEMA ENTITIES ==="
awk -F',' '$2=="SCHEMA" {print "Schema: " $4 " (ID: " $1 ")"}' data/output/entity_extraction.csv | head -5

# Check table-level entities  
echo "=== TABLE ENTITIES ==="
awk -F',' '$2=="TABLE" {print "Table: " $4 "." $5 " (ID: " $1 ")"}' data/output/entity_extraction.csv | head -5

# Check column-level entities
echo "=== COLUMN ENTITIES ==="
awk -F',' '$2=="COLUMN" {print "Column: " $4 "." $5 "." $6 " (ID: " $1 ")"}' data/output/entity_extraction.csv | head -5
 Common Issues & Solutions
Issue: "No entities found"
bash
# Check OpenSearch connection
curl -s "http://localhost:9200/pdc_entities/_search?size=1"

# Check if OpenSearch container is running
docker ps | grep opensearch
Issue: "Empty schema/table/column names"
This is normal for some entity types
Focus on entities with complete hierarchical names for joining
Issue: "Extraction takes too long"
bash
# Extract smaller subset for testing
extract-entities --opensearch-url http://localhost:9200 --output data/output/small_test.csv
# Then manually limit the query size in the extraction tool
 Section 3 Checklist
 Full entity extraction completed successfully
 CSV file created with expected structure
 Entity counts match expected numbers (schemas, tables, columns)
 Hierarchical names (schema_name, table_name, column_name) populated correctly
 Sample calculated metrics CSV created for testing
 Data quality checks completed
 No critical errors in extraction process
 Ready for Next Section
With entity extraction complete, you now have:

Complete entity inventory with IDs
Hierarchical names for joining
Current Trust Score and Sensitivity values
Test data for validation
Next: Section 4 - Pentaho Data Integration (Joining Process)

{% endtab %} {% tab title="Data Integration" %} {% hint style="info" %} #### Data Integration x x **Learning Objectives** * Set up Pentaho Data Integration transformation * Join extracted entities with calculated metrics * Handle different entity types (schema, table, column) * Output properly formatted CSV for bulk updates * Validate join results {% endhint %} ####  Step 1: Prepare Input Files **Verify Your Input Files** ```bash # Check entity extraction file ls -la data/output/entity_extraction.csv head -3 data/output/entity_extraction.csv # Check your calculated metrics file (adjust path as needed) ls -la data/input/your_calculated_metrics.csv head -3 data/input/your_calculated_metrics.csv Expected File Structures Entity Extraction CSV: csv entity_id,entity_type,entity_name,schema_name,table_name,column_name,fqdn,fqdn_display,current_trust_score,current_sensitivity,new_trust_score,new_sensitivity ef60e629-4261-4ce6-8635-961ca4b1b420,SCHEMA,HumanResources,HumanResources,,,688cc7b9c5759eae5fdcba07/AdventureWorks2022/HumanResources,mssql:adventureworks2022/AdventureWorks2022/HumanResources,48,HIGH,, Your Calculated Metrics CSV (example format): csv schema_name,table_name,column_name,trust_score,sensitivity HumanResources,,,75,HIGH HumanResources,Employee,,85,MEDIUM HumanResources,Employee,FirstName,90,LOW ️ Step 2: Create PDI Transformation Open Pentaho Data Integration Launch PDI/Spoon Create New Transformation Save as: trust_sensitivity_join.ktr Add Input Steps Step 2.1: Add CSV File Input for Entity Extraction Drag "CSV file input" step to canvas Configure: Step name: Entity_Extraction_Input Filename: /home/pdc/Projects/APIs/Key_Metrics/data/output/entity_extraction.csv Delimiter: , Enclosure: " Header: Y (Yes) Get Fields to auto-detect structure Preview to verify data loads correctly Step 2.2: Add CSV File Input for Calculated Metrics Drag another "CSV file input" step Configure: Step name: Calculated_Metrics_Input Filename: /home/pdc/Projects/APIs/Key_Metrics/data/input/your_calculated_metrics.csv Delimiter: , Header: Y (Yes) Get Fields and Preview  Step 3: Configure Joins by Entity Type Understanding Join Logic Schema-level: Join on schema_name only Table-level: Join on schema_name + table_name Column-level: Join on schema_name + table_name + column_name Step 3.1: Add Filter Step for Entity Types Drag "Filter rows" step Connect from Entity_Extraction_Input Configure three filters: Filter for Schemas: Step name: Schema_Filter Condition: entity_type = "SCHEMA" Filter for Tables: Step name: Table_Filter Condition: entity_type = "TABLE" Filter for Columns: Step name: Column_Filter Condition: entity_type = "COLUMN" Step 3.2: Create Separate Calculated Metrics Streams For Schema-level Metrics: Drag "Filter rows" connected to Calculated_Metrics_Input Step name: Schema_Metrics_Filter Condition: table_name IS NULL AND column_name IS NULL For Table-level Metrics: Drag "Filter rows" Step name: Table_Metrics_Filter Condition: table_name IS NOT NULL AND column_name IS NULL For Column-level Metrics: Drag "Filter rows" Step name: Column_Metrics_Filter Condition: column_name IS NOT NULL  Step 4: Configure Stream Lookups Step 4.1: Schema-level Join Drag "Stream lookup" step Connect main stream: Schema_Filter → Schema_Lookup Connect lookup stream: Schema_Metrics_Filter → Schema_Lookup Configure lookup: Lookup step: Schema_Metrics_Filter Keys: schema_name = schema_name Retrieve fields: trust_score, sensitivity Rename retrieved fields: new_trust_score, new_sensitivity Step 4.2: Table-level Join Drag "Stream lookup" step Configure: Main stream: Table_Filter Lookup stream: Table_Metrics_Filter Keys: schema_name = schema_name table_name = table_name Retrieve: trust_score → new_trust_score, sensitivity → new_sensitivity Step 4.3: Column-level Join Drag "Stream lookup" step Configure: Main stream: Column_Filter Lookup stream: Column_Metrics_Filter Keys: schema_name = schema_name table_name = table_name column_name = column_name Retrieve: trust_score → new_trust_score, sensitivity → new_sensitivity  Step 5: Combine Results Step 5.1: Union All Streams Drag "Append streams" step Connect all lookup results: Schema_Lookup → Append_All Table_Lookup → Append_All Column_Lookup → Append_All Step 5.2: Clean and Format Output Drag "Select values" step Connect: Append_All → Select_Output_Fields Select only needed fields: entity_id entity_type schema_name table_name column_name fqdn_display (rename to fqdn ) new_trust_score (rename to trust_score) new_sensitivity (rename to sensitivity) Step 5.3: Filter Non-null Updates Drag "Filter rows" step Connect: Select_Output_Fields → Filter_Valid_Updates Condition: trust_score IS NOT NULL OR sensitivity IS NOT NULL  Step 6: Output Results Step 6.1: Add CSV Output Drag "Text file output" step Connect: Filter_Valid_Updates → CSV_Output Configure: Filename: /home/pdc/Projects/APIs/Key_Metrics/data/output/bulk_update_ready.csv Extension: csv Separator: , Enclosure: " Header: Y (Yes) Format: Unix Expected Output Format csv entity_id,entity_type,schema_name,table_name,column_name,fqdn,trust_score,sensitivity ef60e629-4261-4ce6-8635-961ca4b1b420,SCHEMA,HumanResources,,,mssql:adventureworks2022/AdventureWorks2022/HumanResources,75,HIGH table-id-123,TABLE,HumanResources,Employee,,mssql:adventureworks2022/AdventureWorks2022/HumanResources/Employee,85,MEDIUM ✅ Step 7: Test and Validate Step 7.1: Run Transformation Save transformation Click "Run" button Monitor execution logs Check for errors Step 7.2: Validate Output bash # Check output file was created ls -la data/output/bulk_update_ready.csv # Count joined records wc -l data/output/bulk_update_ready.csv # Preview results head -10 data/output/bulk_update_ready.csv # Check for entity types cut -d',' -f2 data/output/bulk_update_ready.csv | sort | uniq -c Step 7.3: Quality Checks bash # Verify all records have entity_id awk -F',' '$1=="" {print "Missing entity_id on line " NR}' data/output/bulk_update_ready.csv # Check trust_score range (should be 0-100) awk -F',' '$7!="" && ($7<0 || $7>100) {print "Invalid trust_score: " $7 " on line " NR}' data/output/bulk_update_ready.csv # Check sensitivity values (should be HIGH/MEDIUM/LOW) awk -F',' '$8!="" && $8!="HIGH" && $8!="MEDIUM" && $8!="LOW" {print "Invalid sensitivity: " $8 " on line " NR}' data/output/bulk_update_ready.csv  Common Issues & Solutions Issue: "No join results" Check field names match exactly between files Verify data types (text vs numeric) Check for extra spaces in field values Use "Preview" extensively to debug Issue: "Duplicate records" Check join keys are unique in lookup stream Add "Sort rows" before lookup if needed Use "Group by" to deduplicate if necessary Issue: "Missing entity_ids" Verify entity extraction completed successfully Check filter conditions aren't too restrictive Ensure entity_type values match exactly  Section 4 Checklist PDI transformation created and saved Both input files load correctly Entity type filters configured Stream lookups configured for all three levels Join keys properly mapped Output fields selected and renamed CSV output configured with correct path Transformation runs without errors Output file created with expected format Quality checks passed Join statistics look reasonable  Ready for Next Section You now have a properly formatted CSV file ready for bulk updates containing: Entity IDs for API calls Calculated Trust Scores and Sensitivity values Proper field names and formats Next: Section 5 - Bulk Updates (API and OpenSearch) ``` {% endtab %} {% tab title="Bulk Updates" %} {% hint style="info" %} #### Bulk Updates (API and OpenSearch) x x **Learning Objectives** * Understand the two update methods (API vs OpenSearch) * Configure authentication and connection settings * Perform bulk updates with validation * Monitor update progress and handle errors * Verify updates were applied successfully {% endhint %} Section 5: Bulk Updates (API and OpenSearch) markdown ####  Learning Objectives * Understand the two update methods (API vs OpenSearch) * Configure authentication and connection settings * Perform bulk updates with validation * Monitor update progress and handle errors * Verify updates were applied successfully #### ⚖️ Step 1: Choose Update Method **API Method (Recommended)** **Pros:** * Uses official data catalog API * Respects business rules and validation * Maintains audit trails * Safer for production use **Cons:** * Slower for large datasets (rate limited) * Requires valid API authentication **OpenSearch Direct Method** **Pros:** * Faster bulk operations * No API rate limits * Direct database updates **Cons:** * Bypasses business logic * Requires OpenSearch access * Less audit trail * Higher risk if misconfigured ####  Step 2: Configure Authentication **API Authentication Setup** ```bash # Verify your API token is configured cat config.py | grep auth_token # Test API connectivity curl -H "Authorization: Bearer YOUR_TOKEN_HERE" \ -H "Content-Type: application/json" \ "[https://pdc.pentaho.lab/api/v1/entities/ef60e629-4261-4ce6-8635-961ca4b1b420](https://pdc.pentaho.lab/api/v1/entities/ef60e629-4261-4ce6-8635-961ca4b1b420)" OpenSearch Authentication Setup bash # Test OpenSearch connectivity curl -s "http://localhost:9200/_cluster/health" # Test entity index access curl -s "http://localhost:9200/pdc_entities/_search?size=1" 離 Step 3: Dry Run Testing Step 3.1: Prepare Test Data bash # Create small test dataset (first 10 records) head -11 data/output/bulk_update_ready.csv > data/output/test_update.csv # Verify test data cat data/output/test_update.csv Step 3.2: API Dry Run bash cd /home/pdc/Projects/APIs/Key_Metrics # Run API dry run (no actual updates) bulk-update-api \ --base-url https://pdc.pentaho.lab \ --auth-token YOUR_TOKEN_HERE \ --csv-file data/output/test_update.csv \ --dry-run Expected Dry Run Output  Starting bulk update process...  Loaded 10 entities from CSV 離 DRY RUN MODE - No actual updates will be made ✅ Entity ef60e629-4261-4ce6-8635-961ca4b1b420 (SCHEMA): Would update trust_score=75, sensitivity=HIGH ✅ Entity table-id-123 (TABLE): Would update trust_score=85, sensitivity=MEDIUM ⚠️ Entity column-id-456 (COLUMN): Current values match - no update needed  Summary: - Total entities: 10 - Would update: 8 - Skipped (no changes): 2 - Errors: 0 Step 3.3: OpenSearch Dry Run bash # Run OpenSearch dry run bulk-update-opensearch \ --opensearch-url http://localhost:9200 \ --csv-file data/output/test_update.csv \ --dry-run  Step 4: Production Bulk Updates Step 4.1: API Bulk Update (Recommended) bash # Full production update via API bulk-update-api \ --base-url https://pdc.pentaho.lab \ --auth-token YOUR_TOKEN_HERE \ --csv-file data/output/bulk_update_ready.csv \ --batch-size 50 \ --delay 1 API Update Parameters --batch-size 50: Process 50 entities at a time --delay 1: Wait 1 second between batches (rate limiting) --max-retries 3: Retry failed requests up to 3 times --timeout 30: Request timeout in seconds Step 4.2: OpenSearch Bulk Update (Advanced) bash # Direct OpenSearch bulk update bulk-update-opensearch \ --opensearch-url http://localhost:9200 \ --csv-file data/output/bulk_update_ready.csv \ --batch-size 100 OpenSearch Parameters --batch-size 100: Larger batches for faster processing --refresh wait_for: Wait for index refresh after updates  Step 5: Monitor Update Progress Real-time Monitoring The update tools provide real-time progress:  Starting bulk update process...  Loaded 1,247 entities from CSV Batch 1/25 (50 entities): ✅ ef60e629-4261-4ce6-8635-961ca4b1b420: Updated trust_score=75, sensitivity=HIGH ✅ table-id-123: Updated trust_score=85, sensitivity=MEDIUM ❌ column-id-789: Error - Entity not found ⚠️ schema-id-456: Skipped - No changes needed Progress: [████████████████████████████████████████] 100% (1,247/1,247)  Final Summary: - Total entities: 1,247 - Successfully updated: 1,198 - Skipped (no changes): 35 - Errors: 14 - Duration: 4m 23s Log Files bash # Check detailed logs ls -la logs/ tail -f logs/bulk_update_$(date +%Y%m%d).log  Step 6: Verify Updates Step 6.1: API Verification bash # Verify specific entity was updated curl -H "Authorization: Bearer YOUR_TOKEN_HERE" \ "https://pdc.pentaho.lab/api/v1/entities/ef60e629-4261-4ce6-8635-961ca4b1b420" \ | jq '.attributes.trustScore, .attributes.features.sensitivity' Step 6.2: OpenSearch Verification bash # Check entity in OpenSearch curl -s "http://localhost:9200/pdc_entities/_doc/ef60e629-4261-4ce6-8635-961ca4b1b420" \ | jq '._source.attributes.trustScore, ._source.attributes.features.sensitivity' Step 6.3: Bulk Verification Script bash # Create verification script cat > verify_updates.sh << 'EOF' #!/bin/bash echo "Verifying random sample of updates..." # Get 10 random entity IDs from update file tail -n +2 data/output/bulk_update_ready.csv | \ shuf -n 10 | \ cut -d',' -f1 | \ while read entity_id; do echo "Checking entity: $entity_id" curl -s -H "Authorization: Bearer YOUR_TOKEN_HERE" \ "https://pdc.pentaho.lab/api/v1/entities/$entity_id" | \ jq -r '.attributes.trustScore // "null", .attributes.features.sensitivity // "null"' done EOF chmod +x verify_updates.sh ./verify_updates.sh  Step 7: Error Handling Common Errors and Solutions Error: "Entity not found" bash # Check if entity ID exists in OpenSearch curl -s "http://localhost:9200/pdc_entities/_doc/ENTITY_ID_HERE" # Solution: Remove invalid entity IDs from CSV Error: "Authentication failed" bash # Test token validity curl -H "Authorization: Bearer YOUR_TOKEN_HERE" \ "https://pdc.pentaho.lab/api/v1/user/profile" # Solution: Generate new API token Error: "Rate limit exceeded" bash # Solution: Increase delay between requests bulk-update-api --delay 2 --batch-size 25 ... Error: "Invalid trust score value" bash # Check for invalid values in CSV awk -F',' '$7!="" && ($7<0 || $7>100)' data/output/bulk_update_ready.csv # Solution: Fix values in CSV and re-run Error Recovery bash # Generate failed entities report grep "❌" logs/bulk_update_$(date +%Y%m%d).log > failed_entities.txt # Create retry CSV from failed entities # (Manual process - extract entity IDs and create new CSV)  Step 8: Performance Optimization For Large Datasets (>10,000 entities) API Method Optimization: bash # Use smaller batches with longer delays bulk-update-api \ --batch-size 25 \ --delay 2 \ --max-retries 5 \ --timeout 60 OpenSearch Method Optimization: bash # Use larger batches for faster processing bulk-update-opensearch \ --batch-size 500 \ --refresh false Parallel Processing (Advanced) bash # Split large CSV into chunks split -l 1000 data/output/bulk_update_ready.csv chunk_ # Process chunks in parallel (be careful with rate limits) for chunk in chunk_*; do bulk-update-api --csv-file $chunk --batch-size 25 --delay 3 & done wait  Section 5 Checklist Update method chosen (API or OpenSearch) Authentication configured and tested Dry run completed successfully on test data Production bulk update executed Update progress monitored Error handling implemented Sample of updates verified Performance optimized for dataset size Logs and reports generated Failed entities identified for retry  Ready for Next Section Your bulk updates are now complete! You should have: Successfully updated Trust Scores and Sensitivity levels Detailed logs of the update process Verification of applied changes Error reports for any failed updates Next: Section 6 - Testing & Validation (Comprehensive Verification) ``` {% endtab %} {% tab title="Testing & Validation" %} {% hint style="info" %} x x **Learning Objectives** * Perform comprehensive validation of bulk updates * Test data catalog UI to confirm changes are visible * Validate data integrity and consistency * Create automated validation scripts * Document test results and generate reports {% endhint %} Section 6: Testing & Validation markdown ## Section 6: Testing & Validation ####  ####  Step 1: Pre-Update Baseline Capture **Create Baseline Report (if not done before updates)** ```bash # Extract current state for comparison extract-entities --output data/validation/baseline_before_update.csv # Count entities by current values echo "=== BASELINE TRUST SCORE DISTRIBUTION ===" awk -F',' 'NR>1 && $9!="" {print $9}' data/validation/baseline_before_update.csv | sort -n | uniq -c echo "=== BASELINE SENSITIVITY DISTRIBUTION ===" awk -F',' 'NR>1 && $10!="" {print $10}' data/validation/baseline_before_update.csv | sort | uniq -c ✅ Step 2: Post-Update Validation Step 2.1: Extract Current State bash # Extract entities after updates extract-entities --output data/validation/state_after_update.csv # Compare record counts echo "Before update: $(wc -l < data/validation/baseline_before_update.csv) entities" echo "After update: $(wc -l < data/validation/state_after_update.csv) entities" Step 2.2: Validate Updated Entities bash # Create validation script cat > validate_updates.sh << 'EOF' #!/bin/bash echo " Validating bulk updates..." # Count successful updates UPDATED_COUNT=$(join -t',' -1 1 -2 1 \ <(tail -n +2 data/output/bulk_update_ready.csv | sort) \ <(tail -n +2 data/validation/state_after_update.csv | sort) | \ wc -l) echo "✅ Validated $UPDATED_COUNT entity updates" # Check for mismatched values echo " Checking for value mismatches..." join -t',' -1 1 -2 1 \ <(tail -n +2 data/output/bulk_update_ready.csv | cut -d',' -f1,7,8 | sort) \ <(tail -n +2 data/validation/state_after_update.csv | cut -d',' -f1,9,10 | sort) | \ awk -F',' '$2!=$4 || $3!=$5 {print "❌ Mismatch for entity " $1 ": Expected " $2 "," $3 " Got " $4 "," $5}' | \ head -10 echo "✅ Validation complete" EOF chmod +x validate_updates.sh ./validate_updates.sh  Step 3: Data Catalog UI Testing Step 3.1: Manual UI Verification Open Data Catalog: https://pdc.pentaho.lab Navigate to a test entity (use entity from your update CSV) Check Key Metrics section: Trust Score should match your CSV value Sensitivity should match your CSV value Test different entity types: Schema-level entity Table-level entity Column-level entity Step 3.2: Browser-based Validation Script bash # Create UI validation checklist cat > ui_validation_checklist.md << 'EOF' # UI Validation Checklist ## Test Entities (Sample from your CSV) - [ ] Schema: HumanResources (Expected: Trust=75, Sensitivity=HIGH) - [ ] Table: HumanResources.Employee (Expected: Trust=85, Sensitivity=MEDIUM) - [ ] Column: HumanResources.Employee.FirstName (Expected: Trust=90, Sensitivity=LOW) ## Validation Steps for Each Entity 1. [ ] Navigate to entity page 2. [ ] Scroll to "Key Metrics" section 3. [ ] Verify Trust Score displays correctly 4. [ ] Verify Sensitivity level displays correctly 5. [ ] Check that values are not cached/outdated 6. [ ] Verify no UI errors or broken displays ## Cross-browser Testing - [ ] Chrome/Chromium - [ ] Firefox - [ ] Edge (if available) ## Notes: - Record any discrepancies - Note any UI performance issues - Check for proper formatting of values EOF echo " UI validation checklist created: ui_validation_checklist.md"  Step 4: Data Integrity Validation Step 4.1: Trust Score Range Validation bash # Check all trust scores are in valid range (0-100) echo " Validating Trust Score ranges..." awk -F',' 'NR>1 && $9!="" && ($9<0 || $9>100) { print "❌ Invalid Trust Score: " $9 " for entity " $1 }' data/validation/state_after_update.csv # Count entities by trust score ranges echo " Trust Score Distribution:" awk -F',' 'NR>1 && $9!="" { if ($9 >= 0 && $9 <= 25) print "LOW" else if ($9 <= 50) print "MEDIUM-LOW" else if ($9 <= 75) print "MEDIUM-HIGH" else print "HIGH" }' data/validation/state_after_update.csv | sort | uniq -c Step 4.2: Sensitivity Value Validation bash # Check all sensitivity values are valid echo " Validating Sensitivity values..." awk -F',' 'NR>1 && $10!="" && $10!="HIGH" && $10!="MEDIUM" && $10!="LOW" { print "❌ Invalid Sensitivity: " $10 " for entity " $1 }' data/validation/state_after_update.csv # Count sensitivity distribution echo " Sensitivity Distribution:" awk -F',' 'NR>1 && $10!="" {print $10}' data/validation/state_after_update.csv | sort | uniq -c Step 4.3: Entity Type Distribution bash # Verify updates across all entity types echo " Updates by Entity Type:" join -t',' -1 1 -2 1 \ <(tail -n +2 data/output/bulk_update_ready.csv | cut -d',' -f1,2 | sort) \ <(tail -n +2 data/validation/state_after_update.csv | cut -d',' -f1,2 | sort) | \ cut -d',' -f2 | sort | uniq -c  Step 5: API Consistency Testing Step 5.1: API vs OpenSearch Consistency bash # Create API consistency test cat > test_api_consistency.sh << 'EOF' #!/bin/bash echo " Testing API vs OpenSearch consistency..." # Test 5 random entities tail -n +2 data/validation/state_after_update.csv | shuf -n 5 | cut -d',' -f1 | while read entity_id; do echo "Testing entity: $entity_id" # Get from API API_RESULT=$(curl -s -H "Authorization: Bearer YOUR_TOKEN_HERE" \ "https://pdc.pentaho.lab/api/v1/entities/$entity_id" | \ jq -r '[.attributes.trustScore // "null", .attributes.features.sensitivity // "null"] | @csv') # Get from OpenSearch OS_RESULT=$(curl -s "http://localhost:9200/pdc_entities/_doc/$entity_id" | \ jq -r '[._source.attributes.trustScore // "null", ._source.attributes.features.sensitivity // "null"] | @csv') if [ "$API_RESULT" = "$OS_RESULT" ]; then echo "✅ Consistent: $API_RESULT" else echo "❌ Inconsistent - API: $API_RESULT, OpenSearch: $OS_RESULT" fi done EOF chmod +x test_api_consistency.sh # Update YOUR_TOKEN_HERE before running # ./test_api_consistency.sh  Step 6: Performance Impact Testing Step 6.1: Query Performance Test bash # Test search performance after updates echo " Testing search performance..." # Time entity searches time curl -s "http://localhost:9200/pdc_entities/_search?q=attributes.trustScore:[70 TO 100]&size=100" > /dev/null time curl -s "http://localhost:9200/pdc_entities/_search?q=attributes.features.sensitivity:HIGH&size=100" > /dev/null # Test API response times time curl -s -H "Authorization: Bearer YOUR_TOKEN_HERE" \ "https://pdc.pentaho.lab/api/v1/entities?limit=100" > /dev/null Step 6.2: Index Health Check bash # Check OpenSearch index health curl -s "http://localhost:9200/_cluster/health/pdc_entities?pretty" # Check index statistics curl -s "http://localhost:9200/pdc_entities/_stats?pretty" | jq '.indices.pdc_entities.total.docs'  Step 7: Generate Validation Report Step 7.1: Comprehensive Validation Report bash # Create comprehensive validation report cat > generate_validation_report.sh << 'EOF' #!/bin/bash REPORT_FILE="data/validation/validation_report_$(date +%Y%m%d_%H%M%S).md" cat > $REPORT_FILE << 'REPORT' # Trust Score & Sensitivity Update Validation Report ## Executive Summary - **Update Date**: $(date) - **Total Entities Processed**: $(wc -l < data/output/bulk_update_ready.csv) - **Validation Status**: ✅ PASSED / ❌ FAILED ## Update Statistics REPORT echo "### Entities by Type" >> $REPORT_FILE awk -F',' 'NR>1 {print $2}' data/output/bulk_update_ready.csv | sort | uniq -c | \ awk '{print "- " $2 ": " $1 " entities"}' >> $REPORT_FILE echo "" >> $REPORT_FILE echo "### Trust Score Distribution" >> $REPORT_FILE awk -F',' 'NR>1 && $9!="" { if ($9 <= 25) print "LOW (0-25)" else if ($9 <= 50) print "MEDIUM-LOW (26-50)" else if ($9 <= 75) print "MEDIUM-HIGH (51-75)" else print "HIGH (76-100)" }' data/validation/state_after_update.csv | sort | uniq -c | \ awk '{print "- " $2 " " $3 ": " $1 " entities"}' >> $REPORT_FILE echo "" >> $REPORT_FILE echo "### Sensitivity Distribution" >> $REPORT_FILE awk -F',' 'NR>1 && $10!="" {print $10}' data/validation/state_after_update.csv | sort | uniq -c | \ awk '{print "- " $2 ": " $1 " entities"}' >> $REPORT_FILE echo "" >> $REPORT_FILE echo "## Validation Results" >> $REPORT_FILE echo "- [ ] All trust scores in valid range (0-100)" >> $REPORT_FILE echo "- [ ] All sensitivity values valid (HIGH/MEDIUM/LOW)" >> $REPORT_FILE echo "- [ ] UI displays updated values correctly" >> $REPORT_FILE echo "- [ ] API and OpenSearch data consistent" >> $REPORT_FILE echo "- [ ] No performance degradation observed" >> $REPORT_FILE echo " Validation report generated: $REPORT_FILE" EOF chmod +x generate_validation_report.sh ./generate_validation_report.sh  Step 8: Rollback Preparation Step 8.1: Create Rollback Data bash # Prepare rollback CSV if needed echo " Preparing rollback data..." # Extract original values for updated entities join -t',' -1 1 -2 1 \ <(tail -n +2 data/output/bulk_update_ready.csv | cut -d',' -f1 | sort) \ <(tail -n +2 data/validation/baseline_before_update.csv | sort) | \ awk -F',' '{print $1 "," $2 "," $3 "," $4 "," $5 "," $6 "," $9 "," $10}' \ > data/validation/rollback_data.csv # Add header sed -i '1i entity_id,entity_type,schema_name,table_name,column_name,fqdn,trust_score,sensitivity' \ data/validation/rollback_data.csv echo " Rollback data prepared: data/validation/rollback_data.csv"  Section 6 Checklist Baseline data captured before updates Post-update state extracted and compared Updated entities validated against expected values Data catalog UI manually tested for sample entities Trust score ranges validated (0-100) Sensitivity values validated (HIGH/MEDIUM/LOW) Entity type distribution verified API and OpenSearch consistency tested Search performance impact assessed OpenSearch index health verified Comprehensive validation report generated Rollback data prepared (if needed) All validation tests passed  Ready for Next Section Your validation is complete! You should have: Confirmed all updates were applied correctly Verified data integrity and consistency Tested UI functionality Generated comprehensive reports Prepared for any necessary rollbacks Next: Section 7 - Troubleshooting (Common Issues and Solutions) Ready for the final Section 7: Troubleshooting? Feedback submitted Close 2 days ago 20 hours ago 2 hours ago 16 minutes ago a minute ago Press Enter again to interrupt and send a new message Write Chat ChatWriteLegacy Claude Sonnet 4 ``` {% endtab %} {% tab title="Untitled" %} Section 7: Troubleshooting (Common Issues and Solutions) markdown ## Section 7: Troubleshooting ####  Learning Objectives * Identify and resolve common issues during implementation * Understand error messages and their solutions * Implement monitoring and alerting * Create maintenance procedures * Establish best practices for ongoing operations ####  Common Issues & Solutions ** Environment Setup Issues** **Issue: "Command not found: extract-entities"** ```bash # Diagnosis which extract-entities echo $PATH # Solutions # 1. Reinstall package pip uninstall key-metrics-updater pip install -e . # 2. Check if installed in user space pip show key-metrics-updater # 3. Use full path if needed python -m key_metrics.cli extract-entities --help Issue: "Permission denied accessing files" bash # Diagnosis ls -la /home/pdc/Projects/APIs/Key_Metrics/data/ # Solutions sudo chown -R $USER:$USER /home/pdc/Projects/APIs/Key_Metrics/ chmod -R 755 /home/pdc/Projects/APIs/Key_Metrics/ mkdir -p data/{input,output,validation,backup} Issue: "Python module import errors" bash # Diagnosis python -c "import requests, pandas, urllib3" # Solutions pip install requests pandas urllib3 # Or if using conda: conda install requests pandas urllib3  Entity Extraction Issues Issue: "No entities found in OpenSearch" bash # Diagnosis curl -s "http://localhost:9200/_cat/indices?v" | grep pdc_entities curl -s "http://localhost:9200/pdc_entities/_count" # Solutions # 1. Check OpenSearch container docker ps | grep opensearch docker start pdc-opensearch-1 # 2. Verify index name curl -s "http://localhost:9200/_cat/indices?v" # 3. Check network connectivity docker network ls docker inspect pdc-opensearch-1 | grep NetworkMode Issue: "Extraction returns empty schema/table/column names" bash # Diagnosis - Check FQDN parsing head -5 data/output/entity_extraction.csv | cut -d',' -f7 # This is normal for some entities # Focus on entities with complete hierarchical names grep -v ',,,,' data/output/entity_extraction.csv | head -5 Issue: "Extraction is very slow" bash # Solutions # 1. Limit extraction for testing extract-entities --output test.csv --max-entities 100 # 2. Use scroll API more efficiently # (Modify extract.py to use smaller scroll sizes) # 3. Extract specific entity types only # (Modify extract.py to filter by entity_type)  Pentaho Data Integration Issues Issue: "CSV input step shows no data" # Diagnosis in PDI 1. Check file path is absolute 2. Verify delimiter and enclosure settings 3. Use "Preview" button to test 4. Check file permissions # Solutions - Use full absolute paths: /home/pdc/Projects/APIs/Key_Metrics/data/... - Verify CSV format with: head -3 your_file.csv - Check for BOM or encoding issues: file your_file.csv Issue: "Join produces no results" # Diagnosis 1. Preview both input streams before join 2. Check field names match exactly 3. Verify data types are compatible 4. Look for extra spaces or special characters # Solutions - Add "Trim spaces" step before join - Use "String operations" to clean field values - Add debug output steps to see intermediate data Issue: "Duplicate records in output" # Diagnosis - Check if lookup stream has duplicate keys - Verify join conditions are correct # Solutions - Add "Sort rows" step before lookup - Use "Group by" to deduplicate lookup stream - Add "Unique rows" step after join  Bulk Update Issues Issue: "Authentication failed - 401 Unauthorized" bash # Diagnosis curl -H "Authorization: Bearer YOUR_TOKEN" \ "https://pdc.pentaho.lab/api/v1/user/profile" # Solutions # 1. Generate new API token # 2. Check token format (no extra spaces/characters) # 3. Verify token permissions # 4. Check if token expired Issue: "Rate limit exceeded - 429 Too Many Requests" bash # Solutions # 1. Increase delay between requests bulk-update-api --delay 3 --batch-size 20 # 2. Use smaller batch sizes bulk-update-api --batch-size 10 # 3. Implement exponential backoff # (Modify api_updater.py to add retry logic) Issue: "Entity not found - 404 errors" bash # Diagnosis # Check if entity IDs are valid curl -s "http://localhost:9200/pdc_entities/_doc/ENTITY_ID_HERE" # Solutions # 1. Re-extract entities to get current IDs extract-entities --output fresh_entities.csv # 2. Filter out invalid IDs before update grep -v "404" logs/bulk_update.log | grep "entity_id" > valid_entities.txt Issue: "Trust Score update fails but Sensitivity succeeds" bash # This is a known issue - Trust Score has different storage/permissions # Solutions # 1. Update Trust Score and Sensitivity separately bulk-update-api --csv-file trust_scores_only.csv bulk-update-api --csv-file sensitivity_only.csv # 2. Use OpenSearch direct method for Trust Score bulk-update-opensearch --csv-file trust_scores.csv  UI Display Issues Issue: "Updated values not showing in UI" # Diagnosis 1. Check browser cache (Ctrl+F5 to hard refresh) 2. Verify updates actually applied (check API/OpenSearch) 3. Check if UI is caching data # Solutions - Clear browser cache completely - Try incognito/private browsing mode - Wait 5-10 minutes for cache refresh - Check if CDN caching is involved Issue: "Values show as 'null' or empty in UI" bash # Diagnosis curl -s -H "Authorization: Bearer YOUR_TOKEN" \ "https://pdc.pentaho.lab/api/v1/entities/ENTITY_ID" | \ jq '.attributes.trustScore, .attributes.features.sensitivity' # Solutions - Verify correct field paths in update payload - Check if entity type supports these attributes - Ensure values are not being overwritten by business rules  Diagnostic Tools Create Diagnostic Script bash cat > diagnose_system.sh << 'EOF' #!/bin/bash echo " System Diagnostics for Key Metrics Updater" echo "=============================================" echo " Package Installation:" pip show key-metrics-updater || echo "❌ Package not installed" echo " Python Environment:" python --version which python echo " File Permissions:" ls -la /home/pdc/Projects/APIs/Key_Metrics/data/ | head -5 echo " OpenSearch Connectivity:" curl -s "http://localhost:9200/_cluster/health" | jq '.status' || echo "❌ OpenSearch not accessible" echo " API Connectivity:" curl -s -H "Authorization: Bearer YOUR_TOKEN" \ "https://pdc.pentaho.lab/api/v1/user/profile" | jq '.username' || echo "❌ API not accessible" echo " Data Files:" echo "Entity extraction: $(ls -la data/output/entity_extraction.csv 2>/dev/null | wc -l) files" echo "Update ready: $(ls -la data/output/bulk_update_ready.csv 2>/dev/null | wc -l) files" echo " Docker Containers:" docker ps | grep -E "(opensearch|mongodb)" || echo "❌ No relevant containers running" echo "✅ Diagnostics complete" EOF chmod +x diagnose_system.sh # Update YOUR_TOKEN before running  Monitoring & Alerting Create Monitoring Script bash cat > monitor_updates.sh << 'EOF' #!/bin/bash LOG_FILE="logs/monitoring_$(date +%Y%m%d).log" mkdir -p logs echo "$(date): Starting monitoring check" >> $LOG_FILE # Check entity counts CURRENT_COUNT=$(curl -s "http://localhost:9200/pdc_entities/_count" | jq '.count') echo "$(date): Total entities: $CURRENT_COUNT" >> $LOG_FILE # Check for entities with trust scores TRUST_COUNT=$(curl -s "http://localhost:9200/pdc_entities/_search" \ -H "Content-Type: application/json" \ -d '{"query":{"exists":{"field":"attributes.trustScore"}},"size":0}' | \ jq '.hits.total.value') echo "$(date): Entities with trust scores: $TRUST_COUNT" >> $LOG_FILE # Check for entities with sensitivity SENS_COUNT=$(curl -s "http://localhost:9200/pdc_entities/_search" \ -H "Content-Type: application/json" \ -d '{"query":{"exists":{"field":"attributes.features.sensitivity"}},"size":0}' | \ jq '.hits.total.value') echo "$(date): Entities with sensitivity: $SENS_COUNT" >> $LOG_FILE # Alert if counts drop significantly if [ "$TRUST_COUNT" -lt 100 ]; then echo "$(date): ⚠️ WARNING: Low trust score count: $TRUST_COUNT" >> $LOG_FILE fi echo "$(date): Monitoring check complete" >> $LOG_FILE EOF chmod +x monitor_updates.sh # Add to crontab for regular monitoring # crontab -e # Add: 0 */6 * * * /home/pdc/Projects/APIs/Key_Metrics/monitor_updates.sh  Maintenance Procedures Regular Maintenance Checklist bash cat > maintenance_checklist.md << 'EOF' # Monthly Maintenance Checklist ## Data Quality Checks - [ ] Run entity extraction and compare counts with previous month - [ ] Check for entities with invalid trust scores (outside 0-100) - [ ] Verify sensitivity values are only HIGH/MEDIUM/LOW - [ ] Review entities with null/empty values ## System Health - [ ] Check OpenSearch cluster health - [ ] Verify API authentication tokens are valid - [ ] Review error logs for patterns - [ ] Test sample updates with dry-run mode ## Performance Monitoring - [ ] Check extraction time trends - [ ] Monitor bulk update performance - [ ] Review API response times - [ ] Check OpenSearch index size and performance ## Security Review - [ ] Rotate API tokens if needed - [ ] Review access logs - [ ] Check for unauthorized access attempts - [ ] Verify backup procedures ## Documentation Updates - [ ] Update any changed API endpoints - [ ] Review and update troubleshooting guides - [ ] Update entity count statistics - [ ] Document any new issues discovered EOF  Emergency Procedures Rollback Procedure bash cat > emergency_rollback.sh << 'EOF' #!/bin/bash echo " EMERGENCY ROLLBACK PROCEDURE" echo "===============================" if [ ! -f "data/validation/rollback_data.csv" ]; then echo "❌ No rollback data found! Cannot proceed." exit 1 fi echo " Rollback will affect $(wc -l < data/validation/rollback_data.csv) entities" read -p "Continue with rollback? (yes/no): " confirm if [ "$confirm" = "yes" ]; then echo " Starting rollback..." # Create backup of current state first extract-entities --output "data/backup/pre_rollback_$(date +%Y%m%d_%H%M%S).csv" # Execute rollback bulk-update-api \ --base-url https://pdc.pentaho.lab \ --auth-token YOUR_TOKEN_HERE \ --csv-file data/validation/rollback_data.csv \ --batch-size 25 \ --delay 2 echo "✅ Rollback complete" else echo "❌ Rollback cancelled" fi EOF chmod +x emergency_rollback.sh  Support Contacts markdown ## When to Escalate Issues ### Level 1: Self-Service (Use this guide) - Environment setup problems - CSV formatting issues - Basic authentication problems - Standard error messages covered in this guide ### Level 2: System Administrator - OpenSearch cluster issues - Docker container problems - Network connectivity issues - Permission and security problems ### Level 3: Data Catalog Team - API endpoint changes - Business rule modifications - Data model changes - Performance degradation ### Level 4: Emergency - Data corruption - System-wide outages - Security breaches - Mass data loss ## Emergency Contact Information - **System Admin**: [Your contact info] - **Data Catalog Team**: [Your contact info] - **Emergency Hotline**: [Your contact info]  Section 7 Checklist Common issues and solutions documented Diagnostic tools created and tested Monitoring scripts implemented Maintenance procedures established Emergency rollback procedure prepared Support escalation paths defined All troubleshooting scripts tested Documentation complete and accessible  Workshop Complete! Congratulations! You have successfully completed the Trust Score & Sensitivity Bulk Update Workshop. You now have: ✅ Complete automated solution for bulk updates ✅ Comprehensive troubleshooting guide for common issues ✅ Monitoring and maintenance procedures for ongoing operations ✅ Emergency procedures for critical situations ✅ Production-ready tools for your data catalog  Next Steps Implement regular monitoring using the provided scripts Schedule monthly maintenance using the checklist Train team members on the procedures Customize scripts for your specific environment Document any new issues you encounter  Additional Resources Project README: README.md Example configurations: /home/pdc/Projects/APIs/Key_Metrics/examples/ Source code: /home/pdc/Projects/APIs/Key_Metrics/src/key_metrics/ Happy automating!   Workshop Guide Complete! Perfect! I've created a comprehensive 7-section workshop guide that walks through every detailed step of implementing your Trust Score and Sensitivity bulk update project. Each section is designed to be copy-and-paste friendly for your documentation.  Workshop Sections Created: Introduction & Overview - Objectives, prerequisites, architecture, and timing Environment Setup - Installation, credentials, configuration, and verification Entity Extraction - Understanding structure, extraction process, and data analysis Pentaho Data Integration - Complete PDI transformation with joins for all entity types Bulk Updates - API and OpenSearch methods with monitoring and error handling Testing & Validation - Comprehensive verification, UI testing, and reporting Troubleshooting - Common issues, diagnostic tools, maintenance, and emergency procedures  Key Features of the Workshop: Step-by-step instructions with code examples Validation scripts and quality checks Troubleshooting guides for common issues Monitoring and maintenance procedures Emergency rollback procedures Production-ready tools and best practices ⏱️ Workshop Duration: 2-3 hours total Each section includes: ✅ Detailed checklists to track progress  Diagnostic commands for troubleshooting  Validation scripts for quality assurance  Common issues and their solutions The workshop is now ready for implementation and can be used to train team members on the complete process of automating Trust Score and Sensitivity updates across your entire data catalog! You can copy and paste each section as needed for your documentation or training materials. The guide provides everything needed to successfully implement and maintain the bulk update solution in your production environment. ``` {% endtab %} {% endtabs %} --- # Agent Instructions: Querying This Documentation If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question. Perform an HTTP GET request on the current page URL with the `ask` query parameter: ``` GET https://academy.pentaho.com/pentaho-data-catalog-en/projects/data-discovery/sensitivity-level-and-trust-scores.md?ask= ``` The question should be specific, self-contained, and written in natural language. The response will contain a direct answer to the question and relevant excerpts and sources from the documentation. Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.