hexagon-nodesLLM Integration

Integrate LLMs into your pipelines ..

circle-info

Overview

The workshops establish a repeatable architectural pattern: PDI acts as the orchestration layer, while a locally-hosted LLM (served via Ollama) handles the "intelligent" processing that rule-based ETL cannot do well. The six use cases covered - Sentiment Analysis, Data Quality, Data Enrichment, Named Entity Recognition, text summarization and multi-staged - all share the same fundamental skeleton. Once you understand it once, you can adapt it to virtually any AI-enrichment task.

circle-info

Everything hinges on making a clean POST request to Ollama's /api/generate endpoint at http://localhost:11434. This is a local REST API, so there's no external dependency, no API key, and no network latency.

The request payload structure is:

{
  "model": "llama3.2:3b",
  "prompt": "Your instruction here...",
  "stream": false,
  "format": "json",
  "keep_alive": "30m",
  "options": {
    "temperature": 0.1,
    "num_predict": 300,
    "num_ctx": 2048,
    "num_thread": 0
  }
}
circle-info

Parameters:

model — Which Ollama model to invoke. The workshops use llama3.2:3b as the default: compact enough to run on CPU-only hardware but capable enough for structured tasks. You can scale up to llama3.1:8b for harder problems at the cost of speed.

stream: false — This is critical for PDI. Streaming sends tokens incrementally, which is great for chat UIs but useless in an ETL row-by-row context. Setting this to false tells Ollama to generate the entire response, then return it as one complete JSON object — which is what PDI needs to write into a field.

format: "json" — Forces the model's output to be valid JSON. Without this, the model might add conversational preamble, markdown fences, or explanations around its JSON answer, all of which break downstream parsing.

keep_alive — This is the single biggest performance optimization for batch processing. By default Ollama unloads the model from memory after each request. Reloading takes 10–30 seconds depending on model size and disk speed. Setting keep_alive: "30m" keeps the model resident in RAM across all rows in a batch run, turning a multi-hour job into a fraction of that time.

temperature: 0.1 — Controls randomness. Values near 0 produce near-deterministic output, meaning the same review will produce the same sentiment classification on repeated runs. This is essential for reproducible ETL — you don't want results changing every time the pipeline runs.

num_predict: 300 — Caps the maximum tokens generated. This prevents runaway generation, protects processing time, and avoids unexpectedly large response payloads overwhelming downstream steps.

Select a workshop:

circle-check

Sentiment Analysis

Workflow

sentiment_analysis_optimized
  1. Verify Ollama Installation.

Run through the following steps to build sentiment_analysis_optimized.ktr:

circle-info

What is Sentiment Analysis?

Sentiment Analysis is the process of computationally identifying and categorizing opinions expressed in text to determine whether the writer's attitude toward a particular topic, product, or service is positive, negative, or neutral.

Example Input:

Example Output (Sentiment Analysis):

circle-info

Why is Sentiment Analysis Important?

Business Applications:

  1. Customer Feedback Analysis - Automatically categorize thousands of reviews to identify satisfaction trends

  2. Brand Monitoring - Track public sentiment about your brand across social media and review sites

  3. Product Improvement - Identify which features customers love and which need improvement

  4. Customer Support Prioritization - Route angry customers to experienced support agents first

  5. Market Research - Understand customer opinions about competitor products

  6. Crisis Detection - Quickly identify negative sentiment spikes that require immediate attention

circle-info

Real-World Example: A company receives 10,000 product reviews per month. Manual analysis would take weeks. With sentiment analysis:

  • Instant categorization: 7,500 positive, 1,800 neutral, 700 negative

  • Identify issues: Negative reviews mention "battery life" 450 times → product team investigates

  • Measure satisfaction: 75% positive sentiment score → track over time

  • Prioritize responses: Route the 200 most negative reviews to customer service

circle-info

Types of Sentiment

1. Polarity (Basic)

  • Positive: "This product is amazing!"

  • Negative: "Terrible quality, waste of money"

  • Neutral: "The product arrived on Tuesday"

2. Granular Sentiment (Scored)

  • Very Positive: +0.8 to +1.0 ("Best purchase ever!")

  • Positive: +0.3 to +0.7 ("Good value for money")

  • Neutral: -0.2 to +0.2 ("It works as described")

  • Negative: -0.7 to -0.3 ("Not what I expected")

  • Very Negative: -1.0 to -0.8 ("Complete garbage, requesting refund")

3. Emotion-Based Sentiment (Advanced)

  • Joy: "So happy with this purchase!"

  • Anger: "This company has the worst customer service!"

  • Frustration: "Why doesn't this feature work properly?"

  • Disappointment: "Expected better quality for the price"


How LLMs Improve Sentiment Analysis

Traditional Methods (Rule-Based/ML):

circle-info

Problems with Traditional Methods:

❌ Can't handle context: "This isn't bad" → Detected as negative (contains "bad")

❌ Misses sarcasm: "Oh great, another software bug" → Detected as positive (contains "great")

❌ Ignores negation: "Not good at all" → Detected as positive (contains "good")

❌ Limited to trained categories

❌ Requires extensive labeled training data


LLM-Based Sentiment Analysis:

circle-info

Advantages of LLMs:

✅ Understands context and nuance

✅ Detects sarcasm and irony

✅ Handles negation correctly

✅ Provides explanations and reasoning

✅ Extracts key phrases automatically

✅ Works in multiple languages (multilingual models)

✅ No training data required (zero-shot learning)

✅ Customizable output format (JSON, XML, etc.)


circle-info

Sentiment Analysis Output Components

In this workshop, our LLM will extract:

1. Sentiment Classification

  • Category: positive, negative, or neutral

  • Example: "sentiment": "positive"

2. Sentiment Score

  • Numeric value from -1.0 (very negative) to +1.0 (very positive)

  • Example: "score": 0.9 (strongly positive)

3. Confidence Level

  • How certain is the LLM about this classification (0-100%)

  • Example: "confidence": 95 (very confident)

  • Low confidence (<60%) might indicate mixed or ambiguous sentiment

4. Key Phrases

  • Important words/phrases that influenced the sentiment

  • Example: ["exceeded expectations", "incredible battery", "blazing fast"]

  • Useful for identifying specific strengths or weaknesses

5. Summary

  • One-sentence summary of the review's main point

  • Example: "Customer extremely satisfied with laptop performance and battery life"

  • Helps quickly understand what the review is about


Use Cases in This Workshop

We'll analyze 3 customer reviews with varying sentiments:

Review 1 (Positive):

Review 2 (Negative):

Review 3 (Neutral/Mixed):

circle-info

Expected Results

After running this workshop's transformation, you'll have:

  • Original review text

  • AI-determined sentiment (positive/negative/neutral)

  • Numeric score (-1.0 to 1.0)

  • Confidence percentage

  • Key phrases extracted

  • One-sentence summary

All in a structured CSV file ready for analysis, visualization, or database import!

circle-info

Key Takeaways

  1. Sentiment analysis automatically categorizes opinions in customer feedback

  2. LLMs provide context-aware analysis that traditional methods can't match

  3. Structured JSON output makes results easy to process in ETL pipelines

  4. Confidence scores help identify reviews that need manual review

  5. Key phrases identify specific strengths and weaknesses

  6. Scalable processing - analyze thousands of reviews in minutes

Last updated

Was this helpful?