Observability
Observability Stack Overview
Our observability infrastructure comprises various tools designed to capture and analyze telemetry data, ensuring the health and performance of our systems. The stack is structured around three primary data types: traces, logs, and metrics.
Metrics
Metrics collection utilizes the Otel Collector along with a suite of exporters included within our Platform Data Collection (PDC) framework. This includes tools like the Node Exporter, MongoDB Exporter, and cAdvisor, among others. Together, they provide a comprehensive view of our system's performance and usage statistics.
Traces
For tracing, we employ the OpenTelemetry (Otel) Collector, which facilitates capturing and managing trace data across our distributed systems. This component is key for understanding request lifecycles and inter-service dependencies.
Logs
Log aggregation and management are handled by Fluent Bit. This lightweight data processor is part of our future work plans to enhance log analysis and storage capabilities. Stay tuned for updates in this area.

cAdvisor (short for container Advisor) analyzes and exposes resource usage and performance data from running containers. cAdvisor exposes Prometheus metrics out of the box.
cAdvisor is not enabled by default.
To enable cAdvisor:
Navigate to deployment folder:
Scroll down to the mon-cadvisor: section
Uncomment all the lines:

Save.
Note the profile: mon_enhanced
Enable the OTEL collector to scrap the logs.
Navigate to deployment folder:
Uncomment cadvisor.

Save.
COMPOSE_PROFILES
Ensure the profile has been enabled
Navigate to deployment folder:
Add mon_enhanced

Save.
You will need to restart PDC to deploy the cAdvisor container.
Check cAdvisor container is up and running.

Log into Portainer either by clicking on the bookmark or
Enter credentials.
Username
admin
Password
Portainer123
Click on 'Live Connect' option.
Make a note of the mon_cadvisor-1 container IP address & port.

In your brower enter: http://[mon_cadvisor IP:8080]
Be aware that the IP & port is exposed.

OpenObserve is a comprehensive observability platform designed to provide insights into the health and performance of IT systems. At its core, OpenObserve integrates seamlessly with a suite of tools for collecting metrics, traces, and logs— the three primary pillars of observability.
This section is for reference only.
OpenObserve has been enabled by default.
Edit the .env.default
Check the following parameter has been added.
Save.
You will need to restart PDC to deploy the container.
OpenObserve
Open your web browser and navigate to the following URL: http://localhost/internal/openobserve/web
Log in using the credentials provided below:

Username
Password
Complexpass123
Ensure that the organisation on the top right of the home page has been set to: pdc

Logs
OpenObserve provides a centralized log management interface that allows users to easily search, filter, and analyze log data from various sources. This aids in troubleshooting issues and understanding the system's behavior over time.
Select: Logs from the left-hand menu.
You can filter logs by specific fields or keywords and visualize the log events in chronological order. This enables quick identification of patterns and anomalies in the log data.
As you type the SELECT statement, you will be prompted ..!

Ensure your in SQL mode and type in the following Query.
A simplier method is to select the field/value from the list
Delete the query and select the required fields.

This will also give us an idea of the number of records.

Metrics
Each exporter utilizes a specific naming convention for its metrics, facilitating the identification of their sources.
For instance, metrics from the node exporter, responsible for server-specific data, start with node_*, while metrics collected by cAdvisor, which targets metrics from all Docker containers, begin with container_*
Select: Metrics from the left-hand menu.
In this example: node_memory_MemAvailable_bytes
The screenshot displays the total memory (bytes) available for the last 4 days.

Choose any metric and confirm that:
A time series graph can be produced for any metric stored in Prometheus
Metric values are being recorded in real time
The PromQL editor can be used to drill down on metrics values with a specific parameter.
Traces are one of its key components of OpenObserve.
Traces help you understand the flow of requests across multiple services in a distributed system. This is crucial for identifying bottlenecks and optimizing performance.
By linking errors to specific traces, you can quickly identify the root cause of issues and the context in which they occurred.
Traces reveal how different services interact, helping you understand and manage dependencies in your system.
With trace data, you can identify areas for potential optimization, such as reducing unnecessary API calls or improving database queries.
Enable Tracing
Log into OpenObserve and select Ingestion -> Traces (Open Telementry)

Make a note of the OLTP gRPC settings (you will have a different Auth token)
Edit the following OTEL configuration file.

Restart PDC
Searching for a Specific Trace
To locate a specific trace by its traceID, you can refine your search by editing the query in the query editor. Use the field name trace_id to direct your search to a particular trace.
On the left side of the page, you'll find a list of field names that assist in filtering traces. For instance, to explore traces originating from the front end of the PDC, you can input the following query into the editor:
This query retrieves all traces associated with the pdc-web-client service, allowing for a focused analysis of front-end activities.

x
x
Query Functions
OpenObserve supports a variety of functions to manipulate and analyze data effectively. These functions can be used within queries to perform operations like aggregations, calculations, and transformations on collected metrics, traces, and logs.
Aggregation Functions: Functions such as SUM(), AVG(), and COUNT() allow for the aggregation of data points over a specified interval.
Transformation Functions: Functions like TOPK(), PERCENTILE(), and RATE() help in transforming raw data into useful insights.
Math Functions: Basic arithmetic functions (+, -, *, /) can be applied to metrics for custom calculations.
String Functions: Functions such as str_match() and str_replace() aid in manipulating text-based log and trace data.
Aggregation Functions: Functions such as SUM(), AVG(), and COUNT() allow for the aggregation of data points over a specified interval.
Transformation Functions: Functions like TOPK(), PERCENTILE(), and RATE() help in transforming raw data into useful insights.
Math Functions: Basic arithmetic functions (+, -, *, /) can be applied to metrics for custom calculations.
String Functions: Functions such as str_match() and str_replace() aid in manipulating text-based log and trace data.
Combining these functions within queries can help in deriving meaningful and actionable insights from your telemetry data.
x
x
x
x
VRL Functions
OpenObserve integrates VRL (Vector Remap Language) for complex data transformations and log manipulations. VRL functions provide a powerful way to convert, enrich, and process logs, metrics, and traces with ease.
Basic Functions: Use functions like parse_json(), to_string(), and to_int() for basic data type conversions.
Conditional Functions: Implement conditional logic with if, else, and case statements.
String Functions: Manipulate text data using upcase(), downcase(), trim(), and substring() functions.
Log Enrichment: Enrich logs with metadata or additional context using functions like add_field() and merge().
As a simple example we're going to add a timestamp to the logs.
Let's take a look at the different fields in the default log stream.
Click on a row and select table to view the fields.

We're going to add a formatted_date field to the logs to help with selecting range - dates in our dashboard.
The following VRL function will add a field: formatted_date to the logs.
Copy and paste the following into the VRL Function Editor.
Execute the VRL function & check the log file.

Save the function so that you can apply the function to all incoming pdc logs.

Click on the function option in the sidebar & select: Stream Association.
Highlight the default logs stream & associate the formatted_date function.

The function will now be applied to all incoming pdc logs.
Dashboards in OpenObserve
Dashboards in OpenObserve offer a consolidated view of metrics, logs, and traces, enabling a holistic perspective on system health and performance.
Let's create a simple dashboard that monitors OS resources.
Click on the "Dashboards" option from the left-hand menu.
Click on 'New Folder' and enter the following details:

Click 'Save'.
Click on 'New Dasboard' and enter the following details:

Click 'Save'.
Start by clicking on 'ADD PANEL'

Use the editor to add widgets for visualizing telemetry data.
x
x
x
Graphs: Plot time-series data for real-time monitoring.
Tables: Display logs or metrics in tabular form.
Heatmaps: Identify patterns and anomalies.
x
Community Dashboards
Explore and import pre-built dashboards shared by the OpenObserve community. These can serve as a quick start for common monitoring scenarios.
To import a dashboard, browse to:
~/Workshop--Pentaho-Data-Catalog/Dashboards
x
x
Was this helpful?


