# Canvas & Collections

{% hint style="success" %}

#### Data Canvas & Collections

Before we start processing / profiling the data, let's cover the main features of the Data Canvas - the main viewer for displaying various aspects of the data.

The catalog is basically comprised of catagorized data assets, such as files, tables or schemas, etc. These can be grouped and defined as a Collection - related by purpose or business context.

Once enriched with, for example, quality scores, sensitivity levels, etc then the Collection can be published as a Data Product. This publishing action changes its lifecycle state and makes it available in global search results and for broader consumption. For example, a Customer Insights Dataset under the Marketing Category can be published as a Data Product once it includes profiling results, sensitivity tagging, and trust scoring.
{% endhint %}

1. Log into Data Catalog:

{% embed url="<https://pdc.pentaho.lab>" %}

Username: <james.lock@adventureworks.com>

Password: Welcome123!

{% hint style="warning" %}

#### **Security Advisory: Handling Login Credentials**

For enhanced security, it is strongly recommended that users avoid saving their login details directly in web browsers. Browsers may inadvertently autofill these credentials in unrelated fields, posing a security risk.

**Best Practice**

• **Disable Autofill:** To mitigate potential risks, users should disable the autofill functionality for login credentials in their browser settings. This preventive measure ensures that sensitive information is not unintentionally exposed or misused.
{% endhint %}

{% tabs %}
{% tab title="1. Data Canvas" %}
{% hint style="info" %}

#### Data Canvas

Use the Data Canvas page to explore and investigate your data.
{% endhint %}

1. Highlight the connection: mssql:adventureworks2022.
2. Click on the Details tab:

<figure><img src="/files/G1CWOPt0DG5DsfHI5SrZ" alt=""><figcaption></figcaption></figure>

{% hint style="info" %}
As we know the adventureworks2022 database has 5 schemas:

* **HumanResources**: Employee data, departments, payroll
* **Sales**: Orders, customers, territories, sales staff
* **Production**: Products, inventory, manufacturing
* **Purchasing**: Vendors, purchase orders, procurement
* **Person**: Contact information, addresses, demographics
  {% endhint %}

3. To View schema details: highlight a schema and click on View:

<figure><img src="/files/oaQpik2drJ5wPRjs1wDs" alt=""><figcaption><p>View Schemas</p></figcaption></figure>

{% hint style="info" %}
We've only ingested the schema metadata, so the Details are limited until we process the data.
{% endhint %}

<figure><img src="/files/Q8gJ7akuHV8KIdg78u41" alt=""><figcaption><p>Schema Details</p></figcaption></figure>

{% hint style="warning" %}
At this stage we have only completed Stage 1: Metadata Ingestion.

{% endhint %}

{% hint style="info" %}
In addition, you can enter a search term in the **Search** field to search for resources such as folders, schemas, tables, files, or fields within the navigation pane.
{% endhint %}

<figure><img src="/files/wgMt6IbrJUjLcdK6M4Ti" alt=""><figcaption></figcaption></figure>
{% endtab %}

{% tab title="2. Collections" %}
{% hint style="info" %}

#### Collections

A **Collection** is a way to logically group data assets, such as schemas, tables, and files, so that you can work with them more efficiently. Whether you are analyzing similar datasets or combining diverse data sources, Collections allow you to organize and manage your data entities based on structure or business use case.

**Key Characteristics**

Collections are designed for business users, not database administrators. They can pull tables from multiple database schemas and present them with business-friendly names and descriptions. Different user roles see different Collections based on their needs - sales teams see sales data, HR sees employee data.

Data Catalog supports two types of Collections:

* **Dataset**: A Dataset is a group of homogeneous data assets, such as tables or files that share the same schema.
* **Data Collection**: A Data Collection is a group of heterogeneous data assets, such as files, tables, or schemas, with different structures.

**How is this different?**

Instead of writing complex SQL joins across multiple schemas, users browse pre-organized Collections and drag-and-drop the data they need. The system automatically handles the technical complexity behind the scenes.

**Business Benefits**

Collections enable self-service analytics by making data accessible to business users without technical expertise. They ensure consistent data definitions across teams, improve data governance, and dramatically reduce the time from question to insight. Teams can focus on analysis rather than data preparation.
{% endhint %}

***

{% tabs %}
{% tab title="1. Create Customer Analytics Collection" %}
{% hint style="info" %}

#### Customer Analytics Collection

A **Customer Analytics Collection** might include:

* `Person.Person` - Basic personal information
* &#x20;`Person.Address` - Customer addresses
* `Sales.Customer` - Customer business data&#x20;
* `Sales.SalesOrderHeader` - Purchase history summary&#x20;

**Business Value:** This gives marketing teams everything they need to analyze customer behavior without understanding table relationships.
{% endhint %}

1. Select Collections & Create a New Category.

<figure><img src="/files/qYvq2qaIu88U53kQrB2w" alt=""><figcaption></figcaption></figure>

2. Select: Create Category & Create.

<figure><img src="/files/sxRsTymMpFyC5qzheqN0" alt=""><figcaption></figcaption></figure>

3. Repeat to Create a Group - Customer Analytics

<figure><img src="/files/8iU1oHwNzwLNaaxO3NWq" alt=""><figcaption></figcaption></figure>

4. Select: Data Canvas
5. Expand Person & Sales schemas & select:

| Schema | Table                  |
| ------ | ---------------------- |
| Person | Person.Person          |
|        | Person.Address         |
| Sales  | Sales.Customer         |
|        | Sales.SalesOrderHeader |

6. Click: 'Add to Cart'

<figure><img src="/files/xr6UJs77rWHgAxUPE3qt" alt=""><figcaption><p>Select Schema - Tables</p></figcaption></figure>

7. Save as Collection.

<figure><img src="/files/sJ2raD5EdNOqlVfFZXvi" alt=""><figcaption><p>Save as Collection</p></figcaption></figure>

8. Select: Customer - Parent Group & Create.

<figure><img src="/files/7UEFqAmgM2WZr8szLxEa" alt=""><figcaption></figcaption></figure>

{% hint style="warning" %}
Remember to select: Collection - the tables span several schemas.
{% endhint %}

9. View the Details of your Collection.

<figure><img src="/files/xex1oHrz1ng5DIFU3e7I" alt=""><figcaption></figcaption></figure>

10. Click on: Actions.

{% hint style="info" %}
Once the data has been curated, it can be published as a [Data Product](#id-3.-data-product).

In practice, Collection creation is typically a collaborative effort:

* **Business Stewards** define the business logic and organize Collections around business processes and analytical needs
* **Data Stewards** handle the technical curation and ensure data quality within Collections
* **Administrators** provide the system setup, user permissions, and governance framework
* **Business Users** provide requirements and feedback on Collection usefulness
  {% endhint %}
  {% endtab %}

{% tab title="Suggested Collections" %}
{% hint style="info" %}

#### Customer 360

A **Customer 360 Collection** might include:

**Tables Included:**

* `Person.Person` - Basic personal information
* `Sales.Customer` - Customer business data
* `Person.Address` - Customer addresses
* `Person.EmailAddress` - Contact information
* `Person.PersonPhone` - Phone numbers
* `Sales.CustomerAddress` - Address relationships
* `Sales.SalesOrderHeader` - Purchase history summary
* `Sales.SalesTerritory` - Geographic context

**Business Value:** This gives marketing teams everything they need to analyze customer behavior without understanding table relationships.
{% endhint %}

{% hint style="info" %}

#### Financial Performance

For finance teams analyzing revenue, costs, and profitability:

**Tables Included:**

* `Sales.SalesOrderHeader` - Order totals and dates
* `Sales.SalesOrderDetail` - Line item details
* `Production.Product` - Product costs and pricing
* `Sales.SalesTerritory` - Regional performance
* `Sales.SalesPerson` - Salesperson commissions
* `Purchasing.PurchaseOrderHeader` - Cost data
* `Production.TransactionHistory` - Inventory costs

**Business Value:** Finance can analyze profit margins, sales trends, and cost analysis across products and regions.
{% endhint %}

{% hint style="info" %}

#### Supply Chain & Inventory

For operations teams managing inventory and procurement:

**Tables Included:**

* `Production.Product` - Product specifications
* `Production.ProductInventory` - Current stock levels
* `Purchasing.PurchaseOrderHeader` - Purchase orders
* `Purchasing.PurchaseOrderDetail` - Purchase details
* `Purchasing.Vendor` - Supplier information
* `Production.Location` - Warehouse locations
* `Production.TransactionHistory` - Inventory movements
* `Production.WorkOrder` - Manufacturing orders

**Business Value:** Operations teams can monitor stock levels, supplier performance, and production scheduling.
{% endhint %}
{% endtab %}
{% endtabs %}
{% endtab %}

{% tab title="3. Data Product" %}
{% hint style="info" %}

#### Data Product

A Data Product is the published, production-ready version of a Collection or Dataset in Pentaho Data Catalog. It transforms working data assets into verified, discoverable resources available for broader organizational use through a formal publishing process.

**From Collections to Data Product**

Collections start as working drafts where data stewards organize related tables and add metadata. Once enriched with quality scores, sensitivity tags, business terms, and trust ratings, they can be published as Data Products. This publishing changes their lifecycle state and makes them globally searchable.

**Customer Analytics Collection**

The Customer Analytics Collection containing tables:

* `Person.Person` - Basic personal information
* &#x20;`Person.Address` - Customer addresses
* `Sales.Customer` - Customer business data&#x20;
* `Sales.SalesOrderHeader` - Purchase history summary&#x20;

would undergo enrichment - receiving a 75% quality score, PII sensitivity tagging, high trust ratings, and business term mapping. Once these standards are met, it becomes the "Customer Analytics Data Product."

**Publishing Criteria**

Organizations typically set minimum thresholds like 60% data quality, assigned sensitivity levels, and acceptable trust scores. However, Collections can still be published even if they don't meet all criteria, giving organizations flexibility in their governance approach.

**Key Benefits**

Published Data Products become globally searchable, carry verified status indicating business readiness, and include enhanced metadata for filtering and discovery. This ensures only validated, well-documented data becomes widely available while maintaining proper access controls based on sensitivity levels.

**Business Impact**

The publishing process creates clear distinction between work-in-progress Collections and trusted business assets. This helps users identify reliable data for decision-making while ensuring sensitive information like customer PII is properly governed and protected.
{% endhint %}

x

x

x
{% endtab %}
{% endtabs %}


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://academy.pentaho.com/pentaho-data-catalog-en/data-catalog/canvas-and-collections.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
