Reference Data
Reference data sets contain relatively static, unchanging data values that are commonly used by an organization. In Pentaho Data Catalog, you can create reference data sets that contain valid data values for your organization to reference.
Some examples of common reference data sets include:
Branch Numbers
Country codes
Currencies
Exchange codes
Language codes
Measurement units
Postal codes
Product codes
Regions
Transaction codes
Import the 'Medical' Reference Dataset
Ensure you have logged in as: Data Steward.
Username
Password
Welcome123!
Click Reference Data in the left navigation menu & select: Import from the drop=down Actions menu options.

Browse to:
~/Workshop--Pentaho-Data-Catalog/Reference Dataset/import_assets_1710938718263.csv

Click: Open

Click: Submit
The parent Medical Category is added together with a 'Data Set' Antibiotics.

Import Antibiotics Data Set
Highlight the 'Antibiotics Data Set'.
Click on: Data Values (currently there are no data values).

Note the schema: Sr, ID, Antibiotic Name
Scroll along and select: Import

Proceed with Import

Complete the steps ..

Choose file type.

Import Data Set.

Review & assign version.

Check Data Values.

Commit Data Values

Create a reference data set to categorize enterprise data and maintain organizational consistency.
If you need a new category to contain the reference data set, you must create the category before creating the reference data set.
Click Reference Data in the left navigation menu.
In the Reference Data menu, click Actions -> Add New DataSet.

In the DataSet Name box, Enter a DataSet Name:
In the Parent list, select the category or reference data set that you want to be the parent of the new reference data set.
Select a reference dataset as a parent only for organizational purposes. Reference datasets do not inherit any properties or information from parent reference datasets.
Click Create. A new, empty reference data set is created and the Summary tab for the new reference data set opens.
In the Description box, enter a description for the reference data set.
In the Purpose box, enter an explanation of the purpose for the reference data set.
(Optional) In the Properties box, update one or more of the following properties:
Property
Value options
Sensitivity
Unknown (default)
Low
Medium
High
Status
Info (default)
Valid
Warning
Expired
Version
1.0 (default)Note: The version number can only be increased.
Click Save.The reference data set is created.
Add schema for a reference data set so that you can maintain data quality by standardizing and controlling what data values can be entered in the reference data set.
For example, you can add schema to specify that the value for a type of information is selected from a pre-defined list, and then specify the list of valid values.
A schema can be added that has the same values in all columns as an existing schema, but has a unique identifier assigned to it in the system. If the duplicate schema are used in different parts of an organization and one schema is updated, then the reference data values that the schema is meant to control might no longer be consistent across the organization.
Verify that a schema with all the same values does not already exist before adding a new schema.
You can also import reference data schema and values in a CSV file or from a Data Catalog table by clicking Import to open the Import Reference Data wizard.
Perform the following steps to add a schema to a reference data set:
Click Reference Data in the left navigation menu.
In the Reference Data menu, navigate to the reference data set that you want to update, and then select the reference data set.
Click the Schema tab.
In the Reference Data Schema table, click + Add Row.
In the new table row, update the following fields:
FieldDescriptionColumn Name
A column name that represents the type of data that the schema controls.
Data Type
The type of data that can be entered as a value. Data Type options include:
Text
String
Integer
Float
Binary
Length
The number of characters that can be entered for the value.
Input Type
The input method that can be used to enter a value. Input Type options include:
Pre-defined
Free text
Valid Value
A comma-separated list of values that are valid as input. You must update the Valid Value field when the schema Input Type is Pre-defined.
For example, to create a list of colors that a user can select from, you might enter the following list of valid values: red, yellow, blue.
Editable
A switch that that can be toggled to specify whether the schema can be edited. Editable options are:
no
yes
You must have the Admin user role to specify whether a schema can be edited.
On the right side of the new table row, click Save.
The new schema is saved to the Reference Data Schema table and is added as a column to the Reference Data Values table on the Data Values tab.
Populate a reference data set with values to serve as authoritative lookup references for fields that are governed by the reference data set.
A reference data value can be added that has the same values in all columns as an existing reference data value, but has a unique identifier assigned to it in the system. If the duplicate values are used in different parts of an organization and one value is updated, then the reference data is no longer consistent across the organization.
Verify that a reference data value with all the same values does not already exist before adding a new reference data value.
Perform the following steps to add values to a reference data set:
Click Reference Data in the left navigation menu.The Reference Data page opens.
In the Reference Data menu, navigate to the reference data set that you want to update, and then select the reference data set.
Click the Data Values tab.
Click + Add Row.Note: If the value already exists in a row that is disabled, you can re-enable that row by toggling the Status switch to the Enabled position.A row is added to the Reference Data Values table. Columns in the table correspond to the schemas that are defined on the Schema tab.
Update the new table row with values that adhere to the schema that controls each column.
On the right side of the new table row, click Save.
The new values are saved to the Reference Data Values table.If you made multiple modifications to the Reference Data Values table, consider committing a new version of the reference data set.
Add a business term to a reference data set to clarify the context for using the data and to enhance organizational understanding of the data.
Perform the following steps to add a business term to a reference data set:
Click Reference Data in the left navigation menu.The Reference Data page opens.
In the Reference Data menu, navigate to the reference data set that you want to update, and then select the reference data set.
Click the Business Terms tab.
In the Business Terms tab, click Add Terms.The Add Business Terms dialog box opens.
Navigate to the business term that you want to add to the reference data set and select it.
Click Add.
The business term is added to the reference data set and appears in the Business Terms table.
x
x
Was this helpful?
