# Other Data Sources

{% hint style="info" %}

#### **Data Source Configuration Guide**

Before integrating your data sources, it's essential to collect all the necessary configuration details. This guide outlines the key pieces of information required to establish a connection to your data sources. Your database administrator (DBA) will be a valuable resource in providing this configuration information.

* **URI (Uniform Resource Identifier):** This unique identifier is used to locate your data source. You'll typically need a username and password to authenticate your connection.
* **Driver:** Ensure you have the appropriate driver for your data source. This is crucial for enabling your application to communicate with the database.
  {% endhint %}

{% embed url="<https://dbschema.com/databases.html>" %}
Link to download database drivers
{% endembed %}

{% hint style="info" %}

* **Credentials:** Username and password are fundamental for authentication.
* **Host Name and Port Number:** These are required to point your application towards the right server and port where your database is running.
* **Key Store and Trust Store Configuration:**

&#x20;      **\* Key Store Type and Location:** Specifies the type (e.g., JKS) and location of the key store file.

&#x20;      **\* Key Store Password:** The password to access the key store.

&#x20;      **\* Trust Store Type and Location:** Defines the type and location of the trust store file.

&#x20;      **\* Trust Store Password:** The password required to access the trust store.

&#x20;      **\* Cipher Suite:** A list of encryption algorithms supported for SSL/TLS connections.

* **Encryption Type:**

&#x20;     *\* Encryption Only:* Data is encrypted during transmission but does not require authentication.

&#x20;     *\* Encryption with Server and Client Authentication:* Both the client and server authenticate each other, providing a higher security level.

&#x20;     **\* SSL Configuration:** For secure connections, SSL configuration details are essential, including the cipher suite, key and trust store information.

&#x20;     **\* Data Source Type:** Identifying the type of data source (e.g., SQL database, NoSQL database, file system) is crucial for selecting the correct driver and configuration settings.

&#x20;    • **Configuration Method:** Decide on the approach for configuring your data source. This can be via direct credentials, SSL, or a URI.

Gathering this information beforehand will streamline the process of adding your data sources. Remember, your DBA is your go-to person for obtaining most of this configuration detail.
{% endhint %}

{% hint style="warning" %}
For Amazon Web Services (AWS) data source types, a configuration method isn't specified. You must have information such as AWS region, account number, IAM username, access key ID, and secret access key to configure these data source types.
{% endhint %}

{% embed url="<https://docs.hitachivantara.com/r/en-us/pentaho-data-catalog/10.0.x/mk-95pdc002/manage-data-sources>" %}
Link to: Data Sources
{% endembed %}

***

#### Accessing Your Catalog

To access your catalog, please follow these steps:

1. Open Google Chrome web browser. and click on the bookmark, or

   Navigate to: [**https://pdc.pdc.lab/**](https://pdc.pentaho.example/)
2. Enter the following email and password, then click Sign In.

<table data-header-hidden><thead><tr><th width="156"></th><th></th></tr></thead><tbody><tr><td>Username</td><td>data_steward@hv.com</td></tr><tr><td>Password</td><td>Welcome123!</td></tr></tbody></table>

{% hint style="warning" %}

#### **Security Advisory: Handling Login Credentials**

For enhanced security, it is strongly recommended that users avoid saving their login details directly in web browsers. Browsers may inadvertently autofill these credentials in unrelated fields, posing a security risk.

**Best Practice**

• **Disable Autofill:** To mitigate potential risks, users should disable the autofill functionality for login credentials in their browser settings. This preventive measure ensures that sensitive information is not unintentionally exposed or misused.
{% endhint %}

3. Click on: Management -> Resources tile.

<figure><img src="/files/hgJTfpoL0QTS659kt9bW" alt=""><figcaption><p>Resources</p></figcaption></figure>

4. Click on: Add Data Source.
5. Specify the following basic information for the connection to your data source (you'll find the connection details in the table below these descriptions):

<table><thead><tr><th width="211">Field</th><th>Description</th></tr></thead><tbody><tr><td>Data Source Name</td><td><p>Specify the name of your data source. This name is used in the Data Catalog interface. It should be something your Data Catalog users recognize. </p><p>Names must start with a letter, and must contain only letters, digits, and underscores. White spaces in names are not supported.</p></td></tr><tr><td>Data Source ID (Optional)</td><td><p>Specify a permanent identifier for your data source. If you leave this field blank, Data Catalog generates a permanent identifier for you.</p><p>You cannot modify Data Source ID for this data source after you specify or generate it.</p></td></tr><tr><td>Description (Optional)</td><td>Specify a description of your data source.</td></tr><tr><td>Data Source Type</td><td>Select the database type of your source. You are then prompted to specify additional connection information based on the file system or database type you are trying to access.</td></tr></tbody></table>

After you have specified the basic information, specify the following additional connection information based on the file system or database type you are trying to access.

<table><thead><tr><th width="214">Field</th><th>Description</th></tr></thead><tbody><tr><td>Affinity</td><td>This default setting specifies which agents should be associated with the data source in a multi-agent deployment.</td></tr><tr><td>Configuration Method: Select Credentials or URI as a configuration method.</td><td></td></tr><tr><td>Configuration Method: Credentials</td><td><p>• Username/Password: Credentials that provide access to the specified database.</p><p>• Host: The address of the machine where the Microsoft SQL database server is running. It can be an IP address or a domain name.</p><p>• Port: The port number on which the Microsoft SQL server is listening for incoming connections. The default port is 5432.</p></td></tr><tr><td>Configuration Method: URI</td><td><p>• Username/Password: Credentials that provide access to the specified database.</p><p>• Service URI: For example, URL would look like Server=myServerAddress;Database=myDatabase;User Id=myUsername;Password=myPassword;Port=1433;Integrated Security=False;Connection Timeout=30;.</p></td></tr><tr><td>Driver</td><td>Select an existing driver or upload a new driver to ensure that the communication between the application and the database is efficient, secure, and follows the required standards.</td></tr><tr><td>Database Name</td><td>The name of the database within the Microsoft SQL server that you want to connect with.</td></tr></tbody></table>

***

{% hint style="info" %}

#### Connect to Demo Data Sources

Follow the steps below to connect to one of the demo datasets. In this workshop we're going to connect to the Synthea dataset, stored on a PostgreSQL database:
{% endhint %}

1. To install the 'Synthea' demo datasource, click on the PostgreSQL tab below:

{% tabs %}
{% tab title="1. PostgreSQL" %}
{% hint style="info" %}
SyntheaTM is an open-source tool that generates synthetic patient data, simulating individuals' complete medical histories. This encompasses medications, allergies, encounters, and social health determinants for each mock patient.&#x20;
{% endhint %}

<figure><img src="/files/OjkytTmuJNcqrmNC8xn7" alt=""><figcaption><p>Synthea</p></figcaption></figure>

The generated data is free from legal and privacy concerns.

<figure><img src="/files/RrUfIkMobxfEO66wT9eW" alt=""><figcaption><p>synthea - ERD</p></figcaption></figure>

{% hint style="warning" %}
To watch the videos please copy and paste the website URL into your **host** Chrome browser.
{% endhint %}

{% tabs %}
{% tab title="English" %}
{% hint style="info" %}
Create a connection to the Synthea dataset, then ingest the database schema.
{% endhint %}

{% embed url="<https://www.loom.com/share/b622643492b0415787c75f7cec6ea16b?hideEmbedTopBar=true&hide_share=true&hide_title=true&sid=08c81f8a-8b47-4e8a-91a9-1b18ad36c95chide_owner=true>" %}
Synthea dataset
{% endembed %}
{% endtab %}

{% tab title="El Español" %}

{% endtab %}

{% tab title="Français" %}

{% endtab %}

{% tab title="Italiano" %}

{% endtab %}

{% tab title="Deutsch" %}

{% endtab %}

{% tab title="日本語" %}

{% endtab %}

{% tab title="简体中文" %}

{% endtab %}
{% endtabs %}

***

Follow the steps below to connect and ingest the schema metadata:

{% tabs %}
{% tab title="1.1 Ingest Metadata" %}
{% hint style="info" %}

#### Test Connection and Ingest Metadata Schema ..

After you have specified the detailed information according to your data source type, test the connection to the data source and add the data source.
{% endhint %}

1. Enter the following details to connect to: PostgreSQL business\_apps\_db (Synthea) database.

<table><thead><tr><th width="226">Field</th><th>Setting</th></tr></thead><tbody><tr><td>Data Source Name</td><td>postgresql:synthea</td></tr><tr><td>Data Source ID</td><td>Leave Blank to autogenerate</td></tr><tr><td>Description</td><td>Demo dataset of patients medical records</td></tr><tr><td>Data Source Type</td><td>PostgreSQL</td></tr><tr><td>Affinity</td><td>Default</td></tr><tr><td>Configuration Method</td><td>Credentials</td></tr><tr><td>   Username</td><td>sqlreader</td></tr><tr><td>   Password</td><td>2Petabytes</td></tr><tr><td>   *Host</td><td>pdc.pdc.lab</td></tr><tr><td>   Port</td><td>5432</td></tr><tr><td>**Driver</td><td>postgresql-42.7.1.jar</td></tr><tr><td>Database Name</td><td>business_apps_db</td></tr></tbody></table>

{% hint style="warning" %}
\*Enter server IP address or FQDN.

\*\*PDC does not ship with any database drivers.

To upload JDBC drivers follow the instructions in tab: [**1.2 Upload JDBC drivers**](#step-1.2-upload-jdbc-drivers)
{% endhint %}

2. After you have specified the detailed information according to your data source type, test the connection to the data source and add the data source.
3. Click Test Connection to test your connection to the specified data source.&#x20;
4. Take a look at the 'workers' to check for any issues.

<figure><img src="/files/dVQmI7zL24lXwIYYNenP" alt=""><figcaption><p>Worker - Test Connection</p></figcaption></figure>

{% hint style="warning" %}
Prior to completing and saving your new data source setup, it's essential to execute the 'Ingest Schemas' process. This step is crucial for importing the database schema and associated metadata into the system.
{% endhint %}

5. Click Ingest Schema, select the 'synthea' schema, and then click Ingest Schemas.

<figure><img src="/files/r1FjiiFk66byojgjCetM" alt=""><figcaption><p>Select schemas</p></figcaption></figure>

{% hint style="info" %}
While you have the option to select all schemas, it is advisable to exclude system-related schemas that are not relevant to your requirements.
{% endhint %}

<figure><img src="/files/gXo6eDA0627g5N0tZghf" alt=""><figcaption><p>Ingesting Schemas</p></figcaption></figure>

6. (Optional) Enter a Note for any information you need to share with others who might access this data source.
7. Click: Create Data Source to establish your data source connection.

<figure><img src="/files/sSALCeVy24YfWsumpnBM" alt=""><figcaption><p>postgresql:synthea connection</p></figcaption></figure>

<figure><img src="/files/18NGx1CqTgojhG3G42my" alt=""><figcaption><p>Connection details</p></figcaption></figure>
{% endtab %}

{% tab title="1.2 Upload JDBC Drivers" %}
{% hint style="warning" %}
PDC does not ship with JDBC drivers. You will need to download the required driver from the vendor site.
{% endhint %}

**To upload JDBC drivers**

1. Click on Manage Drivers.

<figure><img src="/files/HPW8PqOvb7EfARqQNhbp" alt=""><figcaption><p>Manage Drivers</p></figcaption></figure>

2. Click on 'Add New'.

<figure><img src="/files/MdNUPraoakAkNLrXQpWl" alt=""><figcaption><p>Add New</p></figcaption></figure>

3. Select Database type: POSTGRES

<figure><img src="/files/xlldaYxcVnCqpbyKp2YE" alt=""><figcaption><p>Drag &#x26; Drop JDBC driver to upload</p></figcaption></figure>

{% hint style="info" %}
The postgresql-42.7.1.jar driver is located:

\~/Workshop--Pentaho-Data-Catalog/Database-Drivers
{% endhint %}

4. Click Add Driver.
5. Click Close & return to: [**Ingest Metadata**](#id-1.1-ingest-metadata)
   {% endtab %}

{% tab title="1.3 pgAdmin" %}

#### Install & Configure pgAdmin4

{% hint style="info" %}
pgAdmin is an open-source administration and development platform for PostgreSQL.
{% endhint %}

1. Ensure all the existing packages are up-to-date.

```bash
sudo apt update && sudo apt upgrade -y
```

2. Install the public key for the PgAdmin4 repository.

```bash
curl -fsS https://www.pgadmin.org/static/packages_pgadmin_org.pub | sudo gpg --dearmor -o /usr/share/keyrings/packages-pgadmin-org.gpg
```

3. Create the repository configuration file.

```bash
sudo sh -c 'echo "deb [signed-by=/usr/share/keyrings/packages-pgadmin-org.gpg] https://ftp.postgresql.org/pub/pgadmin/pgadmin4/apt/$(lsb_release -cs) pgadmin4 main" > /etc/apt/sources.list.d/pgadmin4.list && apt update'
```

4. Choose your preferred mode for PgAdmin4 installation.

{% hint style="info" %}
Recommended to install desktop mode only.
{% endhint %}

• For both desktop and web modes:

```bash
sudo apt install pgadmin4
```

• For desktop mode only:

```bash
sudo apt install pgadmin4-desktop
```

• For web mode only:

```bash
sudo apt install pgadmin4-web
```

***

#### Connect to Synthea database

1. Start pgAdmin desktop.

<figure><img src="/files/usd04Kj9c1wkUUsLFpf7" alt=""><figcaption><p>pgAdmin4</p></figcaption></figure>

2. Click on Add New Server button and enter the information of your remote server.

<figure><img src="/files/wrYj4h21SEVFhefuoe0f" alt=""><figcaption><p>Register server</p></figcaption></figure>

<table><thead><tr><th width="189">Field</th><th>Setting</th></tr></thead><tbody><tr><td>Name</td><td>Synthea</td></tr><tr><td>Host name</td><td>localhost</td></tr><tr><td>Port</td><td>5432</td></tr><tr><td>Username</td><td>sqlreader</td></tr><tr><td>Password</td><td>2Petabytes</td></tr></tbody></table>

3. View the data in the synthea schema.

<figure><img src="/files/ltuu0D9IqWTvInUmspTf" alt=""><figcaption><p>Synthea schema - patients table</p></figcaption></figure>
{% endtab %}

{% tab title="1.4 DBeaver" %}
{% hint style="info" %}
DBeaver Community is a free cross-platform database tool for developers, database administrators, analysts, and everyone working with data. It supports all popular SQL databases like MySQL, MariaDB, PostgreSQL, SQLite, Apache Family, and more.
{% endhint %}

1. Easiest way to install DBeaver-ce is to use Snap.

```bash
sudo snap install dbeaver-ce
```

To create a connection to the Sythea Postgres database

1. Select PostgreSQL database & click Next.

<figure><img src="/files/wEFL2Nm4nWgJgx8C0ij8" alt=""><figcaption><p>PostgreSQL connection to Sythea database</p></figcaption></figure>

2. Enter the following coneection details:

<table><thead><tr><th width="148"></th><th></th></tr></thead><tbody><tr><td>Connect by URL</td><td>jdbc:postgresql://pdc.pentaho.example:5432/businessapps_db</td></tr><tr><td>Username</td><td>sqlreader</td></tr><tr><td>Password</td><td>2Petabytes</td></tr></tbody></table>

3. 'Test Connection' & download driver version 42.7.2

<figure><img src="/files/04j99Nn3AOnZn7Aog10Z" alt=""><figcaption><p>PostgreSQL driver</p></figcaption></figure>

3. Click Finish

<figure><img src="/files/msE6RSLorjy1aK2B3mNz" alt=""><figcaption><p>Test Connection</p></figcaption></figure>

4. Click OK.

<figure><img src="/files/JyoNZtdZ4L8WCakzydkv" alt=""><figcaption><p>Connection Test</p></figcaption></figure>

5. Expand Databases -> Schemas

<figure><img src="/files/ypaS63PTUf39bv1QIOpN" alt=""><figcaption><p>patients table</p></figcaption></figure>
{% endtab %}
{% endtabs %}
{% endtab %}

{% tab title="2. Microsoft SQL Server" %}
{% hint style="info" %}
AdventureWorks database supports standard online transaction processing scenarios for a fictitious bicycle manufacturer - Adventure Works Cycles.&#x20;

Scenarios include:

* Human Resources - HumanResources
* Contact Information - Person
* Manufacturing - Production&#x20;
* Purchasing - Purchasing
* Sales - Sales
  {% endhint %}

<figure><img src="/files/9xrCvPKR1SC5nNAdy4T2" alt=""><figcaption><p>AdventureWorks2019 - ERD</p></figcaption></figure>

{% hint style="warning" %}
To watch the videos please copy and paste the website URL into your **host** Chrome browser.
{% endhint %}

{% tabs %}
{% tab title="English" %}
{% hint style="info" %}
Create a connection to the AdventureWorks2019 dataset, then ingest the schema.
{% endhint %}
{% endtab %}

{% tab title="El Español" %}

{% endtab %}

{% tab title="Français" %}

{% endtab %}

{% tab title="Italiano" %}

{% endtab %}

{% tab title="Deutsch" %}

{% endtab %}

{% tab title="日本語" %}

{% endtab %}

{% tab title="简体中文" %}

{% endtab %}
{% endtabs %}

***

Follow the steps below to connect and ingest the schema metadata:

{% tabs %}
{% tab title="1. Ingest Metadata" %}

#### Test Connection and Ingest Metadata Schema ..

After you have specified the detailed information according to your data source type, test the connection to the data source and add the data source.

1. Enter the following details to connect to: MSSQL AdventureWorks2019 database.

<table><thead><tr><th width="226">Field</th><th>Setting</th></tr></thead><tbody><tr><td>Data Source Name</td><td>mssql:adventureworks2019</td></tr><tr><td>Data Source ID</td><td>Leave Blank to autogenerate</td></tr><tr><td>Description</td><td>Demo dataset of fictitious bicycle manufacturer</td></tr><tr><td>Data Source Type</td><td>Microsoft SQL Server</td></tr><tr><td>Affinity</td><td>Default</td></tr><tr><td>Configuration Method</td><td>Credentials</td></tr><tr><td>   Username</td><td>sqlreader</td></tr><tr><td>   Password</td><td>2Petabytes</td></tr><tr><td>   Host</td><td>pdc.pdc.lab</td></tr><tr><td>   Port</td><td>1433</td></tr><tr><td>Driver</td><td>mssql-jdbc-9.2.1.jre15.jar</td></tr><tr><td>Database Name</td><td>AdventureWorks2019</td></tr></tbody></table>

After you have specified the detailed information according to your data source type, test the connection to the data source and add the data source.

2. Click Test Connection to test your connection to the specified data source.

{% hint style="info" %}
Prior to completing and saving your new data source setup, it's essential to execute the 'Ingest Schemas' process. This step is crucial for importing the database schema and associated metadata into the system.
{% endhint %}

3. Click Ingest Schema, select the following 5 schemas, and then click Ingest Schemas.

<figure><img src="/files/182bctjZjwgp1T0NZq49" alt=""><figcaption><p>Select schemas</p></figcaption></figure>

{% hint style="info" %}
While you have the option to select all schemas, it is advisable to exclude system-related schemas that are not relevant to your requirements.
{% endhint %}

4. (Optional) Enter a Note for any information you need to share with others who might access this data source.
5. Click: Create Data Source to establish your data source connection.

<figure><img src="/files/Hx7HhryJTEcmvFek2MpD" alt=""><figcaption><p>mssql:adventureworks2019 connection</p></figcaption></figure>
{% endtab %}

{% tab title="2. Azure Data Studio" %}
For Linux folks you can access the MSSQL AdventureWorks2019 database with Azure Data Studio.

1. Ensure all the existing packages are up-to-date.

```bash
sudo apt update && sudo apt upgrade
```

2. Ensure dependencies are up-to-date.

```bash
sudo apt install libunwind8
```

3. Download Deb binary available on the official website: [**Azure Data Studio**](https://learn.microsoft.com/en-us/azure-data-studio/download-azure-data-studio?tabs=win-install%2Cwin-user-install%2Credhat-install%2Cwindows-uninstall%2Credhat-uninstall)
4. Extract the .deb file.

```bash
cd ~
sudo dpkg -i ./Downloads/azuredatastudio-linux-<version string>.deb
```

***

#### Connect to AdventureWorks2019 database

1. Start Azure Data Studio.

```bash
azuredatastudio
```

2. Select: Connections (first icon in left menu).

<figure><img src="/files/WD9M6gLxtkHHlwmoiS5F" alt=""><figcaption></figcaption></figure>

3. Select SQL Login
4. Enter the following details:

<table><thead><tr><th width="247">Field</th><th>Setting</th></tr></thead><tbody><tr><td>Connection type</td><td>Microsoft SQL Server</td></tr><tr><td>Input type</td><td>Parameters</td></tr><tr><td>*Server</td><td>localhost,1433</td></tr><tr><td>Authentication type</td><td>SQL Login</td></tr><tr><td>User name</td><td>sqlreader</td></tr><tr><td>Password</td><td>2Petabytes</td></tr><tr><td>Database</td><td>AdventureWorks2019</td></tr><tr><td>Encrypt</td><td>Mandatory (True)</td></tr><tr><td>Trust server certificate</td><td>True</td></tr><tr><td>Server group</td><td>&#x3C;Default></td></tr><tr><td>Name (optional)</td><td>AdventureWorks2019</td></tr></tbody></table>

{% hint style="warning" %}
\*Enter server IP address or FQDN.

\*\*PDC does not ship with any database drivers.

To upload JDBC drivers follow the instructions in tab: [**1.2 Upload JDBC drivers**](#step-1.2-upload-jdbc-drivers)
{% endhint %}

5. Click: Connect.

<figure><img src="/files/AYuNu679OcEfn6kJTAd1" alt=""><figcaption><p>Person Schema - Person Address</p></figcaption></figure>
{% endtab %}
{% endtabs %}
{% endtab %}

{% tab title="3. Oracle" %}
{% hint style="info" %}
SyntheaTM is an open-source tool that generates synthetic patient data, simulating individuals' complete medical histories. This encompasses medications, allergies, encounters, and social health determinants for each mock patient.&#x20;

{% endhint %}

x

x

x
{% endtab %}

{% tab title="4. MySQL" %}
Arlojet database is an airline demo dataset. You can query the data based on:

* Passengers
* Ticketing
* Weather
* Aircraft
* Catering

<figure><img src="/files/wB0Y3WKqTbBjNpLJuopL" alt=""><figcaption><p>Arlojet database</p></figcaption></figure>

{% hint style="warning" %}
To watch the videos please copy and paste the website URL into your **host** Chrome browser.
{% endhint %}

{% tabs %}
{% tab title="English" %}
{% hint style="info" %}
Create a connection to the Arlojet dataset, then ingest the schema.
{% endhint %}

x
{% endtab %}
{% endtabs %}

***

Follow the steps below to connect and ingest the schema metadata:

{% tabs %}
{% tab title="Ingest Metadata" %}
{% hint style="info" %}

#### Test Connection and Ingest Metadata Schema ..

After you have specified the detailed information according to your data source type, test the connection to the data source and add the data source.
{% endhint %}

1. Enter the following details to connect to: MySQL arlojet database.

<table><thead><tr><th width="226">Field</th><th>Setting</th></tr></thead><tbody><tr><td>Data Source Name</td><td>mysql:arlojet</td></tr><tr><td>Data Source ID</td><td>Leave Blank to autogenerate</td></tr><tr><td>Description</td><td>Demo dataset  of airline / passenger data</td></tr><tr><td>Data Source Type</td><td>MySQL</td></tr><tr><td>Affinity</td><td>Default</td></tr><tr><td>Configuration Method</td><td>Credentials</td></tr><tr><td>   Username</td><td>sqlreader</td></tr><tr><td>   Password</td><td>2Petabytes</td></tr><tr><td>   Host</td><td>pdc.pdc.lab</td></tr><tr><td>   Port</td><td>3306</td></tr><tr><td>Driver</td><td>mysql-connector-j-8.2.0.jar</td></tr></tbody></table>

After you have specified the detailed information according to your data source type, test the connection to the data source and add the data source.

2. Click Test Connection to test your connection to the specified data source.

{% hint style="info" %}
Before you finalize and save your new data source configuration, you need to perform a process called Ingest Schemas. This process loads fundamental information about the database schema and related metadata into the system.
{% endhint %}

3. Click Ingest Schema, select the 'arlojet' schema, and then click Ingest Schemas.

<figure><img src="/files/8WWFrXiUTjJyu7MLxIX1" alt=""><figcaption><p>Select arlojet schema</p></figcaption></figure>

{% hint style="info" %}
Although you can select all schemas, it is recommended to avoid selecting certain system-related schemas that are unnecessary for your needs.
{% endhint %}

4. (Optional) Enter a Note for any information you need to share with others who might access this data source.
5. Click Create Data Source to establish your data source connection.

<figure><img src="/files/TJEL3DMQzuarmkhoyfXv" alt=""><figcaption><p>mysql:arlojet connection</p></figcaption></figure>

{% hint style="info" %}

{% endhint %}
{% endtab %}

{% tab title="MySQL Workbench" %}

#### Install & Configure Schemaworkbench

MySQL Workbench is a is a graphical MySQL database management tool.

1. Ensure all the existing packages are up-to-date.

```bash
sudo apt update && sudo apt upgrade
```

2. Install MySQL Workbench.

```bash
sudo snap install mysql-workbench-community
```

***

#### Connect to MySQL Workbench

1. Select “Applications” from the menu.
2. Search for the MySQL workbench application, and then launch it.
3. Edit the default connection.

<figure><img src="/files/07TNE5u1JhhRX61qvNVV" alt=""><figcaption><p>Default connection</p></figcaption></figure>

4. Enter the following connection details:

<table><thead><tr><th width="205"></th><th></th></tr></thead><tbody><tr><td>Connection Name</td><td>arlojet</td></tr><tr><td>   Username</td><td>sqlreader</td></tr><tr><td>   Password</td><td>2Petabytes</td></tr><tr><td>Default Schema</td><td>arlojet</td></tr></tbody></table>

5. Click 'Test Connection'.

<figure><img src="/files/SRCqyRc67x5etLxVzOMN" alt=""><figcaption><p>Test Connection</p></figcaption></figure>

6. Click Close.

***

#### Connect to Arlojet database

1. Check for arlojet database.

<figure><img src="/files/Ms5lTJNpK8XfW5hLlXAQ" alt=""><figcaption><p>SHOW DATABASES</p></figcaption></figure>

2. Select the option: Schemas & expand Tables

<figure><img src="/files/Q6ztZ6A3xXE5C3UNHYCk" alt=""><figcaption><p>View data</p></figcaption></figure>
{% endtab %}
{% endtabs %}
{% endtab %}

{% tab title="5. Object Store" %}
MinIO is a high-performance, Kubernetes-native object storage service that is designed for cloud-native and containerized applications. It is open-source and allows enterprises to build Amazon S3-compatible data storage solutions on-premises, integrating smoothly with a wide range of cloud-native ecosystems.&#x20;

* **Banking -** Chat bot data.&#x20;
* **Football -**&#x20;
* **IoT Sensor -**&#x20;

<figure><img src="/files/pC6E93wSg4XJbz6zZV5w" alt=""><figcaption><p>MinIO Object Store</p></figcaption></figure>

{% hint style="warning" %}
To watch the videos please copy and paste the website URL into your **host** Chrome browser.
{% endhint %}

{% tabs %}
{% tab title="English" %}
{% hint style="info" %}
Create a connection to the datasets in the minIo object store, then Scan the data.
{% endhint %}

x
{% endtab %}
{% endtabs %}

***

Follow the steps below to connect and ingest the schema metadata:

{% tabs %}
{% tab title="Ingest Metadata" %}

#### Test Connection and Ingest Metadata Schema ..

After you have specified the detailed information according to your data source type, test the connection to the data source and add the data source.

<table><thead><tr><th width="189">Field</th><th>Description</th></tr></thead><tbody><tr><td>Affinity</td><td>This default setting specifies which agents should be associated with the data source in a multi-agent deployment.</td></tr><tr><td>Region</td><td>Geographical location where AWS maintains a cluster of data centers.</td></tr><tr><td>Endpoint</td><td>Location of the bucket. For example, s3.&#x3C;region containing S3 bucket>.amazonaws.com</td></tr><tr><td>Access Key</td><td>User credential to access data on the bucket.</td></tr><tr><td>Secret Key</td><td>Password credential to access data on the bucket.</td></tr><tr><td>Bucket Name</td><td><p>The name of the S3 bucket in which the data resides. For S3 access from non-EMR file systems, Data Catalog uses the AWS command line interface to access S3 data.</p><p>These commands send requests using access keys, which consist of an access key ID and a secret access key. </p><p>You must specify the logical name for the cluster root.</p><p>This value is defined by dfs.nameservices in the hdfssite.xml configuration file. </p><p>For S3 access from AWS S3 and MapR file systems, you must identify the root of the MapR file system with maprfs:///.</p></td></tr><tr><td>Path</td><td>Directory where this data source is included.</td></tr></tbody></table>

1. Enter the following details to connect to: minIO 'Banking' Object Store.

<table><thead><tr><th width="226">Field</th><th>Setting</th></tr></thead><tbody><tr><td>Data Source Name</td><td>minIO:sensor</td></tr><tr><td>Data Source ID</td><td>Leave Blank to autogenerate</td></tr><tr><td>Description</td><td>Demo IoT sensor-data</td></tr><tr><td>Data Source Type</td><td>AWS S3</td></tr><tr><td>Affinity</td><td>Default</td></tr><tr><td>Region</td><td>us-east-1</td></tr><tr><td>Endpoint</td><td><a href="http://172.17.0.1:9000">http://172.17.0.1:9000</a></td></tr><tr><td>Access Key</td><td>minioadmin</td></tr><tr><td>Secret Key</td><td>minioadmin</td></tr><tr><td>Bucket Name</td><td>iot-sensors-data-lake</td></tr><tr><td>Path</td><td>/</td></tr></tbody></table>

After you have specified the detailed information according to your data source type, test the connection to the data source and add the data source.

2. Click Test Connection to test your connection to the specified data source.

{% hint style="info" %}
Before you finalize and save your new data source configuration, you need to perform a process that scans the datasource. This process loads metadata and related information into the system.
{% endhint %}

3. Click: Scan Files.

{% hint style="info" %}
Obviously, as this is a flat file datasource, a 'Lite' scan retrieves just the file metadata.
{% endhint %}

<figure><img src="/files/Vih0cDIuf1hxDUdtM4Vg" alt=""><figcaption><p>Scans</p></figcaption></figure>

4. (Optional) Enter a Note for any information you need to share with others who might access this data source.
5. Click Create Data Source to establish your data source connection.

<figure><img src="/files/7p5GltQOBgGpzuV6EVTD" alt=""><figcaption><p>minIO:banking connection</p></figcaption></figure>
{% endtab %}

{% tab title="Log into MinIO" %}

#### MinIO

The MinIO Console displays a login screen for unauthenticated users. The Console defaults to providing a username and password prompt for a minIO-managed user.

1. Either click on the bookmark or enter the following URL to Log into minIO.

{% embed url="<http://172.17.0.1:9001/login>" %}
Link to MinIO
{% endembed %}

<figure><img src="/files/1w7pDPvXxWgbFP45cb2A" alt=""><figcaption><p>minIO - Log In</p></figcaption></figure>

| Username | minioadmin |
| -------- | ---------- |
| Password | minioadmin |

***

#### Managing Objects

The Object Browser lists the buckets and objects the authenticated user has access to on the deployment.

After logging in or navigating to the tab, the object browser displays a list of the user’s buckets, which the user can filter.

1. Select 'Buckets' from the left hand menu.
2. Browse 'banking-data' bucket to show a list of objects in the bucket.

<figure><img src="/files/pC6E93wSg4XJbz6zZV5w" alt=""><figcaption><p>MinIO - Buckets</p></figcaption></figure>

3. Highlight: banking77.csv

<figure><img src="/files/Q847hNC5Y1IH57u7yVjP" alt=""><figcaption><p>CSV file</p></figcaption></figure>

{% hint style="info" %}
The user can perform actions on the bucket’s objects, depending on the policies and permissions that apply. Example actions the user may be able to perform include:

• Rewind to a previous version

• Create prefixes

• View deleted objects

• Upload objects

• Download objects

• Share

• Preview

• Manage legal holds

• Manage retention

• Manage tags

• Inspect

• Display versions

• Delete
{% endhint %}

{% embed url="<https://min.io/docs/minio/container/administration/minio-console.html>" %}
Link to MinIO documentation
{% endembed %}
{% endtab %}
{% endtabs %}
{% endtab %}

{% tab title="6. Snowflake" %}
{% hint style="info" %}

{% endhint %}

x

x

x

x

x

{% endtab %}
{% endtabs %}


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://academy.pentaho.com/pentaho-data-catalog-en/data-catalog/data-discovery/connect-aw-database/other-data-sources.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
