screwdriver-wrenchPrerequisite tasks

Configure Google Colab and Pentaho Data Integration (PDI) for machine learning.

circle-info

You will set up:

  • Python (and common ML libraries)

  • A Google Colab account

  • R (optional) and rJava (for R steps in PDI)

  • PDI environment variables for R integration

Google Colab

Colab runs Jupyter notebooks in your browser. It includes preconfigured runtimes for common ML libraries.

Google Colab
Colab

circle-info

These steps configure your environment to run ML pipelines in PDI.

Linux (Ubuntu/Debian)

circle-check
1

Install or verify Python

  1. Update packages:

  1. Verify Python:

chevron-rightOptional: install a newer Python version (deadsnakes PPA)hashtag

Only do this if you must upgrade Python.

If you need multiple versions, use update-alternatives:

2

Install ML libraries (venv)

Install these libraries:

  • h2o

  • pandas

  • numpy

  • matplotlib

  • py4j

Python package reference
h2o
3

Install R and rJava

R is required only if you plan to run R steps in PDI.

  1. Install R:

  1. Verify:

circle-exclamation
  1. Install & configure rJava:

circle-info

R CMD javareconf is a command-line tool used in the R programming environment to detect the current Java installation on your system and update R's configuration files to match it.

  1. Install rJava from source:

  1. Verify rJava:

chevron-rightOptional: install RStudiohashtag

Use RStudio only if you want a dedicated IDE.

  1. Download a .deb from the RStudio downloads pagearrow-up-right.

  2. Install it:

4

Configure PDI for R integration

Set environment variables

You can get the paths from R:

Edit /etc/environment and set the values for your system (new terminal):

Example:

Ensure PATH includes your /usr/lib/R/bin directory:

PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/usr/lib/R/bin"

Copy libjri.so into Spoon’s native lib directory:

5

Validate with a simple R transformation

triangle-exclamation
  1. Start PDI:

  1. Create a transformation with an R Script Executor step:

R Script Executor
  1. Use this script:

  1. Click Test Script:

Preview

Last updated

Was this helpful?