Prerequiste Tasks
Configure Colab & Data Integration for ML ..

This section is for Reference only.
The following tasks configure Pentaho Data Integration in a Linux environment.
Make sure all installed Packages are up-to-date.
Check to see if Python is installed.
Install the latest Python version
Only proceed to update your Python to the latest version if required.
Install dependencies.
Import key for PPA deadsnakes.
Add Repository.
Renew the cache, then find current Python version.
Install latest version.
Create symlink.
Different Python versions
The default version of python has been set to 3.10
- required by Apache AirFlow
List the python versions:
Set the Python version:
Then set the required version:
Ensure pip is installed.
Install ML libraries.
Install R from Ubuntu Repository
Update APT packages.
Install the R base package and its dependencies.
Check version.
Type
Rand hit enter to verify that R has been installed.
Using the R command without sudo creates a personal library for your user. To install packages available to every user on the system, run the R command as root by typing sudo -i R.
Type
q()to exit the R console.
Install missing dependency.
Reboot.
rJava
Check to see if Java is installed .. if so then move onto step 4.
Install the Java Runtime Environment (JRE).
Install the Java Development Kit (JDK).
Update where R expects to find various Java files.
In a 'R' Terminal.
Check rJava has successfully installed.
randomForest

In a R terminal.
Install randomForest package:
Type q() to quit the R console.
Click Yes to close the workplace image.
Close R.
RStudio
Visit the RStudio downloads page to grab the latest release.
Install Package.
Once installed, in a Terminal.

Set Environmental Variables
R_HOME
Path to the root directory of your R installation. Enter Sys.getenv("R_HOME") in the R console to get the path.
R_LIBS_USER
Path to the directory where R installs your packages.
Enter Sys.getenv("R_LIBS_USER") in the R console to get the path.
LD_LIBRARY_PATH
Used to load a libraries - libjri.so
PATH
Append the PATH variable with the directory that contains the R executable.
In a R Terminal
Edit the /etc/environment.
Copy & paste the values.
Ensure the path to the R/bin is added to PATH.
Save.
libjri.so
Copy libjri.so to PDI ../libswt/linux.
Reboot.
Test - R
Start Pentaho Data Integration.
Create the following transformation.

Copy and paste the following R script into the R Executor step.
Click on the 'Test Script' button.

Last updated
Was this helpful?
