PDI to Jupyter Notebook
Workshop - PDI to Jupyter Notebook
This workshop demonstrates how to create a Pentaho Data Integration (PDI) pipeline that processes sales data and automatically triggers analysis in Jupyter Notebook when the output file is saved.
The topics were going to cover:
Creating a Jupyter Notebook
Installing required Python packages:
jupyter,watchdog,xslxwriterCreate a PDI pipeline: sales_data.csv file
Create a File Watcher script



Please ensure you have completed the following setup: Jupyter Notebook.
Remember the Jupyter Notebook is running in a Docker container ..!
Install required Python packages:
Check for the test_sales_data.csv & sales_analysis.ipynb (still in container):
Open the sales_analysis.ipynb notebook and RUN each section:

Check for reports: C:\Jupyter-Notebook\reports\sales_analysis_timestamp.xlsx

Check you have 2 sheets: Summary & Detailed Data.
x
x
x
Start Pentaho Data Integration.
Create a New Transformation:
Drag & drop a CSV File input step onto the canvas.
Double-click on the step, and configure the following properties:
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
Last updated
Was this helpful?
