For the complete documentation index, see llms.txt. This page is also available as Markdown.

PDI to Jupyter Notebook

Workshop - PDI to Jupyter Notebook

Pipeline

Quick overview of the pipeline:

  • Execute a PDI pipeline with sample sales_data.csv - from datasets folder

  • The file output to the pdi-output folder triggers the Jupyter Notebook to

  • Load the data - csv files from pdi-output - analyze and visualize the results

  • Export the results to the reports folder


Create a new Transformation

Any one of these actions opens a new Transformation tab for you to begin designing your transformation.

  • By clicking File > New > Transformation

  • By using the CTRL-N hot key

Select the Host Docker OS:

PDI to Jupyter Notebook

Setup Verification

Before building the PDI pipeline, verify everything works by running the sample notebook.

  1. Verify Python Packages are Installed.

Python packages (watchdog, xlsxwriter) are automatically installed when the container starts via the post-start.sh startup script.

If packages are missing, check the container logs: docker logs jupyter-datascienc

  1. Verify Test Files Exist (Inside the Container).


Run the Sales Analysis Notebook

  1. In Jupyter Lab, navigate to notebooks/ in the file browser

  2. Open sales_analysis.ipynb

  3. Run each cell in order (Shift+Enter or use the Run menu)

  4. The notebook will:

    • Load sales_data.csv from /home/jovyan/datasets/

    • Generate a 4-panel Sales Analysis Dashboard

    • Calculate Key Metrics (revenue, average order value, profit margin)

    • Export an Excel report to /home/jovyan/reports/

sales_analysis.ipynb
  1. Check the Output Report

Open the Excel file and verify it has two sheets:

  • Summary - Key metrics (Total Revenue, Average Order Value, etc.)

  • Detailed Data - Full processed dataset

sales_analysis

Last updated

Was this helpful?