display-medicalPDI to Jupyter Notebook

circle-exclamation

Workshop - PDI to Jupyter Notebook

Pipeline
circle-info

Quick overview of the pipeline:

  • Execute a PDI pipeline with sample sales_data.csv - from datasets folder

  • The file output to the pdi-output folder triggers the Jupyter Notebook to

  • Load the data - csv files from pdi-output - analyze and visualize the results

  • Export the results to the reports folder


circle-info

Create a new Transformation

Any one of these actions opens a new Transformation tab for you to begin designing your transformation.

  • By clicking File > New > Transformation

  • By using the CTRL-N hot key

Select the Host Docker OS:

circle-info

PDI to Jupyter Notebook

circle-info

Setup Verification

Before building the PDI pipeline, verify everything works by running the sample notebook.

  1. Verify Python Packages are Installed.

circle-info

Python packages (watchdog, xlsxwriter) are automatically installed when the container starts via the post-start.sh startup script.

If packages are missing, check the container logs: docker logs jupyter-datascienc

  1. Verify Test Files Exist (Inside the Container).


Run the Sales Analysis Notebook

  1. In Jupyter Lab, navigate to notebooks/ in the file browser

  2. Open sales_analysis.ipynb

  3. Run each cell in order (Shift+Enter or use the Run menu)

  4. The notebook will:

    • Load sales_data.csv from /home/jovyan/datasets/

    • Generate a 4-panel Sales Analysis Dashboard

    • Calculate Key Metrics (revenue, average order value, profit margin)

    • Export an Excel report to /home/jovyan/reports/

sales_analysis.ipynb
  1. Check the Output Report

Open the Excel file and verify it has two sheets:

  • Summary - Key metrics (Total Revenue, Average Order Value, etc.)

  • Detailed Data - Full processed dataset

sales_analysis

Last updated

Was this helpful?