> For the complete documentation index, see [llms.txt](https://academy.pentaho.com/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://academy.pentaho.com/pentaho-data-integration/data-integration/data-sources/jupyter-notebook.md).

# Jupyter Notebook

{% hint style="info" %}

#### **Jupyter Notebook**

Jupyter Notebook's cell-based interface creates an ideal environment for consuming and analyzing data processed through Pentaho Data Integration (PDI). The interactive coding structure allows data scientists to immediately visualize and explore PDI outputs, with results appearing below each executed cell, while documenting their analytical process through integrated Markdown explanations. This makes Jupyter the perfect downstream tool for leveraging PDI's data preparation work, enabling seamless transition from engineered datasets to advanced analytics and model development.
{% endhint %}

<figure><img src="/files/3mhqfMbQaO6GAlCyqRdR" alt=""><figcaption><p>Jupyter Notebook</p></figcaption></figure>

{% hint style="info" %}
The integration between PDI and Jupyter Notebook represents a powerful approach to enterprise data science that maximizes organizational efficiency. PDI serves as the robust data preparation engine, handling complex data blending, cleansing, and feature engineering operations that can be easily scaled and deployed to production environments.

These prepared datasets then flow seamlessly into Jupyter Notebook environments where data scientists can focus on their core expertise: model exploration, hyperparameter tuning, and advanced machine learning techniques. The notebook format perfectly complements PDI's structured outputs by providing an interactive workspace for hypothesis testing, visualization, and iterative model development.

This PDI-to-Jupyter workflow creates substantial competitive advantages for organizations. The clear separation of concerns accelerates time-to-market by allowing data engineers to optimize data pipelines in PDI while data scientists simultaneously develop models in Jupyter using previously processed datasets.

Solution quality improves through specialized tool usage, and team collaboration is enhanced as PDI's standardized outputs can be easily shared and consumed across multiple Jupyter environments. Most importantly, this integration reduces the data preparation burden on data scientists, allowing them to dedicate more time to advanced analytics while ensuring that data engineering work is properly leveraged throughout the organization's analytical workflows.
{% endhint %}


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://academy.pentaho.com/pentaho-data-integration/data-integration/data-sources/jupyter-notebook.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.