# Jupyter Notebook

{% hint style="info" %}

#### **Jupyter Notebook**

Jupyter Notebook is an open-source web application that allows you to create and share documents containing live code, equations, visualizations, and narrative text. Originally developed for Python (the name comes from Julia, Python, and R), it now supports over 40 programming languages.
{% endhint %}

<figure><img src="/files/E2nC6gmHjRNO6dIGX9Nn" alt=""><figcaption><p>Jupyer Notebook</p></figcaption></figure>

{% tabs %}
{% tab title="Linux" %}
{% hint style="info" %}

#### Jupyter Notebook - Docker

Running Jupyter Notebook in a Linux Docker container provides a portable, reproducible environment for interactive data science and development work. By pulling an official Jupyter Docker image (such as `jupyter/scipy-notebook` or `jupyter/minimal-notebook`) or building a custom one from a base Linux image with Jupyter installed via pip, you can launch a fully functional notebook server isolated from your host system.&#x20;

The container typically exposes port 8888, which you map to your host using `docker run -p 8888:8888`, and Jupyter outputs a token-based URL for browser access. This approach ensures consistent dependencies across machines, simplifies environment setup, and allows easy teardown and recreation - making it ideal for reproducible research, team collaboration, and CI/CD pipelines where environment parity matters.
{% endhint %}

{% tabs %}
{% tab title="1. Directories" %}
{% hint style="info" %}

#### Directories

**`~/Jupyter-Notebook/`**

* **`datasets/`**
  * `sales_data.csv` — Sample sales transaction data for analysis exercises
  * `orders.csv` — Sample order data used in workshop scenarios
* **`notebooks/`**
  * `sales_analysis.ipynb` — Jupyter notebook with sales data analysis examples
  * `welcome.ipynb` — Introductory notebook to verify the environment is working
* **`pdi-output/`**
  * `README.md` — Placeholder documenting the purpose of this directory for PDI output files
* **`reports/`** — Directory for generated report outputs
* **`scripts/`**
  * `docker-compose.yml` — Defines the Jupyter container configuration, port mappings, and volume mounts
  * `run-docker-jupyter.sh` — Shell script to start, stop, and check status of the Jupyter container (auto-detects `docker compose` vs legacy `docker-compose`)
  * `file_watcher.py` — Python script that monitors directories for file changes
  * `post-start.sh` — Script that runs automatically after the container starts for additional setup tasks
* **`workshop-data/`** — Directory for additional workshop-related data files
  {% endhint %}

1. Update APT package index.

```bash
sudo apt update
```

2. Run setup script.

```bash
cd
cd ~/Workshop--Data-Integration/Setup/Jupyter\ Notebook/linux/
chmod +x ./copy-jupyter.sh && ./copy-jupyter.sh
```

<figure><img src="/files/W01O9jnDBPrgFtkMGmy5" alt=""><figcaption><p>copy-jupyter.sh</p></figcaption></figure>

{% hint style="info" %}
**What the script does:**

1. Creates `~/Jupyter-Notebook/` with sub-directories: `datasets/`, `notebooks/`, `pdi-output/`, `reports/`, `scripts/`, `workshop-data/`
2. Copies `sales_data.csv` and `orders.csv` into `datasets/`
3. Copies `sales_analysis.ipynb` and `welcome.ipynb` into `notebooks/`
4. Copies `docker-compose.yml`, `run-docker-jupyter.sh`, `file_watcher.py`, and `post-start.sh` into `scripts/`
5. Creates a `README.md` inside `pdi-output/`
6. Sets correct file permissions (755 for scripts, 644 for data)cd&#x20;
   {% endhint %}
   {% endtab %}

{% tab title="2. Notebook Docker Container" %}
{% hint style="info" %}

#### Notebook Docker Container

The script locates the docker-compose.yml file inside the scripts/ subdirectory of the Jupyter working directory, then delegates all container management to Docker Compose. It auto-detects whether the system uses the modern "docker compose" plugin or the legacy standalone "docker-compose" binary.
{% endhint %}

1. Start the Jupyter Notebook container

<pre class="language-bash"><code class="lang-bash"># Navigate to the scripts directory
<strong>cd
</strong><strong>cd ~/Jupyter-Notebook/scripts/
</strong>chmod +x run-docker-jupyter.sh &#x26;&#x26; ./run-docker-jupyter.sh start
</code></pre>

{% hint style="info" %}
**What happens:**

* Docker pulls the `jupyter/scipy-notebook:latest` image (first time only, \~3 GB)
* Creates and starts a container named `jupyter-datascience`
* Maps port 8888 on the host to port 8888 in the container
* Bind-mounts the host directories into the container
  {% endhint %}

<figure><img src="/files/tNqXoYwTH2Vt9hwiIUZR" alt=""><figcaption><p>run-docker-jupyter.sh</p></figcaption></figure>

2. Verify the container is running.

```bash
# Check container status
./run-docker-jupyter.sh status

# Or use Docker directly
docker ps --filter name=jupyter-datascience
```

{% hint style="info" %}
You should see the container listed with status `Up` and port `0.0.0.0:8888->8888/tcp`
{% endhint %}

***

**Access Jupyter Lab.**

1. Open your web browser and navigate to:

{% embed url="<http://localhost:8888>" %}

{% hint style="info" %}
When prompted:

* **Token:** `datascience`
* **Password:** Set a password of your choice (optional)

You should see the Jupyter Lab interface with the mounted directories visible in the file browser sidebar.
{% endhint %}

<figure><img src="/files/a7ZP94MrRfzW4JqzXHN8" alt=""><figcaption><p>Enter Token &#x26; Password: datascience / password</p></figcaption></figure>

<figure><img src="/files/OtBHSYjwjFPdu3Y2nBaI" alt=""><figcaption><p>Jupyter Notebook</p></figcaption></figure>
{% endtab %}
{% endtabs %}
{% endtab %}

{% tab title="Windows" %}
x

x

1. Create a Jupyter Notebook folder and copy the required files.

{% hint style="info" %}
**Create directory & copy - PowerShell**

```powershell
cd \
& "C:\Workshop--Data-Integration\Setup\Jupyter-Notebook\copy-jupyter.ps1"
```

{% endhint %}

2. Check the Directory has been created and the files copied over.
3. Execute the docker-compose script to create the container.

{% hint style="info" %}
**Jupyter Notebook Container - PowerShell**

```powershell
cd \
cd "C:\Jupyter-Notebook\scripts"
.\run-docker-jupyter.ps1
```

{% endhint %}

4. Check the container is up and running in Desktop Docker.

<figure><img src="/files/wzCZkePAoi26No0za2Np" alt=""><figcaption><p>Deploy container - condensed screensho!</p></figcaption></figure>

5. Log on to Jupyter Notebook UI:

{% embed url="<http://localhost:8888/lab>" %}
Link to Jupyter Notebook UI
{% endembed %}

6. You will need to enter:

Token: datascience

Password: password

<figure><img src="/files/q0GG7SrqaSvVxXreK5or" alt=""><figcaption><p>Enter Token and set password</p></figcaption></figure>

<figure><img src="/files/GwiVtYPtySBC1AtZc0Xp" alt=""><figcaption><p>Jupyter UI</p></figcaption></figure>

***

#### Test

1. Expand the notebooks folder, and Open the welcome.ipynb notebook.

<figure><img src="/files/pN8CxtdmuntsE6pb2bf3" alt=""><figcaption><p>Open: welcome.ipynb</p></figcaption></figure>

{% hint style="info" %}
Check you have a Python Kernel.

To set as default, click on the Python 3 kernel | idle link:

<img src="/files/eebYYPNvbiZFLd8Wh3jY" alt="" data-size="original">
{% endhint %}

<figure><img src="/files/9tsZV8UeLegeo4snJG2e" alt=""><figcaption><p>welcome.ipynb</p></figcaption></figure>
{% endtab %}
{% endtabs %}


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://academy.pentaho.com/pentaho-data-integration/setup/data-sources/jupyter-notebook.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
