A full example of this repository can be found on my Github here.
I often find myself exploring data using Python, Jupyter Lab, Pandas and various graphing programs. This has caused me to want to have a light weight, easy to use, and portable approach to standing up these tools. In the past, I have found myself using tools like pyenv to install a desired version of python for each project. While there is nothing wrong with this approach, I generally prefer to have my project packaged up with Docker, and Docker Compose. Therefore, let this blog post show you how I’ve achieved just that.
Right out of the gate, we have to decide if we’re going to build our own base Docker image or leverage one that someone has already built and published. Fortunately, we do not have to create our own from scratch, as the Jupyter project as created a number of public images that you can choose from here. In this example, I’ve decided to use the image titled jupyter/minimal-notebook
, which contains:
- Everything in jupyter/base-notebook
- Common useful utilities like curl, git, nano (actually nano-tiny), tzdata, unzip, and vi (actually vim-tiny),
- TeX Live for notebook document conversion
If I was going to be leveraging a GPU, I’d look at the CUDA enabled variants, as they’ll have CUDA preinstalled.
From there we’re able to start writing our docker-compose.yaml
file, which will be the meat of our configuration. Let’s look at the entire file, and then break it down:
version: "3"
services:
jupyterlab:
image: jupyter/minimal-notebook
volumes:
- .:/home/jovyan/work
- ./configure_environment.sh:/usr/local/bin/before-notebook.d/configure_environment.sh
ports:
- 8888:8888
In this simple file, we define a single service called jupyterlab
, and point to the very image we described above. We take advantage of two volumes
entries:
- To ensure that the files we’re working on on our host OS are accessible in the container, we volume in the entire directory into the image’s default user’s home directory.
- To tie into this image’s Startup Hooks, we define and volume in a bash script called
configure_environment.sh
. This shell script will automatically be executed by the container before the Jupyter Lab process is started, allowing us perform some pre execution commands like installing our dependencies and preparing environment variables. That bash script looks like this:
#!/bin/bash
set -eo pipefail
IFS=$'\n\t'
# From: http://redsymbol.net/articles/unofficial-bash-strict-mode/
# Not using `set -u`, as this will not play nice with the image's usage of `JUPYTER_ENV_VARS_TO_UNSET`
# To be run inside the Docker container automatically.
# Ensures that the dependencies are installed when the container spins up
pip install -r ~/work/requirements.txt
export PYTHONPATH=PYTHONPATH:/home/jovyan/work
Finally we describe the ports we want to use, exposing port 8888
on the host OS, as well as binding to port 8888
which Jupyter Lab will use in the container.
We also expect to have a requirements.txt
file which describes our dependencies that we’ll leverage in our Notebook. For example:
pandas==2.2.1
python-dotenv==1.0.1
To start our container, we can just run $ docker compose up
, which will emit a message giving us a URL with an embedded token to authenticate with our service. For example, it might look like this:
jupyterlab-1 | To access the server, open this file in a browser:
jupyterlab-1 | file:///home/jovyan/.local/share/jupyter/runtime/jpserver-7-open.html
jupyterlab-1 | Or copy and paste one of these URLs:
jupyterlab-1 | http://c6bcfb18cb93:8888/lab?token=somefaketoken
jupyterlab-1 | http://127.0.0.1:8888/lab?token=somefaketoken
Then you’d navigate to http://127.0.0.1:8888/lab?token=somefaketoken
or localhost:8888/lab?token=somefaketoken
in your web browser.
Disclaimers
- You’ll notice that you have one directory labeled
work
. Please be aware that only the files you write into this directory will actually be saved, as this is the only directly that is volumed into the container by ourdocker-compose.yaml
file. - If you want to install a new dependency, the easiest way is to modify the
requirements.txt
and restart the container. If that is to heavy handed, you can modify therequirements.txt
, which will set you up for future executions, and then access a terminal inside the Jupyter Lab instance to install the dependency for this session.
Conclusion
Overall, this should provide a simple, non-production approach to using Jupyter Lab and Python for data exploration.
A full example of this repository can be found on my Github here.