Deploying JupyterHub with Kubernetes: A Step-by-Step Guide
Author: Harsh Patel
JupyterHub is a powerful tool for deploying and managing Jupyter Notebooks at scale. With JupyterHub, you can provide multiple users with access to a shared Jupyter Notebook server. This can be useful in a variety of settings, such as classrooms, research groups, or companies that use Jupyter Notebooks for data analyses and modeling.
Kubernetes is an open-source container orchestration platform that can help you manage and scale your JupyterHub deployment. By using Kubernetes to deploy JupyterHub, you can easily scale your deployment up or down as needed while ensuring it is highly available and resilient.
Below we’ll walk you through each step of deploying JupyterHub with Kubernetes so that you can get the most out of your deployment.
Prerequisites
Before we get started, you’ll need to have the following:
- A running Kubernetes cluster
- Docker Desktop, installed. Verify that kubectl is also installed. You will be running docker and kubectl commands from your machine.
- A DockerHub account (or another container registry where you can store your JupyterHub Docker images)
- Basic knowledge of Kubernetes concepts, such as Pods, Deployments, and Services. For more information, see the documentation (https://kubernetes.io/docs/concepts/)
Step 1: Create a JupyterHub Configuration File
The first step is to create a configuration file that tells JupyterHub how to set itself up. You can use a default configuration file as a starting point and modify it as needed. Here’s an example configuration file:
auth:
type: dummy
hub:
cookie_secret: "YOUR_SECRET_KEY"
db:
url: postgresql://jupyterhub:jupyterhub@jupyterhub-db/jupyterhub
service:
type: ClusterIP
url: http://jupyterhub:8000
proxy:
secretToken: "YOUR_SECRET_TOKEN"
singleuser:
image:
name: "YOUR_JUPYTER_NOTEBOOK_IMAGE"
tag: "latest"
storage:
type: none
In this configuration file, we’re using a dummy authentication system (which allows user access with any combination of username and password) for simplicity, but you can use any authentication system that JupyterHub supports, such as OAuth or LDAP. We’re also using a PostgreSQL database for storing user information, so you’ll need to set up a PostgreSQL database separately (more on this later).
Step 2: Create a Docker Image for JupyterHub
The next step is to create a Docker image for JupyterHub that includes your configuration file. Here’s an example Dockerfile:
FROM jupyterhub/jupyterhub:1.4
COPY jupyterhub_config.yaml /srv/jupyterhub/jupyterhub_config.yaml
This Dockerfile starts with the official JupyterHub Docker image and copies your configuration file to the appropriate location (/srv/jupyterhub/jupyterhub_config.yaml). You can build this image and push it to your container registry:
docker build -t YOUR_IMAGE_NAME .
docker push YOUR_IMAGE_NAME
Step 3: Set up a PostgreSQL Database
As mentioned earlier, we’re using a PostgreSQL database to store user information. You’ll need to set up a PostgreSQL database separately and create a user and database for JupyterHub. Here’s an example kubectl command for creating a PostgreSQL database:
kubectl create -f postgres.yaml
# postgres.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: postgres-pvc
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
---
apiVersion: v1
kind: Service
metadata:
name: jupyterhub-db
spec:
ports:
- name: postgresql
port: 5432
selector:
app: postgres
type: ClusterIP
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: postgres
spec:
replicas: 1
selector:
matchLabels:
app: postgres
template:
metadata:
labels:
app: postgres
spec:
containers:
- name: postgres
image: postgres:13
env:
- name: POSTGRES_USER
valueFrom:
secretKeyRef:
name: jupyterhub-db-secrets
key: username
- name: POSTGRES_PASSWORD
valueFrom:
secretKeyRef:
name: jupyterhub-db-secrets
key: password
- name: POSTGRES_DB
value: jupyterhub
volumeMounts:
- name: postgres-pvc
mountPath: /var/lib/postgresql/data
volumes:
- name: postgres-pvc
persistentVolumeClaim:
claimName: postgres-pvc
---
apiVersion: v1
kind: Secret
metadata:
name: jupyterhub-db-secrets
type: Opaque
data:
username: <base64-encoded-postgres-username>
password: <base64-encoded-postgres-password>
In this example, we’re using a Persistent Volume Claim to create persistent storage for our database. We’ve also created a Service and Deployment for the PostgreSQL database.
`base64-encoded-postgres-username` refers to a PostgreSQL database username that has been encoded in Base64 format. Base64 encodes binary data as ASCII text, making it possible to transmit data over text-based channels such as email, chat, and HTTP.
To encode a PostgreSQL username in Base64, you can use a command-line tool or an online Base64 encoder. Here is an example of how to encode a PostgreSQL username using the `base64` command in a Linux terminal:
$ echo -n "postgres_username" | base64
This command will output the Base64-encoded version of the PostgreSQL username, which you can then use in your configuration files or scripts that require this value.
Note that the -n option is used with the echo command to prevent adding a newline character at the end of the string, which could cause issues when decoding the value later on.
Step 4: Deploy JupyterHub
Now that you have a JupyterHub Docker image and a PostgreSQL database set up, you can deploy JupyterHub to your Kubernetes cluster. Here’s an example kubectl command for deploying JupyterHub:
kubectl create -f jupyterhub.yaml
# jupyterhub.yaml
apiVersion: v1
kind: Service
metadata:
name: jupyterhub
spec:
type: NodePort
ports:
- port: 80
targetPort: 8000
selector:
app: jupyterhub
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: jupyterhub
spec:
replicas: 1
selector:
matchLabels:
app: jupyterhub
template:
metadata:
labels:
app: jupyterhub
spec:
containers:
- name: jupyterhub
image: YOUR_IMAGE_NAME
imagePullPolicy: Always
env:
- name: POSTGRES_USER
valueFrom:
secretKeyRef:
name: jupyterhub-db-secrets
key: username
- name: POSTGRES_PASSWORD
valueFrom:
secretKeyRef:
name: jupyterhub-db-secrets
key: password
- name: POSTGRES_HOST
value: jupyterhub-db
command: ["jupyterhub"]
args: ["--config", "/etc/jupyterhub/jupyterhub_config.py"]
volumeMounts:
- name: jupyterhub-cfg
mountPath: /etc/jupyterhub/
- name: jupyterhub-data
mountPath: /data
volumes:
- name: jupyterhub-cfg
configMap:
name: jupyterhub-config
- name: jupyterhub-data
persistentVolumeClaim:
claimName: jupyterhub-pvc
In this example, we’ve created a Service and Deployment for JupyterHub. The Deployment specifies the Docker image to use, sets the environment variables for the PostgreSQL database connection, and mounts the configuration file and data volume.
Step 5: Configure JupyterHub
You’ll need to configure JupyterHub to use the PostgreSQL database for user authentication. To do this, create a configuration file called jupyterhub_config.py and mount it to the JupyterHub container. Here’s an example jupyterhub_config.py file:
c.JupyterHub.authenticator_class = 'jupyterhub.auth.PAMAuthenticator'
# Use the Postgres database for authentication.
c.JupyterHub.db_url = 'postgresql://jupyterhub:jupyterhub@jupyterhub-db/jupyterhub'
# Use the DockerSpawner to start user containers.
from jupyterhub.spawner import DockerSpawner
class MyDockerSpawner(DockerSpawner):
def _options_form_default(self):
return '''
<label for="cpu">CPU limit (in cores):</label>
<input type="text" name="cpu" placeholder="1">
<br>
<label for="mem">Memory limit (in GB):</label>
<input type="text" name="mem" placeholder="1">
<br>
<label for="gpu">GPU count:</label>
<input type="text" name="gpu" placeholder="0">
<br>
<label for="image">Docker image:</label>
<input type="text" name="image" value="jupyter/minimal-notebook">
<br>
<label for="name">Container name:</label>
<input type="text" name="name" placeholder="{username}-notebook">
'''
c.JupyterHub.spawner_class = MyDockerSpawner
# Set the hub IP address for use in the singleuser server.
c.JupyterHub.hub_ip = 'jupyterhub'
In this example, we’re using the PAMAuthenticator to authenticate users against the PostgreSQL database. We’re also using DockerSpawner to start user containers, and we’ve customized the spawner with some additional options for CPU and memory limits, GPU counts, and Docker image and container names.
Note that the db_url option in the configuration file should match the database connection URL that you specified in the POSTGRES_HOST environment variable in the JupyterHub deployment.
Step 6: Start JupyterHub
Now that you have everything set up, you can start JupyterHub by running the following command:
kubectl apply -f jupyterhub.yaml
This will create a Kubernetes Service and Deployment for JupyterHub, which will automatically start a single-user Jupyter Notebook server for each user that logs in.
Conclusion
Deploying JupyterHub on Kubernetes can be a powerful way to provide a collaborative data science environment for your team or organization. With Kubernetes, you can easily scale JupyterHub to handle a large number of users and keep your environment up and running even in the face of hardware failures or other issues.
By following the steps outlined in this guide, you should be able to deploy JupyterHub on Kubernetes and start using it to enable collaborative data science in your organization. If you want to learn more about this or other data science related topics, follow our blog or contact us today.