Jupyter and JupyterHub

Jupyter notebooks and the Jupyter ecosystem

You may have heard of Jupyter -- an open computing "ecosystem" developed by Project Jupyter. This ecosystem is described succinctly and effectively in the online open book, Teaching and Learning with Jupyter:

Project Jupyter is three things: a collection of standards, a community, and a set of software tools. Jupyter Notebook, one part of Jupyter, is software that creates a Jupyter notebook. A Jupyter notebook is a document that supports mixing executable code, equations, visualizations, and narrative text. Specifically, Jupyter notebooks allow the user to bring together data, code, and prose, to tell an interactive, computational story. ("2.2 But first, what is Jupyter Notebook?")

We will use the JupyterLab software to create, manage and run Jupyter notebooks. You will be exposed to Jupyter notebooks throughout the hackweek, including in most tutorials. To learn more about Jupyter, Jupyter notebooks and JupyterLab:

  • Check out several sections in the Teaching and Learning with Jupyter online open book, specially Chapter 5 Jupyter Notebook ecosystem.
  • See the OceanHackWeek 2020 pre-hackweek tutorial "Jupyter and Scientific Python basics: numpy, pandas, matplotlib", which demonstrates effective Jupyter use both on your computer ("locally") and on JupyterHub: Jupyter notebookstutorial video. The video includes Q&As at the end where you'll find common questions you may find asking yourself.
  • See the resources at the end of this page.

Why are we using a shared cloud computing environment?

Teaching software to a diverse group of participants, each with different computers and operating systems, can be challenging. There are specific ways to configure our software for the tutorials to be successful, so it takes time to get everyone set up consistently. Our solution to this is to give everyone access to a cloud computing environment that is pre-configured for the specific software we will deploy. This cloud computing instance can be accessed from any web browser, which eliminates the need for configuring each person's individual computer. For this hackweek we have created virtual computing instances that can be deployed on demand in a parallel computing environment. We use JupyterHub as a way to give a Jupyter Notebook server (JupyterLab) to each person in a group. These (slightly old) slides give a nice overview of what JupyterHub is all about. JupyterHub enables us to quickly begin working with code without spending time to get the necessary libraries and dependencies set up on everyone's individual computers.

We encourage you to use our shared JupyterHub resources for running all the tutorials and for your projects. We also hope you will practice installing Python libraries locally on your laptop so that you can continue working after leaving our event.

How do I access the shared JupyterHub cloud environment?

Access to our shared JupyterHub cloud environment is easy. Just click on https://jupyterhub.cuahsi.org

hub-opening

Assuming you set up your HydroShare credentials correctly, you can now click on the "Sign in with HydroShare" button (after accepting the Terms of Use), then on the next screen click on "Authorize" to grant JupyterHub the required permissions. Next you'll be presented with a list of "Server Options". Select "WaterHackWeek 2020":

hub-serverimage

then click the "Start" button. You'll see something like this while the JupyterHub WaterHackWeek 2020 server environment is loading:

hub-loading

It will take a little bit of time for this to load - be patient! Once things are spun up you will see your very own instance of a JupyterLab graphical user interface:

jupyterlab

How do I get my code in and out of JupyterHub?

When you start your own instance of JupyterHub you will have access to your own virtual drive space. No other JupyterHub users will be able to see or access your data files. Next we will explain how you can upload files to your virtual drive space and how to save files from JupyterHub back to another location, such as GitHub or your own local laptop drive.

First we'll show you how to pull some files from GitHub into your virtual drive space. This will be a common task during the hackweek: at the start of most tutorials we'll ask you to "clone" (make a copy of) the GitHub repository corresponding to the specific tutorial being taught into your JupyterHub drive space.

To do this, we will need to interface with the JupyterHub file system. JupyterHub is deployed in a Linux operating system and we will need to open a terminal within the JupyterHub JupyterLab interface to manage our files. There are two ways to do this: (1) Navigate to the "File" menu, choose "New" and then "Terminal" or (2) click on the "terminal" button in JupyterLab:

terminal-button

This will open a new terminal tab in your JupyterLab interface:

terminal-tab

You can issue any Linux commands to manage your local file system.

Now let's clone a repository (see the Git Setup and Basics page). We'll illustrate this with the waterdata tutorials First, navigate in a browser on your own computer to the repository link https://github.com/waterhackweek/waterdata. Next, click on the green "clone or download" button and then copy the url into your clipboard by clicking the copy button (the screenshot below is not from the same repository, but the steps and button are the same):

clone

Now navigate back to your command line in JupyterLab. Type git clone and then paste in the url:

git clone https://github.com/waterhackweek/waterdata.git

After issuing the git clone command you should see something like this (again, the screenshot below is for a different repo, but the concept is identical):

clone-result

How do I end my JupyterHub session? Will I lose all of my work?

When you are finished working for the day it is important to explicitly log out of your JupyterHub session, to reduce the load on our cloud infrastructure.

To log out and stop the server, select the menu item File > Log Out.

logging out

Logging out will NOT cause any of your work to be lost or deleted. It simply shuts down some resources. It would be equivalent to turning off your desktop computer at the end of the day.

References and Resources