Jupyter and JupyterHub
Jupyter notebooks and the Jupyter ecosystem
You may have heard of Jupyter -- an open computing "ecosystem" developed by Project Jupyter. This ecosystem is described succinctly and effectively in the online open book, Teaching and Learning with Jupyter:
Project Jupyter is three things: a collection of standards, a community, and a set of software tools. Jupyter Notebook, one part of Jupyter, is software that creates a Jupyter notebook. A Jupyter notebook is a document that supports mixing executable code, equations, visualizations, and narrative text. Specifically, Jupyter notebooks allow the user to bring together data, code, and prose, to tell an interactive, computational story. ("2.2 But first, what is Jupyter Notebook?")
We will use the JupyterLab software to create, manage and run Jupyter notebooks. You will be exposed to Jupyter notebooks throughout the hackweek, including in most tutorials. To learn more about Jupyter, Jupyter notebooks and JupyterLab:
- Check out several sections in the Teaching and Learning with Jupyter online open book, specially Chapter 5 Jupyter Notebook ecosystem.
- See the OceanHackWeek 2020 pre-hackweek tutorial "Jupyter and Scientific Python basics: numpy, pandas, matplotlib", which demonstrates effective Jupyter use both on your computer ("locally") and on JupyterHub: Jupyter notebooks — tutorial video. The video includes Q&As at the end where you'll find common questions you may find asking yourself.
- See the resources at the end of this page.
Why are we using a shared cloud computing environment?
Teaching software to a diverse group of participants, each with different computers and operating systems, can be challenging. There are specific ways to configure our software for the tutorials to be successful, so it takes time to get everyone set up consistently. Our solution to this is to give everyone access to a cloud computing environment that is pre-configured for the specific software we will deploy. This cloud computing instance can be accessed from any web browser, which eliminates the need for configuring each person's individual computer. For this hackweek we have created virtual computing instances that can be deployed on demand in a parallel computing environment. We use JupyterHub as a way to give a Jupyter Notebook server (JupyterLab) to each person in a group. These (slightly old) slides give a nice overview of what JupyterHub is all about. JupyterHub enables us to quickly begin working with code without spending time to get the necessary libraries and dependencies set up on everyone's individual computers.
We encourage you to use our shared JupyterHub resources for running all the tutorials and for your projects. We also hope you will practice installing Python libraries locally on your laptop so that you can continue working after leaving our event.
How do I access the shared JupyterHub cloud environment?
Access to our shared JupyterHub cloud environment is easy. Just click on https://jupyterhub.cuahsi.org
Assuming you set up your HydroShare credentials correctly, you can now click on the "Sign in with HydroShare" button (after accepting the Terms of Use), then on the next screen click on "Authorize" to grant JupyterHub the required permissions. Next you'll be presented with a list of "Server Options". Select "WaterHackWeek 2020":
then click the "Start" button. You'll see something like this while the JupyterHub WaterHackWeek 2020 server environment is loading:
It will take a little bit of time for this to load - be patient! Once things are spun up you will see your very own instance of a JupyterLab graphical user interface:
How do I get my code in and out of JupyterHub?
When you start your own instance of JupyterHub you will have access to your own virtual drive space. No other JupyterHub users will be able to see or access your data files. Next we will explain how you can upload files to your virtual drive space and how to save files from JupyterHub back to another location, such as GitHub or your own local laptop drive.
First we'll show you how to pull some files from GitHub into your virtual drive space. This will be a common task during the hackweek: at the start of most tutorials we'll ask you to "clone" (make a copy of) the GitHub repository corresponding to the specific tutorial being taught into your JupyterHub drive space.
To do this, we will need to interface with the JupyterHub file system. JupyterHub is deployed in a Linux operating system and we will need to open a terminal within the JupyterHub JupyterLab interface to manage our files. There are two ways to do this: (1) Navigate to the "File" menu, choose "New" and then "Terminal" or (2) click on the "terminal" button in JupyterLab:
This will open a new terminal tab in your JupyterLab interface:
You can issue any Linux commands to manage your local file system.
Now let's clone a repository (see the Git Setup and Basics page). We'll illustrate this with the waterdata
tutorials First, navigate in a browser on your own computer to the repository link https://github.com/waterhackweek/waterdata. Next, click on the green "clone or download" button and then copy the url into your clipboard by clicking the copy button (the screenshot below is not from the same repository, but the steps and button are the same):
Now navigate back to your command line in JupyterLab. Type git clone
and then paste in the url:
git clone https://github.com/waterhackweek/waterdata.git
After issuing the git clone
command you should see something like this (again, the screenshot below is for a different repo, but the concept is identical):
How do I end my JupyterHub session? Will I lose all of my work?
When you are finished working for the day it is important to explicitly log out of your JupyterHub session, to reduce the load on our cloud infrastructure.
To log out and stop the server, select the menu item File > Log Out
.
logging out
Logging out will NOT cause any of your work to be lost or deleted. It simply shuts down some resources. It would be equivalent to turning off your desktop computer at the end of the day.
References and Resources
- Why Jupyter is data scientists’ computational notebook of choice. An improved architecture and enthusiastic user base are driving uptake of the open-source web tool (Nature, 2018-10)
- Teaching and Learning with Jupyter, an online open book.
- OceanHackWeek 2020 pre-hackweek tutorial "Jupyter and Scientific Python basics: numpy, pandas, matplotlib": Jupyter notebooks — tutorial video.
- From https://dataquest.io
- Getting Started with JupyterLab
- Lesson 0b: Introduction to JupyterLab - (Justin Bois) Introduction to Data Analysis in the Biological Sciences, Caltech
- Jupyter Lab: Evolution of the Jupyter Notebook. An overview of JupyterLab, the next generation of the Jupyter Notebook.