image-research-computing-tutorial

CloudBank solution: Creating and using machine images as customized research environments; featuring a Jupyter notebook server.

View the Project on GitHub cloudbank-project/image-research-computing-tutorial

MSE544 Creating a VM + image on the Azure cloud

TOC

Streamlined view of this page

Editing view of this page

Jump to hands-on activity

Table of contents

Overview

The Plan

TOC

Blog on the distinctions between VMs and Containers

Read through this overview and proceed to the walkthrough activities for Virtual Machines (VMs) on Azure. VMs are self-contained computers; also called instances. A single physical computer may host more than one Virtual Machine.
On the cloud an instance type means a VM with specifications: How much CPU power, memory, storage, and networking speed. We will use a fairly light VM that costs about $0.11 per hour.

We pay at some rate for a VM ‘per hour’ until we Stop it. Unfortunately this immediately runs into some confusing language on Azure: A VM that is Stopped and Deallocated is equivalent to a computer that is turned off; and we do not pay for it on Azure (but it is still available to be turned back on). A VM that is merely stopped on Azure (not deallocated) is still sitting there costing money. sudo shutdown -h now will stop, but not deallocate, a VM.

A VM is distinct from a container: A container makes use of a computer’s underlying operating system; and it starts up very quickly. It is a (possibly very substantial) program that runs on a host computer. A VM includes its own complete operating system; plus anything we choose to install on it; plus our code and our data. So a VM is a computer running on a computer. A VM can also have an IP address; and can exist on the web as a server. It could host a non-serverless function.

On Azure we have root access on any VMs we create. We log in to the VM via a bash shell and then when necessary wield root access by means of the sudo (super-user-do) command.

On the cloud we select a VM by choosing both an instance type and an operating system. The instance type matches the computer’s purpose in processing power, memory, network speed and other features.

Technical detail: The operating system choice in fact selects a machine image that includes this operating system. Once the VM starts we are free to log on and customize it. Then we will save a new image which is a snapshot of the modified VM. This new image is tied to our Azure account; so we can terminate the VM at this point (all signs of it are gone) and restart our saved image. This creates a new VM or if we like, even multiple such VMs. This is the central idea of VM images as backups of our computing environment; and as a basis for scaling. In fact a given image can be restored to a larger, more powerful VM; or a smaller, less powerful VM depending on its intended use. We can also use a VM image as a building block for an Azure ‘Scale Set’, a veritable herd of identical VMs useful for doing batch processing.

As with many Azure resources there is a logical ‘box’ for VM images called a Gallery.

The VM we use costs $0.11 per hour. A good rule of thumb is: Establish an alarm that shuts down (stop plus deallocate) the VM every evening. We do this in the setup process. Azure sends us an email that the machine is going to be stopped soon and provides an option to keep it running a bit longer.

VM Day 1 Monday May 1

VM Day 2 Tuesday May 3

Azure Cloud Shell

VMs on Azure

TOC

Object and block storage on the cloud

TOC

Python environments

TOC

To review: Python features a level of virtualization (specialization) via virtual environments. The Python base environment is the Python interpreter and libraries that comprise the basic Python installation in the operating system. From this base or default environment a Python virtual environment is often created as a dedicated space to further customize the workspace.

git and GitHub

TOC

To review: GitHub is a provider of Internet hosting for software development and version control using git. git is in turn a Linux software version control utility. GitHub and similar hosting sites facilitate open sharing of software, part of the larger picture of reproducible research.

Jupyter

TOC

Jupyter is an interactive coding environment. Here are some of the key terms defined.

Jupyter notebook code execution is managed by a language-specific program called a kernel (for example ‘Python kernel’, ‘R kernel’, ‘Julia kernel’).
The kernel operates “behind the scenes” to maintain the notebook environment and run blocks of code as requested. We use Python, and the two other primary Jupyter-supported languages are Julia and R (hence ‘JuPyt(e)R’). In the spirit of expansibility many other kernels have been developed: There are more than 100 Jupyter kernels available at this time.

Walkthrough for VMs day 1

TOC

1 Start a VM on Azure

TOC



drawing

Above: The Azure portal has a gear icon for changing the account configuration.



drawing

Problems here? Check the Potential Issues section below.

Potential issues

2 Log in to the VM

TOC

If you added the (optional) data disk to your VM you can use the guide at this link to make the disk available. This will take a few minutes. If you like you can also simply check that the disk is available using the following command.

lsblk -o NAME,HCTL,SIZE,MOUNTPOINT | grep -i "sd"

Here the 4GiB data disk is listed last as sdc.

sda     0:0:0:0      30G
├─sda1             29.9G /
├─sda14               4M
└─sda15             106M /boot/efi
sdb     1:0:1:0      14G
└─sdb1               14G /mnt
sdc     3:0:0:0       4G

3 Create a machine image from the VM

TOC

We will need a ‘box’ for images; and this is called an Azure compute gallery. This link goes into more detail on Azure machine images.

4 Terminate your original VM and start a new VM from the image

TOC

Once you have created a VM image you should be able to safely delete your VM and then create a new one, same as the original, from the image.

Walkthrough for Jupyter

TOC

Jupyter overview

TOC

As noted up at the top: Jupyter is an interactive programming (and story-telling) environment. We access this environment via a web browser. The center of mass of the Jupyter project is UC Berkeley.

Here our objective is to install a Jupyter Hub on a single (small) Azure VM. This small-format Jupyter Hub is intended for just a few people. You might set this up for yourself and four colleagues, for example. In such a case, this Littlest Jupyter Hub provides five distinct Jupyter environments, one for each User. The ‘big brother’ full-scale version of a Jupyter Hub is built on a cluster of machines and can serve six or a dozen or twenty or even a hundred or a thousand users. (That is an example of cloud scaling.)

Building a Littlest Jupyter Hub is going to be almost identical to the Monday activity of starting an Azure VM. The small differences include adding in some Custom information that will result in the Create step going the extra mile to set up the Jupyter Hub service.

After doing this Jupyter Hub build on the Azure portal you log in as the system administrator and make some modifications to the environment. You then clone a data science repository take a look at a Jupyter notebook therein.

A Jupyter notebook consists of text boxes called cells. Some contain code and others contain markdown (formatted text). Cells are ‘executed’ individually using ctrl + enter or shift + enter.

Website instructions

TOC

Caution 1: Since those instructions were created the VM Wizard has changed format; so just about everything is still *there but the order is a little scrambled. To help you navigate this the next section consists of screencaps of the various tabs in the wizard taken yesterday.*

Caution 2: Many of the Wizard choices are filled in properly by default. If it looks reasonable to you as-is just leave it that way; only change it if it contradicts what the instructions are asking you to modify.

Screencaps from the build

TOC

We begin in familiar territory: Making our way on the Azure portal to Virtual Machines and the Create+ button



drawing

Once the wizard starts we see again there are eight tabs to work through.



drawing

VM Wizard: Basics tab

Click the link to see all images so you can select…



drawing

…Ubuntu Server 22.04 LTS



drawing

Instead of a key file (.pem) we will use a username (Example: mynetidadmin) and a password.



drawing

VM Wizard: Discs tab

Nothing to do here.

VM Wizard: Networking tab

Use the dropdown to co-select http and https in addition to ssh.



drawing

VM Wizard: Management tab



drawing

VM Wizard: Monitoring tab



drawing

VM Wizard: Advanced tab

Here at last is where the magic happens, in the Custom data script text box. The script you copy from the Littlest Jupyter instructions to this box (and be sure to modify the admin username to be the one you entered on the Basics tab) will start to run once your VM is operational. This will install the Jupyter Hub on your VM; but be warned it takes about ten minutes.



drawing

VM Wizard: Tags tab

Nothing to do here.

VM Wizard: Review and Create tab

Make sure everything looks ok and click Create. There is no key file to download because we switched to using a password.



drawing

Once the Jupyter Hub installation is completed (remember this takes about 10 minutes): Paste the ip address of your VM into a browser tab address window. If you try entering the ip address and nothing useful happens: It is not done installing yet. You can monitor your install progress at this location: Your VM resource page, boot diagnostics (left sidebar), Serial log tab.



drawing

At this point, when the Littlest Jupyter Hub install is done, you should be able to log in as the administrator User. Try starting a Jupyter notebook; then create and run a cell.

Install libraries

TOC

The objective is now to modify the Jupyter Hub environment by installing some packages.
At this point the instructions and screencaps are great; no need for ‘updated’ screencaps. In addition to the gdal and there library installs: Also install these packages:

sudo -E pip install matplotlib
sudo -E pip install xarray[complete] 

Clone and examine a data science repository

This is a stretch activity. Start a terminal (still logged in as the admin) and run this git command:

cd ~
git clone https://github.com/robfatland/ocean

Now you should have a folder called ocean in the navigator. Navigate to this folder and start the notebook called Biooptics.ipynb. Use the Run menu to run all of the cells in this notebook. This will take a minute or two; and when it is done you might take a moment to look through the results. You can also see the markdown behind the rendering by double-clicking on a text cell.