CloudBank solution: Creating and using machine images as customized research environments; featuring a Jupyter notebook server.
View the Project on GitHub cloudbank-project/image-research-computing-tutorial
Blog on the distinctions between VMs and Containers
Read through this overview and proceed to the walkthrough activities for
Virtual Machines (VMs) on Azure. VMs are self-contained computers; also called instances.
A single physical computer may host more than one Virtual Machine.
On the cloud
an instance type means a VM with specifications: How much CPU power, memory, storage, and
networking speed. We will use a fairly light VM that costs about $0.11 per hour.
We pay at some rate for a VM ‘per hour’ until we Stop it. Unfortunately this immediately runs into
some confusing language on Azure: A VM that is Stopped and Deallocated is equivalent to a computer that
is turned off; and we do not pay for it on Azure (but it is still available to be turned back on).
A VM that is merely stopped on Azure
(not deallocated) is still sitting there costing money. sudo shutdown -h now
will stop,
but not deallocate, a VM.
A VM is distinct from a container: A container makes use of a computer’s underlying operating system; and it starts up very quickly. It is a (possibly very substantial) program that runs on a host computer. A VM includes its own complete operating system; plus anything we choose to install on it; plus our code and our data. So a VM is a computer running on a computer. A VM can also have an IP address; and can exist on the web as a server. It could host a non-serverless function.
On Azure we have root access on any VMs we create. We log in to the VM via a bash shell and then when necessary
wield root access by means of the sudo
(super-user-do) command.
On the cloud we select a VM by choosing both an instance type and an operating system. The instance type matches the computer’s purpose in processing power, memory, network speed and other features.
Technical detail: The operating system choice in fact selects a machine image that includes this operating system. Once the VM starts we are free to log on and customize it. Then we will save a new image which is a snapshot of the modified VM. This new image is tied to our Azure account; so we can terminate the VM at this point (all signs of it are gone) and restart our saved image. This creates a new VM or if we like, even multiple such VMs. This is the central idea of VM images as backups of our computing environment; and as a basis for scaling. In fact a given image can be restored to a larger, more powerful VM; or a smaller, less powerful VM depending on its intended use. We can also use a VM image as a building block for an Azure ‘Scale Set’, a veritable herd of identical VMs useful for doing batch processing.
As with many Azure resources there is
a logical ‘box’ for VM images called a Gallery
.
The VM we use costs $0.11 per hour. A good rule of thumb is: Establish an alarm that shuts down (stop plus deallocate) the VM every evening. We do this in the setup process. Azure sends us an email that the machine is going to be stopped soon and provides an option to keep it running a bit longer.
bash
shellCloud Shell
and it has these features
>_
in the title bar just right of centerbash
shell computing environment complete with a filesystem
bash
or Power Shell
modepython --version
to see which Python runspython -m pip list
to see installed libraries: Notice pandas
is not listed.python -m pip install pandas
and verify it is installed nowpandas
is still installedStorage account
To review: Python features a level of virtualization (specialization) via virtual environments. The Python base environment is the Python interpreter and libraries that comprise the basic Python installation in the operating system. From this base or default environment a Python virtual environment is often created as a dedicated space to further customize the workspace.
To review: GitHub is a provider of Internet hosting for software development and version control using git
.
git
is in turn a Linux software version control utility. GitHub and similar hosting
sites facilitate open sharing of software, part of the larger picture of reproducible research.
The git clone
command can be used clone GitHub repositories,
thematic collections of files in a directory tree.
git
comes with a learning curve.
Improper use of GitHub can grant cloud access to Bad Actors. This in turn can lead to lost time and money.
Jupyter is an interactive coding environment. Here are some of the key terms defined.
IPython is short for Interactive Python, a command shell for interactive computing. It supports multiple languages including, of course, Python.
.ipynb
that is hosted
by a Jupyter notebook server, viewed in a browser, consists of text blocks called cells
that contain either code or formatted text
Jupyter notebook server: An interactive development environment that hosts Jupyter notebooks
Jupyter Book: A wrapper around a collection of tools in the Python ecosystem that make it easier to publish computational documents
Jupyter notebook code execution is managed by a language-specific program called a kernel (for example ‘Python kernel’, ‘R kernel’, ‘Julia kernel’).
The kernel operates “behind the scenes” to maintain the notebook environment and run blocks of code as requested. We use Python, and the two other primary Jupyter-supported languages are Julia and R (hence ‘JuPyt(e)R’). In the spirit of expansibility many other kernels have been developed: There are more than 100 Jupyter kernels available at this time.
Above: The Azure portal has a gear icon for changing the account configuration.
Problems here? Check the Potential Issues section below.
+Create
virtual machine
and select Virtual MachineCreate a virtual machine
tabbed wizard to customize a VM
see all sizes
and select Standard_D2as_v4 - 2 vcpus, 8 GiB memory
Next: Disks >
Create and attach a new disk
Change size
, select 4 GiB (much cheaper than 1024 GiB); click Ok
Next: Networking >
Next: Management >
Next: Monitoring >
Next: Advanced >
Next: Tags >
Next: Review + create >
Create
Download private key and create resource
>_
button at center-right on the Azure title bar.ssh
: mkdir .ssh
mv keyfile.pem .ssh
moves the file to the .ssh
directorycd .ssh
chmod 400 keyfile.pem
modifies the file permissions.ssh
folder from ~
.
are not visible to plain ls
commandscd ~
use ls -al
to list all folders.ssh
is where you place VM access key files (.pem
file extension)
python3 -m pip
does not find pip
. How to install pip
?
sudo apt update
sudo apt install python3-pip
python3 -m pip
Go to resource
Start
if your VM is stoppedConnect
12.23.34.45
.cd ~
ssh -i .ssh/keyfile.pem azureuser@12.23.34.45
azureuser@yournetid-mse544-vm
ps -p $$
to confirm you are using the bash shellpython
python3
and this does runpython3 --version
shows Python 3.8.10python3 -m pip list
shows that requests
is installedhttps://rob5-function-app.azurewebsites.net/api/afunction?n=1234000
python3
and enter the following 3 lines of code>>> import requests
>>> url = 'https://mynetid-function-app.azurewebsites.net/api/azurefunction?n=1234'
>>> print(requests.get(url).text)
exit()
to halt the Python interpretercd ~
touch fingerprint.txt
ls
If you added the (optional) data disk to your VM you can use the guide at this link to make the disk available. This will take a few minutes. If you like you can also simply check that the disk is available using the following command.
lsblk -o NAME,HCTL,SIZE,MOUNTPOINT | grep -i "sd"
Here the 4GiB data disk is listed last as sdc
.
sda 0:0:0:0 30G
├─sda1 29.9G /
├─sda14 4M
└─sda15 106M /boot/efi
sdb 1:0:1:0 14G
└─sdb1 14G /mnt
sdc 3:0:0:0 4G
We will need a ‘box’ for images; and this is called an Azure compute gallery. This link goes into more detail on Azure machine images.
Azure compute gallery
Azure compute galleries
> +Create
netid-compute-gallery
Review + create
> Create
Target VM image definition
:
Create new
yournetid-image-definition
Capturing a virtual machine image will make the virtual machine unusable. This action cannot be undone.
Once you have created a VM image you should be able to safely delete your VM and then create a new one, same as the original, from the image.
Create VM
and Create VMSS
Create VMSS
(where SS
stands for ‘Scale Set’)
Create VM
Review + create
> Create
> download a new key file~/.ssh
directorychmod 400 key.pem
ssh -i ~/.ssh/key.pem azureuser@12.23.34.45
As noted up at the top: Jupyter is an interactive programming (and story-telling) environment. We access this environment via a web browser. The center of mass of the Jupyter project is UC Berkeley.
Here our objective is to install a Jupyter Hub on a single (small) Azure VM. This small-format Jupyter Hub is intended for just a few people. You might set this up for yourself and four colleagues, for example. In such a case, this Littlest Jupyter Hub provides five distinct Jupyter environments, one for each User. The ‘big brother’ full-scale version of a Jupyter Hub is built on a cluster of machines and can serve six or a dozen or twenty or even a hundred or a thousand users. (That is an example of cloud scaling.)
Building a Littlest Jupyter Hub is going to be almost identical to the Monday activity
of starting an Azure VM. The small differences include adding in some Custom information
that will result in the Create
step going the extra mile to set up the Jupyter Hub
service.
After doing this Jupyter Hub build on the Azure portal you log in as the system administrator and make some modifications to the environment. You then clone a data science repository take a look at a Jupyter notebook therein.
A Jupyter notebook consists of text boxes called cells. Some contain code and others contain markdown (formatted text). Cells are ‘executed’ individually using ctrl + enter or shift + enter.
Caution 1: Since those instructions were created the VM Wizard has changed format; so just about everything is still *there but the order is a little scrambled. To help you navigate this the next section consists of screencaps of the various tabs in the wizard taken yesterday.*
Caution 2: Many of the Wizard choices are filled in properly by default. If it looks reasonable to you as-is just leave it that way; only change it if it contradicts what the instructions are asking you to modify.
We begin in familiar territory: Making our way on the Azure portal to Virtual Machines and the Create+
button
Once the wizard starts we see again there are eight tabs to work through.
Click the link to see all images so you can select…
…Ubuntu Server 22.04 LTS
Instead of a key file (.pem) we will use a username (Example: mynetidadmin
) and a password.
Nothing to do here.
Use the dropdown to co-select http
and https
in addition to ssh
.
Here at last is where the magic happens, in the Custom data
script text box.
The script you copy from the Littlest Jupyter instructions to this box (and be sure to modify
the admin username to be the one you entered on the Basics tab)
will start to run once your VM is operational.
This will install the Jupyter Hub on your VM; but be warned it takes about ten minutes.
Nothing to do here.
Make sure everything looks ok and click Create. There is no key file to download because we switched to using a password.
Once the Jupyter Hub installation is completed (remember this takes about 10 minutes): Paste the ip address
of your VM into a browser tab address window. If you try entering the ip address and nothing useful happens:
It is not done installing yet. You can monitor your install progress at this location: Your VM resource
page, boot diagnostics
(left sidebar), Serial log
tab.
At this point, when the Littlest Jupyter Hub install is done, you should be able to log in as the administrator User. Try starting a Jupyter notebook; then create and run a cell.
The objective is now to modify the Jupyter Hub environment by installing some packages.
At this point the instructions and screencaps are great; no need for ‘updated’ screencaps.
In addition to the gdal
and there
library installs: Also install these packages:
sudo -E pip install matplotlib
sudo -E pip install xarray[complete]
This is a stretch activity.
Start a terminal (still logged in as the admin) and run this git
command:
cd ~
git clone https://github.com/robfatland/ocean
Now you should have a folder called ocean
in the navigator. Navigate to this folder
and start the notebook called Biooptics.ipynb
. Use the Run menu to run all of
the cells in this notebook. This will take a minute or two; and when it is done you
might take a moment to look through the results. You can also see the markdown behind
the rendering by double-clicking on a text cell.