Skip to article frontmatterSkip to article content

Machine Learning in the Cloud

The cloud is particularly well suited to training neural networks (or any other form of ML model); it can provide as much computational power as you need, when you need it, and you pay for only what you use.

There is a substantial amount of overlap between the needs of Machine Learning projects and more general HPC workloads. That said, cloud platforms tend to provide separate services for general HPC workloads and for ML-specific work. These ML services offer value-add tools to make training and inspecting models easier, and can often be built off of pre-trained models provided by the platform. The rest of this article goes into detail about such dedicated ML services.

The major pieces of instracture to think about for ML workloads include:

Guides

Notebooks

There are two routes to go for serving notebooks from the cloud:

Containers

Cloud platforms often provide services to run Docker containers without needing to set up a full virtual machine to do so (this is one of the common things referred to by “serverless” computing). Because Docker containers are often used for running web app infrastructure, this is what cloud services and documentation are geared towards. That said, they can just as well be used for long-running machine learning jobs. Cloud platforms are also beginning to offer ML-focused container services as well, usually under different names than the generic container jazz.

Guides:

Storage

Broadly, cloud platforms provide two different kinds of data storage service:

Case Studies