Link Search Menu Expand Document

Science Gateways and Dataset Dissemination

The cloud can be a big help in making a datasets available to others in your field. The primary challenge is in dealing with ongoing storage fees and the extra egress charges that cloud platforms levy for downloads of your data. There are a few strategies towards dealing with this:

  • Encouraging cloud-native, storage-adjacent computation
  • Taking advantage of cheaper object storage, which can be binned and discounted based on frequency of access.
  • Taking advantage of vendor-specific discount programs for publically hosted scientific data

The rest of this article will go into these strategies in detail.

Data storage and costing

TODO: object glacier storage

Discount programs


Data APIs

TODO: zero to API solution

Storage-adjacent Computation

The approach that many cloud-hosted gateways take towards disseminating data is providing an experimentation platform, usually a JupyterHub, to their users. This way, rather than every user of the dataset downloading what they need to their own storage, they simply run their code or use tools hosted on cloud machines that have free access to the central dataset.

For help on getting things set up, check out our CloudBank Solutions for setting up a JupyterHub or running a hosted web application

Case studies