Minus80 now allows you to push and pull datasets from the cloud!

4 years ago

Latest Post Upgrading LocusPocus genomic coordinates with Python Data Classes by Rob Schaefer public

In another post recapping Minus80 (v1.0.0), I talk about incorporating persistence to Minus80 datasets using the freeze and unfreeze commands. These tools ensure that datasets can be traced back to specific tags and facilitate a way to pivot and compare results between different experimental runs or points in time.

For example, when you are ready to publish your data you can freeze it, assigning it a distinguishable tag, such that later access guarantees that you are working with the exact set of files you published with. In the meantime, you are free to continue updating and analyzing the data without worrying about losing or inadvertently changing anything related to when it was published.

This is great were it not for one thing. Your dataset, along with all the frozen snapshots, are local to the computer you are working on. Until now! Version v1.1.0 of Minus80 adds several new commands that allow you to move tagged datasets to and from the cloud.

New cloud commands in Minus80 v1.1.0

Four simple commands have been added to support cloud powered datasets.

Logging into your minus80 cloud account

Before you can push and pull datasets, you need to login. Behind the scenes, this is securely managed by Google FireBase. This was for several reasons. First, it is rather difficult to implement your own user authentication schema (AKA, don't reinvent the wheel). I'd rather not run the risk of accidentally exposing attack vectors and potentially leak passwords or user information. By using this battle tested library, passwords are exchanged for  tokens using the FireBase authentication API, ensuring your account and information are safe and sound. Second, I decided to leverage other tools available through FireBase for other aspect of the project, so adopting this ecosystem just made sense. There are also JavaScript APIs, which will come into play later in the project. Finally, FireBase comes with some bells and whistles including a very nice dashboard as well as some email based password reset options, which are nice add-ons.

The FireBase authentication dashboard (with test users).

For the time being, minus80 cloud accounts are still only being created for beta-testing users. If you'd like to participate in beta testing minus80 cloud functionality, contact me at rob@linkage.io!

From the minus80 CLI, Logging is is pretty straight forward:

Logging in requires a username and a password.

Additionally, simple account management features are included. Forget your password? No problem. Reset your password by using a secure link sent to your account email address.

Seeing what's up there

The list command shows you what datasets are available in the cloud.

The cloud list command mimics that same interface as the local minus80 list command, along with the dtype and name search and filter (not shown here). Since we don't have any datasets in the cloud, we get the message Nothing here yet!. To get some data up into the cloud, let's check out the push command.

Pushing a dataset to the cloud

Assume we have a tagged local minus80dataset called Project.foobar:v1 that we want to push to the cloud. We can see our local dataset using the minus80 list --tags command, which shows the frozen tag name, the checksum, and the timestamp. Then, using the same tag (v1), we can send the dataset (Project.foobar) to the cloud using the push command. Putting it all together, using a similar syntax as the freeze command, we have: minus80 cloud push Project.foobar:v1.

Push a local dataset to the cloud. 

Behind the scenes, minus80 communicates with the cloud server and caluclates which files need to be uploaded. Files are uploaded in parallel, up to 5 at a time. Once the files are server-side, checksums are calucalted and a status is returned, in this case SUCCESS!

If we re-try out cloud list command, we can see our dataset in the cloud!

The cloud list command once we have data in our cloud repo.

Pulling a dataset from the cloud

Pulling a dataset is just as straightforward. Since the tagged dataset we just pushed exists locally (we cannot pull a tag that exists locally), assume it was either deleted with the minus80 remove command or the following demo is happening on a different computer.

Pulling data down from the cloud.

When we list the local data sets, we can see that a Project called foobar has been created for us along with the tag we requested.

What's next?

This was a difficult post to write. So much went into trying to make the minus80 cloud push/pull API as simple as possible, it's hard to not dive deep into the implementation details. Luckily for you, I have several smaller blog posts planned in the future focusing on some fun, nitty gritty details that peel back whats going on behind the scenes.

Until then, stay tuned on more updates and make sure to check out out GitHub page for more incremental updates and connect on Twitter.

Acknowledgements

Rob Schaefer

Published 4 years ago