Previous versions of Minus80 (up to v0.3.3) focused on incorporating persistence to biological datasets, especially instances such as gene networks or reference genomes that require a lot of computational power to create. This philosophy of "build once, reuse many" allows researchers to quickly use large, complex datasets in their everyday analyses. Like pulling stock solution from the deep freeze, it takes just a moment to get going with the analysis you want to perform. Similarly, with datasets backed by Minus80, it's easy to pull up a frozen instance and start analyzing your data.
This is great for mature or clear cut datasets, but with many biological datasets, a lot of effort and iterations go into developing and getting datasets to a point where persistence is useful. In other words, while persistence and reusability are the end goals, datasets first and foremost need to be stable and reproducible.

Through ongoing support of a Mozilla Science Mini-Grant, we've taken big steps in adding features enabling minus80 to help researchers track, manage, and share core genetic datasets. A short summary of the change-log for v1.0.0
to date includes:
- Updated command line interface
- Simpler data management and tracking through Projects
- Version history through freezing tagged datasets
A sneak peek into our latest developmental version (v1.0.0-dev), showcases some of latest latest developments incorporated into Minus80.
Brand New CLI
In addition to major changes behind the scenes, Minus80 has a new, improved CLI. Let's check it out:

The --help
tag prints out the available commands. Assuming a fresh install, there should be nothing being tracked by Minus80 yet.

Let's create a trackable datasets using the new Project data type.
Tracking data directories with Projects
Starting in v1.0.0
you can track your own custom data directories using minus80 Projects
. Using the init
command, a project dicrectory is created and then tracked by minus80
. Here is a basic workflow:

First, we use the list
command to see what's currently in Minus80. [Nothing here yet]
indicates no available Minus80 datasets, let's make one. The init
command defaults to a new Minus80 Project
, called foobar
here. When we list the projects again using the list
command, we can see an entry for foobar
under the Project
heading. We can also see a new directory was made with the same name. Minus80 is now tracking the contents of this directory!
Version history through freezing tagged datasets
Any data in the foobar/
directory can now be indexed and tracked by Minus80. Note that Minus80 is not meant to manage your raw data, but rather better suited to be used with smaller, day-to-day, curated datasets. Suppose, for instance, you processed your raw RNA-Seq reads (perhaps 100's of Gb of data) resulting in a (much smaller) gene expression matrix. This resultant dataset is a perfect candidate for minus 80, as most downstream analyses require lot of pitvoting, iterating, and analyzing.
Suppose, you just read about a fancy new normalization method you'd like to try on your data. However, you want to make a checkpoint you can go back to in case things don't pan out. You can freeze the current state of your minus80 Project using the freeze
command.

Here we first list to see the Project
directory called foobar
from above. We create a file called foobar/data.txt
and put some data in it (1234
). We use the freeze
command to create a snapshot in time and "tag" it with the string version_1
. Let's take a closer look at the --help
entry for the freeze command to better understand what is going on.
$ minus80 freeze --help
Usage: minus80 freeze [OPTIONS] <slug>
Freeze a minus80 dataset
Options:
--help Show this message and exit.
The freeze command takes a single positional argument called <slug>
. Since minus80 can track more than just Project
objects, a more verbose notation is needed to let minus80 know what you'd like to freeze. The syntax is as follows: <dtype>.<name>:<tag>
where dtype
is the data type supported by minus80 (Project
in this case), name
is the name of the dataset, and tag
is a short, user defined tag used to differentiate what the snapshot contains. Here, we use the string "version_1
".
We can see what tags a dataset has using the --tags
flag in the list
command.
$ minus80 list --tags
Project
└──foobar
└──version_1 1a0e22dcdd (11:40AM - Sep 17, 2019)
Under the foobar
heading is a version_1
along with a checksum designator (1a0e22dcdd
) and a timestamp.
Let's make some changes to our data.txt
file, representing some sort of analysis (e.g. a new normalization technique).

First, we cat
the current data file showing 1234
. Next we append the string abcd
to the data file and freeze it with the tag version_2
. We can see the updated tag using the minus80 list --tags
command. In addition to the updated timestamp, we can see a different checksum digest.
Suppose you were not happy with this new modification and wanted to revert back to the way things were in version_1
. We can do just that using the thaw
command!

Here, we can again see that our data.txt
file contains both 1234
and abcd
. We can thaw
our previous version which reverts the data files back to the state they were when it was frozen. Like the freeze
command, thaw
takes in a slug of the form <dtype>.<name>:<tag>
, in this case, Project.foobar:version_1
since we want to go back to version_1
.
We see the SUCCESS!
output indicating our operation worked, and when we look at what is inside data.txt
we see only 1234
. We're making history!
What's next?
Data becomes exceedingly useful when it can be shared! Either with yourself (perhaps on a new computer), with collegues or with the rest of the world.
Similar to to relationship between git
and github
or docker
and dockerhub
data, next steps for minus80
will be connecting it to the cloud, where you can push
and pull
your datasets or share them with friends via a URL. We currently have some functionality available in the v1.0.0-dev
version of Minus80, however the contents of that will be for another Blog update. Check out the rest of the development on GitHub or connect on Twitter.