How Do You Manage a Terabyte?

This post originally appeared on the Software Carpentry website.

This question has come up a couple of times, and I'd welcome feedback from readers. Suppose you have a large, but not enormous, amount of scientific data to manage: too much to easily keep a copy on every researcher's laptop, but not enough to justify buying special-purpose storage hardware or hiring a full-time sys admin. What do you do? Break it into pieces, compress them with gzip or its moral equivalent, put the chunks on a central server, and create an index so that people can download and uncompress what they need, when they need it? Or...or what? What do you do, and why?

Dialogue & Discussion

Comments must follow our Code of Conduct.

Edit this page on Github