Code and Data for the Social Sciences

This post originally appeared on the Software Carpentry website.

Matthew Gentzkow and Jesse Shapiro have written an excellent guide "Code and Data in the Social Sciences". It's short (only 38 pages), very readable, and full of practical advice for scientists of all stripes:

  • Automate everything that can be automated.
  • Write a single shell script that executes all code from beginning to end.
  • Store code and data under version control.
  • Run the whole directory before checking it back in.
  • Separate directories by function.
  • Separate files into inputs and outputs.
  • Make directories portable.
  • Store cleaned data in tables with unique, non-missing keys.
  • Keep data normalized as far into your code pipeline as you can.
  • Abstract to eliminate redundancy.
  • Abstract to improve clarity.
  • Otherwise, don't abstract.
  • Don't write documentation you will not maintain.
  • Code should be self-documenting.
  • Manage tasks with a task management system.
  • E-mail is not a task management system.

Dialogue & Discussion

Comments must follow our Code of Conduct.

Edit this page on Github