Solution to Data Merging with Dictionaries
This post originally appeared on the Software Carpentry website.
This week's tutorial problem was to merge the data from a set of input files to show how often different species were observed on different dates. The shell pipeline, Python code, and two sample input files follow the video.
{% assign video_title="Data Merging with Dictionaries" %} {% assign video_slug="S-dqlYWs4S0" %} {% assign video_time="00:09:18" %} {% include youtube %}shell command
grep -h -v '#' *.txt | sort | uniq -c
merge.py
import sys # Read and merge data. results = {} filenames = sys.argv[1:] for f in filenames: reader = file(f, 'r') for line in reader: if line.startswith('#'): pass else: date, species = line.split() key = (date, species) if key not in results: results[key] = 1 else: results[key] += 1 reader.close() # Format output. all_combos = results.keys() all_combos.sort() for key in all_combos: count = results[key] print count, key[0], key[1]
cousteau.txt
# Jacques Cousteau 2012-03-27 marlin 2012-03-29 tuna 2012-03-29 tuna 2012-03-29 turtle
haddock.txt
# Steve Haddock 2012-03-28 squid 2012-03-28 marlin 2012-03-28 marlin 2012-03-29 eel 2012-03-29 squid 2012-03-29 turtle 2012-03-29 turtle 2012-03-30 squid 2012-03-31 turtle