Solution to Data Merging with Dictionaries
This post originally appeared on the Software Carpentry website.
This week's tutorial problem was to merge the data from a set of input files to show how often different species were observed on different dates. The shell pipeline, Python code, and two sample input files follow the video.
{% assign video_title="Data Merging with Dictionaries" %} {% assign video_slug="S-dqlYWs4S0" %} {% assign video_time="00:09:18" %} {% include youtube %}shell command
grep -h -v '#' *.txt | sort | uniq -c
merge.py
import sys
# Read and merge data.
results = {}
filenames = sys.argv[1:]
for f in filenames:
reader = file(f, 'r')
for line in reader:
if line.startswith('#'):
pass
else:
date, species = line.split()
key = (date, species)
if key not in results:
results[key] = 1
else:
results[key] += 1
reader.close()
# Format output.
all_combos = results.keys()
all_combos.sort()
for key in all_combos:
count = results[key]
print count, key[0], key[1]cousteau.txt
# Jacques Cousteau 2012-03-27 marlin 2012-03-29 tuna 2012-03-29 tuna 2012-03-29 turtle
haddock.txt
# Steve Haddock 2012-03-28 squid 2012-03-28 marlin 2012-03-28 marlin 2012-03-29 eel 2012-03-29 squid 2012-03-29 turtle 2012-03-29 turtle 2012-03-30 squid 2012-03-31 turtle