Home> Blog> Solution to Data Merging with Dictionaries

Solution to Data Merging with Dictionaries

This post originally appeared on the Software Carpentry website.

This week's tutorial problem was to merge the data from a set of input files to show how often different species were observed on different dates. The shell pipeline, Python code, and two sample input files follow the video.

{% assign video_title="Data Merging with Dictionaries" %} {% assign video_slug="S-dqlYWs4S0" %} {% assign video_time="00:09:18" %} {% include youtube %}

shell command

grep -h -v '#' *.txt | sort | uniq -c

merge.py

import sys

# Read and merge data.
results = {}
filenames = sys.argv[1:]
for f in filenames:
    reader = file(f, 'r')
    for line in reader:
        if line.startswith('#'):
            pass
        else:
            date, species = line.split()
            key = (date, species)
            if key not in results:
                results[key] = 1
            else:
                results[key] += 1
    reader.close()

# Format output.
all_combos = results.keys()
all_combos.sort()
for key in all_combos:
    count = results[key]
    print count, key[0], key[1]

cousteau.txt

# Jacques Cousteau
2012-03-27 marlin
2012-03-29 tuna
2012-03-29 tuna
2012-03-29 turtle

haddock.txt

# Steve Haddock
2012-03-28 squid
2012-03-28 marlin
2012-03-28 marlin
2012-03-29 eel
2012-03-29 squid
2012-03-29 turtle
2012-03-29 turtle
2012-03-30 squid
2012-03-31 turtle