....or the cutting and pasting unverifiable theories for fun and profit

Documenting climatology's fascination with regurgitation. Here is a popular example to get you started: Luterbacher and Jones borrow their text from the Mann.

Sunday, May 13, 2012

Are There Fingerprints in the Climategate Archives?

Are there differences in the recipients of Climategate I and II email messages? To analyze this I used the following commands:
egrep -i "^To:" *.txt |\
       tr -cs A-Za-z0-9 '\012' |\
       tr A-Z a-z |\
       awk '{ count[$1]++}END{for(w in count){print w " " count[w]}}' |\
       sort -n -k2
These commands pull out the 'To:' line(s) from each email, turns these lines into a set of individual words, lowercase these strings, then count the frequency of occurrence of each string. Finally the results are sorted numerically, based on the count field. The results of this is a long screenful of output with some extraneous strings, such as 'txt' with a count number, like this:
mann 918
phil 603
keith 624
ucar 640
de 645
t 791
hulme 815
osborn 885
com 903
m 970
k 1086
p 1181
briffa 1548
jones 1753
gov 1892
edu 2746
uea 4654
to 5729
txt 5736
ac 6404
uk 7232
Filtering this by hand to remove the strings like 'txt' and 'uk' yields the following counts for the top five recipients and senders:
CG I
mann 183
osborn 200
jones 313
briffa 373
CG II
hulme 815
osborn 885
briffa 1548
jones 1753
So - Keith Briffa was the leading recipient of emails in CG I and Jones is the leading recipient of emails in CG II. Whether this is symptomatic of significantly different sources for the email archives, I do not know for sure, but it is suggestive of a difference.

No comments:

Post a Comment