Log in

Graph of recorded file access and modification times for files in FOI2009.zip

December 27th, 2009 (12:54 pm)

Here's a graph of file access times versus file modification times as recorded in the FOI2009.zip archive of the cracked CRU files. Note that the graph doesn't cover the entire range of times spanned by the files, but only the more 'active' time periods.

* * *
Update 2009-12-27, 2010-05-16: I put up the semi-raw data which the graph was produced from.

Update 2010-08-20: The old graph image was nicked, so I've replaced it with a new one -- which should give the same information.

Update 2010-09-04: OK, here's the gnuplot program for generating the graph from the semi-processed data:
#!/usr/bin/env gnuplot

set terminal postscript color
set output "vomit-zip.ps"
set title "File Access Times vs. File Modification Times in FOI2009.zip"
set xlabel "File modification time (UTC)"
set ylabel "File access time (UTC)"
set xtics rotate
set xdata time
set ydata time
set ytics 864000
set grid xtics
set grid ytics
set timefmt "%Y-%m-%d,%H:%M:%S"
set format xy "%Y-%m-%d"
set xrange ["1988-01-01,00:00:00":"2009-12-01,00:00:00"]
set yrange ["2009-09-10,00:00:00":"2009-11-29,00:00:00"]
set key box
plot "vomit-zip-FOI2009.out.txt" using 4:6 with points pt 7 \
        title "FOI2009.zip", \
     "vomit-zip-mbh98.out.txt" using 4:6 with points pt 7 \
        title "mbh98-osborn.zip in FOI2009.zip"


Posted by: https://www.google.com/accounts/o8/id?id=AItOawmhLJyQlNujh7gcccLct1ecxJ3y0J5MCm8 (ext_219820)
Posted at: December 28th, 2009 08:18 pm (UTC)
Atimes are very consistent.

The atimes are one of the more interesting features of this whole file. For the 4662 entries in the zip file, only 34 different atimes are present. And all but two with atimes before 2009-09-16 have a timestamp different from Midnight EST, on January 1st, of some year. In the case of those two files (Skagerrak-Foram-2010.doc and tduch.pdf), the atime exactly matches the mtime of the same file.

The atimes of many files being the same is consistent with being extracted from a tar file by GNU tar. Most variants of tar don't record the atime of the file when it's stored, so GNU tar sets the atime to the point when tar is run. (mtime=atime is consistent with the behavior of unzip on Unix when it extracts files from a zip file that doesn't record the atime, as well.) So from this I would conclude that data exfiltration probably began at or around September 16th of 2009.

Posted by: Decoding SwiftHack (ijish)
Posted at: December 31st, 2009 09:39 pm (UTC)
Re: Atimes are very consistent.

Hmm. So I guess we can probably say that the files were extracted from smaller .tar (or .zip) files in batches, and then added to form the huge .zip file?

For my part, I'm curious to know why some files are dated 1 Jan 2009 while others are not...

Posted by: ((Anonymous))
Posted at: February 20th, 2010 04:33 pm (UTC)
mbh98-osborn.zip- two different Zip versions present

Looks like the mbh98-osborn archive was zipped over time on two different machines. the Zip version used to encode (2.3) is unchanged across all 2175 files.
However, I get a different count on the minimum version#(s) required to extract.

There are 159 files (mbh98-osborn.zip) showing version 1.0 (same as foi2009.zip.)

There are 2016 (mbh98-osborn.zip) files showing version 2.0. File count is short 10 files, probably because my regex command is wacky. :)

I used the "zipinfo -v" command to generate the file that was grepped.

Incorporating this info into your timeline might reveal something interesting. Who knows?

Posted by: Decoding SwiftHack (ijish)
Posted at: February 20th, 2010 05:05 pm (UTC)
Re: mbh98-osborn.zip- two different Zip versions present

Seems there's a similar phenomenon in FOI2009.zip itself, but it doesn't seem to be anything unusual -- the files where the minimum required version is 1.0 are simply those which are either directory names, or aren't compressed (i.e. are "stored"). At least that's what seems to be going on for the files I've looked at so far.

Posted by: Decoding SwiftHack (ijish)
Posted at: November 24th, 2011 05:53 pm (UTC)
My attempt to explain this stuff in layman terms...

In reply to TrueSceptic's comment at RealClimate --

Surely the same applied to the original hack, where all the file dates had been set to 1 Jan 2009?
Not so. Only the e-mails in FOI2009.zip (under FOIA/mail/) were all made to read 1 Jan 2009.

Many of the code and data files (in FOIA/documents/) retained their original modification times (e.g. the 1990s), which probably correspond to the times the code and data were originally written by their authors (e.g. Briffa).

What's more, FOI2009.zip uses an enhancement of the .zip format. The classical .zip format only stores modification times in the local time zone of the machine creating the .zip. But with the "UX" feature (which is enabled on default by certain zip programs), each .zip entry also records the file's modification time as UTC, as well as an access time in UTC (roughly, what time the file was last written or read).

The access times mostly range from Sep 2009 to Nov 2009, and the difference between the UTC and local modification times suggest a time zone of -0500/-0400. More here and here.

The latest SwiftHack 2.0 release (at least the front half of it) avoids these information leakages by suppressing this particular .zip format enhancement -- only local modification times are stored throughout, so not even time zone information can be gleaned.

-- frank

Posted by: Decoding SwiftHack (ijish)
Posted at: November 25th, 2011 05:00 pm (UTC)
Re: My attempt to explain this stuff in layman terms...

Erratum: "UT", not "UX". (FOI2009.zip uses "UT" and "Ux" (small "x") extra fields; a "UT" extra field contains the modification and access times as UTC.)

6 Read Comments