Approximately 1 month ago I downloaded a .gz file which supposed to be an iso file of an software installation DVD from the software company’s own site. I’ll not disclose the company name in case this is a weird security method (which should not since I’ve legit access to that file and they should inform how to open that file.) Anyhow to be on the safe side I’ll not tell which company and which product. That’s not important indeed.

After downloading 1.5GB of data with my 1024kbit ADSL line, I failed to open the file with WinRAR which is able to open gz files normally. I downloaded the gzip utility thinking that the file may be a in a new gz format that WinRAR is not aware of… Nope! that didn’t work either. The 3rd thing I tried was renaming the file from xxx.iso.gz to xxx.iso hoping that the file is uncompressed somehow on the way (I read that’s possible, client de-compressing the gz file during download). That approach failed to.

So there was only one thing to do, re-downloading the file as it’s obviously broken. I did a second download (many more hours). And guess what? The new file is broken too!!! GRRR! I did a binary comparison on the first and second files and they were identical (WinHex is a very powerful life-saver tool BTW). So the file is either broken on the server or it’s being spoiled -in a consistent way- by something on the way or on my end. Since I’ve already spent so much time, I changed the plan and downloaded a public-trial version of the software and that was enough for me.

That was 1 month ago. Then today…

Today one of my co-workers downloaded the new (next) version of that software from that legit location again. This time it’s a 2.3GB download and guess what? Yup: she could not open the gz file. I was curious and I started downloading it. Meanwhile I started poking the old file, that I failed to open last month. I kept that file in case I can open it someday. I’m glad that I did.

First I opened the file with WinHex. I did a spectral analysis on the file (counting bytes to see which characters are occurring in which frequency).
spectrum of gz file
Almost all the characters were occurring with the same frequency and that was a good indicator that the file contains compressed data (very basic check for high entropy). For a comparison here is the spectrum of uncompressed iso file (with obviously low entropy):
spectrum of iso file
Since the file looks like compressed file, then I decided to check if the file matches the gz file format. I googled the web and found this gz file format descriptor text.

The first few bytes of my gz file was like this:

3C 8B 08 08 2B 3C FC 46 00 03

I started cross-checking.

offset:0      2 bytes  magic header  0x1f, 0x8b (\037 \213)

my file has “3C 8B” instead of “1F 8B”. Weird. This didn’t match but since one byte is matching I went on…

offset:2      1 byte   compression method
0: store (copied)
1: compress
2: pack
3: lzh
4..7: reserved
8: deflate

My file has 08 which is logical (deflate)

 offset:3      1 byte   flags
bit 0 set: file probably ascii text
bit 1 set: continuation of multi-part gzip file, part number present
bit 2 set: extra field present
bit 3 set: original file name present
bit 4 set: file comment present
bit 5 set: file is encrypted, encryption header present
bit 6,7: reserved

I have 08 which means “original file name present” and yeah I see the iso file name after few bytes. Hmm.

offset:4      4 bytes  file modification time in Unix format

I have 2B 3C FC 46 which is 0x46FC3C2B which makes 1190935595 in decimal. And that converts to “Fri Sep 28 2007 02:26:35 GMT+0300 (GTB Daylight Time)” which makes sense.

offset:8      1 byte   extra flags (depend on compression method)

I have 00 which is meaningful too. And lastly

offset:9      1 byte   OS type

I got 03 there and that means Unix which makes complete sense too. And after this point I have null-terminated string having the iso file name.

So it seems that only the first byte of the file is wrong and besides that the rest of the file is indeed a gzip file. I modified that first byte with WinHex to 1F so that my file starts with bytes

1F 8B 08 08 2B 3C FC 46 00 03

instead of

3C 8B 08 08 2B 3C FC 46 00 03 

… and YAY! WinRAR can now open and uncompress that file happily. I did a brief search on the net to see if there is a variant of gzip file with 3C 8B header magic instead of 1F 8B but I failed to find such info. I still wonder how that happened both to me and my colleagues. But anyhow we have a solution now. Since my co-worker didn’t have a WinHex license, I sketched this Python code which changes the first byte of the given file to 0x1f

f=open('C:\\foo.gz', 'r+b')
f.seek(0)
f.write("\x1F")
f.close()

If anybody know anything about those weird gz files, I’d like to hear.

© 2012 Notes to Shelf Suffusion theme by Sayontan Sinha