Yet Another Windows Rant

Posted in Technology by Thomas Themel on November 5, 2008.

My current ThinkPad came with an 80GB hard disk. Early this summer, this started to fill up dangerously under the load of a recovery partition, a Windows Vista, my Debian system and the CERN Scientific Linux I added. Favorable dollar exchange rate and technological progress had conspired to make storage cheap, and so I just got myself a 200GB replacement drive. Once this made its way to Switzerland, I just had to copy the contents of the old disk to a backup drive, plug in the new one and copy everything back. This actually worked and even yielded a bootable version of Vista.

Trouble started a bit later, when I wanted to resize the file systems on the new disk to actually take advantage of all the new space. The old layout had been a bit short-sighted in that it had used all four available primary partition slot for actual partitions, and so I needed to shift things around a bit. No problems for the Linux partitions, just some more copy/writeback (ext3 online resizing amazed me the first time I saw it in action). Amazingly, no problem for the NTFS volume either – ntfsclone even manages to clone volumes onto new partitions of different size, after which I can mount it fine enough, and everything seems to be there.

Of course, having the entire file system back doesn’t mean you can make it boot. Researching this, I stumbled across a program called ntfsreloc, which was supposed to fix the boot sector after moving a NTFS volume. Sadly, the only thing it did for me was corrupt the file system so that it wasn’t even mountable in Linux any more. No problem, I copied it from the backup again, but still no booting Windows.

The ThinkPad doesn’t have an optical drive, but I figured I’d just go out, buy a Vista CD (yay Studenten Software Service!), boot the installer over the network and use it to do ye olde fixboot and fixmbr from there.

After arriving home with my Vista CD, I was in for surprise number 1: For Vista, Microsoft has ditched the old almost-standards-based approach of supplying a PXE boot image on the install CD. Instead, it appears that one now needs a running Windows system to install something called “Windows Distribution Services” on in order to make PXE installations. Back to the drawing board, as they say.

I had noticed a USB CD drive sitting at a client’s office, so I took the ThinkPad and my Vista CD there. Booting from the CD gave me a nice Windowsy GUI repair thing that even recognized my NTFS volume as a Windows partition. A couple of clicks later, I had reached the “repair startup process” option in the installer, and was cheerfully told that Windows could find nothing wrong with my startup process. I cursed and resorted to messing with the command line tools detailed in KB927392 a bit, all the while shaking my head at the idea of replacing the venerable boot.ini with a 32k binary blob in, of all things, registry hive format. It took me a while to figure out that when bootrec /scanos or bootrec /rebuildbcd find 0 operating systems, this is actually a good thing because they only count those which are not already registered in the BCD (a fact that is documented for /scanos, but not for /rebuildbcd, AFAICT). After I had convinced myself that my BCD was in order, I still couldn’t boot, though.

I exhausted the options of bcedit and bootrec, I did a chkdsk /r, but the result was either just a blinking cursor when trying to chainload Windows from grub, or “A disk read error has occurred. Press Ctrl+Alt+Del to reboot.” when trying to boot the Microsoft-supplied MBR. More research was in order. I found the “disk read error” message in the boot sector, so I figured it wasn’t the MBR’s fault (this saved me a lot of time because every time I tried bootrec /fixmbr, I had to restore grub afterwards to get back into my Linux for further research). Via the ntfsreloc page, I found the link to an amazing resource – An Examination of the NTFS Boot Record, where someone has gone through the trouble of documenting every byte of the NTFS boot sector, including a disassembly of the boot code with lovingly detailed comment! Reading it made me wonder why one would want to put all this stuff into a boot sector – number of heads of the hard disk, sector offset of the volume from the start of disk? Stupid. Decoding my BPB, I found out that the number of heads of the hard drive was wrong (00F0 instead of the actually-faked-by-BIOS 00FF), and that the sector offset of the partition was set to some fantasy number (A41000 instead of 1DCDA5). I fixed that using LDE and was “rewarded” with a booting Vista.

On review, it turns out that the problem is not that bootrec /fixboot doesn’t KNOW about these strange values in the boot sector, it just seems to write wrong values there (confirmed by breaking the working boot sector with /fixboot). Armed with the knowledge of what ntfsreloc was supposed to do, I was also able to figure out why it had broken my file system before:

struct ntfs_geometry {
	unsigned short sectors;
	unsigned short heads;
	unsigned long start;
};
[...]
	const int geomsize = sizeof(struct ntfs_geometry);
[...]
	if (read(device, &fs_geom, geomsize) != geomsize) {

Apparently, I was the first idiot to try this code on an AMD64 system… Now excuse me as I wander off to savour the rich irony contained in the fact that the latest stage in Windows evolution has finally succeeded in sending me back to the kind of byte-editing drudgery that I first learned in DEBUG.EXE, a long, long time ago.