The long and the short of it.

Okay. As promised here is my more detailed outline of what happened a few days ago. Although my memory of the details have faded a little bit, so has my shock – so it all evens out. I’m also leaving out all of the really technical details, and explaining things in a fashion that, hopefully, will make some kind of sense to people who don’t know all that much about computers. Suffice it to say, that I could have written much more on the subject. Also, any mild kind of surprise that a technically savvy person might pick up from this is nothing in comparison to the actual details that I’ll be leaving out.

I’d been working on a Web page that made use of PHP to query and display parts of a database. It was the first time I’d actually done this, and I have to admit that it was far easier than I’d thought it would be. I’ll definitely be making use of this sort of things again. Not only does it dynamically construct the page from the database (so all you need to do is add/remove/edit items in the database and the Web code takes care of displaying all of those changes on its own) but you can also create “multiple” pages out of a single file using various conditional statements and PHP’s ability to call itself self-referentially. (This, rather than my previous methods of doing this which would rely on multiple files or multiple directories.)

In any case, I’d backed up my regular (HTML) Web page as “Welcome.bak”, while I was working on the “Welcome.php” file. In case it didn’t work out, I could always return to my previous work. When it did work out, I decided to delete the old file by typing the Linux command “rm Welcome.bak”. This would have been fine – except that immediately after I’d done so I realised that I hadn’t typed that at all. Instead, what I’d actually typed was “rm Welcome.php”!

Knowing that it would take me at least a couple of hours to redo what I’d just finished accidentally deleting, I decided I’d investigate the possibility of somehow undeleting the file. While I’ve done this sort of thing in Windows before, I’d never done it with Linux. I got on the phone with my best friend, Glen, who quickly pointed me to “The Coroner’s Tookit” – which contains such utilities as “grave-robber” and “lazarus”. (I just love people who come up with cool names like this.) It’s meant primarily as a tool to recover data from systems that have been hacked into and compromised, rather than just as a way of recovering a single, small file, but I figured I’d try it out anyway. The odd’s seemed good that I could get the file back relatively quickly.

The first thing that happened was that I needed to generate a file containing all of the (now) free space on the hard drive (using the “unrm” command). Actually, to be more precise, the space on the Linux “partition” containing the /www directory where all of the Web site files are kept. I didn’t think that would be a problem since I had another portion of the hard drive just as big, and I didn’t think that there really was all that much free space. However, I was wrong. The “undelete” file (on which I was then supposed to run “lazarus”) took up all of the space I had before unrm stopped. Not only was I not sure if it stopped because it was actually done or because it had simply run out of room to write any more, but I certainly didn’t have any free space left over to run the additionally required “lazarus” command to go on to the next step.

No problem. I had enough free space on my Windows computer. I thought I’d just share my Windows hard drive with my Linux server and have it write everything out to it. This seemed simple enough. Except for, seemingly, a few initial problems. First of all, I didn’t have the “smbmount” command that I needed to this on the Linux side. Downloading and installing it was no problem. But when I ran it nothing happened. Finally, I realised that I’d disabled all of the required networking components on my Windows computer. (I like to not have more enabled on my Windows computer than necessary for what I normally do. Keeping things to a minimum means that it can run faster.) I spent some time reinstalling / reconfiguring things there. After this, that unrm utility finally started to create the “undelete” file that I need on my Windows computer.

Then it stopped when the file was 2 Gig in size. (It needed to be around 4G.) Apparently “smbmount” has a self-imposed limit of working with files that are no bigger than 2G in size. So close – but not there yet. I investigated and found that there were some patches that could be applied to smbmount to get it to bypass this problem. After trying to install these to no avail, I discovered it may, in part, have been due to not having the most up to date version of the Linux operating system “kernel” in place. So I downloaded that and installed it.

At which point my workstation was no longer able to connect to the Internet (although my server could). Apparently some kind of incompatibility with the newer kernel I’d just installed was at work. So I now had two problems. My initial issue with the file I wanted back (which had led me down a series of unexpected obstacles in trying to get my server to have access to enough hard drive space) and the fact that I couldn’t get on the Internet from my workstation in order to further investigate things properly.

I spent some time booting back and forth between the old kernel (with Internet access but without the >2G support I needed) and the new kernel (with the >2G support but without Internet access) – and quickly got tired of that dance. So I focused on the Internet problem. I spent a fair amount of time downloading, installing, and configuring various things before I finally gave up and called it a lost cause.

Back to the hard drive space issue. I gave up on smbmount and tried a different utility (mount.cifs for those who care) that should have worked. Many more hours of trial and error ensued on this before I finally got it working. Although I really shouldn’t say “working” since, while I could make a connection and start the file transfer (theoretically without that 2G limit), it would always die on me relatively quickly with a generic write error for some reason that I still can’t identify.

By this time over 24 hours had gone by since I’d deleted that file. But I was still determined not to be beaten by everything going on, so I decided to move the hard drive (with its free space) from my Windows computer to the server directly. Before doing that I had to resize the portion of the drive on which I’d installed Windows, so that there was actually enough free space, not being used by Windows, that the server could access without causing all sorts of damage. (Linux and Windows don’t co-exist very well, if at all, in terms of using the same portions of a hard drive). I discovered that I didn’t have quite enough free space when taking the resize into account, so I backed up some MP3 music files onto tape using the utility Arkeia, then deleted them to free up the room. After which I resized the hard drive and transferred it over to the Linux server.

Only to discover that the computer that’s home to the Linux server is too old to understand that the drive I’d just put installed was a 40G drive. (It thought it was only an 8G drive.) While I was careful not to intentionally let anything write any data to it, something must have happened in the process I went through because when I gave up on that idea (and the whole undelete thing as a whole) it was to discover that my Windows hard drive was unusable. All data seemingly gone.

Now, around this point, I would have given almost anything to turn the clock back 36 hours and just accept the fact that the one file was gone, and that I should just swear a little bit about that but then get on with the 1-2 hours business of recreating it. However, that was not to be. About the only thing that gave me any kind of hope at all at this point was the fact that Windows actually did start to boot and show the initial screen before giving me the famous blue screen of death. If the drive had really been nothing more than a boat anchor, I should have got something like a “Non system disk.” error when turning the computer on. So something was still on the disk, despite the fact that it didn’t seem usable. (Disk utiilites informed me, variously, that the resizing I’d done previously was now “undone”, that the name of the disk was “?????”, that it didn’t exist at all, and that its capacity, rather than being 40G, was something like 100P (which is 100,000,000G). I had a pretty good idea that a place like Action Front Data Recovery Labs would be able to retrieve everything. They can get back data when a drive has been formatted (as well as worse), and what had happened to this disk seemed a lot less serious. But – making use of them would cost me at least a thousand dollars and there was no way I could afford that.

I ended going out to various pubs with Michelle and getting myself a little drunk. Not that I actually felt all that drunk, since I was still so wound up about what had happened.

In the end, it turned out to be Glen who saved the day. He brought with him a utility that connected to my Windows computer (with its questionable drive) from his laptop. This utility (a variation of one I already had but different in its approach) saw the drive with its correct size, and was able to run Windows’ own “chkdsk” program against it. It turned out (as I’d expected / hoped) that none of the actual data was corrupt or missing, just the “index” on the drive that told it were everthing was located. (Similar to reading a book in which every word has been printed on the page in random order.) So, the “index” was successfully repaired and everything was back to normal.

All, that is, except for a couple of things. My backup program had “forgotten” (somehow) that I’d backed up those MP3 music files. I still had the backup data on tape, but it didn’t know about it. Nor could I figure out how to “scan” the tape so that it would remember. Currently, I still don’t have those files restored. Even though they’re backed up on tape, I can’t do anything about restoring them. This is not a major disaster (certainly not in comparison to having lost everything) but it is pretty ironic when you’ve got a backup you can’t figure out how to restore it (it makes the whole enterprise pretty useless, despite your good intentions). If I can’t figure out how to scan the contents of that tape, I may have to forget about restoring it, and simply scan the original CDs back into my computer again, as I did the first time.

Further, to add insult to injury, I can’t figure out how to actually create a backup using this utility any more. When I’d first used it, the important files I’d marked had only taken up a single tape. Now, for some reason, the program isn’t using all of the tape I put in and is insisting on using two tapes. But when it prompts me to insert a 2nd tape, and I do so, it doesn’t do anything except sit there. When I click on the “Okay” button it has, it closes out the job and I’m left with just a partial backup. This utility, Arkeia, I’ve always found to be much more complex than necessary, and I’ve constantly struggled with it since I started using it – but, in the end, had (I thought) figured it out so that it was usable. Now, it just seems to be a waste of time – or worse, since it’s given me a false sense of security. Unfortunately, I don’t know of any other (free) utility that will let me backup both my Linux and Windows computers from the tape drive in my Linux server.

Also, that long-deleted file was still deleted. When I got back to recreating it from scratch, I discovered that I actually did have another, older, version of it in another directory. So I was already halfway there. It only took me an hour to get things back to where there were before. If fact, they ended up being better because, in the process, I fixed a problem I’d had with the original version!

The strangest thing about all of this is that, even in hindsight, there was no single point in time at which I could have realistically thought that it would be faster to just recreate this file from scratch than to go through the seemingly simple steps of attempting the restore. Even as time stretched on, the (to my mind) remaining steps of resolving the most current roadblock and continuing on should have taken less time than the file recreation. There’s no way that, given what I knew then, I could have thought that it would be such an involved or, in the end, futile process. Glen agrees with me on this – he would have done exactly what I did. With the one possibile exception that he would have made sure he had a full backup of the Windows hard drive before moving it over to the Linux server. (Of course, even if I had done that – and it’s something I’ll make sure I do from now on, even if I have no reason to suspect anything will happen to warrant needing it – it may not have done me any good since I bet I wouldn’t have been able to get this utility to restore anything off of it…)

My first priority at the moment? Make sure I get some kind of working backup / restore solution for my two computers. At this particular point, I think it will just be to copy all of the important files from one to the other. So long as both computers don’t die on me I’ll be fine. After that I really need to look into a usable backup utility. (Or, at least, figure out how in the world to use the one I currently have.)

Errant Musings

By Jason Bassford

The long and the short of it.