Just last week, I was locked in a life and death battle with my Internet server for several days. During that time, my connection to the outside world was cut off. Here’s what happened.
About week ago (it was a Friday) Michelle noticed a message on my workstation saying that “a network cable has been unplugged” and that she could no longer get on the Internet. I came home, power-cycled the workstation, and everything seemed to be okay after that. Then, on Sunday, it happened again. No amount of power-cycling, or anything else, could fix it this time. The light on the hub for the NIC simply refused to go on. In other words – my onboard Gigabit Marvell had died.
We went out shopping – but all of the computer stores proper were closed at that time of day. A stop at Future Shop, and me asking if they had any Intel Gigabit NICs, resulted in some young sales guy saying, “Oh, yes, we used to – but I don’t see them here now – and, gigabit – that’s awfully fast – why do you need something that fast?” (No doubt implying that the stock of 10/100 Mb cards, which were on the shelves, would be fast enough for anybody. Or – at least fast enough for him to make a sale. We quickly left there and went to our regular store – Staples. I’m not sure why we went to Future Shop in the first place.
I found a Gigabit card there. It was a D-Link DGE-530T. At that point, I paused mentally and considered things. There was also a 5-port Gigabit switch. Plus, a couple more of the NICs. Perusing the back of the box, I saw that the NIC was also supposed to work with Linux. I decided to go ahead and upgrade my entire network to Gigabit speed. So, I bought a second NIC (for the Linux server) and the high speed switch. In addition – just because I knew I was throwing down a bunch of cash I hadn’t really intended on doing prior to waking up that morning, and I was in a “What the hell?” kind of mood – I grabbed one of the new Microsoft ergonomic 4000 keyboards. (The keyboard ended up being a wonderful purchase. I’ve been working with the old Microsoft ergonomic keyboard for years and have always like it better. This new version is, I feel, as much better than the old one as the old one was better than a normal keyboard. I love it. I’ve requested one for work and, if it’s not approved, will simply by another one for myself there.)
I got home, swapped out my old switch for the new one, disabled my defective on-board NIC, plugged in my new one, and booted up Windows. Everything went just fine. It found the new NIC, installed drivers for it, and I was back on the Internet again. (With a slight sour taste in my mouth that my onboard NIC had died in the first place – but it does happen from time to time and I just had to live with it.)
Now I had a switch with one green light and two yellow lights. The green indicated a Gigabit connection, the yellow a 100 Megabit connection. The slower connections were coming from my Linux server – and also from Cogeco’s cable connection. I couldn’t do anything about my ISP, obviously, but I still had a second D-Link in a box to put into the server. So, I shut it down, put in the card, and then tried to compile the module for it. That didn’t work – all I got was compile errors. I went to the D-Link site and downloaded the source for the module directly, but this only resulted in the same thing. Some Googling led me to believe that a newer kernel might fix this.
So – I upgraded from the stock 2.4 kernel to the latest 2.6. This gave me different set of errors, but it still failed to work. I thought to myself that, perhaps, the OS itself was simply too old. (I’d had RedHat Enterprise Linux 3 on there.) So – I decided to upgrade it to Fedora Core 5. The virtual machine, which has the real Internet presence and is actually running in a virtual machine on the physical box, uses Fedora Core 3 – and I’d actually planned on upgrading it to Fedora Core 5 at some point. I also knew that GSX 3.2.1 wouldn’t properly support FC5 – either as host or guest – so I knew I had to switch over to the second beta of VMware Server. But the first thing I had to do was upgrade the host to FC5.
I had 7 CD-RWs and 2 DVD-RWs. I’d already downloaded all of the CD ISOs for FC5, and the server didn’t have a DVD-ROM drive, just a regular CD-ROM drive. I didn’t feel like having to rip my DVD-ROM drive out of my workstation and put it into my server. (Anything involving wholesale cannibalism of parts, and multiple machines being operated on, I try to avoid if at all possible.) So, I started to burn CDs. The first one worked just fine – none of the others did. I kept getting write errors. So, I decided to go with DVD after all. But I couldn’t get that to burn either.
In the end, Glen convinced me to give Nero a try – thinking that the problem wasn’t the media but the burning software I’d been using (CDBurnerXP Pro). It turned out that he was right – Nero had no problem burning at all. I immediately deleted my old software, although with some regret since I always prefer to use freeware options where possible. Now I could get both CDs and DVDs for FC5. For some reason (a kind of “Well, why shouldn’t I be able to do this?” attitude) rather than just using a series of CDs, I started to investigate how I could use the DVD image without having to rip things apart.
I started by discovering that I could put the ISO directly onto the server’s hard drive, and boot up (using the 1st CD, which I’d already burned) with “linux askmethod” to tell the install to grab the full disc image off of the hard drive. Unfortunately, doing this required that the image not be on one of the partitions to which files would be written. I knew I’d have to resize a partition and create a new one. The System Rescue CD utility saw the partitions, but wouldn’t let me resize. Partition Magic saw them and did let me resize. I though I had things well in hand – until, at the beginning of the install, it aborted to say it couldn’t write to /proc and that it would have to reboot the system.
At that point in time, I had a non-functional server. It would boot – but only into a minimal configuration without various filesystems mounted. Further, as I attempted to “undo” my disk repartioning, I discovered that there were now errors on it. Essentially, I’d screwed up my hard drive. Luckily, I still have access to the data on the disc. I booted using the System Rescue CD, mounted the partition with my data, gave it an IP address on my internal network, and copied my virtual machine (the only thing I really cared about from the server) over to my workstation.
Once that was done, I knew I was going to have to wipe everything on the server and install from scratch. Around this time, I discovered that, rather than installing from hard drive, I could have installed from HTTP and pointed it to my workstation’s Web server which was sharing out the FC5 files. So – I did this. (Rather than burning the remaining set of CDs or ripping things apart to use the physcal DVD.) This started to work quite well – until it aborted because it couldn’t read from a file. It turned out that my original DVD ISO download was, itself, corrupt. (I’m thankful I hadn’t ripped things apart to get a DVD-ROM drive in the server – it wouldn’t have done me much good.) I copied all of the files from the various CD ISOs onto my Web site. This finally worked. I only wish I’d realized I could have done this at the beginning of the whole process.
I got Fedora installed on the server. I then compiled the kernel source, compiled that resulting kernel, and got VMware Server installed and working. I copied my virtual machine files back to the server, and was able to get everything else working. (It wasn’t quite as simple as I make it out – I had to do several different things to make it all function properly – but, compared to everything else, it wasn’t that difficult.)
So, I was back on the Internet – but I still didn’t have the Gigabit card activated. The compilation of the modules still failed. In the end, I discovered that I needed to build in support for an “skge” adapter – rather than the “sk98lin” that originally drove this device. I had to query the kernel .config file to determine that to use “skge” I had to have “New SysKonnect GigaEthernet” support enabled. (You would think that there’d be better documentation for this, either in the material provided with the card, or on the D-Link Web site itself.) Ironically, I’m sure I could have compiled this in while still using the old OS on the host – so I didn’t need to go through with the upgrade in the first place. Of course, I hadn’t realized that at the time.
But, in the end, this was a success, and I now have both my workstation and server operating at Gigabit speed. I didn’t have to go through all of this grief (things would have kept working as before if all I’d done was replace my workstation NIC) but, as with most difficult processes, I learned a lot of very useful information by going through it all.