Yesterday around lunchtime an entire subnet at the school went 'dark'. This is not good. The session ('semester' for my friends in the northern hemisphere) starts on Monday. This is the absolute busiest time of year for faculty and staff because we've got a lot of stuff to prepare for next week when the hammer falls.
The curious thing is that the subnet that went 'dark' was only one subnet, and another subnet which traverses the same wire continues to work fine. So only a random smattering of machines was affected. This made it very difficult to even track down what happened because it appeared as a random cluster of machines that suddenly could not route packets. As it turns out they're all related by having an equal third byte of the IP address. What made it even more difficult to troubleshoot is that two of these machines which went dark are the primary DNS servers, so when they vanished, nobody could see anything 'by name' until we patched a couple of machines over to an alternate.
Trying to get anything done by IP number is a minefield, because even if you don't use any hostnames directly, you might accidentally touch a server or service which does - and you're screwed; waiting for it to time out (if you're lucky). Some services just hang until you get tired of looking at the hourglass icon and then you have to go find another already logged-in session somewhere else to work. Can't start any new sessions because they mount home directories, which touch name servers and will hang.
So I'm chugging the morning coffee and heading off to work an hour early this morning to try and recover from this disaster. Spent half the night awake formulating a plan after spending the entire evening determining for certain where the problem was. The problem is a router that's locked in a closet, and only the main campus IT folks have keys. Coincidentally, they made some configuration changes in that closet yesterday. Around lunchtime.
Gentlemen, get over here right now and unlock that door.
Unfortunately it isn't that easy, as there are layers of bureaucracy to contend with. My backup plan is to move two of our absolutely critical machines out of the darkness and into the light. One of them I can unplug and carry away. The other is a virtual machine living on two networks (which can run elsewhere) but I've got to find a wire in another room/building that can talk to both subnets. Oh and a cooperative host with enough disk and memory, that won't mind being loaded up with an alien machine.
-- Linus, when upgrading linux.cs.helsinki.fi, and after using the machine for several months

Digg
Delicious
Netscape
Technorati