Sunday, August 9, 2009

Information Rot on the Webbertubes

One of the roughest parts of working on the Internet is dealing with information rot.

I've been recently digging through what it would take to create a high-availability cluster of Linux systems; essentially a way of taking two or more systems and configuring them to provide a service (like a web site, for example) that appears to be one computer but in reality when one computer fails...motherboard bursts into flames...the other computer takes over and continues to provide the service in a nearly uninterrupted manner, enough that most users aren't aware there's been a problem.

Putting it like that makes it sound simple but it's not.

Information rot is the name I coined for the issue I ran into when researching the task. I've come to see it as a reason for techies to be a little more understanding when they spew vitriol at newbies who don't RTFM or Google for a solution before asking mailing lists for the answer to an issue that has already been answered three times in the past two weeks (to be fair, though, it seems many still don't do that first).

Here's an example of what I ran into. Some people set up clusters and put information into what's called a HOWTO. Just as the name implies, it is a document that tells you HOW TO do what they did. A recipe for setting up a cluster similar to what they did, in this case.

In order for computer B to know that computer A has had a problem (and so computer B must take over A's role), there is a bit of software called Heartbeat whose job it is to periodically check the other machine's health by answering a network query.

Since the first heartbeat software was released for the project it has since had a series of changes made; there was a version one, then a major second version, then it became another program called Pacemaker.

Now when you're trying to create a cluster to meet your own set of needs it means finding information on bringing a lot of related-but-not-tied-togther programs to work towards a common goal (your project needs). It means going through a lot of HOWTOs and home pages and program man pages.

But clusters aren't found in everyone's basement; of all the people who do create a cluster only a fraction of them bother sharing their experiences and information as tutorials, and then they often don't update them anymore. Many of the documents...instructions, HOWTOs, mailing list anecdotes...will have instructions with disclaimers saying, "This is how you do it with version one...version two has this ability built in, so click on this link to see how to configure that. Note that with Pacemaker you should disregard these and go to this link telling you how to do it the "right" way with Pacemaker, unless you're running in compatibility mode with version one Heartbeat..."

That's if you're lucky.

Sometimes you find instructions that refer to software that is just plain out of date so the author doesn't put any notes about versions in it at all, leaving you scratching your head why the commands aren't working for you the way they worked for the author of the document.

It's not just with trying to configure a cluster that I've run into this. I had the same issues with setting up proxy servers (speeding up and filtering web browsing for a large site) and mail filtering for a large number of users (sending email through a filter that decided what was spam and what wasn't while blocking certain senders and blocking certain attachments). The technology behind filtering and even the mechanisms for rerouting networks in Linux changed over time and so I had to puzzle out what documentation was relevant to the version of software, and the mix of software, I was trying to use for the task! Part of the problem with working in technology is that this shift occurs constantly and we're expected to change with it.

It's gaining not just knowledge of what is available as an option but the need to gain an understanding of each component so you can truly apply that solution to meet the needs of the client, whether that client is yourself or another business or department. Often it seems as if users think that tech people just have a store of knowledge that can be tapped into like a human Google and it simply doesn't work like that.

Ever ask a tech person a question that seems very simple on the surface, but in reality is far more complicated? My favorite one is when someone asks me what computer they should get. Seems simple, like asking what their favorite color is. But I can't give them a simple answer if I'm being honest.

I'd need to know what you're going to use the computer for, what your experience level is, what kind of software you really think you absolutely have to use. Are you a gamer? Just an Internet browser? And your budget? All of those are factors in the decision. A Mac can't be beat for people doing audio and video editing at home or just web browsing and email. Windows systems are cheaper (in price and quality, usually). Would your use include a lot of travel? Or do you just like commuting from home to the local book shop with wifi? A netbook might fit...or do you need a true desktop replacement class notebook? Data redundancy or backup?

While this seems like veering away from the concept of information rot, it is related in that both reflect the difficulty in gaining in-depth comprehension of the solution you're seeking.

Google can only point you to knowledge. It's up to you to find wisdom.

No comments:

Post a Comment