Sunday, November 8, 2009

Documenting Configurations

I had an incident that reminded me of an aspect to system administration that we as system administrators don't often address.

It's a "dirty thought", the thing that ends up being on our minds without usually being said. An elephant in the room, if you will.

That thought is just how much of our jobs is to protect users from themselves.

I had a user call up to say their program wasn't working. I'll call it Widgetapp. She is the only one that uses Widgetapp. It's an older program (not extremely old, but about five years in age or so), and it's used to track a vital bit of data on a couple thousand of our users for HR purposes.

Since she's the only user that uses Widgetapp she is the only one with a PC that has the application installed.

I viewed her desktop and found that the program was opening a "sample database" meant for training purposes. Oh...no problem. I use File->open to open the other database with our live data.

I couldn't find it on her PC.

Hmm...this could be bad.

Her desktop doesn't have a backup agent of any sort on it; users are instructed to save all data to their home directories and the servers are then backed up regularly (when the backup server is double checked that it is working properly, that is). I looked at what kind of file the database was and started searching her PC for similar files. Nothing.

At this point I was getting irritated; I couldn't imagine why, if I've worked with this application before (it rarely needed fixing or alterations made) and the application allows you to specify a location for the database file, I wouldn't have stuck it onto the server.

I started looking for a backup of the database on the server used for her department. I hoped that there would be a database that was at most a few weeks old.

Instead I found an oddly named folder that had an uncompressed database file. I created a new folder just above that with a more obvious name (Widgetapp_database) and copied the suspicious contents to that folder then pointed the program to that database and opened it and then had the user check the database; her most recent entries were there!

From what I could piece together my suspicion that I had pointed the program to a database on the server (where it would be backed up regularly) was indeed what I had done. At some point when the company made an upgrade to Widgetapp they moved the folder (still on the server) to another location.

The user probably had a network issue or some other problem where she ended up pointing the program to a default "training" database on her local hard disk. She had no idea that data was actually residing on a shared folder so it was up to us to know this...and we didn't.

Lessons?

A) Keep application data centralized. Programs that don't allow you to point to network shares or UNC's or IP's of application servers are crap. Centralizing the data allows you to centralize your backup management.
B) Document your applications. Document your changes. Document your configurations. Document everything.
C) Users won't have a freakin' clue what you're talking about.

Our organization doesn't do a lot of documentation. We don't have the manpower to properly handle it, and it's a situation that isn't going to change in the near future.

We expect users with specialized software needs to keep track of certain things with those applications. Again, we're extremely shorthanded in our duties and so we make an unreasonable assumption that the user will take responsibility for applications they insist they need. In the end they don't. I consider this another elephant in the room...we know we're doing wrong by it but do it anyway. What normally ends up happening is we end up spinning our wheels for a time because we're re-learning how to use the application or figuring out how something was configured instead of having an up to date reference that spells it out. Then we end up sometimes creating a new method to work around the issue or fix the problem that counters what one of our coworkers initially did. Hilarity ensues if that other coworker is the next one called in to fix the next mess.


I guess the biggest fail here is lack of documentation. We are shorthanded so we take shortcuts. This means we don't keep track of changes made to systems, we're just starting to document procedures, and no work has gone into properly making documentation available (not just available as in collected in some tome on a shelf; available means being able to actually find the information you need, and that means leveraging a wiki or issue tracking database for the troubleshooters to use for getting user and system history and tracking configuration issues).

In this case there was a happy ending. The user's database was found and the application worked once again. The user was happy. And I re-discovered how the application was set up, so I managed to solve my puzzle of the day. The next time I may not be so lucky.

No comments:

Post a Comment