Friday, June 12, 2009

DD and Netcat to Clone Hard Disks

Normally when I need to clone out a hard disk image to a set of systems I have success using my RIP Linux CD's to boot up and mount a remote share and save an image using Partimage, a utility that creates a compressed file with the contents of a hard disk partition saved to a remote system. I can then copy it down to the other systems limited only by my network bandwidth.

But some systems get indigestion from this. They don't like RIP, or they don't like Partimage for reasons only the silicon gods fully comprehend.

We have a few systems like this that I have to work with. At times like this, I use two basic utilities together to make a literal block-for-block copy of the hard disk to the other disk over the network.

You need two machines identical in hardware.

The source disk must be the same size or smaller than the target disk on which you want to place the image.

Boot RIP (or any other Linux bootable liveCD, needed if RIP won't work on that system or can't see the network card or some other goofy anomaly). From a command prompt on the target system, run

nc -l -p 7000 | dd of=/dev/(drive)

OR

nc -l -p 7000 | pv | dd of=/dev/(drive)

On the source system which has the working installation, run

dd if=/dev/(drive) | nc (ip of target system) 7000 -q 10

Okay...some explanations.

(drive) means the hard drive. Probably /dev/hda or /dev/sda...use dmesg to find what drives are detected during bootup. And you want the drive, not the partition (such as /dev/hda1, which is the first partition on /dev/hda).

(ip of target system) is the IP address of the system on which you want an image put. You get that on your target machine after booting it up and getting networking working...ifconfig should tell you that information.

dd is a command that will read the raw disk device. Handy, but dangerous! Make sure you are careful...one slip of if (input file) vs. of (output file) and you could screw up your working system with an image of a blank drive...use with care!

nc is netcat. It's like a swiss army knife of network tools in that it can allow you to do a variety of functions over the network. Here it's acting as a conduit for sending information from point A to point B. You're telling it to use port 7000, and on one machine it's listening for a connection (-l) and the other it's waiting for 10 seconds after data ceases coming in before quitting, to make sure data is flushed and it's not a connection glitch (-q).

The pv command is optional. It lets you "view pipeline" (you're piping data from one command to another) so you can see how much progress is being made. Ordinarily when you're doing this it looks like nothing is happening...the systems just sit there. I have to watch and see if the drive LED's are blinking on the front of the case to see if there's a connection hiccup or there's still activity. PV tells you in realtime what is going on. I should also mention that while I found that RIP has pv, it depends on the livecd you choose to use as to whether or not this particular application is included on your disc.

There are variations to this setup...I am only giving the simplest form. For example you can stick gzip in there to zip data on the sender and unzip at the receiver to, theoretically, speed it up a bit. Remember dd is reading every block on the drive, whether there's data to be sent or not, so this can take hours to do. But its gonna read every single block, fragmented, no matter the file system or partitioning. It's reliable but takes awhile.

I also give no guarantees. I'm telling you what works for me. If you screw up your system...how many posts have I put up about backups already?? Also this is a totally free solution. There are commercial cloning programs out there that can run hundreds or thousands of dollars depending on how much you need to do. This one takes elbow grease because it doesn't automate a lot. For example, if systems are set up on a domain network I have to remove the source machine from the domain first, then clone it, boot windows, change the system's name, use newsid to give it a new ID (although adding to the domain should fix that already), then put them both back on the domain. Some commercial offerings I believe automate this.

The upshot is that once you understand the process it's as flexible as a circus acrobat. You can, for example, mount a remote share (or use Netcat) to copy an image of the disk to another computer for storage with dd. Use zip (or 7zip) to compress it down. Takes a few hours of downtime, but you'll have an image from which you can restore your computer later, and best of all it's a block-by-block copy...suffice it to say that the significance of this is that some filesystems allow for little tricks to "hide" files (like, say, viruses) on your disk where you can't see them without special tools or some software (like certain CAD software) will do funky stuff with the boot sector or areas of the drive to prevent you from pirating their software. DD doesn't care. It'll copy the raw data so you'd be able to restore those programs (except the malware...the malware part you'd hopefully get a clean copy before it happens.) The possibilities are far too many to enumerate here. I'll leave it to your sysadmin's imagination.

No comments:

Post a Comment