June 18, 2009, 6:06 p.m.
posted by void
Synchronize Files Across Machines
If you switch between multiple machines, it's a real hassle to keep track of which system has the latest version of a file. Use the free Unison File Synchronizer to keep all your systems in sync, even Windows and Mac machines. There's an old saying that everyone should take to heart: "There are two kinds of computer users: those who have lost data and those who are about to." I belong in that first camp, but it only took one nasty experience with an accidentally deleted term paper to force me to resolve never to let that happen again. That night long ago, when I lost the fruits of my labors analyzing the great film Raising Arizona, taught me the hard way that backing up is essential. And while traditional backups [Hack #79] are vital, keeping two or more machines in sync not only gives you some extra security, but it also makes it easy to switch from system to system without having to remember which one has the latest copy of a file. If a user new to Linux starts asking more experienced users about a good way to sync his data, he will soon hear about a wonderful tool named rsync. rsync was developed by Andrew "Tridge" Tridgell, the same man behind Samba; in fact, Tridge has stated that he believes he'll be remembered through posterity for rsync far more than for Samba, and he just may be right. However, rsync, while absolutely excellent, is not necessarily the best software to use for syncing two repositories that you're actively working with. The Problem with rsyncrsync is truly awesome software, and it really is worth your time to check it out. However, I don't use rsync for my day-to-day sync needs. Instead, I use Unison, which uses rsync as its base. Why Unison instead of rsync? Because Unison synchronizes in two directions at one time, while rsync goes only in one direction. Let me give another example, from my own setup. At home I have a desktop machine, but I also use a laptop. I work a lot outside of my home, so my laptop travels with me constantly; when I'm home, though, I sometimes leave my laptop in my backpack and use my desktop instead. I have a large amount of data that I need to have available to me at all times: web pages I've read, instruction manuals, articles I'm working on, photos, and so on. All told, it's about 10 GB worth of stuff. I keep all of this work on my desktop, and I keep a mirror of the same thing on my laptop. Obviously, I need to keep all 10 GB in sync between my two machines. If I delete a file on the desktop, I need to delete it on the laptop. If I change a file on the laptop, it wouldn't do to open the original, unchanged file on the desktop and make changes there. That way lies madness. So why can't I use rsync? Because my goal is to synchronize between two machines, each of which acts as a source and a destination at the same time. Here's one example: on my laptop, I move the file widgets from the directory foo to the directory bar. On the desktop, however, widgets is still in foo. If I run rsync with the laptop as the source and the desktop as the destination, so far so good: widgets is now in bar on both the laptop and the desktop. However, widgets is still in foo on the desktop. The next time I run rsync with the desktop as the source and the laptop as the destination, widgets gets copied over from foo on the desktop back into foo on the laptop, leaving me with widgets in both foo and bar, which produces a real mess. Of course, I could run rsync with the --delete option set, so that files deleted on the source are also deleted on the destination, but that can be a very dangerous practice. What if I delete foo on the laptop but change foo on the desktop? If I run rsync with the laptop as source, then foo is deleted on the desktop, which is not what I want. Instead, I have to remember to run rsync with the desktop as source so that the changed foo is copied over to the laptop. But what if there were files I changed on the laptop and deleted on the desktop? Then I need to run rsync with the laptop as source...and on and on, ad infinitum, back and forth. Unison solves this problem. You define a source and a destination, and then you run Unison. After some length of time, Unison starts asking you questions about the files it has found: copy this file from laptop to desktop? Copy this other file from desktop to laptop? Delete this file on the laptop since it was deleted on the desktop? You can accept Unison's guesses, specify a direction for the copy to go, or even tell Unison to skip the file entirely until another day. With Unison, synchronizing two directories, each on a different machine, becomes a far simpler task. And, as icing on the cake, Unison is cross-platform, as it runs on Linux, Mac OS X (with a few caveats; see "Further Information About Unison" at the end of this hack), Unix, and Windows machines. Synchronize Files on Two Machines with Unison Using SSHInstalling Unison is easy as pie (however, make sure you have the universe repository enabled [Hack #60]). Just run the following command, and you're done: $ sudo apt-get install unison unison-gtk
Now you have both the basic, command-line-only Unison program and the GTK-based GUI. It's time to sync a group of files with another copy of those same files on another machine. For added safety, I'm going to use SSH for the transport, a capability built into Unison. Unfortunately, Unison doesn't create entries in the GNOME Applications menu or the KDE K menu, so you'll need to add those yourself. In the meantime, you can type unison-gtk on the command line. When the program opens, you'll see Figure, which asks you to either select or create a profile. Select or create a Unison profile
Click "Create new profile," and you're asked to give a name for the new profile. Since I'm going to sync up my folder that contains the work I've done on this book, I'm going to cleverly name this profile ubuntu_hacks and click OK. Now both profiles, default and ubuntu_hacks, appear. I double-click on ubuntu_hacks, and a small window opens, shown in Figure, asking me to enter the local directory that I want to sync. Select the local directory you want to sync
Since I don't like to type when I can avoid it, I instead click Browse and navigate to the folder: /home/rsgranne/documents/clientele/aaa_current/OReilly/ubuntu_hacks I click OK, and another window opens, shown in Figure. This one wants to know about the remote folder. Select the remote directory you want to sync
Notice that I can sync to a local folder (which could also be a folder accessed through Samba or NFS that I've mounted on this machine), or to another machine via SSH, RSH, or a socket. In this case, I'm using SSH, so I enter the address of the remote hostin this case, 192.168.0.14and the username, which is rsgranne. Finally, I type in the remote directory I want to sync: /media/data/zzz_rsg/documents/clientele/aaa_current/OReilly/ubuntu_hacks I click OK, and since this is the first time I've synched these two folders, Unison presents me with a warning that it is going to have to create archives for those two folders from scratch. No problem, so I click OK. The main Unison window opens, and Unison goes to work, comparing the two directories (and all their subdirectories as well, since the program automatically recurses). Be patient during this process. Unison may appear to be locked up, but it's just taking its time to digest all the files and folders. The more data, the longer Unison will take. Eventually, Figure appears. You can sync files in a number of directions
Unison's interface is pretty self-explanatory, with the arrows showing the direction in which copying will occur. If I want to switch the direction in which a file is copied, I simply select it and then click the "Right to Left" or "Left to Right" button. If I want Unison to ignore a file, I click Skip. If I want to invoke a diff-type program to actually compare the contents of the two files (this obviously works only with text files and the like) and then selectively merge the differences, I click Diff and then Merge. When I'm sure that everything is the way I want it, I click the Go button to tell Unison to copy files and folders. There are actually lots of other options available through the Actions menu, so it's a good idea to check it out. Tweaking Your ProfilesWhen I first started Unison, I had to create a profile, which keeps track of the directories I want to compare and the transport method, among other configurations. Profiles live in a hidden directory (.unison), which itself is in your home directory, and they're just text files that end with the .prf extension. Before I finish looking at Unison, I want to mention a few other tweaks that you should add manually to those files in order to make your sync processes work a little more smoothly. In order to add these tweaks, you're going to need to open your profiles with your favorite text editor. In my case, I'm going to open and change ubuntu_hacks.prf, located in /home/rsgranne/.unison. To start, I'll add the following to the file (it doesn't matter where you put these preferences, but I like to put them at the top of the file, after my two roots and before any "ignore" lines): times = true If you use Unison for any length of time, you're going to notice that a lot of files need their "props" synchronized. Complaining about "props" is just Unison's way of telling you that the date- and timestamps for your fileswhich indicate when the files were modifiedare not exact. By setting times to TRue, you are copying over not just the contents of the files, but their modification dates and times as well, which will vastly reduce any "props" complaints that you'll see.
In a similar vein, you may want to add the following: owner = true group = true When you add these two settings to your profile, you order Unison to synchronize the owners and groups of files, along with their contents, which can really help to keep things straight as you move files around between machines. If you want to synchronize users and groups by the real numbers that your Ubuntu system uses instead of the names that we humans use (in other words, by 1000 instead of rsgranne; if you don't know what I'm talking about, take a look at your /etc/passwd file and run the command man 5 passwd), then add this preference to your profile as well: numericids = true If you're using Unison to sync only Linux boxes, then you can skip this next preference. But if you're involving Windows machines in your syncsay, from a Windows workstation to a Linux machine, or from a Windows PC to a Linux workstationthen you probably want to use fastcheck (but read the following paragraphs carefully): fastcheck = yes You'll probably want to use fastcheck = yes most of the time, but every once in a while you'll want to comment out that line with a pound sign in front of it (#), run Unison, and then remove the pound sign. Let me explain why. You may already have asked yourself a pretty important question: how the heck does Unison know that a file has changed? For files on a Linux box, Unison looks at their inode numbers and modtimes; if either of those is different, then Unison thinks that the file has changed. Windows boxes, of course, don't have inode numbers, so Unison's default for Windows is to scan the contents of each file to see if anything has changed. This is certainly a safe method, but if you have a lot of files to scan on a Windows machine, it's going to take a long, long time. Failing to set the fastcheck preference then, or setting it to fastcheck = default, results in this behavior, which probably is not what you want to do.
If you set fastcheck to yes, however, then Unison acts differently on your Windows machines: it looks at the file modification times instead of scanning the contents of each file (Linux files still get the same treatment: their inode numbers and modtimes are checked). This results in a much faster scan of your Windows machine, but in very rare circumstances (a file has been changed, but you have somehow managed not to change the create time, the modification time, and the length of the filepretty hard to do), Unison may refuse to make an update of some of the files on the Windows machine. Now, practically speaking, you don't have to worry about this very much. First, you'd have to really jump through hoops to fool Unison, and second, even if Unison thought that it should possibly overwrite such a file, it won't, as the Unison manual explains: "Unison will never overwrite such an update with a change from the other replica, since it always does a safe check for updates just before propagating a change." Since this is the case, I would recommend leaving fastcheck set to yes most of the time, but, every once in a while, comment out that preference with a pound sign, so that the full contents of your Windows files will be checked on the next run. Then, after you've satisfied yourself that things are copacetic, uncomment fastcheck = yes and go back to the faster method. Changing the Location of Your Logfile DirectoryWhen you run Unison, the program generates a logfile, which can be a very useful thing. The default, however, is to create this logfilenamed unison.login your home directory, which I find annoying since I try to keep my home directory neat and tidy. If you don't mind this, then leave the logfile preference alone. But if you, like me, want to change the location in which the logfile goes, then set this preference: logfile = $HOME/.unison/unison.log I like the logfile to go into the hidden .unison directory, but you can change this any way you like. Further Information About UnisonThe main site for all things Unison can be found at http://www.cis.upenn.edu/~bcpierce/unison/. Hopefully, it'll get a cooler URL someday. On the main Unison site, you can read the very complete and very informative Unison documentation (if you really get into Unison, then you should look into running it sans GUI; the documentation will tell you everything you need to know, although the GUI actually provides a good way to learn the basics). If that doesn't help, there is a good FAQ that covers several important questions, especially one that I was wondering myself: "Does Unison work on Mac OS X?" Be sure to read that answer in full before trying to use Unison on that OS! For further support, there are a few listservs you can joinor just search, if you don't want to joinat Yahoo! Groups. Unison-users is just that: a list for users of Unison. It currently has over 600 members, and they produce in the neighborhood of 100 messages a month, so it's not too overwhelming. You can get more information about the list at http://groups.yahoo.com/group/unison-users/. If you're a developer, then you might want to check out Unison-hackers at http://lists.seas.upenn.edu/mailman/listinfo/unison-hackers. If all you want are the announcements when a new version is released, then you should subscribe to the low-volume (one message per month) Unison-announce, which you can find at http://groups.yahoo.com/group/unison-announceor just send a blank email to unison-announce-subscribe@groups.yahoo.com. Finally, Open magazine, a "weekly e-zine for Linux and Open Source computing in the enterprise," has a nice article about Unison available at http://www.open-mag.com/features/Vol_53/synch/synch.htm. It focuses on getting Unison to work with Windows as well as Linux, and includes some technical details about fail-safe provisions that Unison provides. Scott Granneman |
- Comment
