Backit Down
the file synchronizer

This chapter is not necessary to read. But you are earnestly encouraged to do, so as to avoid data lost and corruption.

A Bit of Theory

The idea of the files' backup-copying and synchronization is not new. A number of tools is famous for doing the job. Some of them are capable of sustaining file versioning system for team collaboration, with the possibility to recoil - such as SVN and CVS. Another ones are targeted to personal everyday usage like Windows briefcase. But if the former tools are too heavy and extensively flexible for the tasks which are maintained by Backit Down, the latter ones are too much simplified.

Backit Down keeps a well defined niche in the row of the file synchronization utilities. Its' main task is to transfer from one computer to another (or to third one if you wish) only those files which have been modified. The files can be transferred countrarywise so as to keep only most fresh versions wherever you use them.

The so called media (i.e. external storage devices) are meant to be the links between computers. You can use USB drives, external HDD - and even CD-RW or floppy disks (if you wish some extreme) - as the Backit media.

The synchronization process looks like this: supposing you wish to transfer newest files between computers A and B. That means you need to copy fresh files from computer A to the medium and then - from the medium to computer B:

А -> USB-drive -> B

The second stage of the process mirrors the first one. So every action which Backit can perform concerns computer on one side and medium on another.

Possible actions are the following:

Copying from computer to medium
Files which are not present on the medium or have an older timestamp are copied from computer to the medium
Same as above with deletion synchronizing on the medium
All the same, plus those files which exist on the medium but not on the computer are being deleted from the medium.
Moving from computer to medium
The old files (or absent ones) on the medium are replaced by newer ones, then all files conforming the file selection criteria (see below) are removed from the computer.
Same as above with deletion synchronizing on the medium
All the actions described in previous option, but before those missing on the computer files are removed from the medium.
The four actions described above in the other direction
All the actions from the list above are available in reverse direction: when source is the medium and destination - computer.
Mutual copying
Files missing on the medium (or older ones) are being copied from computer and vice versa: those which do not exist (or outdated) on the computer copied from the medium.
Mutual copying with sync-deletion on the medium
The same actions as above plus removing on the medium files which are missing on the computer.
Mutual copying with sync-deletion on the mediumе
The same actions but sync-deletion is performed on the computer side.

How the files are compared

The total of criteria of the decision to copy (move) files is two. The first one - file exists on the synchronization source but is abscent on the destination. The second - source file is newer then the target one. File sizes are not compared.

The file age is defined by its' last modification time. Different operating systems (OS) and filesystems (FS) contain file timestamps in different structures. But despite the variations file time still makes sence everywhere.

Some representation differnces which have been run into while writing Backit Down are described below. But that's belletristics.

Synchronization projects

It's enough for you to click "Run synchronization" button to do the job. But that makes sence only when the synchronization project has been created and set up. Don't be afraid, everything is not too complicated!

The project combine one or several folders which are supposed to be synchronized in one go - when the button is clicked. The main purpose of the project - your convenience. You may give the projects whatever names you wish. Any project has other attributes besides the name. Those are the maximum file size to be considered and maximum file size which is still allowed to be compressed while copying to the medium.

The project includes synchronization folders any of which is projected to different real directory on different computers. The folder can be named as you wish but contain only the characters allowed for the file names. Actually all files, subject to a synchronization project, are stored on a medium within a single folder. Every direct subdirectory of this "project folder" matches the "virtual folder" of the project and bear its' name.

A set of rules is assigned to every project folder. Checking the rules Backit Down makes decision should a certain file be copied (moved, deleted) or it must be ignored. The reason for approach like this is, well, trash which may be piled up on the filesystem: backup copies and other files which are irrelevant to the project or have no value.

General picture of the project key elements is this:

File selection rules

The rules by which Backit Down decides should it be bothered with one file or another are connected to a virtual folder of a project. A rule is a filemask with an option - wheather it allows or rejects conforming files.

The order in which rules are enlisted matters. Every file while the synchronization is being checked against the rules from the top of the list downwards. The first matching rule decides what to do with the file - leave it along or synchronize.

The file which does not meet any rule will not be synced.

Please, consider the following:

Besides, optimization of filesystem analisys may require some quite different rules - those which deny digging into a certain directory. The matter is, the folders' analisys is a resource-consuming procedure (though not as much as actual copying). If there is a folder containing tens of thousands files which you do not need syncing, you can boldly block this folder. Thus you'll save some couple of minutes.

The rule forbidding analisys of a directory must not contain wildcards (* or ?), must be starting from the root of the virtual folder, begin and terminate on the path separator. Position of such rules in the list does not matter: if there is a "folder blocking" rule it will inevitably be applied. That means no rules allowing some files within such folder will be considered - even if they are higher in stack!

Some samples:

1.
   *.*    allow
   *.pas  deny       <- this rule won't work since it's lower in stack
2.
   Cypher.exe allow
   *.exe      deny   <- all EXE-files will be ignored except Cypher.exe
3.
   RX/*.dpk allow
   *.dpk    deny     <- all DPK files will be ignored except those in RX and its' subdirectories
4.
   VCL/*    deny     <- all files in the VCL will be studied but ignored
5.
   VCL/     deny     <- better option: now the contents of the directory won't be studied at all
6.	 
   VCL/*.pas allow
   VCL/*     deny    <- "VCL/*" is justifull here but not "VCL/" because you need some files in its' subfolders
7.
   *.dll     allow
   *.*       deny
   VCL/RX/   deny    <- ignore all files except DLL, do not even analyse the "VCL/RX/" 
		

You can use the folder rules testbed to be sure of your rules set. The tool has got filtering and sorting possibilities at your service.

Where the projects are stored

The information of your synchronization projects is stored in an SQLite database. The database is separate for every user of the computer. It's located in \Documents and Settings\User\Application Data\backit (Windows) or in ~/.backit (Linux).

That means the projects' information exists on any computer where Backit Down is installed. Databases may differ and there must be a mechanizm to make those differencies up-to-date on computers. Export and import of projects' metainfo are meant to do this.

The files which supposed to be synchronized are contained on the media, in their project-folders.

The picture makes clear that the Backit media are connected to the projects directly, but computers - to the project folders. To tell computers (and media) from each other a identification mechanizm is needed. Computer domain name serves the computer's identity. That means if you change computer's name you'll have to reconnect its' project folders back for this computer.

Serial number of the storage devices would make a good identifier. But Linux requires root's credentials to get device's serial. That is not acceptable. So Backit Down assignes a UID to any medium it knows. The UID is contained in /backit.txt file on the media.

Timestamp differences

The FILETIME structure on Windows allows to store timestamp with the accuracy of 0.1 microsecond. Linux uses the size_t structure. According to LIBC size_t intepretation depends on realisation. For Linux it contains an unsigned integer - the seconds count from a certain starting point.

So a simple timestamp comparison on files which have been worked over on different OS'es will give errors.

Another observation. I've got an experience that files copied on a USB-flash card (formatted to FAT32), gain an error of, say, two seconds modification time. No matter how the files have been copied - by the explorer, shell or call to API functions.

Revision control systems deal with these inaccuracies giving a revision number to each file version. Such approach won't do for Backit Down - it would badly complicate all the business.

A decision has been made for Backit Down to ignore filetime differences which does not surpass a certain value. For instance three seconds. One can hardly imagine a situation in which this interval makes a real error. Indeed, he must perform a synchronization, quickly modify a file and synchronize back - and do everything within these same three seconds!