Following is the design for the backupmanager.
The first version will focus solely on rsyncing data. The roadmap includes adding the ability to generate incremental tarballs and manage aging them out over time.
It will run as a systemd managed service installed directly on the host on which you want to backup data from to N number of remote hosts. It will be a Python 3.7 program that will run scheduled jobs via APScheduler cron jobs, without any dependency on cron jobs run on the host.
It will feature the ability to chain together multiple instances of it across multiple hosts to manage the synchronization of various asynchronous rsync operations.
For example; you have a desktop (local_desktop) on which you have your music, photos, email, and other files that you want backed up. Further, you have a backup server (backup_a) at the same location as the aforementioned desktop to which you rsync data from a number of different directories on a daily basis. You have another backup server at a remote location (backup_b) to which you rsync data across the Internet from backup_a every four days.
In this scenario there are two instances of the backupmanager running. The first on the local_desktop and the second on backup_a. The instances can be configured to block on another instance of the backupmanager so as not to overwrite data that is already being rsynced.
Continuing with this example; when backupmanager instance-2 is running (rsyncing data to backup_b), it will write a lock file in a known location. backupmanager instance-1 will be configured to look for the aforementioned lock file when it runs its regularly scheduled rsync job. If backupmanager instance-1 finds the lock file when it starts its rsync job it will wait a configurable amount of time before re-checking for the existence of the lock file. It will continue checking until it no longer finds the lock file and will then proceed rsyncing its data from local_desktop to backup_a.
# Schedule on which we want these jobs to run. Uses the same # syntax as a standard Linux cron job. cron_schedule: '* * * * *' # The rsync implementation to use. Currently, this only # runs on a (modern) Linux distro. If anyone needs it, I can # include a pure Python implementation of rsync. # Default: linux_native # rsync_impl: linux_native # Directory into which we will write a pid file. This will # prevent two instances of the application running at the same # time. Default is /var/run/backupmanager pid_file_dir: /var/run/backupmanager # List of rsync jobs. Each job will contain a list of source # and destination dirs/files to be rsynced. jobs: - id: local_desktop_to_backup_a # The user that we will use to make the rsync connection to # the host user: root host: backup_a.example.com # Optional overriding SSH port port: 22000 # Definitions to any arbitrary number of lock files that # this process will create and manage on either the # localhost or any remote. The lockfiles will indicate that # this process is running the rsync job defined lock_files: - type: local path: /var/run/backupmanager/local_desktop_to_backup_a - type: remote host: backup_a.example.com user: root # Optional overriding SSH port port: 22000 path: /var/run/backupmanager/local_desktop_to_backup_a # Optional configuration that tells the backup manager that # it will block on a lockfile on the remote host. blocks_on: - type: remote # The user name that we will use to connect to the # remote host user: root host: backup_a.example.com # Optional overriding SSH port port: 22000 lock_file_path: /var/run/some-lock-file # Amount of time in seconds to wait to retry wait_time: 300 # There can also be any number of other lock files on # the localhost on which we will also block - type: local lock_file_path: /lockfile/on/localhost wait_time: 300 # List of source:dest dirs/files to rsync syncs: - source: /source/path dest: /path/on/remote/host opts: - "-av" - "--delete" - "--exclude '.cache'" - "--exclude 'Downloads'"