Rclone continued - III (two years later)May 27, 2023 | Stack
Now, here two years after is set up the backup server. I need to do it all again, the other Pi was erased and used as part of a cluster and I kind of lost data since and was confirmed in the reason to have backup of your data.
This is basically a small tutorial of putting everything in to practis.
Documenting your backup
When stuff breaks, data is erased etc. it is essential to have documents describing everything from where and what is backed up, to schedules of backups and recovery tests. The purpose of having a nice description of these things is ofcause to get an overview and manual to follow, when stuff goes south. If you run a bussiness it can also be a compliance requirement and customer would like to se that you have certain processes in place to recover data if lightning strikes the server.
How would you do that?
- Use docs, spreadsheets, wiki's, git or what ever you like. I usually have a local doc that are synced to an external storage and maybe a Wiki in particular if other people needs access to this info. This gives a redundancy in information, but some info you would like to have under all circumstances.
- Secondly you would like to know how to debug, backup and recover. This is everything from where to find passwords for decrypting, what scripts are used and how. The process of recovery and backup, where are the scripts found.
- Also describe some facts:
- Precise location/s of files that are backed up
- Precise location of the backup on the backup storage
- How often is there taken backup
- The retension
- The type of backup
- Recovery test schedule and result
There is a tendancy that we don't do this step, but when things go wrong this document will usually be the first we take a look at and it should provide us with a easy step by step guide of what to do. Don't underestimate it.
Crontab and scripts
Beside the scripts mentioned in "Part II" of this series. I'll try to make some scripts to put these into practis.
One of the big hurdles I have had, in particular with the sync script. Is that Rclone in my setup, is a central server that reaches out from behind a firewall (I want this server to be very difficult to penetrate, it holds a lot of vital credidentials). Rclone could be setup in a more decentralized fasion, where rclone is installed on the server that we need to backup and give it a single access point to a remote backup storage. But, I have choosen the first case, so.
Problem: the user we setup on the external server we want to access should not be root, but something like "pi-backup" with limited privileges. So, when we try to copy/backup a file owned by ex. root with read access to only owner ex. 700 we can't copy that file with the "pi-backup" user.
There are two work around to the above situation (that is pretty common in UNIX like systems).
- Read thru the logs of the rclone script, and find out what files are in accessible. Often when files are set with ex. 740 they are actually readable by "users" that are part of the group, just include "pi-backup" user in this group. Some files may also be changed (be carefull here, keys etc. should not be changed), so there may be files that you don't consider nessesary to keep in only owner read mode. So, you may change these to read for group also (600 -> 640).
- The second solution is not suited for the sync script, where we only backup changed files to get more fine grained control and limit the size of the backup. We use the system to carry out a cron job where we make a tarball on the server (one file that we compress), this file is then accessible for the "pi-backup" user, when it ssh into the server. But, we take a full backup each time, so we have to take retension into consideration on our backup drive with this solution.
You get some kind of error running the sync script and look in the "all_backup_logs.log":
and see something like:
2023/05/22 20:38:36 ERROR : portainer-portainer-pvc-8c636919-e813-4dc0-8336-c7e02920d929/bin: error reading source directory: error listing "portainer-portainer-pvc-8c636919-e813-4dc0-8336-c7e02920d929/bin": permission denied
Then you look further into the issue, you ssh into the server and list the directory with the "ls -la" flag or in some cases just "ll". In this case "portainer-portainer-pvc-8c636919-e813-4dc0-8336-c7e02920d929/" and figure out that the bin directory is set to:
drwx------ 2 root root 4096 May 17 14:44 bin/
you change the directory rights to:
sudo chmod 740 bin
and add the "pi-backup" to the root group:
sudo usermod -aG root pi-backup
Now, this error from the backup log should be gone.
First we have to go to our server we want to get a backup from, it could be a web server. Here we test out if we can create a tarball.
sudo tar -czvf /mnt/nfs/backup.tar.gz --mode=640 --exclude=/mnt/nfs/backup.tar.gz -C /mnt/nfs .
In the above example I put the tarball in the same mount as i backup and exclude the file. Be aware that you have room for the backup file. In the above case I am backing up a 2TB nfs device and the root system only operate on a 32GB SD card, so I would run into trouble if I put the backup file anywhere else.
and add the following line at the bottom:
0 17 * * * sudo tar -czvf /mnt/nfs/backup.tar.gz --exclude=/mnt/nfs/backup.tar.gz -C /mnt/nfs . && sudo chmod 640 /mnt/nfs/backup.tar.gz
The above permission did'nt seems to take effect on creating the first "backup-tar.gz" file. Solution: Manually changing the permission on the existing backup.tar.gz fil, will make it persist on new tarballs created with the same name.
sudo chmod 640 /mnt/nfs/backup.tar.gz
It is also important to make sure that your "pi-backup" user has access to read the backup.tar.gz file, which means that you have to include that user in the "root" group:
sudo usermod -aG root pi-backup
So, now we want to get our compressed backup file
Let's try to fetch this file from our backup server. Login to your backup server, here we make a script that will run this backup, looks something like this:
# Our 8 sync parameters is set
source="pi-nfs:/mnt/nfs/backup.tar.gz" # What ever path to source
dest="myremote-s3storage-encrytedPi-nfs:" # Path to destination
date_for_backup="01,08,15,22" #two digit number ex. 01 for the first in each month to run the script
del_after="90" # Will will delete everything older than x days in bckp
keep_mnt="01,07" # Keep these backup months in the old_dir ex. 01 or 01,04,07,10(comma seperated)
del_all_after="365" # Will delete everything older than x days in old_dir
job_name="$(basename $0)" # This is the this files name, but you can change it to what ever you decide
options="--dry-run" # remove --dry-run hook when you have tested the file
email="" # your email
/full/path/to/backup.sh "$source" "$dest" "$date_for_backup" "$del_after" "$keep_mnt" "$del_all_after" "$job_name" "$options" "$email"
This script should be executable and be run on a scheduled basis. Remember we set the server to generate the tarball at 5 pm, so setting this script to run at 9 pm would probably make sense (time may not always be in the same timezone).
This will setup a cronjob for the current user. Somehow /etc/profile is not automatically sourced by the cron session. So, we will have to source the decryption password from the /etc/profile first (. /etc/profile):
0 21 * * * . /etc/profile && /home/rune79/backup/config/backup-pi-nfs.sh
Often times this is a step that can be cumbersome, that is also why it is at least a good thing to establish if this first backup seems right.
If this were a company setup, you would have to schedule a disaster/recovery test maybe every quarter or so. Often with at least some sanity check, to see if all data and functionality can be recovered.
But, we could do a more lightweight test to see if everything seems like it should.
Copy the backup.tar.gz back from the backup storage:
rclone copy Myremote-s3storage-pi-encrypt:month_backup/2023-05-28_111135/backup.tar.gz pi-nfs:
It should now reside in the "pi-backup" users home directory. Untar it here.
sudo tar -xzf backup.tar.gz -C /home/pi-backup/test (choose some other directory than the original, we don't want to overwrite these files)
First we need to iterate thru some of our files and directories in both the original and test directory, to get confirmation that privileges, owners and rights are preserved.
Use "ll" or "ls -la" and make sure that everything looks like it should.
You can also use the diff or sum to checksum or see differences in directories and files, do the have the same content and size.
This will give you a clear idea of the data's consistency and accuracy in case of recovery. You should also make sure that the size is similar (no missing files or directories).