Back The Site Up

B

So now that I have committed not only to skill development, but also writing projects, I am going to need a way to back up my various projects. Right now, I am running everything on my own server because I have no justification to buy a lot of high end VM’s and manage them on either Azure, AWS or any one of a number of cloud-based server providers.

Given that I like to do my own development and am pretty opinionated about how I like things, I have my own server set up with my various projects on it that include this web site as well as a couple of others and the development work I am doing for Probo.CI.

Because of this, I need to do my own backups. I’ve been working on a backup script that is designed to be narrow in scope, but extendable as I expand my server. It will do all the backing up for this web site as well as Bagnall.IO and Powertools.sh.

My base script exports MySQL databases using mysqldump and archives file directories which are then uploaded to an Amazon S3 bucket. It lives in a Docker container but has access to files and databases via networking and mounted volumes.

The entire backup process works as follows:

  1. Export databases and compress the exported backup files with a filename that indicates the date of the backup.
  2. Compress asset folders for the relevant web sites. This includes any asset folders for content management systems (WordPress, Drupal, etc).
  3. Remove backups older than 14 days.
  4. Upload the new backups.

I utilize environment variables to store items such as usernames, passwords, host names and other secrets that are in use by the script. So when you see variable names, those are associated with a calculated value (date) or an environment variable (credentials).

Backing Up Databases
Using PHP, we pass through the mysqldump command with the applicable parameters for our username and password as well as our database name. There are other variables in here to date stamp our file for storage purposes.

The –single-transaction and –quick flags in the command remediate issues with large databases and table locking. While a database is being dumped, you cannot write data to it. As such this can cause web sites to be down while the database backup is being generated. In the case of large web sites, this can cause an outage of anywhere from several seconds to several minutes. If you do not have the option of generating a backup from a minion database server, using these flags can help to mitigate possible outages during export.

$mysql_query = "mysqldump --user='$user' --password='$pass' --single-transaction --quick -h$host $database > /app/backups/$database-$data_prefix.$today.sql";
$mysql_backup = exec($mysql_query);

Compressing Asset Folders
When working with sites that use assets such as content management systems (WordPress/Drupal) you will want to include the asset folder as part of the backup process. This is baked into my image.

$files = `cd $files_folder_parent; tar -czf /app/backups/$files_prefix.$today.tar.gz $files_folder_name`;

Interacting with Amazon S3
Once your database has been exported and assets compressed, the next part is to put them on your backup location. In my case, I use Amazon S3. In my PHP Script, I leverage the Amazon AWS library to do all of my file functions. I remove any backups older than 14 days and then upload the new one.

To do this, we have to define the paths and prefixes we’re going to upload. This is to maintain a synchronization between the local file system and S3. In this way, we can direct our script to upload every file with .sql or .tar.gz as a file extension. In this script, we define our paths as an associative array keyed off the name of the AWS bucket name. The path element is the path on our local file system while the glob is the filename pattern to work with.

$aws_bucket = 'our_aws_bucket'; // normally assigned from an environment variable
$paths = [
  $aws_bucket => [
    [
      'path' => '/app/backups',
      'glob' => '*.sql.gz',

    ],
    [
      'path' => '/app/backups',
      'glob' => '*.tar.gz',
    ],
  ],
];

Now that we have this, we can work with the clearing out of old files and uploading of new ones.

/**
 * delete_files()
 * 
 * @param S3Client $s3
 *  The S3 client object.
 * @param array $paths
 *  An array of the paths for this bucket to be evaluted and processed.
 * @param string $aws_bucket_subfolder
 *  The subfolder within the bucket to check for deletions, Without this,
 *  the script will delete every matching file in the bucket. Could be bad.
 */
function delete_files($s3, $paths, $aws_bucket_subfolder = NULL) {
  // Get a list of all files in the bucket. Note that this is
  // not related to subfolder. Because these are objects, the
  // "subfolder" paradigm is more of a construct. The subfolder
  // name is just part of the filename. We will fillter down
  // the subfolders after we have the full list/
  foreach ($paths as $bucket => $info) {
    try {
      $results = $s3->getPaginator('ListObjects', [
        'Bucket' => $bucket
      ]);
    } catch (S3Exception $e) {
      echo $e->getMessage();
    }
    if (!empty($results)) {
      foreach ($results as $result) {
        foreach ($result['Contents'] as $key => $content) {
          // We can search for a "subfolder" in a bucket to further segment our
          // file removals. This way we're not removing files outside the
          // scope of this backup.
          if (!empty($aws_bucket_subfolder)) {
            if (strpos($content['Key'], $aws_bucket_subfolder) !== FALSE) {
              $delete = TRUE;
            }
            else {
              $delete = FALSE;
            }
          }
          else {
            $delete = TRUE;
          }
          $last_modified = strtotime($content['LastModified']->__toString());
          // Entries expire and are deleted after the configured days.
          $expires = time() - (60 * 60 * 24) * 14;
          if ($delete == TRUE && $last_modified < $expires) {
            $delete = $s3->DeleteObject([
              'Bucket' => $bucket,
              'Key' => $content['Key'],
            ]);
          }
        }
      }
    }
  }
}

Finally, we have our upload step. Because we can have really big files, we’re going to stick to using the Multipart Upload class in AWS. We will only upload files that do not already exist though. Collision is detected through filename, not contents. So this is where your calculated filenames come into play.

function upload_files($s3, $paths, $aws_bucket_subfolder = NULL) {
  foreach ($paths as $bucket => $buckets) {
    foreach ($buckets as $key => $info) {
      // Go into our path and look for files that fit our patterns
      chdir($info['path']);
      $filenames = glob($info['glob']);
      if (!empty($filenames)) {
        foreach (glob($info['glob']) as $filename) {
          // Check to see if the file already exists and skip if it does.
          $response = $s3->doesObjectExist($bucket, $aws_bucket_subfolder . $filename);
          if ($response != '1') {
            $filepath = $info['path'] .'/' . $filename;
            if (!file_exists($filepath)) {
              continue;
            }
            // Upload the file using the multipart upload class.
            $source = fopen($info['path'] .'/' . $filename, 'rb');
            $uploader = new ObjectUploader(
              $s3,
              $bucket,
              $aws_bucket_subfolder . $filename,
              $source
            );
            do {
              try {
                $result = $uploader->upload();
              } catch (MultipartUploadException $e) {
                rewind($source);
                $uploader = new MultipartUploader($s3, $source, [
                  'state' => $e->getState(),
                ]);
              }
            } while (!isset($result));
          }
        }
      }
    }
  }
}

You can view the entire process including the Dockerfile used to construct the container by going to the Bitbucket page for this project as I worked on it for my employer and clients. It is open source, so feel free to look around.

https://bitbucket.org/itcon_drupal/itcon_backup/src/master/

About the author

Michael Bagnall

I am a web solutions architect working to develop PHP, JavaScript and Docker-based solutions in the public and private sector. Currently working out of the Middle Tennessee area, I am also an avid Ice Hockey fan and also play in an adult league. In addition to CodeRefactored.com, I also blog on hockey related items at BagOfPucks.com.

By Michael Bagnall

Recent Posts

PROJECTS

Tag Cloud