Scheduling EBS Snapshots

Take Backups of EC2 Instances Automatically

Posted by Ryan S. Brown on Sat, Oct 31, 2015
In Mini-Project
Tags: python, ebs, ec2, backup, scheduling, cron

Update: Part two of this series shows how to expire old snapshots.

Backups aren’t usually the first thing that comes to mind when you hear about a new service. After the Lambda blog announced task scheduling, I had a few prime tasks in mind. Backups, health checks, and periodic cleanup tasks are all great candidates.

To demonstrate, we’ll build a Lambda function that will take an EBS snapshot of every instance that have a tag named “Backup” or “backup”. This will save a bit of money compared to backing up every EC2 instance in your account, and in the future you could add a value like “daily” or “weekly” to control how often snapshots are taken.

Set Up IAM Permissions

Before we write the code, we need to make an IAM role called ebs-backup-worker. You do this with either the Management Console or the aws cli, but the policy is the same. In the console, create a new “service role” for AWS Lambda.

Using the AWS CLI

If you’re using the CLI, you have to use a trust document when creating the IAM role. The trust document allows code running in AWS Lambda to authenticate and use the policies associated with a role.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Action": "sts:AssumeRole",
      "Principal": {
        "Service": "lambda.amazonaws.com"
      },
      "Effect": "Allow",
      "Sid": ""
    }
  ]
}

Copy the trust, or download it as snapshot-trust.json. Then use the iam create-role command to associate the trust with the ebs-backup-worker role.

aws iam create-role --role-name ebs-backup-worker \
    --assume-role-policy-document file://snapshot-trust.json

Building an IAM Policy

The policy needs to allow the Lambda function to:

  1. Write CloudWatch logs, so you can debug the function.
  2. Read EC2 information about instances, tags, and snapshots.
  3. Take new snapshots using the EC2:CreateSnapshot call.

In policy form, we express that (in order) as:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": ["logs:*"],
      "Resource": "arn:aws:logs:*:*:*"
    },
    {
      "Effect": "Allow",
      "Action": "ec2:Describe*",
      "Resource": "*"
    },
    {
      "Effect": "Allow",
      "Action": [
        "ec2:CreateSnapshot",
        "ec2:ModifySnapshotAttribute",
        "ec2:ResetSnapshotAttribute"
      ],
      "Resource": ["*"]
    }
  ]
}

In the console, copy the policy above into an inline policy on the role. For how to do that, see Inline and Managed policies. If you want to use the command line, copy the policy, or download it as snapshot-policy.json.

Once you have snapshot-policy.json, attach the it to the role you’ve created.

aws iam put-role-policy --role-name ebs-backup-worker \
    --policy-name TakeSnapshots \
    --policy-document file://snapshot-policy.json

Great! Now that we’ve got the authorization policy for our function, we can build the script that will grab the volumes for all our instances.

Create the Lambda Function

Finally, it’s time to write some code. In the AWS Lambda management console, create a new function using the ebs-backup-worker role from the last section. I’m doing the code examples here in Python since I love the boto3 library, it makes working with the AWS APIs a joy. Pick the Python 2.7 runtime when prompted.

Before we can take EBS snapshots, we have to find the instances we’re backing up. To enable backups, we’ll use Resource Tags to enable snapshots. Thanks to boto3, finding instances with the “backup” or “Backup” is easy.

ec = boto3.client('ec2')
reservations = ec.describe_instances(
        Filters=[
            {'Name': 'tag-key', 'Values': ['backup', 'Backup']},
        ]
    )['Reservations']

Then we have to flatten the list of instances in every reservation, since a single reservation can contain huge blocks of instances. The shortest way is to use sum and a list comprehension.

instances = sum(
    [
        [i for i in r['Instances']]
        for r in reservations
    ], [])

Finally, we iterate over all the volumes and snapshot them.

for instance in instances:
    for dev in instance['BlockDeviceMappings']:
        if dev.get('Ebs', None) is None:
            # skip non-EBS volumes
            continue
        vol_id = dev['Ebs']['VolumeId']
        print "Found EBS volume %s on instance %s" % (
            vol_id, instance['InstanceId'])

        ec.create_snapshot(
            VolumeId=vol_id,
        )

Copy the code from schedule-ebs-snapshot-backups.py into the code editor in the management console and save it.

Run It!

Once you’ve saved your function, you’re ready to test it out. Make sure that at least one of your instances has a tag named “Backup” with any value you like. Now if you use the “Test” button in the Lambda function console you’ll be able to see the output.

START RequestId: f7e285bb-801e-11e5-a861-5dc9c75ff3b3 Version: $LATEST
Found EBS volume vol-2f8620e7 on instance i-1b04b1e2
... (snipped out more instances) ...
END RequestId: f7e285bb-801e-11e5-a861-5dc9c75ff3b3

Now that the function works, it’s time to put it on a schedule.

Head over to the “Event Sources” tab and add a new scheduled event.

You can schedule your function to run every day (or hour, if you just want to see the scheduling work).

There are default intervals for every 5 minutes, every 15 minutes, every hour, or every day. Pick an interval that makes sense for your application. If you want, you can also specify more specific intervals using cron syntax.

Recap

In short, we’ve taken a quick tour of everything you need to improve your backups on EC2.

  1. With IAM permissions, scope down the things your code can do to only what it needs most.
  2. Query EC2 to get all the instances you’ve tagged to be backed up.
  3. Schedule the code to run daily.

In part two of this series we’ll learn how to prune old snapshots to save on storage costs.

Update: kashiwagi commented on the Reddit thread that snapshots only reflect data written to the EBS volume at the time of the snapshot, which is totally correct (see the EBS documentation) and can mean that backups are inconsistent with the actual instance state. This is especially true for databases and high-write-load applications.

Thanks for reading! Keep up with posts via RSS. As always, you can send me an email at ryan@serverlesscode.com if you have an idea, question, comment, or want to say hi.


Tweet this, send to Hackernews, or post on Reddit