Scheduling EBS Snapshots Part II

Delete Old Backups Automatically With Lambda

Posted by Ryan S. Brown on Thu, Nov 5, 2015
In Mini-Project
Tags: python, ebs, ec2, backup, scheduling, cron

In part one of this series on scheduling EBS snapshots, we learned how to use Lambda’s task scheduler to back up EC2 instances on a daily basis. If you haven’t yet, read it here.

By the end of this post, you’ll have daily backups of your EBS volumes retained, and pruned after a number of days you choose. We’ll build two Lambda functions; one to snapshot instances, and one to delete expired EBS snapshots.

The function that takes snapshots will find instances that have a tag “Backup” and set the snapshots to be saved for 7 days. We’ll read a “Retention” tag so we can tune backups for cost or regulatory reasons, and then we’ll schedule a function to delete snapshots after their time is up.

New IAM Permissions

Before doing anything else, we need to add more permissions to the ebs-backup-worker IAM role from Part 1 of the series. The role will need permissions to add tags and to delete EBS snapshots.

Go to the AWS Console and open the ebs-backup-worker policy in the policy editor. We’ll need to add permission for the ec2:DeleteSnapshot and ec2:CreateTags actions. Add them to the “Action” list of one of the existing policy lines.

When you’re done, the full policy should look like this:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": ["logs:*"],
            "Resource": "arn:aws:logs:*:*:*"
        },
        {
            "Effect": "Allow",
            "Action": "ec2:Describe*",
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "ec2:CreateSnapshot",
                "ec2:DeleteSnapshot",
                "ec2:CreateTags",
                "ec2:ModifySnapshotAttribute",
                "ec2:ResetSnapshotAttribute"
            ],
            "Resource": ["*"]
        }
    ]
}

Great, now we’re ready to update the schedule-ebs-snapshot-backups.py code from last post.

New Snapshotting Code

The existing code available here finds all instances with a tag named “Backup” and takes a snapshot. We need to add a way for it to save the retention time for the snapshot.

The number of days a snapshot will be kept for is going to be saved as the value of a “Retention” tag, and we’ll make sure there’s a default.

import collections
import datetime
to_tag = collections.defaultdict(list)

# instances is a list of `instance` info from ec2.describe_instances()
for instance in instances:
    try:
        retention_days = [
            int(t.get('Value')) for t in instance['Tags']
            if t['Key'] == 'Retention'][0]
    except IndexError:
        retention_days = 7

This code tries to read a “Retention” tag if it exists, and if not defaults to one week.

After taking the snapshot we add a “DeleteOn” tag to it that contains the day the snapshot should be deleted. The date is formatted as YYYY-MM-DD (2015-11-05). To make the function run faster, I’m saving all the snapshots that will be deleted on a certain day as a list and tagging them all at once.

Remember, execution time is money: 2.08 microdollars per second for the 128MB memory environment.

# take the snapshot, and save it in a list with others in the same retention
# time category
snap = ec.create_snapshot(VolumeId=vol_id)
to_tag[retention_days].append(snap['SnapshotId'])

With these lists, it’s time to save the tags so the snapshots are deleted on time. This is why we needed the ec2:CreateTags permission added to the IAM role.

for retention_days in to_tag.keys():
    # get the date X days in the future
    delete_date = datetime.date.today() + datetime.timedelta(days=retention_days)
    # format the date as YYYY-MM-DD
    delete_fmt = delete_date.strftime('%Y-%m-%d')
    ec.create_tags(
        Resources=to_tag[retention_days],
        Tags=[
            {'Key': 'DeleteOn', 'Value': delete_fmt},
        ]
    )

In just one call we can tag all the instances to be deleted, saving a few HTTP round trips. Get the new version of schedule-ebs-snapshot-backups.py and copy it into the Lambda console to replace the version from last time. You can the old and new versions if you want to see the exact differences.

Delete Snapshots

Now that we’re taking snapshots with an expiration date, we need a Lambda function that enforces that rule. This is one of the places where Lambda functions shine; adding more complex logic on top of existing services.

To query only the snapshots for your account (rather than all the public snapshots) you’ll need to get your AWS account number. It’s a 12-digit number that you can find on your bill, or using this code.

import boto3
iam = boto3.client('iam')
print iam.get_user()['User']['Arn'].split(':')[4]

If you have multiple AWS accounts for different business units or prod/stage environments, put the full list in the account_ids list in the function.

Before expiring snapshots, we need to figure out what day it is so we can filter for all the snapshots expiring today. Snapshots created by our EBS snapshot worker have a “DeleteOn” tag containing the YYYY-MM-DD formatted expiration date.

delete_on = datetime.date.today().strftime('%Y-%m-%d')
filters = [
    {'Name': 'tag-key', 'Values': ['DeleteOn']},
    {'Name': 'tag-value', 'Values': [delete_on]},
]

ec = boto3.client('ec2')
account_ids = ['12345']
snapshot_response = ec.describe_snapshots(OwnerIds=account_ids, Filters=filters)

Now that we have the snapshots, deleting them is a simple loop over all the IDs.

for snap in snapshot_response['Snapshots']:
    print "Deleting snapshot %s" % snap['SnapshotId']
    ec.delete_snapshot(SnapshotId=snap['SnapshotId'])

Make a new Lambda function in the AWS Lambda management console, choosing the ebs-backup-worker role that is used for the other function. Be sure to choose the Python 2.7 runtime, and call your function “ebs-snapshot-expiration-worker.”

In the console, copy the ebs-snapshot-janitor.py code and save the function. You have a full backup lifecycle now, all that’s left is to set it and forget it.

Schedule Everything

In the AWS Lambda management console, go to the Event Sources tab for both functions and add a one-day interval.

Boom, done! Now you can rest easy and know that you can restore yesterday’s version of an instance without fuss.

Recap

In this post we added to the IAM policy for our backup functions, added expiration dates to backups that are configurable for every instance, and wrote code to delete old snapshots. We also added a schedule so backups will run daily, no matter what.

Thanks for reading! Keep up with future posts via RSS.

As always, if you have an idea, question, comment, or want to say hi hit me on twitter @ryan_sb or email me at ryan@serverlesscode.com


Tweet this, send to Hackernews, or post on Reddit