Interview: Lambda in Production with Andrew Templeton -- Serverless Code

This week we have a real treat – Andrew Templeton from Tuple Labs is here to unpack the story behind this tweet:

There’s tons of great stuff in this case study – cost savings, faster time to market, single-command dev/test environments, conntinous deploy pipelines, and lower ongoing maintenance costs. This is huge for the customer that was spending $1,150 per month and is now under $80/month for an app that’s now easier to deploy and manage. The app is written in Node.js, and the whole stack is deployed with CloudFormation.

That should be enough intro, if you have follow-on questions, tweet me @ryan_sb or Andrew @ayetempleton and I’ll add updates here.

The Interview

Our client had a mobile application that needed a REST API backend. There were about 10 kinds of business objects that we needed to model in the API. The system needed to be highly secure, scalable, and easy to manage with almost no ongoing effort, as Tuple Labs, the company I work with, is a small firm and we want to keep our cost of maintenance down.

Other than Lambda, on the backend, we used API Gateway to allow serverless conversion of incoming HTTP requests into Lambda calls. We used DynamoDB as the persistence store for application data, along with S3 and CloudFront for delivering static content. For email and push notifications, we use SES and SNS sent from Lambdas subscribed to DynamoDB Streams, which look for certain patterns in table writes to trigger pushes. On the front-end, the setup uses iOS for the mobile app and AngularJS for stripped down functionality via desktop and mobile web.

Yes, it was a rewrite. The client wanted to replace the existing system mostly to decrease Total Cost of Ownership (TCO) and ongoing distractions of maintenance. The client approached us to consult on how best to alleviate these ongoing concerns, and Tuple decided that the application and use case fit the serverless model well. We got the client excited about Lambda when we told them how much cheaper low-use time periods could be with Lambda, that they would not need to worry about scaling, and that they could provision and redeploy so quickly.

We tackled this project with a team of two (!). I was the only one with existing experience using Lambda, but my partner during completion has a lot of cloud experience in general, so picking up on the abstractions Lambda provides was a natural step.

On actual business logic and application development for the API, we spent about 4 calendar weeks, amounting to about 250 man-hours, since we did not work on the project full-time. We did not write the iOS application from scratch so I cannot comment on the amount of time that took.

For deployment, we have a simple script which deploys static assets and provisions layers of the infrastructure using CloudFormation, using some of the custom resources I wrote for my open source projects - namely to support API Gateway and DynamoDB Streams (note: they are now supported natively and we are in the process of migrating).

We have some custom closed source tooling to automatically provision CI and CD pipelines on AWS. Sometime soon we will open source it, but generally, we use AWS CloudFormtaion, CodePipeline, Lambda, SQS, and ECS. We rolled our own because we wanted to be able to dynamically provision the full pipelines whenever we launch new projects, which no strong hosted CI/CD service offers right now.

Everything is fully monitored with 1 minute resolution. A large proportion of this is from the CloudWatch monitoring that API Gateway, Lambda, and DynamoDB allow you to use. Because we use lots of custom CloudFormation resources for provisioning, this monitoring is very easy to roll out and is fully automatically set up. The main things we monitor are Latency, Queries per Second, Error Rates, and DynamoDB read/write throughput.

Because we use CloudFormation for everything, all testing is fully automatic, and it takes us about 5 minutes to fully rebuild new environments. We can run any kind of testing we want on new environments we create. Because we are only testing the API, the testing suite is relatively simple, including basic API tests plus some load testing.

It varies a lot. The peak real throughput we have seen hit the service is 400 queries per second. During peak-to-average spike testing, we have been able to go from 0 queries per second to 3,000 queries per second in about 10 seconds before having elasticity problems. During more normal ramp up over the course of an hour, we have been able to get it to go to 7,000 queries per second.

They were spending about $1150 per month in hard infrastructure costs before, just for production, and a lot of soft spend on spending time maintaining the infrastructure. Now they pay under $80 per month for dev, test, and prod all together, and maybe 1/10th the time they used to on managing things like deployment and scaling.

Several million requests per day. The objects are small so API Gateway’s cache at minimum size saved us a lot of money, costing roughly $15/month. We also used direct Lambda invocations in several places to avoid the cost of API Gateway’s $3.50/million requests. We also dynamically provision throughput of DynamoDB tables using a custom Lambda-backed version of dynamic-dynamodb. Unfortunately, we will not be open sourcing that toolkit.

For this project in particular, everything was fairly simple because we had all of the tools on hand ahead of time. Had we not already been toying with this configration before, the most difficult parts would be:

Automating deployment of API Gateway with CloudFormation
Implementing the elastic scaling of DynamoDB with Lambdas
Writing test logic for the deployments

You can use my custom resources I open sourced, available on npm
You can either roll your own equivalent scaling logic Lambda package, write an adapter for DynamicDynamoDB to use in Lambda now that Python is supported, or use the package as-is and launch a t2.nano instance with CloudFormation and monitor your tables that way.
I would suggest using a hosted continuous intgration service like CircleCi or CloudBees, and make sure to set the timeout for your builds to at least 7 minutes, to allow for 4 minutes of build time when you launch your CloudFormation stacks.

Wrapping Up

Thanks again to Andrew for agreeing to be interviewed and for publishing his tools where possible.

Disclosure: I have no relationship to Tuple Labs, but they build cool projects and this interview covers just one of them.

Keep up with future posts via RSS. If you have suggestions, questions, or comments feel free to email me ryan@serverlesscode.com.

Tweet this, send to Hackernews, or post on Reddit

Interview: Lambda in Production With Andrew Templeton

Andrew From Tuple Labs Explains the Path From Concept to Production With Lambda

The Interview

Wrapping Up