There’s tons of great stuff in this case study – cost savings, faster time to market, single-command dev/test environments, conntinous deploy pipelines, and lower ongoing maintenance costs. This is huge for the customer that was spending $1,150 per month and is now under $80/month for an app that’s now easier to deploy and manage. The app is written in Node.js, and the whole stack is deployed with CloudFormation.
Tell me a little about the app, what problem is it solving?
Our client had a mobile application that needed a REST API backend. There were about 10 kinds of business objects that we needed to model in the API. The system needed to be highly secure, scalable, and easy to manage with almost no ongoing effort, as Tuple Labs, the company I work with, is a small firm and we want to keep our cost of maintenance down.
Other than Lambda, what other technologies is the app using? This includes frontend, mobile, databases, whatever you can share.
Other than Lambda, on the backend, we used API Gateway to allow serverless conversion of incoming HTTP requests into Lambda calls. We used DynamoDB as the persistence store for application data, along with S3 and CloudFront for delivering static content. For email and push notifications, we use SES and SNS sent from Lambdas subscribed to DynamoDB Streams, which look for certain patterns in table writes to trigger pushes. On the front-end, the setup uses iOS for the mobile app and AngularJS for stripped down functionality via desktop and mobile web.
Was this a rewrite of an existing product? If yes, why did the client want to replace the existing solution. If no, how was the decision to use Lambda made?
Yes, it was a rewrite. The client wanted to replace the existing system mostly to decrease Total Cost of Ownership (TCO) and ongoing distractions of maintenance. The client approached us to consult on how best to alleviate these ongoing concerns, and Tuple decided that the application and use case fit the serverless model well. We got the client excited about Lambda when we told them how much cheaper low-use time periods could be with Lambda, that they would not need to worry about scaling, and that they could provision and redeploy so quickly.
How large was the team? Did any/all of them have experience with Lambda already, or were they coming from other areas of expertise?
We tackled this project with a team of two (!). I was the only one with existing experience using Lambda, but my partner during completion has a lot of cloud experience in general, so picking up on the abstractions Lambda provides was a natural step.
About how long did you spend developing the app? Was it any faster than if you were writing in another framework? (Express, Rails, whatever your "home turf" is).
On actual business logic and application development for the API, we spent about 4 calendar weeks, amounting to about 250 man-hours, since we did not work on the project full-time. We did not write the iOS application from scratch so I cannot comment on the amount of time that took.
How are you deploying the application?
For deployment, we have a simple script which deploys static assets and provisions layers of the infrastructure using CloudFormation, using some of the custom resources I wrote for my open source projects - namely to support API Gateway and DynamoDB Streams (note: they are now supported natively and we are in the process of migrating).
Are you using a CI/CD service?
We have some custom closed source tooling to automatically provision CI and CD pipelines on AWS. Sometime soon we will open source it, but generally, we use AWS CloudFormtaion, CodePipeline, Lambda, SQS, and ECS. We rolled our own because we wanted to be able to dynamically provision the full pipelines whenever we launch new projects, which no strong hosted CI/CD service offers right now.
What monitoring do you have in place? Is there anything you want to monitor but don't/can't yet?
Everything is fully monitored with 1 minute resolution. A large proportion of this is from the CloudWatch monitoring that API Gateway, Lambda, and DynamoDB allow you to use. Because we use lots of custom CloudFormation resources for provisioning, this monitoring is very easy to roll out and is fully automatically set up. The main things we monitor are Latency, Queries per Second, Error Rates, and DynamoDB read/write throughput.
How are you testing changes before they go to production? Do you have testing/staging environments?
Because we use CloudFormation for everything, all testing is fully automatic, and it takes us about 5 minutes to fully rebuild new environments. We can run any kind of testing we want on new environments we create. Because we are only testing the API, the testing suite is relatively simple, including basic API tests plus some load testing.
What kind of traffic have you been seeing since the app went live?
It varies a lot. The peak real throughput we have seen hit the service is 400 queries per second. During peak-to-average spike testing, we have been able to go from 0 queries per second to 3,000 queries per second in about 10 seconds before having elasticity problems. During more normal ramp up over the course of an hour, we have been able to get it to go to 7,000 queries per second.
You mentioned that the low bill surprised your client, what were they paying to run the app before? What are they paying now?
They were spending about $1150 per month in hard infrastructure costs before, just for production, and a lot of soft spend on spending time maintaining the infrastructure. Now they pay under $80 per month for dev, test, and prod all together, and maybe 1/10th the time they used to on managing things like deployment and scaling.
Can you share how many page views/API hits the app is serving for that price point?
Several million requests per day. The objects are small so API Gateway’s cache at minimum size saved us a lot of money, costing roughly $15/month. We also used direct Lambda invocations in several places to avoid the cost of API Gateway’s $3.50/million requests. We also dynamically provision throughput of DynamoDB tables using a custom Lambda-backed version of dynamic-dynamodb. Unfortunately, we will not be open sourcing that toolkit.
Was there anything that surprised you along the way? Were certain tasks easier or harder than you'd expected?
For this project in particular, everything was fairly simple because we had all of the tools on hand ahead of time. Had we not already been toying with this configration before, the most difficult parts would be:
- Automating deployment of API Gateway with CloudFormation
- Implementing the elastic scaling of DynamoDB with Lambdas
- Writing test logic for the deployments
Are there any tools or libraries you found particularly helpful that you'd like more people to know about?
- You can use my custom resources I open sourced, available on npm
- You can either roll your own equivalent scaling logic Lambda package, write an adapter for DynamicDynamoDB to use in Lambda now that Python is supported, or use the package as-is and launch a t2.nano instance with CloudFormation and monitor your tables that way.
- I would suggest using a hosted continuous intgration service like CircleCi or CloudBees, and make sure to set the timeout for your builds to at least 7 minutes, to allow for 4 minutes of build time when you launch your CloudFormation stacks.
Thanks again to Andrew for agreeing to be interviewed and for publishing his tools where possible.
Disclosure: I have no relationship to Tuple Labs, but they build cool projects and this interview covers just one of them.