In my two-part tutorial on using Django-Zappa and deploying Django-Zappa in a VPC I covered Rich’s new library for running any Python WSGI application on the Lambda/API Gateway stack. Every major Python web framework uses WSGI: Django, Flask, Pyramid, CherryPy, Bottle, and on and on. Opening the Lambda ecosystem to the whole gaggle of them is huge.
Thanks for taking the time Rich, tell us about how you came to write Zappa?
Hello! My name is Rich Jones, I’m the CTO of Gun.io and the principal author of Zappa.
I believe that serverless architectures - that is to say, systems without any permanent infrastructure - are the future of network applications.
For Python web applications, Zappa is the next evolution of application deployment. What previously cost hundreds or even thousands of dollars on VPS services like Linode or PaaS services like Heroku is now possible for mere pennies using Zappa. It’s faster, cheaper, more scalable and far easier to deploy and maintain than anything else out there - so give it a try!
How long have you been working on Zappa (and Django-Zappa)?
I’ve been working on Zappa since.. hm, let’s look at the git logs.. January 20th! So, the project is still quite young! But, all things considered, it actually matured quite a bit faster than I expected thanks to the awesome community support and contributions.
What gave you the idea to write a WSGI wrapper inside API Gateway?
I was using AWS Lambda quite extensively for a Gun.io Consulting project, where we used AWS Lambda as part of a state machine for high performance computing system with a microservice architecture.
I really liked the Lambda technology and saw it as the next successor in the “Metal -> VPS -> PaaS” evolution, so I started to build a whole new Python-based web framework around it, which I dubbed Zappa (after my favorite guitarist, and a nod to Django, my favorite web framework).
I was in the middle of writing a new URL router when I thought - wait, why am I reinventing the wheel? Can’t I just get Django to work on Lambda? Pretty soon I had that working, so now here we are.
Can you tell me about some projects where you've used Zappa? Is it being used extensively inside gun.io?
So far, I haven’t used Zappa in commercial production anywhere, but I am using it my small personal projects. I think it’ll be another two weeks (okay, maybe a month) until we’re ready for commercial usage. However, we’re currently doing a major product overhaul at Gun.io, and Zappa is a big part of that. The operations cost savings alone have been huge.
I have also begun porting all of my small personal projects to Zappa, and have actually managed to save myself over a thousand dollars a year in hosting costs! I’ve also had a number of people tell me that they’re using Zappa for internal applications at their companies, which is very encouraging.
Have you written a product like this before, or is Zappa your first crack at WSGI-layer Python? If it isn't, tell me what you learned from past iterations.
AWS Lambda/API Gateway are really new technologies, and we’re using them in ways they weren’t really intended for, so I guess I haven’t really written anything else quite like this before, although I have used Lambda in a few projects (such as a replacement for the now-defunct Yahoo! Pipes).
Looking at the code, it seems like there are a few spots where you have to work around limitations in API Gateway or Lambda. For example, the way your middleware packs multiple values into one cookie. What limitations caused you the most trouble writing or testing Zappa?
Oh man. Yeah. It took a lot of hacking to get everything to work properly through API Gateway. API Gateway is still a half-baked product, quite frankly. Hopefully they’ll see how we’re using it and make changes accordingly.
For instance, if I remember correctly, non-200 status codes are passed by raising an exception in the Python layer, but there’s no way to pass custom values through it that way. So, we have to pack the entire response as base64, prefix a known value to it, raise the whole thing as an exception, then map it to the right error code using a regex of the known value, then base64 decode the body in VTL before passing it back to the client. It’s kind of crazy. I hope they fix that in the near future.
(Jeff, if you’re reading this, I am available for product consulting. Email me.)
Can you walk readers through the lifecycle of a request in an app that uses Zappa versus the same app running in a normal (say, nginx+uwsgi) environment?
With a traditional webserver, much of the server time can be spend idling waiting for a new request to come in. When a new HTTP request comes in through the network socket:
- Nginx picks it up
- It is proxied to Gunicorn which mangles it into WSGI
- Gunicorn feeds it to your app
- After your code finishes working, Gunicorn gets the response back and passes it back to Nginx
- Nginx serves it back to the requesting client
At the end, they all go back to idling. However, if too many requests come in simultaneously, the server is blocked until it has enough resources to process the next request! So spikes of traffic can mean some clients timeout and never receive the requested content.
With Zappa, the server is only created after the HTTP request comes in through the API Gateway, and it dies immediately after. With Zappa running in Lambda, the client sends the request:
- The request is routed through the API Gateway
- API Gateway maps that request into a nice dictionary using Velocity Template Language (VTL)
- The Lambda function is spawned from a cache, and the dictionary is fed in as an event
- Zappa turns this into a valid WSGI environ
- Your WSGI application code runs and returns the response
- Zappa turns that back into something which can pass successfully through the API Gateway
- The server dies at the end of the Lambda function
- API Gateway sends the response back to the client
And all of that happens in under 40ms, cool!
Plus, since each request has its own ‘server’, there is no limit on horizontal scalability. In fact, it’s all handled completely transparently and automatically by AWS Lambda - awesome.
The Django-Zappa deployment commands work through the `manage.py`, do you see a path to providing a consistent deploy system, like a tool that would work across projects using Zappa with either Flask or Django?
Django-Zappa uses the standard Django
manage.py method of interaction, and
flask-cli, which is quite similar and will be merged into
the Flask core in the next release. We will also likely build out a Zappa CLI
entry point which will work with arbitrary Python programs as well. This will
probably happen after the Zappa core library stabilizes, which should happen in
the next week or two. I might actually stub this out today now that I’m
thinking about it.
How has it been engaging with contributors and users on Github? Any tips you'd like to share with people thinking about opening their own code?
I’ve been absolutely delighted with the response that people have had to Zappa so far. We’ve had many welcome contributions from many contributors – specifically, @mathom, @doerge, @vascop, @collingreen, @jdmac and @benbangert. (And probably other people I forgot as well, sorry about that!) Slack has been a really useful platform for discussing developing and using Zappa - please feel free to join us here!
I’m a huge, huge, huge supporter, advocate, and contributor to Free and Open Source software. My biggest qualm about Zappa is actually that it is a step backwards for software freedom, since Lambda and API Gateway aren’t Free Software projects. I know that people have started on Free software implementations of Lambda-like technology, but I don’t know if there are any Free offerings like API Gateway yet.
If you’re interested in getting started with Free Software, I wrote an article a few years ago called How to Github that still has some relevance today.
You mentioned Zappa being a problem for software freedom, but doesn't Zappa improve the situation by acting as a shim between the proprietary API Gateway/Lambda?
Unfortunately, I don’t really think so. Software freedom, rather than just “open source,” is about control. Using the Amazon stack requires relinquishing some of that control to a vendor - it’s just the cost of the technical improvements that Zappa can provide. So, that makes Zappa good choice for commercial or public microservices, webapps or content management systems, but not a great choice for any services that depend on protecting user privacy, for instance.
Fortunately, because Zappa uses existing Python technologies, vendor lock-in isn’t a real problem as a normal user, as you can come and go as you please if you want to return to a VPS/PaaS arrangement.
Where do you see open versions of API Gateway and Lambda succeeding? Much of the selling points seem related to outsourcing all the ops/management, plus not paying for idle time. It seems like a software project wouldn't solve that part of the problem, what do you think?
Well, right off the bat, there would be an immediate benefit just from having a multitude of vendors. Right now, Amazon is really the only player in town, with Google’s Cloud Functions as a distant second. If we developed a free standard for serverless deployments, we could have a plethora of providers.
For instance, one could imagine privacy-respecting providers in IMMI-protected Iceland, or a provider in Africa to serve African developers and businesses, or a free provider for hosted Free Software projects, or even just the ability for large organizations to run their own stack and easily let their developers deploy serverless microservices without the need to block on the IT guys. The possibilities are unlimited with free software, and often times the greatest use cases are ones the original developers never even conceived of. That’s the beauty of it!
What made you pick the MIT license for the project?
Since this is a ‘plumbing’-type project, I didn’t think that the GPL would be very appropriate, so MIT is kind of my go-to permissive software license.
What do you see as the biggest problems with the Zappa library as it is? Is there a roadmap or wishlist you'd like to share?
We are still waiting on a few features to be available in the AWS API before I’ll be able to consider Zappa feature-complete. Specifically, the ability to schedule Lambda events, as this allow us to replace Celery entirely, which would be an absolutely massive win for Zappa. This is currently only available through the AWS Console, but hopefully official support will come soon, or somebody will take the time to reverse the interface.
So, on the roadmap:
- Celery-like functionality
- Let’s Encrypt SSL out of the box
- More documentation!
- Testing, testing, testing!
- ..maybe write a book? :)
Honestly, the thing we need most of all is just more people using it, finding the edge cases, filing bugs, and telling us how we can make the whole experience better. Feature and pull requests are gladly accepted!
Pyramid was the first web framework I used, so I have a soft spot for it: do you plan on a pyramid-zappa anytime soon?
Yes! In fact, I hope he doesn’t mind me saying this, but one of the Pyramid authors has been a regular in our Slack channel, and has been influential in some of our design decisions. It’s very possible that there won’t be a specific Pyramid-Zappa library, but that Pyramid will be the first use case of the standalone Zappa tool.
Are there any tools or libraries you found particularly helpful while building Zappa that you'd like more people to know about?
One of the very first issues people filed with django-zappa was to request support for PostgreSQL. Since Postgres requires C-extensions, this posed a bit of a problem.
We have also started the lambda-packages project which contains popular Python libraries, pre-compiled for AWS Lambda. It currently houses MySQL and Postgres, which are used by Django-Zappa, and Pillow, which is used by pretty much every Python program that touches images. More contributions are very welcome!
Tools like Zappa are just the beginning of what’s possible with on-demand compute like Lambda and wrappers like API Gateway. Being able to run apps that weren’t designed for “serverless” hosting is huge for teams that have a product that benefits from the flexibility or pricing model. If you want to follow Zappa development, or use it yourself, it’s on Github Miserlou/Zappa and the Django version is Miserlou/Django-Zappa.