Interview: Rich Jones on Zappa

Hear From Author of Zappa and CTO at Gun.io About Where His Project Originated, and Its Future

Posted by Ryan S. Brown on Wed, Mar 23, 2016
In Interview
Tags: lambda, django, api gateway, case-study

In my two-part tutorial on using Django-Zappa and deploying Django-Zappa in a VPC I covered Rich’s new library for running any Python WSGI application on the Lambda/API Gateway stack. Every major Python web framework uses WSGI: Django, Flask, Pyramid, CherryPy, Bottle, and on and on. Opening the Lambda ecosystem to the whole gaggle of them is huge.

Hello! My name is Rich Jones, I’m the CTO of Gun.io and the principal author of Zappa.

I believe that serverless architectures - that is to say, systems without any permanent infrastructure - are the future of network applications.

For Python web applications, Zappa is the next evolution of application deployment. What previously cost hundreds or even thousands of dollars on VPS services like Linode or PaaS services like Heroku is now possible for mere pennies using Zappa. It’s faster, cheaper, more scalable and far easier to deploy and maintain than anything else out there - so give it a try!

I’ve been working on Zappa since.. hm, let’s look at the git logs.. January 20th! So, the project is still quite young! But, all things considered, it actually matured quite a bit faster than I expected thanks to the awesome community support and contributions.

Commit logs of Zappa's birth

I was using AWS Lambda quite extensively for a Gun.io Consulting project, where we used AWS Lambda as part of a state machine for high performance computing system with a microservice architecture.

I really liked the Lambda technology and saw it as the next successor in the “Metal -> VPS -> PaaS” evolution, so I started to build a whole new Python-based web framework around it, which I dubbed Zappa (after my favorite guitarist, and a nod to Django, my favorite web framework).

I was in the middle of writing a new URL router when I thought - wait, why am I reinventing the wheel? Can’t I just get Django to work on Lambda? Pretty soon I had that working, so now here we are.

So far, I haven’t used Zappa in commercial production anywhere, but I am using it my small personal projects. I think it’ll be another two weeks (okay, maybe a month) until we’re ready for commercial usage. However, we’re currently doing a major product overhaul at Gun.io, and Zappa is a big part of that. The operations cost savings alone have been huge.

I have also begun porting all of my small personal projects to Zappa, and have actually managed to save myself over a thousand dollars a year in hosting costs! I’ve also had a number of people tell me that they’re using Zappa for internal applications at their companies, which is very encouraging.

AWS Lambda/API Gateway are really new technologies, and we’re using them in ways they weren’t really intended for, so I guess I haven’t really written anything else quite like this before, although I have used Lambda in a few projects (such as a replacement for the now-defunct Yahoo! Pipes).

The closest thing I’ve worked on in terms of weird WSGI stuff is a project called OnionChat - which is a real time chat server that works without JavaScript by using HTTP Long Polling, designed for use on the Tor network. That’s kind of a funky one. (As far I know, however, it was the first ever cross-platform, mobile-ready, real-time anonymous chat system, so that’s pretty cool.)

Oh man. Yeah. It took a lot of hacking to get everything to work properly through API Gateway. API Gateway is still a half-baked product, quite frankly. Hopefully they’ll see how we’re using it and make changes accordingly.

For instance, if I remember correctly, non-200 status codes are passed by raising an exception in the Python layer, but there’s no way to pass custom values through it that way. So, we have to pack the entire response as base64, prefix a known value to it, raise the whole thing as an exception, then map it to the right error code using a regex of the known value, then base64 decode the body in VTL before passing it back to the client. It’s kind of crazy. I hope they fix that in the near future.

(Jeff, if you’re reading this, I am available for product consulting. Email me.)

Sure!

With a traditional webserver, much of the server time can be spend idling waiting for a new request to come in. When a new HTTP request comes in through the network socket:

  1. Nginx picks it up
  2. It is proxied to Gunicorn which mangles it into WSGI
  3. Gunicorn feeds it to your app
  4. After your code finishes working, Gunicorn gets the response back and passes it back to Nginx
  5. Nginx serves it back to the requesting client

At the end, they all go back to idling. However, if too many requests come in simultaneously, the server is blocked until it has enough resources to process the next request! So spikes of traffic can mean some clients timeout and never receive the requested content.

With Zappa, the server is only created after the HTTP request comes in through the API Gateway, and it dies immediately after. With Zappa running in Lambda, the client sends the request:

  1. The request is routed through the API Gateway
  2. API Gateway maps that request into a nice dictionary using Velocity Template Language (VTL)
  3. The Lambda function is spawned from a cache, and the dictionary is fed in as an event
  4. Zappa turns this into a valid WSGI environ
  5. Your WSGI application code runs and returns the response
  6. Zappa turns that back into something which can pass successfully through the API Gateway
  7. The server dies at the end of the Lambda function
  8. API Gateway sends the response back to the client

And all of that happens in under 40ms, cool!

Plus, since each request has its own ‘server’, there is no limit on horizontal scalability. In fact, it’s all handled completely transparently and automatically by AWS Lambda - awesome.

Django-Zappa uses the standard Django manage.py method of interaction, and Flask-Zappa uses flask-cli, which is quite similar and will be merged into the Flask core in the next release. We will also likely build out a Zappa CLI entry point which will work with arbitrary Python programs as well. This will probably happen after the Zappa core library stabilizes, which should happen in the next week or two. I might actually stub this out today now that I’m thinking about it.

I’ve been absolutely delighted with the response that people have had to Zappa so far. We’ve had many welcome contributions from many contributors – specifically, @mathom, @doerge, @vascop, @collingreen, @jdmac and @benbangert. (And probably other people I forgot as well, sorry about that!) Slack has been a really useful platform for discussing developing and using Zappa - please feel free to join us here!

I’m a huge, huge, huge supporter, advocate, and contributor to Free and Open Source software. My biggest qualm about Zappa is actually that it is a step backwards for software freedom, since Lambda and API Gateway aren’t Free Software projects. I know that people have started on Free software implementations of Lambda-like technology, but I don’t know if there are any Free offerings like API Gateway yet.

If you’re interested in getting started with Free Software, I wrote an article a few years ago called How to Github that still has some relevance today.

Unfortunately, I don’t really think so. Software freedom, rather than just “open source,” is about control. Using the Amazon stack requires relinquishing some of that control to a vendor - it’s just the cost of the technical improvements that Zappa can provide. So, that makes Zappa good choice for commercial or public microservices, webapps or content management systems, but not a great choice for any services that depend on protecting user privacy, for instance.

Fortunately, because Zappa uses existing Python technologies, vendor lock-in isn’t a real problem as a normal user, as you can come and go as you please if you want to return to a VPS/PaaS arrangement.

Well, right off the bat, there would be an immediate benefit just from having a multitude of vendors. Right now, Amazon is really the only player in town, with Google’s Cloud Functions as a distant second. If we developed a free standard for serverless deployments, we could have a plethora of providers.

For instance, one could imagine privacy-respecting providers in IMMI-protected Iceland, or a provider in Africa to serve African developers and businesses, or a free provider for hosted Free Software projects, or even just the ability for large organizations to run their own stack and easily let their developers deploy serverless microservices without the need to block on the IT guys. The possibilities are unlimited with free software, and often times the greatest use cases are ones the original developers never even conceived of. That’s the beauty of it!

¯\(ツ)

Since this is a ‘plumbing’-type project, I didn’t think that the GPL would be very appropriate, so MIT is kind of my go-to permissive software license.

We are still waiting on a few features to be available in the AWS API before I’ll be able to consider Zappa feature-complete. Specifically, the ability to schedule Lambda events, as this allow us to replace Celery entirely, which would be an absolutely massive win for Zappa. This is currently only available through the AWS Console, but hopefully official support will come soon, or somebody will take the time to reverse the interface.

So, on the roadmap:

  • Celery-like functionality
  • Let’s Encrypt SSL out of the box
  • More documentation!
  • Testing, testing, testing!
  • ..maybe write a book? :)

Honestly, the thing we need most of all is just more people using it, finding the edge cases, filing bugs, and telling us how we can make the whole experience better. Feature and pull requests are gladly accepted!

Yes! In fact, I hope he doesn’t mind me saying this, but one of the Pyramid authors has been a regular in our Slack channel, and has been influential in some of our design decisions. It’s very possible that there won’t be a specific Pyramid-Zappa library, but that Pyramid will be the first use case of the standalone Zappa tool.

One of the very first issues people filed with django-zappa was to request support for PostgreSQL. Since Postgres requires C-extensions, this posed a bit of a problem.

We have also started the lambda-packages project which contains popular Python libraries, pre-compiled for AWS Lambda. It currently houses MySQL and Postgres, which are used by Django-Zappa, and Pillow, which is used by pretty much every Python program that touches images. More contributions are very welcome!

Wrapping up

Tools like Zappa are just the beginning of what’s possible with on-demand compute like Lambda and wrappers like API Gateway. Being able to run apps that weren’t designed for “serverless” hosting is huge for teams that have a product that benefits from the flexibility or pricing model. If you want to follow Zappa development, or use it yourself, it’s on Github Miserlou/Zappa and the Django version is Miserlou/Django-Zappa.

Keep up with posts like this on RSS, and tweet @ryan_sb or email me at ryan@serverlesscode.com .


Tweet this, send to Hackernews, or post on Reddit