For an app platform like Convox good log management is a requirement.
convox logs tail was one of the most important features to build after
we got deployments working. A developer is dead in the water without
visibility into their application logs.
Picking Kinesis for logs was an easy decision. It’s designed for real-time data streaming, which is exactly how all of us writing 12 Factor apps think of logs. We wrote a little Go agent that using the Docker Events API to monitor for container start events, then uses the Docker Logs API to follow logs and write them to a Kinesis stream.
When more people started using the platform, log searching and alerting was a top requested feature. Again, visibility is critical, and searching through old logs and automating alerts from new logs are DevOps best practices.
Partnering with Papertrail was an easy choice. Most of our users are already using (or at least familiar with) Papertrail, and their product hits a sweet spot of log management features with little to no setup and maintenance.
Picking Lambda to process Kinesis events into Papertrail was a bit more risky, since Lambda is new in general and something I’ve only used in another context (CloudFormation custom resources). But the promise of it – invoking a little bit of code on demand to Kinesis events – is perfect for log forwarding. It was definitely worth an experiment.
Also maybe a bit of history is fun… I’m no stranger to this problem.
One of my first duties at Heroku, 6 years ago now, was to improve our logging infrastructure.That meant getting syslog-ng configured correctly to forward logs off of a couple hundred of instances, and some syslog-ng servers to ingest all the data. On the log servers we could run some analytics and forward data onto Splunk for more analysis. Before this shelling into individual boxes was the main investigative tool.
At this time,
heroku logs --tail didn’t exist.
heroku logs told the
API to query every runtime instance, see if it was running the app in
question, read log files and return the data. We needed to get every app
onto the syslog system too.
We spent years building and scaling the infrastructure to manage application log streams and routing logs to 3rd party system.
So I still can’t believe how easy it has been to get it all working for Convox.
For starters, an open source contributor, Dave Newman offered the first sketch of a Lambda function to send Kinesis events to Papertrail. It came in at a whopping 22 lines of code, using the Winston Papertrail library!
I try to take an iterative approach to engineering, so the next step was to manually create some Lambda functions for some apps.
We have our own Convox apps, and manage Convox for one of our earliest serious adopters Opendoor. By setting it up for critical these apps we can “let it bake” and see if it feels right. After using it for a couple of weeks it was definitely working and providing real value to our operations. Most importantly it was working as advertised… No maintenance overhead!
So the next step is to build it as a first class part of the platform.
The UI was obvious. These commands should connect apps to a Papertrail system:
$ convox services create papertrail --name pt --url logs1.papertrailapp.com:11235 $ convox services link pt --app myapp $ convox services link pt --app myapp2
How to save the settings like the Papertrail URL and it’s linked apps was a bit less obvious. But I look at Ockham’s Razor as a design principle for Convox: “Entities must not be multiplied beyond necessity”. We have been using CloudFormation to manage almost everything Convox, so that would be the solution here too.
The URL would be a CloudFormation Parameter and every app link would be
AWS::Lambda::EventSourceMapping Resource. And of course
CloudFormation would also make the Lambda function and IAM Execution
role as a Resource.
There’s an architecture diagram of this here:
There’s one gotcha I discovered with Lambda. I want to publish a single zip file with the event forwarding code, and have this function dynamically figure out the URL it should be forwarding to. Unfortunately Lambda doesn’t offer any sort of environment or configuration management solution. To work around this, I gave the Lambda function the ability to describe its own CloudFormation stack, so it could introspect the URL parameter there. This feels a bit hacky but it’s been working perfectly.
And countless app logs flow into Papertrail.
No syslog-ng client/server configuration. No servers at all. And tons of confidence that we can pump a lot of data into a Kinesis stream and have Lambda process and forward it to Papertrail.
Future work along this vein is to:
- Generalize this to work with other syslog services, like Splunk Cloud or Loggly
- Build a Lambda function that processes logs, extracts custom metrics and writes them to CloudWatch