There are a few movements that are flying under the “serverless” flag these days. One of them is the drive to outsource as much functionality as possible to SaaS tools so you can avoid maintenance and unexpected costs. Algorithmia is a platform where any developer can provide their code as-a-Service without building a whole infrastructure to deliver it.
It’s been around since 2013 and provides a marketplace to sell individual algorithms to other users, and for developers to get access to loads of algorithms via a uniform API. Function-as-a-Service tools like Azure Functions and AWS Lambda stop at providing a runtime and invocation method for your functions, so the marketplace that Algorithmia provides is unique in the space.
How long have you been working on Algorithmia? What gave you the idea for the product?
Kenny (my cofounder) and I started working on the the initial version of the platform in late 2013 so a little over 2 years at this point. Inspiration for Algorithmia came mostly out of Kenny’s frustration as an academic algorithm developer. He was completing his PhD in Artificial Intelligence and was constantly publishing new algorithms that he believed could be used in everyday software - he did the usual publish to github, open source etc but rarely would anyone actually use what he was working on. On the opposite end I was building data tools at Microsoft and we were constantly looking for new algorithms that would help us interpret data in an easier/better way. The fact that we saw a supply and a demand for the algorithms led us to build an algorithms as a service marketplace.
On your platform, any developer can design and sell an algorithm, and Algorithmia will deploy it and provide an API in front. As a developer, how do I get my code ready to ship to Algorithmia?
The first place to start would be to check out our developer center. In terms of getting ready - Algorithmia was really built to make this step as easy as possible. We have what’s called an apply() function in every algorithm - this is where you define all the inputs and outputs of the algorithm and what allows us to standardize across programming languages. Other than that we work with the main package manager for each language (ie Pypi, RubyGems, Maven, etc) so declaring those in our dependency manager allows Algorithmia to pull whatever it needs from the internet.
What limitations are there on an algorithm? Is there a limit on the language or number of libraries it can use?
Currently we support Java, Scala, Python 2.x and 3.x, Ruby, Rust, JS as algorithm development languages. We are adding R and C++ next. Future languages are purely based on what our users are asking us for. We have developed some interesting technology which allows us to add a language pretty quickly to the platform.
You use Docker to run the actual containers, but can you tell us about the rest of your infrastructure? How are containers scheduled and monitored?
Algorithmia was designed to move compute to the data. What this means that our workers node (servers that can load algorithms in containers) can coexist in multiple clouds, regions or in some cases even on-premise servers. Our API servers handle routing, scheduling, monitoring of containers using our own scheduling algorithms, which allow us to optimize when to load and unload algorithms and ensure locality of data.
To handle incoming requests, passing the data into and back from the container where the code executes is important. In Algorithmia, how can developers respond to lots of (potentially large) requests quickly?
One of my favorite parts about Algorithmia is that developers never have to worry about this. The whole purpose of the platform is to take care of the usual devops and data engineering that is generally required when running these types of algorithms. As your application makes more requests to our platform we will scale out the workers and keep serving request smoothly. We also have strong caching mechanisms on data in the platform, so that subsequent requests to an algorithm can run much faster. By adding structure to our platform beyond a simple IAAS platform, we can track data and algorithms more precisely, and be more efficient in running them.
For large numbers of requests, your platform scales right up to match load. What happens if a single algorithm instance has to do a lot of work? Is there a way for developers to checkpoint their work, or send back partial results?
Our platform’s stregth is in hosting and scaling algorithms that tend to be shorter lived; things like image recognition and language processing. Longer than 50 min runs are not currently supported by our platform. Currently there is no way to checkpoint your algorithm results. That said, the algorithm developer could save intermediate state using our data api.
Can you walk readers through the lifecycle of a request to any given algorithm? Is there a time limit for how long a function can run?
A request comes into our system through our API server fleet. From there we will using our scheduling algorithm to look at available system resources and constraints, and find an available worker machine to handle the request. The request is sent out to a worker which will have a instance of the particular algorithm version running in its own isolated docker container. Once work is completed, it gets routed back to the API servers and returned to the caller.
We have a default timeout of 5 minutes per request, which is plenty of time for most algorithms, however this timeout can be extended up to 50 minute for longer running jobs.
What happens if that algorithm calls others in a chain? Does the calling function get billed for the time spent waiting for the other call?
Yes, we allow and encourage the use of algorithms being called in a chain. This allows algorithms to be used as building blocks to create more intelligent, higher-level algorithms. Since we charge for seconds of compute, the cost of an algorithm will be the total running time of all algorithms, including any “parent” algorithms. This can result in billing for time spent waiting, but the combined algorithm provides enough value for that to be worth it while the compute costs are quite minimal.
Your service runs on multiple clouds, how do you handle deploying to different infrastructure and routing requests across clouds?
Being able to run across multiple cloud providers and data centers was a core requirement of the platform from day 1. In a sense we want to be a CDN for algorithmic computing. Since this was a requirement from the start, we avoided taking any dependencies on vendor-specific products or services. Beyond that we have spent a lot of time engineering our data management and scheduling algorithms to work across providers.
There are some architects that say designing to run on multiple clouds locks you into a lowest-common-denominator platform. Can you tell us about tradeoffs you've had to make, or services you had to implement yourself that you might have purchased if Algorithmia was single-platform?
Algorithmia had to be designed for portability, from cloud to on-premise as well as across multiple cloud providers. Given that there is no standard for most higher level services picking the lowest common denominator was actually by design. The tradeoffs are mostly on convinience and speed. We love managed services and AWS/Azure/GCP all have great ones that we just could not take advantage of – you are not going to beat the speed of getting up and running of a managed service. We had to implement our own orchestration, distributed git storage, routing, container managment, and scheduling layers – all of these have managed alternatives if you are commited to a single IaaS provider.
With serverless/Functions-as-a-Service becoming more popular, what do you think the future looks like for Algorithmia? Are any there new features for folks to look forward to?
We are pretty excited moving forward. This is a new paradigm in designing architectures - I think we are finally getting to a world where code reuse is practical and not just a myth/thing software developers aspire to. The power in scaling these architectures is also really exciting. For us the mission is simple “make state of the art algorithms for discoverable and accessible by everyone”. From a practical stand point that means putting a system in front of developers where they can build things they could never imagine before, including machine learning and other elements of AI into everyday applications.
We are constantly adding new features some of the more exciting ones are new languages like R and C++, support for using Algorithmia with your Spark Cluster and support for GPUs and deep learning workloads. You can stay tuned on these at blog.algorithmia.com.
If you’d like to hear more from Diego about how the Algorithmia marketplace works, listen to his Software Engineering Daily interview from a few weeks ago. If you’re ready to build your first public algorithm, check out the Algorithmia developer guide.
Special thanks to ServerlessCode sponsor Trek10, experts in supporting high-scale serverless and event-driven applications.