Last year, I posted instructions for building scikit-learn for AWS
Lambda and since then, there have been changes in both the
way scikit-learn has to be built. The project has also started being
shipped as a different kind of wheelfile – bdist_wheel. According to this
Github issue that breaks build processes using strip
to reduce the
code size of numpy and scipy.
Amazon has also released a container edition of Amazon Linux. The Amazon Linux container is a full container version of the same Amazon Linux that’s being run in the AWS Lambda environment. In this post, we’ll use the new container image to build the same scikit-learn artifact as the last post that used an EC2 instance.
The New Script
The script itself is relatively unchanged, but to use it you’ll need to have Docker installed on your computer. That’s beyond the scope of this post, and if you don’t have it check out the docs.
Installing the non-binary wheels means changing the pip
commands to use the
(new in pip
version 8) --no-binary
option to force the type of wheel to
be installed. The new command is pip install --use-wheel --no-binary numpy numpy
.
Special thanks to ServerlessCode sponsor Trek10, experts in supporting high-scale serverless and event-driven applications.
To run it, instead of using an Ansible playbook, you’ll need to pull down the
Amazon Linux image with docker pull amazonlinux:2016.09
. As of January of
2017, the 2016.09
image matches the Lambda execution environment.
Running in Docker
Once the image is downloaded, we can run the build script in the container.
Clone my ryansb/sklearn-build-lambda and change into the
directory. Once that’s done, we can use docker run
to build the artifacts and
dump them in the working directory. The $(pwd)
part of the volume argument
mounts the current directory to a /outputs
folder inside the container.
$ docker run -v $(pwd):/outputs -it amazonlinux:2016.09 \
/bin/bash /outputs/build.sh
After a few minutes (depending on your hardware) you’ll have a venv.zip
file
containing scikit-learn and all it’s dependencies. The artifact is still
hovering around 40MB, which is large but not unmanageable. For pulling models,
I’ve had success storing them separately in S3 and downloading at function
start.
Wrapping Up
With the Docker container, you don’t need to worry whether your base OS is the
right flavor of Linux, or run an instance in AWS. This build script is easy to
expand to include more dependencies, just add libraries to the do_pip
function of build.sh.
Keep up with future posts via RSS. If you have suggestions, questions, or comments ryan@serverlesscode.com is my email address.