lxml is a nice Python library for parsing XML files. Very efficient and powerful but it has C binary dependencies that are a bit complicated install some times.
The problem with lxml and Docker is that we need the compiled binary dependencies in the final image. So, it’s complicated have a reduced Docker Image with this scenario, event more with multi-step Docker Images.
But don’t worry you only need to follow these steps to create a multi-step Docker image that contains Lxml library.
We can consider this Dockerfile:
FROM python:3.8-alpine as base
RUN apk update && \
apk upgrade
FROM base as build_lxml
RUN apk add --no-cache build-base gcc musl-dev python3-dev libffi-dev libxml2-dev libxslt-dev
RUN python -OO -m pip install --no-cache-dir -U pip && \
python -OO -m pip wheel --no-cache-dir --wheel-dir=/root/lxml_wheel lxml
FROM base
COPY --from=build_lxml /root/lxml_wheel /root/lxml_wheel
# lxml binary dependencies
COPY --from=build_lxml /usr/lib/libxslt.so.1 /usr/lib/libxslt.so.1
COPY --from=build_lxml /usr/lib/libexslt.so.0 /usr/lib/libexslt.so.0
COPY --from=build_lxml /usr/lib/libxml2.so.2 /usr/lib/libxml2.so.2
COPY --from=build_lxml /usr/lib/libgcrypt.so.20 /usr/lib/libgcrypt.so.20
COPY --from=build_lxml /usr/lib/libgpg-error.so.0 /usr/lib/libgpg-error.so.0
RUN python -OO -m pip install --no-cache --no-index --find-links=/root/lxml_wheel/*
Oks, lets explain a bit what we did:
Create building layer
We created a *build_lxml’ layer. In this layer we were installed the development libraries necessaries to for compiling lxml:
RUN apk add --no-cache build-base gcc musl-dev python3-dev libffi-dev libxml2-dev libxslt-dev
Compile lxml and optimize wheel
As we saw in previous post we created a Python wheel with optimization Python flags.
RUN python -OO -m pip install --no-cache-dir -U pip && \
python -OO pip wheel --no-cache-dir --wheel-dir=/root/lxml_wheel lxml
Copy binary dependencies
This is the most important step. In the final layer we must include binary files .so from build_lxml layer.
# lxml binary dependencies
COPY --from=build_lxml /usr/lib/libxslt.so.1 /usr/lib/libxslt.so.1
COPY --from=build_lxml /usr/lib/libexslt.so.0 /usr/lib/libexslt.so.0
COPY --from=build_lxml /usr/lib/libxml2.so.2 /usr/lib/libxml2.so.2
COPY --from=build_lxml /usr/lib/libgcrypt.so.20 /usr/lib/libgcrypt.so.20
COPY --from=build_lxml /usr/lib/libgpg-error.so.0 /usr/lib/libgpg-error.so.0
Install wheels
I do as previous post to install wheels:
RUN python -OO -m pip install --no-cache --no-index --find-links=/root/lxml_wheel/*