Docker lxml

lxml in multi-step Docker images

cr0hn
2 min readMay 27, 2020

lxml is a nice Python library for parsing XML files. Very efficient and powerful but it has C binary dependencies that are a bit complicated install some times.

The problem with lxml and Docker is that we need the compiled binary dependencies in the final image. So, it’s complicated have a reduced Docker Image with this scenario, event more with multi-step Docker Images.

But don’t worry you only need to follow these steps to create a multi-step Docker image that contains Lxml library.

We can consider this Dockerfile:

FROM python:3.8-alpine as base

RUN apk update && \
apk upgrade

FROM base as build_lxml

RUN apk add --no-cache build-base gcc musl-dev python3-dev libffi-dev libxml2-dev libxslt-dev
RUN python -OO -m pip install --no-cache-dir -U pip && \
python -OO -m pip wheel --no-cache-dir --wheel-dir=/root/lxml_wheel lxml

FROM base
COPY --from=build_lxml /root/lxml_wheel /root/lxml_wheel

# lxml binary dependencies
COPY --from=build_lxml /usr/lib/libxslt.so.1 /usr/lib/libxslt.so.1
COPY --from=build_lxml /usr/lib/libexslt.so.0 /usr/lib/libexslt.so.0
COPY --from=build_lxml /usr/lib/libxml2.so.2 /usr/lib/libxml2.so.2
COPY --from=build_lxml /usr/lib/libgcrypt.so.20 /usr/lib/libgcrypt.so.20
COPY --from=build_lxml /usr/lib/libgpg-error.so.0 /usr/lib/libgpg-error.so.0

RUN python -OO -m pip install --no-cache --no-index --find-links=/root/lxml_wheel/*

Oks, lets explain a bit what we did:

Create building layer

We created a *build_lxml’ layer. In this layer we were installed the development libraries necessaries to for compiling lxml:

RUN apk add --no-cache build-base gcc musl-dev python3-dev libffi-dev libxml2-dev libxslt-dev

Compile lxml and optimize wheel

As we saw in previous post we created a Python wheel with optimization Python flags.

RUN python -OO -m pip install --no-cache-dir -U pip && \
python -OO pip wheel --no-cache-dir --wheel-dir=/root/lxml_wheel lxml

Copy binary dependencies

This is the most important step. In the final layer we must include binary files .so from build_lxml layer.

# lxml binary dependencies
COPY --from=build_lxml /usr/lib/libxslt.so.1 /usr/lib/libxslt.so.1
COPY --from=build_lxml /usr/lib/libexslt.so.0 /usr/lib/libexslt.so.0
COPY --from=build_lxml /usr/lib/libxml2.so.2 /usr/lib/libxml2.so.2
COPY --from=build_lxml /usr/lib/libgcrypt.so.20 /usr/lib/libgcrypt.so.20
COPY --from=build_lxml /usr/lib/libgpg-error.so.0 /usr/lib/libgpg-error.so.0

Install wheels

I do as previous post to install wheels:

RUN python -OO -m pip install --no-cache --no-index --find-links=/root/lxml_wheel/*

--

--

cr0hn

Cybersecurity is a tricky business. I’m a freelancer helping companies avoid nasty surprises