Manylinux and Manyarm Docker Base Images
Distributing compiled geospatial Python extensions requires strict ABI compliance across heterogeneous Linux environments. The manylinux container ecosystem (often colloquially referred to as “manyarm” for ARM64 targets) provides the standardized, glibc-anchored build environments necessary to produce portable wheels for GDAL, PROJ, PyProj, and their native C/C++ dependencies. As the foundational layer of Modern Python Build Tooling & Wheel Configuration, these base images eliminate host-toolchain drift, enforce reproducible compilation boundaries, and guarantee that spatial packages install predictably across cloud VMs, edge devices, and enterprise Linux distributions.
ABI Compliance and Image Selection
Geospatial libraries rely heavily on system-level math, compression, and projection libraries. Selecting the correct base image dictates the minimum glibc version, available package managers, and cross-compilation capabilities. While the community sometimes uses “manyarm” to describe ARM64 builds, the official PEP 600 standard unifies these under manylinux_*_aarch64 tags.
| Architecture | Base Image Tag | glibc Baseline | Primary Use Case |
|---|---|---|---|
x86_64 |
quay.io/pypa/manylinux_2_28_x86_64 |
2.28 (RHEL 8) | Modern CI runners, AVX2 optimizations |
x86_64 |
quay.io/pypa/manylinux2014_x86_64 |
2.17 (CentOS 7) | Legacy enterprise deployments |
aarch64 |
quay.io/pypa/manylinux_2_28_aarch64 |
2.28 | ARM64 cloud instances, Graviton/Apple Silicon cross-builds |
aarch64 |
quay.io/pypa/manylinux2014_aarch64 |
2.17 | Raspberry Pi/legacy IoT stacks |
The manylinux_2_28 series is the recommended baseline for contemporary spatial wheels. It ships with dnf instead of yum, includes GCC 11+/Clang 13+ toolchains, and resolves symbol conflicts that frequently arise when linking PROJ against modern SQLite3 or libcurl. When evaluating Alpine-based alternatives for containerized deployments, maintainers should carefully weigh the trade-offs documented in Manylinux2014 vs musllinux for spatial libs before committing to a musl-based toolchain.
Deterministic Container Initialization
Reproducible geospatial builds require strict image pinning and explicit compiler flag propagation. Relying on floating tags (latest or 2_28) introduces silent ABI drift when upstream images update system libraries.
# Pin to a specific digest for CI reproducibility. Replace <digest> with the
# full 64-char digest from:
# docker buildx imagetools inspect quay.io/pypa/manylinux_2_28_x86_64
FROM quay.io/pypa/manylinux_2_28_x86_64@sha256:<digest>
# Install geospatial build dependencies
RUN dnf install -y \
gcc gcc-c++ make cmake pkgconfig \
sqlite-devel libcurl-devel zlib-devel \
libtiff-devel libjpeg-turbo-devel \
proj-devel geos-devel \
&& dnf clean all
# Architecture-aware compiler flags
# x86_64: Enable v2 baseline for broad compatibility + AVX2 safety
# aarch64: Use armv8-a baseline
ENV CFLAGS="-O2 -fPIC -march=x86-64-v2 -mtune=generic"
ENV CXXFLAGS="-O2 -fPIC -march=x86-64-v2 -mtune=generic"
# Do not bake an absolute RPATH here (and /opt/manylinux/lib does not exist in
# these images); auditwheel rewrites RPATH to $ORIGIN/.libs during repair.
Note that manylinux images already contain multiple CPython installations under /opt/python. For automated wheel generation, orchestrators like cibuildwheel handle Python isolation, but custom Dockerfiles must respect the /opt/python/*/bin/python layout if invoking build scripts directly.
Dependency Resolution for C-Extensions
GDAL, PROJ, and rasterio introduce complex transitive dependencies. Building them inside manylinux containers requires explicit sysroot awareness and careful library path management. The containerized environment isolates header resolution from the host OS, but runtime linking must still comply with the manylinux ABI specification.
Static vs Dynamic Linking Strategy
Geospatial maintainers typically adopt a hybrid approach:
- Bundle spatial core libs: PROJ, GEOS, and SQLite3 are statically linked or bundled as
.sofiles inside the wheel to avoid version skew on target systems. - Exclude system-critical libs:
libcurl,libz, andlibpthreadare excluded from bundling viaauditwheelbecause they are guaranteed by the glibc baseline and bundling them causes symbol collisions.
Build Backend Integration
Modern spatial packages should leverage scikit-build-core to bridge CMake and Python packaging. This backend natively understands manylinux constraints, generates correct DT_NEEDED entries, and respects pyproject.toml build isolation. For detailed backend configuration, see Integrating CMake with scikit-build-core.
# pyproject.toml snippet
[build-system]
requires = ["scikit-build-core>=0.9", "setuptools_scm"]
build-backend = "scikit_build_core.build"
[tool.scikit-build]
cmake.version = ">=3.26"
wheel.py-api = "cp39"
wheel.exclude = ["libcurl.so*", "libz.so*"]
Proper wheel metadata generation is critical. The wheel.py-api field dictates the ABI tag, while auditwheel later validates and rewrites platform tags. For comprehensive metadata structuring, refer to Mastering pyproject.toml for Spatial Wheels.
CI Matrix Optimization and Validation
Geospatial wheel matrices multiply quickly across Python versions, architectures, and OS baselines. Optimizing CI execution requires strategic caching, parallelization, and strict validation gates.
GitHub Actions Matrix Example
strategy:
fail-fast: false
matrix:
# Native runners build their own architecture (no QEMU needed):
# ubuntu-latest -> x86_64, ubuntu-24.04-arm -> aarch64.
os: [ubuntu-latest, ubuntu-24.04-arm]
python-version: ["3.9", "3.10", "3.11", "3.12"]
Validation Workflow
After wheel generation, every artifact must pass auditwheel validation before registry publication. The tool inspects ELF headers, verifies DT_NEEDED symbols against the manylinux allowlist, and rewrites RPATHs to point to bundled .libs directories.
# Validate wheel ABI compliance
auditwheel show dist/*.whl
# Repair and bundle allowed dependencies
auditwheel repair dist/*.whl \
--plat manylinux_2_28_x86_64 \
--lib-sdir .libs \
--exclude libcurl.so.4 \
--exclude libz.so.1 \
-w wheelhouse/
The auditwheel documentation provides the authoritative allowlist for each manylinux tag, which must be consulted when introducing new C-extensions. Always run ldd on the repaired .so files to confirm no unresolved symbols remain.
Cache and Artifact Structuring
Leverage cibuildwheel’s built-in dependency caching (CIBW_CACHE_PATH) to avoid recompiling PROJ/GDAL on every matrix run. Store intermediate .tar.gz source archives and compiled .a/.so objects in CI artifact storage. This aligns with enterprise registry workflows where wheels are staged, scanned for CVEs, and promoted to production indices.
By anchoring builds to pinned manylinux digests, enforcing strict ABI validation, and structuring wheels through modern build backends, geospatial maintainers can guarantee that spatial Python packages deploy reliably across the entire Linux ecosystem.