Vendoring PROJ and GDAL vs System Libraries

The decision to vendor PROJ and GDAL into Python geospatial wheels versus linking against system-installed libraries defines the reliability, portability, and maintenance overhead of your distribution pipeline. Within the broader scope of Geospatial C-Extension Fundamentals & ABI Architecture, this architectural choice directly impacts binary reproducibility, CI matrix complexity, and deployment security posture. For package maintainers and DevOps engineers, the trade-off is rarely binary; it requires precise configuration of the build toolchain, explicit dependency resolution strategies, and careful CI matrix optimization.

Linkage Models and ABI Surface

System library linking delegates binary resolution to the host OS package manager (apt, dnf, brew, or conda). This approach minimizes wheel size but introduces ABI drift across environments, particularly when downstream users upgrade their base container images without rebuilding extensions. Vendoring, conversely, bundles compiled PROJ and GDAL artifacts directly into the wheel, typically via auditwheel repair or static linking during the cibuildwheel phase. While vendoring guarantees runtime consistency, it introduces significant overhead in Memory Management in Geospatial Extensions due to duplicated symbol tables, static allocation patterns, and isolated heap pools.

Furthermore, the linkage model dictates how you handle C-API vs CPython ABI Compatibility across minor Python releases. System libraries often lag behind CPython’s stable ABI guarantees and may require recompilation for each cp3X tag. Geospatial bindings rarely utilize Py_LIMITED_API because they rely heavily on the full C-API for buffer protocols, NumPy array views, and GIL-managed thread safety. Consequently, vendored wheels must still be built per minor Python version, but the native dependency tree remains frozen at compile time, eliminating runtime dlopen failures caused by host OS library upgrades.

Production Build Configuration

To implement a robust vendoring pipeline, you must configure cibuildwheel with explicit environment variables, isolated build prefixes, and deterministic auditwheel policies. The following pyproject.toml configuration demonstrates a production-ready setup for vendoring PROJ 9.x and GDAL 3.8.x on manylinux_2_28_x86_64. This configuration aligns with the manylinux specification and leverages isolated build environments to prevent host leakage.

[build-system]
requires = ["setuptools>=68.0", "wheel", "Cython>=3.0"]
build-backend = "setuptools.build_meta"

[tool.cibuildwheel]
build = "cp39-* cp310-* cp311-* cp312-*"
skip = "*-musllinux_*"
archs = ["x86_64", "aarch64"]
test-command = "python -c \"import your_geospatial_pkg; print(your_geospatial_pkg.__version__)\""

[tool.cibuildwheel.linux]
before-all = """
  set -e
  yum install -y epel-release gcc-c++ cmake make sqlite-devel libtiff-devel libcurl-devel zlib-devel
  export VENDOR_PREFIX=/opt/vendor

  # Build PROJ
  curl -sL https://download.osgeo.org/proj/proj-9.4.0.tar.gz | tar xz
  cd proj-9.4.0 && mkdir build && cd build
  cmake -DCMAKE_BUILD_TYPE=Release \
        -DCMAKE_INSTALL_PREFIX=$VENDOR_PREFIX \
        -DBUILD_SHARED_LIBS=ON \
        -DBUILD_TESTING=OFF \
        -DENABLE_TIFF=ON \
        -DENABLE_CURL=ON ..
  make -j$(nproc) && make install
  cd ../..

  # Build GDAL
  curl -sL https://download.osgeo.org/gdal/3.8.4/gdal-3.8.4.tar.gz | tar xz
  cd gdal-3.8.4 && mkdir build && cd build
  cmake -DCMAKE_BUILD_TYPE=Release \
        -DCMAKE_INSTALL_PREFIX=$VENDOR_PREFIX \
        -DBUILD_SHARED_LIBS=ON \
        -DGDAL_USE_EXTERNAL_LIBS=ON \
        -DPROJ_INCLUDE_DIR=$VENDOR_PREFIX/include \
        -DPROJ_LIBRARY=$VENDOR_PREFIX/lib/libproj.so ..
  make -j$(nproc) && make install
  cd ../..

  # Update runtime linker cache for the build phase
  ldconfig $VENDOR_PREFIX/lib
"""

environment = { PKG_CONFIG_PATH="/opt/vendor/lib/pkgconfig", GDAL_CONFIG="/opt/vendor/bin/gdal-config", PROJ_LIB="/opt/vendor/share/proj", LD_LIBRARY_PATH="/opt/vendor/lib" }
repair-wheel-command = "auditwheel repair --plat manylinux_2_28_$AUDITWHEEL_ARCH -w {dest_dir} {wheel}"

Key configuration notes:

  • Isolated Prefix: Installing both libraries to /opt/vendor prevents collision with system packages and gives auditwheel a clean directory tree to scan.
  • Shared vs Static: -DBUILD_SHARED_LIBS=ON is preferred over static linking for geospatial wheels. Static linking PROJ/GDAL into a single .so often triggers duplicate symbol resolution errors during dlopen and inflates the binary size without improving runtime performance.
  • Repair Command: auditwheel repair automatically copies vendored .so files into the wheel’s .libs directory and patches RPATH to $ORIGIN/.libs, ensuring the extension resolves dependencies internally at runtime.

Runtime Overhead and Memory Topology

Vendoring introduces measurable runtime trade-offs. When PROJ and GDAL are bundled, each Python process loads its own copy of the native libraries into memory. This prevents symbol collision but creates isolated heap pools. The Python interpreter’s memory allocator (pymalloc) does not manage allocations made by vendored C libraries unless explicitly wrapped, leading to fragmented memory profiles under heavy raster/vector processing loads. For a detailed breakdown of allocation strategies and GIL interaction patterns, consult Why vendoring PROJ causes wheel bloat.

Additionally, vendored wheels bypass OS-level security patching. If a critical vulnerability is discovered in libtiff or sqlite (both transitively linked by GDAL), you must rebuild and republish the wheel rather than relying on the host OS package manager. This shifts the security posture from reactive OS updates to proactive CI pipeline maintenance.

CI Matrix and Dependency Resolution

Optimizing the build matrix requires strategic caching and cross-architecture awareness. The cibuildwheel framework supports Docker layer caching, but vendoring PROJ and GDAL from source typically invalidates caches due to network fetches and compilation steps. To accelerate pipelines:

  1. Pre-compile Vendor Artifacts: Build PROJ/GDAL in a separate CI job, upload the compiled /opt/vendor directory to a private artifact registry, and mount it during the wheel build phase.
  2. Cross-Compilation: For aarch64 builds on x86_64 runners, configure QEMU emulation or use native ARM runners. CMake’s CMAKE_CROSSCOMPILING flag must be explicitly disabled unless using a dedicated cross-toolchain, as PROJ/GDAL’s auto-detection routines often fail in emulated environments.
  3. Auditwheel Policy Enforcement: Always pin the manylinux tag to match your target deployment environment. Using manylinux_2_28 (AlmaLinux 8 baseline) provides a modern glibc floor while maintaining broad compatibility. Refer to the official cibuildwheel documentation for platform-specific environment variable mappings and troubleshooting guides.

By treating vendoring as a deterministic build artifact rather than a convenience, teams achieve reproducible geospatial deployments that scale across heterogeneous cloud environments without sacrificing the performance guarantees of native C-extensions.