Skip to content

Stripping Unnecessary Python Packages from AWS Lambda Layers

Stripping unnecessary Python packages from AWS Lambda layers requires auditing your runtime dependency tree, isolating geospatial binaries, removing non-runtime artifacts, and rebuilding inside a Docker container that exactly matches your target Lambda runtime. For GIS workloads, explicitly exclude heavy sub-packages like matplotlib backends, scipy optional modules, and fiona test suites unless your handler explicitly imports them.

Why Geospatial Layers Exceed Limits

Serverless GIS stacks inherently pull in C-compiled libraries. A standard geopandas or rasterio installation drags in shapely, pyproj, fiona, numpy, and often scipy. Each brings precompiled wheels, static libraries, locale data, and PROJ/GDAL configuration files. AWS enforces a strict 250 MB unzipped limit per layer and a 500 MB total deployment package limit. Exceeding these triggers DeploymentPackageSizeLimitExceeded errors and inflates cold-start latency.

Effective Python Layer Management and Size Reduction requires surgical pruning, not just pip install --no-deps. Blind installations leave behind debug symbols, documentation, test fixtures, and .dist-info metadata that consume megabytes without contributing to runtime execution.

Automated Pruning Workflow

  1. Map the Dependency Tree: Run pip install pipdeptree && pipdeptree --json to separate direct imports from transitive bloat. The official pipdeptree documentation explains how to filter by package name to identify orphaned dependencies.
  2. Verify Runtime Usage: Grep your handler code for actual import statements. GIS pipelines frequently install folium, geoplot, or contextily for local debugging but never invoke them in production. If a package isn’t imported during invocation, it doesn’t belong in the layer.
  3. Prune Non-Essential Artifacts: Delete tests/, docs/, examples/, *.pyc, and __pycache__/. Remove .dist-info/ directories unless the package relies on pkg_resources or importlib.metadata at runtime (common in pyproj and rasterio).
  4. Strip Shared Libraries: Run find . -name "*.so" -exec strip --strip-unneeded {} + to remove debug symbols from compiled extensions. This typically shaves 15–30% off geospatial wheels without altering runtime behavior.
  5. Repackage in Target Environment: Always build inside public.ecr.aws/sam/build-python3.11 (or your exact runtime) to avoid glibc mismatches. Cross-compiling on macOS or Windows will produce incompatible binaries that fail at invocation.

Production-Ready Pruning Script

The following script safely prunes a pre-installed layer directory. It preserves required .dist-info for GIS libraries that rely on runtime metadata inspection, removes test/docs directories, strips compiled bytecode, and safely invokes the Linux strip utility on shared objects.

python
#!/usr/bin/env python3
"""prune_lambda_layer.py - Safely strip non-runtime artifacts from a Lambda layer."""
import os
import sys
import shutil
import subprocess
from pathlib import Path

LAYER_DIR = Path(sys.argv[1]) if len(sys.argv) > 1 else Path("python/lib/python3.11/site-packages")

# Packages that require .dist-info at runtime (GIS-heavy)
KEEP_DIST_INFO = {"pyproj", "rasterio", "fiona", "shapely", "gdal", "numpy"}

REMOVE_DIRS = {"tests", "test", "docs", "examples", "benchmarks"}
REMOVE_SUFFIXES = {".pyc", ".pyo"}
REMOVE_NAMES = {"__pycache__"}

def prune_directory(root: Path) -> None:
    """Walk the layer directory and remove non-runtime artifacts."""
    for item in root.rglob("*"):
        if item.is_dir():
            # Remove test/doc/example trees
            if item.name.lower() in REMOVE_DIRS:
                shutil.rmtree(item, ignore_errors=True)
                continue
            # Remove __pycache__
            if item.name in REMOVE_NAMES:
                shutil.rmtree(item, ignore_errors=True)
                continue
            # Conditionally remove .dist-info
            if item.name.endswith(".dist-info"):
                pkg_name = item.name.split("-")[0].lower()
                if pkg_name not in KEEP_DIST_INFO:
                    shutil.rmtree(item, ignore_errors=True)
        elif item.is_file():
            # Remove compiled bytecode
            if item.suffix in REMOVE_SUFFIXES:
                item.unlink(missing_ok=True)
            # Strip debug symbols from shared libraries
            elif item.suffix == ".so":
                try:
                    subprocess.run(
                        ["strip", "--strip-unneeded", str(item)],
                        check=True, capture_output=True
                    )
                except subprocess.CalledProcessError:
                    print(f"[WARN] Failed to strip {item.name}")

if __name__ == "__main__":
    if not LAYER_DIR.exists():
        print(f"Error: Directory {LAYER_DIR} does not exist.")
        sys.exit(1)
    
    print(f"Pruning {LAYER_DIR}...")
    prune_directory(LAYER_DIR)
    print("Pruning complete. Verify layer size before packaging.")

Run the script against your extracted layer: python prune_lambda_layer.py ./layer/python/lib/python3.11/site-packages. Always verify the final unzipped size before zipping and uploading.

Validation & Deployment Checklist

After pruning, validate your layer against AWS constraints and runtime expectations:

  • Size Verification: du -sh ./layer should report ≤250 MB. Use zip -r layer.zip ./layer and verify the archive size stays within the 50 MB zipped upload limit (or use S3 for larger payloads).
  • Import Smoke Test: Spin up a local Docker container matching your runtime (docker run --rm -v $(pwd)/layer:/opt python:3.11-slim python -c "import geopandas; print('OK')"). Missing .dist-info or stripped symbols will surface immediately.
  • Cold Start Baseline: Measure initialization time before and after pruning. Removing unused modules reduces import overhead and improves memory allocation during the first invocation.
  • CI/CD Integration: Embed the pruning script in your build pipeline. Reference the broader Packaging & Dependency Management for Serverless GIS guidelines to standardize artifact generation across teams.

For authoritative size constraints and deployment limits, consult the official AWS Lambda quotas documentation. Automating this workflow ensures your geospatial functions remain lean, compliant, and optimized for production traffic.