Managing /tmp Storage Limits for GeoTIFF Extraction
Managing /tmp storage limits for GeoTIFF extraction in serverless environments requires abandoning disk-heavy raster workflows in favor of streaming I/O, memory-mapped access, and explicit ephemeral storage configuration. GeoTIFFs routinely exceed default serverless /tmp quotas (512 MB on AWS Lambda, 2 GB on GCP Cloud Functions, 1 GB on Azure Functions). The most reliable mitigation is to avoid materializing full rasters on disk. Instead, use windowed reads via rasterio, stream tiles directly to object storage using GDAL virtual filesystems, and enforce strict /tmp usage thresholds before invocation. When disk caching is unavoidable, configure platform-specific ephemeral storage quotas and tune GDAL’s internal cache to prevent silent spills to /tmp.
Why /tmp Fails During Raster Processing
Serverless /tmp is an ephemeral, instance-local filesystem backed by RAM or NVMe. It is not shared across invocations, lacks persistence, and shares I/O bandwidth with your execution environment. GeoTIFF extraction typically triggers three /tmp pressure points:
- Full-file downloads: Naive
requests.get()orboto3.download_file()writes the entire raster to disk before processing. - GDAL cache spills: When
GDAL_CACHEMAXexceeds available RAM, GDAL silently writes temporary blocks to/tmp. - Intermediate format conversion: Reprojecting, clipping, or compressing without in-memory buffers creates large
.aux.xmlor.tmpfiles.
Understanding Ephemeral Storage Limits in AWS Lambda is critical because AWS ties /tmp capacity linearly to memory allocation, while GCP and Azure use fixed or tiered models. Across all providers, exceeding /tmp triggers OSError: [Errno 28] No space left on device during rasterio/GDAL operations, regardless of available RAM.
Platform Quotas & Provisioning
Before processing, explicitly configure your function’s ephemeral storage quota. In AWS, you can provision up to 10 GB of /tmp space, but this scales with memory allocation and directly impacts cold-start latency and cost. Review the broader Serverless Geospatial Architecture & Platform Limits to align storage provisioning with your expected raster footprint. Always set GDAL_CACHEMAX to a conservative value (e.g., 256M) to prevent silent disk spills. You can verify this via environment variables or runtime configuration.
Production Extraction Pattern
The following Python implementation extracts a spatial bounding box from a remote GeoTIFF while guaranteeing /tmp usage stays below a configurable threshold. It uses windowed reads, in-memory MemoryFile buffers, and direct cloud uploads.
import os
import io
import shutil
import rasterio
from rasterio.windows import from_bounds
from rasterio.enums import Resampling
from rasterio.io import MemoryFile
import boto3
from urllib.parse import urlparse
def _get_tmp_usage_mb() -> float:
"""Returns current /tmp usage in MB."""
stat = shutil.disk_usage("/tmp")
return (stat.total - stat.free) / (1024 ** 2)
def _build_vsi_uri(uri: str) -> str:
"""Converts S3/HTTPS URIs to GDAL Virtual File System paths."""
parsed = urlparse(uri)
if parsed.scheme == "s3":
return f"/vsis3/{parsed.netloc}{parsed.path}"
elif parsed.scheme in ("http", "https"):
return f"/vsicurl/{uri}"
return uri
def extract_geotiff_window(
src_uri: str,
dst_uri: str,
bounds: tuple[float, float, float, float],
max_tmp_mb: int = 50,
resampling: Resampling = Resampling.bilinear,
profile_overrides: dict | None = None
) -> str:
"""
Extracts a spatial window from a cloud-hosted GeoTIFF without exceeding /tmp limits.
Streams output directly to S3 via MemoryFile.
"""
if _get_tmp_usage_mb() > max_tmp_mb:
raise RuntimeError(f"/tmp usage ({_get_tmp_usage_mb():.1f}MB) exceeds threshold ({max_tmp_mb}MB)")
vsi_uri = _build_vsi_uri(src_uri)
with rasterio.open(vsi_uri) as src:
# Align bounds to source CRS before windowing
window = from_bounds(*bounds, src.transform)
window = src.window(*bounds)
# Read window into memory (no disk spill)
data = src.read(window=window, resampling=resampling)
meta = src.meta.copy()
meta.update({
"width": window.width,
"height": window.height,
"transform": rasterio.windows.transform(window, src.transform),
"compress": "deflate",
"tiled": True,
"blockxsize": 256,
"blockysize": 256,
**(profile_overrides or {})
})
# Write to in-memory buffer to avoid disk I/O
with MemoryFile() as memfile:
with memfile.open(**meta) as dst:
dst.write(data)
# Upload directly to S3
s3 = boto3.client("s3")
parsed_dst = urlparse(dst_uri)
s3.put_object(
Bucket=parsed_dst.netloc,
Key=parsed_dst.path.lstrip("/"),
Body=memfile.read()
)
return dst_uri
Why This Pattern Works
- Zero-disk materialization:
rasterio.io.MemoryFilekeeps the extracted tile in RAM until upload, bypassing/tmpentirely. - GDAL VSI routing:
/vsis3/and/vsicurl/stream data directly from cloud storage without intermediate downloads. - Safety guardrails:
_get_tmp_usage_mb()halts execution before cache spills or concurrent invocations exhaust the partition.
GDAL Runtime Tuning & Cache Control
GDAL’s default behavior aggressively caches raster blocks. In constrained environments, this causes unpredictable /tmp consumption. Apply these environment variables at function initialization:
export GDAL_CACHEMAX=256M
export GDAL_DISABLE_READDIR_ON_OPEN=TRUE
export VSI_CURL_CACHE_SIZE=100M
GDAL_CACHEMAX caps in-memory block storage, while VSI_CURL_CACHE_SIZE limits HTTP range-request buffering. For deeper configuration guidance, consult the official GDAL Virtual File Systems documentation. Pair these settings with rasterio’s blockxsize/blockysize overrides to ensure output tiles align with cloud storage optimal read patterns.
Monitoring & Threshold Enforcement
Serverless runtimes recycle containers unpredictably. Relying on static /tmp assumptions leads to cascading failures during peak load. Implement these safeguards:
- Pre-invocation checks: Validate available ephemeral space before opening any raster. Fail fast with
RuntimeErrorrather than allowing partial writes. - Structured logging: Emit
/tmpusage metrics alongside function duration and memory consumption. TrackOSError: [Errno 28]occurrences as high-severity alerts. - Graceful degradation: If a bounding box exceeds memory limits, split the request into smaller tiles using a quadtree or fixed-grid strategy. Queue oversized requests for batch processing rather than blocking synchronous invocations.
Proactive threshold management prevents silent data corruption and ensures predictable cold-start behavior. When designing geospatial pipelines, always treat /tmp as a volatile scratch space, not a staging area.