🚀 Overview
Spatial Mask Merging (SMM) is a production-ready Python library for merging fragmented instance segmentation predictions using correlation clustering with R-tree spatial indexing. Designed for high-resolution imagery (aerial, satellite, medical), SMM delivers enterprise-grade performance with rigorous optimization and comprehensive testing.
📊 Research Impact: Our paper has attracted 655 views and 80 downloads since publication, demonstrating growing academic and industry interest in spatial mask merging techniques. Read the paper →
🎯 Production Status: Version 1.0.1 delivers enhanced maintainability through code refactoring while maintaining full backward compatibility. All tools now share GPU evaluation logic, eliminating ~500 lines of duplicate code and ensuring consistent behavior across the codebase.
🆕 v1.0.1 Improvements: Refactored tools module with shared gpu_evaluation.py module (403 lines). Reduced evaluation.py by 55% and optimize_smm.py by 41% through DRY architecture. Zero breaking changes, all tests passing. View release notes →
📺 Video Tutorial & Podcast
Video Explanation
Gigapixel Vision's Fatal Flaw: Learn how Spatial Mask Merging with R-tree indexing solves the fundamental problem of instance segmentation in high-resolution images. Covers the algorithm, performance optimizations, and real-world applications.
🎙️ Podcast Episode
Deep dive into the research: Detailed explanation of the algorithm, optimization strategies, GPU acceleration, and applications in aerial, satellite, and medical imaging.
Listen on: Spotify Podcasts
📑 Contents
✨ Key Features
- Optimized Core Algorithm: 50-200% speedup through graph optimization, vectorized IoU computation, and early termination strategies
- R-tree Spatial Indexing: Sub-linear query performance with automatic optimization detection and pure-Python fallback
- Dual Solving Backends: Exact ILP solver (PuLP) or fast greedy approximation for scalability trade-offs
- Anti-Chaining Constraint: Prevents inconsistent merges via pairwise compatibility enforcement
- GPU-Accelerated Evaluation: CUDA-enabled batch evaluation with automatic CPU fallback
- Shared Evaluation Module (v1.0.1): DRY architecture with centralized GPU evaluation logic (~500 lines eliminated)
- Hyperparameter Optimization: Bayesian tuning via Optuna with importance analysis and trial logging
- Comprehensive Validation: Input validation, type hints, error handling, and batch verification methods
- Rich Visualization: PDF rendering with polygon overlays, class labels, and confidence scores
- Clean Public API: Zero namespace pollution, semantic versioning, fully documented interfaces
- Enterprise Testing: 35+ unit tests covering edge cases, performance, and integration scenarios
⚡ Performance Characteristics
Core Algorithm Optimizations
| Component | Optimization | Speedup |
|---|---|---|
| Graph Construction | Triangle inequality pre-filtering (O(N³) → O(E·N)) | 50-100% |
| IoU Computation | Vectorized intersection/union with early exit | 2-3× |
| Edge Weight Calculation | Batch distance computation with NumPy broadcasting | 100-200% |
| R-tree Queries | Bulk insertion, optimized bounding box checks | 5-10% |
| Mask Operations | Optimized bbox-to-mask rasterization | 10-15% |
Memory & Scalability
- Sparse Graph Representation: Only stores edges above similarity threshold
- Spatial Pruning: R-tree limits candidate pairs to local neighborhoods
- Downscaling Support: Configurable resolution reduction for large-scale evaluation
- GPU Memory Management: Automatic batch sizing and device allocation
🏆 Quality Assurance
Code Quality Standards: Production-grade codebase with comprehensive testing, validation, and documentation.
Testing Coverage
- ✅ 35+ Unit Tests: Core algorithm, data structures, R-tree utilities, validation logic
- ✅ Edge Case Handling: Empty inputs, invalid dimensions, malformed data, extreme values
- ✅ Integration Tests: Cross-module compatibility, end-to-end workflows
- ✅ Performance Benchmarks: Regression detection, optimization validation
Code Standards
- ✅ 100% Type Hints: Complete type annotations for IDE support and static analysis
- ✅ Input Validation: Comprehensive checks with actionable error messages
- ✅ API Stability: Semantic versioning, backward compatibility guarantees
- ✅ Clean Namespace: No internal module pollution in public API
- ✅ Documentation: Docstrings, algorithm overviews, usage examples
Verification
All production claims verified via automated test suite. Run verification script:
python tests/verify_readme_claims.py
📦 Repository Structure
spatial-mask-merging/
├── smm/ # Core library (production-ready)
│ ├── __init__.py # Clean public API (v1.0.0)
│ ├── smm.py # Optimized SMM algorithm (ILP/Greedy)
│ ├── predictions.py # SMMPrediction/SMMAnnotation containers
│ └── rtree_utils.py # R-tree spatial indexing with fallback
│
├── tools/ # Production utilities
│ ├── gpu_evaluation.py # Shared GPU evaluation module (v1.0.1)
│ ├── optimize_smm.py # Optuna hyperparameter tuning
│ ├── evaluation.py # GPU-accelerated batch evaluation
│ └── visualization.py # PDF rendering with overlays
│
├── tests/ # Comprehensive test suite (35+ tests)
│ ├── test_smm_*.py # Core algorithm tests
│ ├── test_predictions_*.py # Data structure tests
│ ├── test_rtree_*.py # Spatial indexing tests
│ ├── test_init_*.py # API tests
│ ├── test_refactored_tools.py # Refactoring validation (v1.0.1)
│ └── verify_readme_claims.py # Production readiness validation
│
├── docs/ # Documentation
│ ├── algorithm_overview.md # Mathematical foundation
│ ├── changelog.md # Version history
│ └── citation.bib # Academic reference
│
├── examples/ # Usage examples and notebooks
├── requirements.txt # Python dependencies
├── setup.py # Package installation
├── README.md # Comprehensive documentation
└── index.html # Project website (this file)
🧮 Algorithm Overview
Spatial Mask Merging (SMM) formulates instance merging as a correlation clustering problem on a weighted spatial graph. Unlike greedy heuristics, SMM globally optimizes partition quality by balancing merge vs. separation decisions.
Core Pipeline
- Graph Construction: Each prediction is a vertex. Edges connect spatial neighbors (R-tree search radius
ρ) with weights encoding:- Spatial proximity: normalized by distance threshold
τd - Mask overlap (IoU): normalized by
τi - Confidence consistency: uses detection scores
si, sj
Edge weight formula:
w_ij = β₁ · max(0, 1 - D_ij/τ_d) + β₂ · I_ij + β₃ · min(s_i, s_j) - Spatial proximity: normalized by distance threshold
- Spatial Pruning: R-tree retrieves candidate neighbors within radius
ρfor scalability (O(log N) queries) - Triangle Inequality Filtering: Pre-compute transitive bounds to skip redundant edge evaluations (O(N³) → O(E·N))
- Correlation Clustering: Partition graph to maximize within-cluster edge weight minus separation penalty
λ:
Solved exactly via ILP (PuLP) or approximately via greedy pivotingmax Σ w_ij · [i,j in same cluster] - λ · Σ [i,j in different clusters] - Anti-Chaining Constraint: Threshold
γenforces pairwise compatibility within clusters (prevents indirect merges) - Mask Fusion: Merge function
Φ(A)combines cluster masks via pixel-wise OR, bbox union, and score aggregation
Mathematical Properties
- NP-Hardness: General correlation clustering is NP-complete; ILP provides optimal solution
- Approximation Guarantee: Greedy backend achieves O(log N) approximation ratio
- Idempotence: Single-object clusters remain unchanged:
Φ({o}) = o - Spatial Consistency: Only spatially adjacent objects with high similarity merge
⚙️ Algorithm Parameters
SMM behavior is controlled by spatial thresholds (candidate generation) and clustering parameters (merge resolution). Optimal values are domain-specific—use tools/optimize_smm.py for automated tuning.
Hyperparameter Reference
| Parameter | Description | Typical Range | Impact |
|---|---|---|---|
| τd | Distance scale (pixels) for spatial proximity normalization | 5–30 | Higher → merge distant objects |
| τi | IoU normalization threshold for overlap contribution | 0.1–0.9 | Lower → require less overlap |
| ρ | R-tree search radius (pixels) for candidate edge generation | 10–50 | Larger → more edges, slower |
| β₁ | Weight of distance term in edge score | 0.2–0.4 | Emphasizes spatial proximity |
| β₂ | Weight of IoU term in edge score | 0.4–0.6 | Emphasizes mask overlap |
| β₃ | Weight of confidence term in edge score | 0.1–0.3 | Emphasizes detection confidence |
| λ | Correlation clustering penalty (merge vs. separate trade-off) | 0.1–2.0 | Higher → fewer, larger clusters |
| γ | Pairwise compatibility threshold for anti-chaining | 0.3–0.7 | Higher → stricter merging |
Note: Weights β₁, β₂, β₃ should sum to ~1.0 for interpretability. Use tools/optimize_smm.py with Optuna for Bayesian hyperparameter optimization (20-50 trials recommended).
🔧 Installation
Standard Installation
# Clone repository
git clone https://github.com/mihajlov39547/spatial-mask-merging.git
cd spatial-mask-merging
# (Recommended) Create virtual environment
python3 -m venv .venv
source .venv/bin/activate # Linux/Mac
# or
.venv\Scripts\activate # Windows
# Install dependencies
pip install -r requirements.txt
# Install package in editable mode
pip install -e .
Optional Dependencies
# GPU-accelerated evaluation (CUDA required)
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118
# Exact ILP solver (requires system libspatialindex)
pip install rtree pulp
# Hyperparameter optimization
pip install optuna
# Visualization tools
pip install matplotlib opencv-python-headless
Minimal Requirements
- Python 3.8+
- NumPy, SciPy, NetworkX (graph algorithms)
- Pillow (image I/O)
- Pandas (data handling)
🚀 Usage Examples
1. Core SMM API
Merge predictions programmatically:
from smm import SpatialMaskMerging, SMMPrediction
import numpy as np
# Load your model predictions (boolean masks)
masks = [...] # List of 2D boolean arrays
scores = [0.91, 0.88, 0.85]
labels = ["car", "car", "car"]
# Create prediction containers
preds = [
SMMPrediction(mask=mask, score=score, label=label)
for mask, score, label in zip(masks, scores, labels)
]
# Configure SMM (ILP for exactness, or "greedy" for speed)
smm = SpatialMaskMerging(
mode="ilp", # Exact solver
iou_weight=1.0, # β₂
dist_weight=0.5, # β₁
conf_weight=0.2, # β₃
similarity_threshold=0.4, # λ
distance_threshold=15, # τ_d
iou_threshold=0.3, # τ_i
search_radius=25, # ρ
gamma=0.5 # anti-chaining
)
# Execute merging
merged = smm.merge(preds)
# Access results
for obj in merged:
print(f"Label: {obj.label}, Score: {obj.score:.2f}, Mask shape: {obj.mask.shape}")
2. Hyperparameter Optimization
Tune SMM parameters using Optuna (Bayesian optimization):
# Optimize on validation dataset (predictions + ground truth JSONs)
python tools/optimize_smm.py \
--pred_dir /data/predictions \
--gt_dir /data/ground_truth \
--img_dir /data/images \
--out_dir ./optimization_results \
--mode ilp \
--trials 50 \
--timeout 3600
Outputs:
best_params_ilp.json— Optimal hyperparameters (includes mode in filename)smm_ilp_hparam_importance.json/.pdf— Parameter sensitivity analysissmm_ilp_optuna_trials.csv— Trial history with metrics and timing
Tip: Start with 20-30 trials for quick exploration, then refine with 50+ trials. Use --timeout to limit per-trial execution time.
3. GPU-Accelerated Batch Evaluation
Evaluate merged predictions against ground truth with CUDA acceleration:
# Automatic GPU detection (falls back to CPU if unavailable)
python tools/evaluation.py \
--pred_dir /data/merged_predictions \
--gt_dir /data/ground_truth \
--img_dir /data/images \
--out_csv ./results/metrics_ilp.csv \
--iou_thr 0.5 \
--downscale 4
Metrics Computed:
- Precision, Recall, F1 Score (detection quality)
- Dice Coefficient (mask overlap)
- Panoptic Quality (PQ) — unified segmentation metric
- Average Fragments per GT Object (fragmentation assessment)
- Count Error (prediction count vs. ground truth)
- Mean Absolute Error (GPU-only, per-pixel accuracy)
Performance: GPU evaluation is 10-50× faster than CPU for large datasets. Use --downscale (2-8) to reduce memory usage for very high-resolution images.
4. Visualization
Render masks as PDF overlays on source images:
# Visualize predictions
python tools/visualization.py \
--pred_dir /data/predictions \
--image_dir /data/images \
--output_dir ./visualizations
# Visualize ground truth
python tools/visualization.py \
--gt_dir /data/ground_truth \
--image_dir /data/images \
--output_dir ./gt_visualizations
Generates <image>_pred_visualization.pdf or _gt_visualization.pdf with polygon overlays, class labels, and confidence scores. Supports both JSON predictions and TXT label formats.
🛠 Tools & Utilities
gpu_evaluation.py — Shared GPU Module (v1.0.1)
- Purpose: Centralized GPU evaluation utilities for all tools (DRY architecture)
- Functions: compute_metrics_gpu(), compute_metrics_cpu(), load_mask_from_segmentation()
- Features: Adaptive VRAM chunking, vectorized mask IoU, automatic GPU/CPU fallback
- Constants: CHUNK_P=1024, CHUNK_G=256, IOU_THRESHOLD=0.5, DOWNSCALE_FACTOR=4
- Benefits: Single source of truth, consistent metrics, easier maintenance
optimize_smm.py — Hyperparameter Tuning
- Backend: Optuna (Tree-structured Parzen Estimator)
- Search Space: All 8 SMM parameters (τd, τi, ρ, β₁, β₂, β₃, λ, γ)
- Objective: Maximize F1 score on validation set
- Outputs: Best params JSON, importance analysis (JSON/PDF), trial history CSV
- Features: Parallel trials, timeout handling, mode-specific naming
- Architecture (v1.0.1): Uses shared gpu_evaluation module (361 lines, -41% code)
evaluation.py — Batch Evaluation
- GPU Acceleration: CUDA kernel for mask IoU and metrics (via PyTorch)
- CPU Fallback: Automatic NumPy implementation if CUDA unavailable
- Metrics: Precision, Recall, F1, Dice, PQ, fragment count, count error, MAE
- Scalability: Downscaling support for large images, batch processing
- Output: Per-image CSV with all metrics and timing statistics
- Architecture (v1.0.1): Uses shared gpu_evaluation module (207 lines, -55% code)
visualization.py — Rendering
- Format Support: JSON predictions, TXT ground truth labels
- Rendering: Polygon overlays with matplotlib, PDF export
- Annotation: Class labels, confidence scores, instance IDs
- Consistency: Shared colormap across datasets, high-DPI output
📘 Documentation
- README.md — Comprehensive project documentation with quick start, API reference, and examples
- algorithm_overview.md — Mathematical foundation, graph construction, correlation clustering theory
- changelog.md — Version history, feature additions, optimization milestones
- citation.bib — BibTeX reference for academic citation
Additional Resources
- GitHub Wiki: Extended tutorials and FAQ
- Issue Tracker: Bug reports and feature requests
- Releases: Version downloads and release notes
📄 Citation
📈 Impact Metrics: 655 views · 80 downloads · Growing academic and industry adoption
If you use SMM in your research, please cite our paper:
@article{mihajlovic2025enhancing,
title={Enhancing Instance Segmentation in High-Resolution Images Using
Slicing-Aided Hyper Inference and Spatial Mask Merging
Optimized via R-Tree Indexing},
author={Mihajlovic, Marko and Marjanovic, Marina},
journal={Mathematics},
volume={13},
number={19},
pages={3079},
year={2025},
publisher={MDPI},
doi={10.3390/math13193079},
note={Special Issue: Mathematics Applications of Artificial Intelligence
and Computer Vision}
}
Paper: https://doi.org/10.3390/math13193079
Published: Mathematics, Vol. 13, Issue 19, 2025 (MDPI)
📜 License
MIT License — Copyright © 2025 Marko Mihajlović, with contributions from Marina Marjanović
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
🙏 Acknowledgments
This research was conducted at the Faculty of Informatics and Computing, Singidunum University, Belgrade, Serbia, as part of developing advanced post-processing techniques for high-resolution aerial and satellite imagery segmentation.
Special thanks to the open-source community for foundational libraries: NumPy, SciPy, NetworkX, PuLP, Rtree, Optuna, and PyTorch.