imgdd: Image DeDuplication¶
imgdd is a performance-first perceptual hashing library that combines Rust's speed with Python's accessibility, making it perfect for handling large datasets. Designed to quickly process nested folder structures, commonly found in image datasets.
Features¶
- Multiple Hashing Algorithms: Supports
aHash,dHash,mHash,pHash,wHash. - Multiple Filter Types: Supports
Nearest,Triangle,CatmullRom,Gaussian,Lanczos3. - Identify Duplicates: Quickly identify duplicate hash pairs.
- Simplicity: Simple interface, robust performance.
Why imgdd?¶
imgdd has been inspired by imagehash and aims to be a lightning-fast replacement with additional features. To ensure enhanced performance, imgdd has been benchmarked against imagehash. In Python, imgdd consistently outperforms imagehash by ~60%–95%, demonstrating a significant reduction in hashing time per image.