At the moment it has to detect and track stream hash collisions, but it would be possible to track these more easily in the database itself (without taking additional space) and remove the significant bulk of scavenge.db. It may also make the accumulation phase of the scavenge faster.