DDropbox·DSASDE-2Technical Phone Screen
Find Duplicate Files in a System
Problem
Given a list of directory paths with file contents, group all files that have identical content.
Example
group files by content hash -> lists of duplicate paths
Constraints
- Many files; content can be large
Approach
Hash content -> group paths. Hugely relevant to Dropbox's dedup; reported question.
added 6 days ago