2dbi
Home/Dropbox/Find Duplicate Files in a System
DDropbox·DSASDE-2Technical Phone Screen

Find Duplicate Files in a System

Problem

Given a list of directory paths with file contents, group all files that have identical content.

Example

group files by content hash -> lists of duplicate paths

Constraints

  • Many files; content can be large

Approach

Hash content -> group paths. Hugely relevant to Dropbox's dedup; reported question.

added 6 days ago
LeadersAccount