ddupes
Differenze
Queste sono le differenze tra la revisione selezionata e la versione attuale della pagina.
Entrambe le parti precedenti la revisioneRevisione precedenteProssima revisione | Revisione precedente | ||
ddupes [2011/10/11 22:46] – 2.3 pietro | ddupes [2017/01/20 19:10] (versione attuale) – [What is this?] true edit pietro | ||
---|---|---|---|
Linea 5: | Linea 5: | ||
===== What is this? ===== | ===== What is this? ===== | ||
+ | |||
**ddupes** is a python program which extends fdupes action to directories. | **ddupes** is a python program which extends fdupes action to directories. | ||
**ffdupes** (//" | **ffdupes** (//" | ||
+ | |||
+ | **Update:** at the time of writing this page, I ignored the existence of //many// other command line tools to find duplicate files: for instance, in Debian you can find not only fdupes, but also rdfind, hardlink, finddup, duff. I //totally ignore// how they compare to ffdupes: it is reasonable that they outperform it. I didn't find, instead, any replacement for ddupes. Notice the different tools are not compatible as interface (arguments and output), so ddupes is not able to use their output. | ||
fdupes/ | fdupes/ | ||
Linea 28: | Linea 31: | ||
necessarily read //all// files it must compare: instead, it first tries to | necessarily read //all// files it must compare: instead, it first tries to | ||
compare the heads, and reads the rest only if they match. | compare the heads, and reads the rest only if they match. | ||
+ | |||
+ | A test of larger size (thanks, Florian Bruhin!), ran with 2.5 TB of data, in | ||
+ | ~727 000 files, gave the following results: | ||
+ | * fdupes: | ||
+ | * ffdupes: 4 Hours 19 Minutes | ||
+ | * ddupes: | ||
That said, in the worst case in which there are many files which are almost | That said, in the worst case in which there are many files which are almost | ||
Linea 35: | Linea 44: | ||
If ffdupes is used with the " | If ffdupes is used with the " | ||
- | run statistically | + | run slower |
run faster than fdupes in //all// cases). | run faster than fdupes in //all// cases). | ||
Linea 68: | Linea 77: | ||
of members) groups of duplicates, which reside in directories which are very | of members) groups of duplicates, which reside in directories which are very | ||
similar but not identical. This should be a quite remote eventuality, | similar but not identical. This should be a quite remote eventuality, | ||
- | do find some patologic | + | do find some pathological |
===== Who should I blame if this sucks? ===== | ===== Who should I blame if this sucks? ===== | ||
Linea 74: | Linea 83: | ||
Pietro Battiston - < | Pietro Battiston - < | ||
- | Last version of ddupes can always be found at | + | Last version of ddupes can always be found at http:// |
- | http:// | + | The source repo can be obtained with |
+ | git clone git:// | ||
+ | and browsed at http:// | ||
===== Requirements ===== | ===== Requirements ===== | ||
ddupes and ffdupes are written in Python, so you need python to run them. | ddupes and ffdupes are written in Python, so you need python to run them. |
ddupes.1318366012.txt.gz · Ultima modifica: 2011/10/11 22:46 da pietro