rmlint-2.0.0 - a lint/duplicate finder [rewrite of old rmlint, testers wanted]
Linux - GeneralThis Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
rmlint finds space waste and other broken things on your file system and offers to remove it.
It is especially good at finding duplicates in your files and getting rid of
them. You might argue that most of this can be done via few lines of bash, but
can do you do it fast too? And how do you get rid of the results?
rmlint has been completely rewritten in the meantime.
Roughly 800+ commits are included in this release which resulted in:
Much cleaner and extensible code with less bugs.
More tests and less nonsense features ([em]--junkchars, --oldtmp, posix regex![/em]).
Speedups regularly between 2x and 8x of the original speed or even more.
Saner, more unix-ish commandline interface.
Okay, that was vague - Sorry. Concrete features are:
Exchangeable hashsum-algorithms. (cryptographic and non-cryptographic)
Ability to find duplicate directories. (experimental)
Filter duplicates by basename, file extension or only files newer than a certain mtime.
Localization support (help needed!)
More output formats. (shell/python script, json/csv dump, a progressbar...)
Support for reading files from stdin. (using "-" as file)
More options to guess the original in a set of duplicates. (--sortcriteria option)
…
With this said: The new version is not compatible to the old one. Do not assume it works with the same options!
But it should be noted that the new version does not ever delete files itself, but gives you weapons to do so.
ANY HELP NEEDED?
It's still fresh software that needs packagers, translators, bugfixers and mostly testers.
People that want to port rmlint to other platforms (OSX, BSD*) are welcome too of course.
In any case, GitHub is where the action should happen.
If you know a little Python, adding a testcase to our testsuite along with your bugreport would be great.
At this point: a big thanks to my co-author SeeSpotRun which made this happen.
tnx for creating this tool. I just installed it on Linuxmint. Is there a discussion forum for rmlint? I have some usage questions.
If not, maybe you can just answer my question:
What do I have to do, to have about the same functionallity like "fdupes -rL" (replace identical files with hardlinks to first occuarance)?
Current situation: I have quite a lot of photos. When I edit some, I backup them first. But this is very space consuming. Now I want to hardlink all unchanged photos.
What do I have to do, to have about the same functionallity like "fdupes -rL" (replace identical files with hardlinks to first occuarance)?
Code:
$ rmlint . -c sh:use_ln
Recursion will be done bt default, so no -r switch is needed. The -L is emulated by the `-c sh:use_ln`:
With this switch we configure the `sh` formatter to change the default removal command from rm to rm + ln.
Other config values can be looked up in the manpage or in the documentation above btw.
Aferwards you will notice that rmlint wrote his script a bit differently:
If you execute the ./rmlint.sh now, it will ask you for confirmation and remove the duplicate first (rm -f...) and replace it with a hardlink afterwards (ln ...).
P.S: We also have an IRC channel (#rmlint) on irc.freenode.net.
Anyway. How can I prevent rmlint from checking for bad files?
And I guess I found a bug or maybe at least a not so nice behaviour (using stable version):
When really nothing was found, an functionally empty rhlink.sh is created. If nothing was found IMHO no rmlint.sh should be created.
Tnx for your fast reply. I already found it. For me
Anyway. How can I prevent rmlint from checking for bad files?
There was a little bug that reported bad links even though you told it not do. Fixed in master.
Is the size parameter supposed to find all non-empty files? It does not harm, but with --types df it should only report
duplicates with a size >= 1 anyways.
Quote:
Originally Posted by Enkidu70
And I guess I found a bug or maybe at least a not so nice behaviour (using stable version):
When really nothing was found, an functionally empty rhlink.sh is created. If nothing was found IMHO no rmlint.sh should be created.
Okay, that's somewhat true. Also implemented in master.
Is the size parameter supposed to find all non-empty files? It does not harm, but with --types df it should only report
duplicates with a size >= 1 anyways.
Tnx, you are right. Perfect.
Quote:
Originally Posted by sahib_bommelig
Also implemented in master.
Fine. When will this be available in stable version?
Fine. When will this be available in stable version?
master is the stable version. "master" is just the default branch in the git repository.
Stable versions are released at https://github.com/sahib/rmlint/releases.
Those are just snapshots of master though, and may take some time to get into your distribution.
Compiling from source is a bit more work but guarantees the latest version.
Well... Because there is no package available for Linuxmint (Ubuntu)
None of the developers use ubuntu or any .deb based distribution.
Just ask the next friendly maintainer you find for a debian package.
Quote:
Originally Posted by Enkidu70
But version does not change (keeps 2.0.0 and empty rmlint.sh still created):
Version did not change since it is still alpha software. Before the official release only the git revision changes.
Speaking of that, it is quite recent so that's okay. Still I wonder what's going wrong. How do you rmlint now and what do you expect from it?
I tested the following:
Code:
$ mkdir empty
$ rmlint -T df empty
==> 1 file(s) after investigation, nothing to search through.
$ stat rmlint.sh
stat: "rmlint.sh": No such file or directory
If it finds empty files or directories it will of course write a script too.
Also check if the script was not there previously. And if you accidentally installed
an older version (unlikely, since you checked that with --version) in /usr/local or something.
How do you rmlint now and what do you expect from it?
Seems to work, but...
Code:
==> Insgesamt 23867 dateien, von denen 0 Duplikate in 0 Gruppen sind.
==> Dies entspricht 0 B an Duplikaten die entfernt werden können.
Eine sh Datei wurde nach /home/enkidu/Bilder/tmp/rmlint.sh geschrieben.
We're proud to release the new rmlint version 2.2.0 "Dreary Dropbear"!
Rmlint is a fast, feature-full but still easy to use lint and duplicate file finder.
This new releases includes over 400 commits and some noticeable improvements:
- Improved speed, particularly for byte-by-byte comparison option "-pp".
- Reduced memory footprint. This is particularly important for very large data sets (>5 million files) which rmlint now handles with ease.
- Fix some annoying bugs and crashes (especially on 32bit).
- Improved testsuite to ensure internal program integrity during development.
Reminder: We still feature a nice progressbar (-g), finding duplicate
directories (-D) and fast byte-by-byte comparison (-pp).
- Testers and morale boosters. Give us some feedback via Issue Tracker.
- Packagers for other distributions. You can also vote for the AUR package to get included in the official repos.
- Translators (only French and German available at present)
- Beer money is appreciated too of course.
Developers:
Here's what we're currently working on:
- An easy GUI for those in need (Prototype)
- Extend testsuite (current coverage as per lcov output)
- Automated speed regression tests (early benchmark)
- Faster re-running of rmlint (improved --cache)
- Sort output files by certain criteria (find biggest size sucker e.g.)
- Make shell script perform sanity checks.
we're happy to release the new rmlint version 2.4.0 Myopic Micrathene.
If you wonder what a Micrathene is, look here.
Here's the newsticker:
- A new optional GUI frontend based on Python/GTK3.
- A benchmark suite to protect against performance regression.
- Support for btrfs and reflink capable filesystems: Files can be now deduplicated by the fileystem using the BTRFS_IOC_FILE_EXTENT_SAME ioctl if
the user specified -c sh:clone.
- New --replay option that reprocesses the json file(s) of a previous run.
- New --sort-by option that sorts rmlint's output. Sort for example by size (--sort-by s) to print the biggest size suckers first.
- The shellscript now does sanity checks before removing files and can be told to double check the files before removing them.
That's of course a short list for about 700 commits.
While we're a somewhat healthy Open Source project, we can't do everything alone.
This is not only due to time constraints, but also due to the unability to test/package
rmlint on other systems or translating it to languages we don't speak.
In particular we want help on these topics:
- Packagers, particularly for Debian/Ubuntu. See here for more info.
There is already a package for Arch, thanks to Massimiliano Torromeo!
- Translators: See here for more information.
- Testers and Patchers. Especially for the new GUI, since it is a separate codebase.
- Beer money is always welcome.
Plans for upcoming releases:
Not many. We'd like to stabilise rmlint now and go up in smaller version jumps.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.