LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - General
User Name
Password
Linux - General This Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.

Notices


Reply
  Search this Thread
Old 01-01-2015, 08:39 PM   #1
sahib_bommelig
LQ Newbie
 
Registered: Jan 2015
Posts: 8

Rep: Reputation: Disabled
Post rmlint-2.0.0 - a lint/duplicate finder [rewrite of old rmlint, testers wanted]


HELLO PEOPLE.
  1. https://github.com/sahib/rmlint/tree/develop (GitHub)
  2. https://github.com/sahib/rmlint/issues (Feature/Bug/Issue tracker)
  3. https://travis-ci.org/sahib/rmlint (TravisCI teststatus)
  4. https://rmlint.rtfd.org (Documentation)

rmlint finds space waste and other broken things on your file system and offers to remove it.
It is especially good at finding duplicates in your files and getting rid of
them. You might argue that most of this can be done via few lines of bash, but
can do you do it fast too? And how do you get rid of the results?

Maybe some of you might remember this tool since it had a thread on the archforums.

SO WHAT'S NEW?

rmlint has been completely rewritten in the meantime.
Roughly 800+ commits are included in this release which resulted in:
  1. Much cleaner and extensible code with less bugs.
  2. More tests and less nonsense features ([em]--junkchars, --oldtmp, posix regex![/em]).
  3. Speedups regularly between 2x and 8x of the original speed or even more.
  4. Saner, more unix-ish commandline interface.
Okay, that was vague - Sorry. Concrete features are:
  1. Exchangeable hashsum-algorithms. (cryptographic and non-cryptographic)
  2. Ability to find duplicate directories. (experimental)
  3. Filter duplicates by basename, file extension or only files newer than a certain mtime.
  4. Localization support (help needed!)
  5. More output formats. (shell/python script, json/csv dump, a progressbar...)
  6. Support for reading files from stdin. (using "-" as file)
  7. More options to guess the original in a set of duplicates. (--sortcriteria option)

With this said:
The new version is not compatible to the old one. Do not assume it works with the same options!
But it should be noted that the new version does not ever delete files itself, but gives you weapons to do so.

ANY HELP NEEDED?

It's still fresh software that needs packagers, translators, bugfixers and mostly testers.
People that want to port rmlint to other platforms (OSX, BSD*) are welcome too of course.
In any case, GitHub is where the action should happen.
If you know a little Python, adding a testcase to our testsuite along with your bugreport would be great.


At this point: a big thanks to my co-author SeeSpotRun which made this happen.

I WANT IT!

There are readily made packages for some distributions already.
See here for more details: http://rmlint.readthedocs.org/en/latest/install.html

General feedback is welcome.

Enjoy!
 
Old 01-02-2015, 02:40 PM   #2
Enkidu70
LQ Newbie
 
Registered: Jan 2015
Distribution: Linuxmint
Posts: 6

Rep: Reputation: Disabled
Question

Hi,

tnx for creating this tool. I just installed it on Linuxmint. Is there a discussion forum for rmlint? I have some usage questions.

If not, maybe you can just answer my question:

What do I have to do, to have about the same functionallity like "fdupes -rL" (replace identical files with hardlinks to first occuarance)?

Current situation: I have quite a lot of photos. When I edit some, I backup them first. But this is very space consuming. Now I want to hardlink all unchanged photos.


Tnx in advance,
Enkidu
 
Old 01-02-2015, 03:40 PM   #3
sahib_bommelig
LQ Newbie
 
Registered: Jan 2015
Posts: 8

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by Enkidu70
What do I have to do, to have about the same functionallity like "fdupes -rL" (replace identical files with hardlinks to first occuarance)?
Code:
$ rmlint . -c sh:use_ln
Recursion will be done bt default, so no -r switch is needed. The -L is emulated by the `-c sh:use_ln`:
With this switch we configure the `sh` formatter to change the default removal command from rm to rm + ln.
Other config values can be looked up in the manpage or in the documentation above btw.

Aferwards you will notice that rmlint wrote his script a bit differently:

Code:
$ grep 'duplicate' ./rmlint.sh -B1 
echo  '/home/sahib/rmlint/test_dupes/1' # original
rm -f '/home/sahib/rmlint/test_dupes/2' && ln  '/home/sahib/rmlint/test_dupes/1' '/home/sahib/rmlint/test_dupes/2' # duplicate
If you execute the ./rmlint.sh now, it will ask you for confirmation and remove the duplicate first (rm -f...) and replace it with a hardlink afterwards (ln ...).


P.S: We also have an IRC channel (#rmlint) on irc.freenode.net.
 
Old 01-02-2015, 04:03 PM   #4
Enkidu70
LQ Newbie
 
Registered: Jan 2015
Distribution: Linuxmint
Posts: 6

Rep: Reputation: Disabled
Lightbulb

Quote:
Originally Posted by sahib_bommelig View Post
Code:
$ rmlint . -c sh:use_ln
Tnx for your fast reply. I already found it. For me
Code:
rmlint --types=df --size="1-9999GB" --config sh:use_ln=true
is what I basically need.

Anyway. How can I prevent rmlint from checking for bad files?

And I guess I found a bug or maybe at least a not so nice behaviour (using stable version):
When really nothing was found, an functionally empty rhlink.sh is created. If nothing was found IMHO no rmlint.sh should be created.


Cheers,
Enkidu
 
Old 01-02-2015, 04:49 PM   #5
sahib_bommelig
LQ Newbie
 
Registered: Jan 2015
Posts: 8

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by Enkidu70 View Post
Tnx for your fast reply. I already found it. For me
Anyway. How can I prevent rmlint from checking for bad files?
There was a little bug that reported bad links even though you told it not do. Fixed in master.

Is the size parameter supposed to find all non-empty files? It does not harm, but with --types df it should only report
duplicates with a size >= 1 anyways.

Quote:
Originally Posted by Enkidu70 View Post
And I guess I found a bug or maybe at least a not so nice behaviour (using stable version):
When really nothing was found, an functionally empty rhlink.sh is created. If nothing was found IMHO no rmlint.sh should be created.
Okay, that's somewhat true. Also implemented in master.

Regards,
Chris
 
Old 01-02-2015, 05:25 PM   #6
Enkidu70
LQ Newbie
 
Registered: Jan 2015
Distribution: Linuxmint
Posts: 6

Rep: Reputation: Disabled
Question

Quote:
Originally Posted by sahib_bommelig View Post
Is the size parameter supposed to find all non-empty files? It does not harm, but with --types df it should only report
duplicates with a size >= 1 anyways.
Tnx, you are right. Perfect.

Quote:
Originally Posted by sahib_bommelig View Post
Also implemented in master.
Fine. When will this be available in stable version?


Cheers,
Enkidu
 
Old 01-02-2015, 05:36 PM   #7
sahib_bommelig
LQ Newbie
 
Registered: Jan 2015
Posts: 8

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by Enkidu70 View Post
Fine. When will this be available in stable version?
master is the stable version. "master" is just the default branch in the git repository.
Stable versions are released at https://github.com/sahib/rmlint/releases.
Those are just snapshots of master though, and may take some time to get into your distribution.
Compiling from source is a bit more work but guarantees the latest version.
 
Old 01-02-2015, 06:42 PM   #8
Enkidu70
LQ Newbie
 
Registered: Jan 2015
Distribution: Linuxmint
Posts: 6

Rep: Reputation: Disabled
Quote:
Originally Posted by sahib_bommelig View Post
master is the stable version.
Well... Because there is no package available for Linuxmint (Ubuntu), I installed with
Code:
echo
rmlint --version
echo

rm -rf rmlint
git clone -b master https://github.com/sahib/rmlint.git
cd rmlint/
scons -j4
sudo scons --prefix=/usr install

echo
rmlint --version
echo
But version does not change (keeps 2.0.0 and empty rmlint.sh still created):
Code:
version 2.0.0 compiled: Jan  2 2015 at [17:34:47] "Personable Pidgeon" (rev a9fa1b6)
compiled with: +mounts +nonstripped +fiemap +sha512 +bigfiles +intl 

Klone nach 'rmlint'...
remote: Counting objects: 6302, done.
remote: Compressing objects: 100% (56/56), done.
remote: Total 6302 (delta 32), reused 0 (delta 0)
Empfange Objekte: 100% (6302/6302), 4.05 MiB | 865.00 KiB/s, done.
Löse Unterschiede auf: 100% (4590/4590), done.
Checking connectivity... done
scons: Reading SConscript files ...
Checking whether the C compiler worksyes
Checking for git revision... (cached) a9fa1b6
Checking for pkg-config... yes
Checking for glib-2.0 >= 2.32... yes
Checking for blkid... yes
Checking whether __SSE4_2__ is declared... yes
Checking whether blkid_devno_to_wholedisk is declared... yes
Checking for existence of /sys/block... (cached) yes
Checking for C function sysctlbyname()... no
Checking for C header file libelf.h... yes
Checking for C library libelf... yes
Checking for C type struct fiemap... yes
Checking size of off_t ... yes
Checking for C function stat64()... yes
Checking whether G_CHECKSUM_SHA512 is declared... yes
Checking for C header file mntent.h... yes
Checking for C function getmntinfo()... no
Checking for C header file locale.h... yes
scons: done reading SConscript files.
scons: Building targets ...
Building manpage from rst...
msgfmt po/de.po -o po/de.mo
msgfmt po/fr.po -o po/fr.mo
build_config_template(["src/config.h"], ["src/config.h.in"])
Using sphinx-build binary: /usr/bin/sphinx-build
Compiling ==> src/checksums/murmur3.c
Compiling ==> src/utilities.c
Compiling ==> src/session.c
Compiling ==> src/shredder.c
Compiling ==> src/checksums/city.c
Compiling ==> src/preprocess.c
Compiling ==> src/main.c
Compiling ==> src/treemerge.c
Compiling ==> src/traverse.c
Compiling ==> src/settings.c
Compiling ==> src/cmdline.c
Compiling ==> src/file.c
Compiling ==> src/checksum.c
Compiling ==> src/formats.c
Compiling ==> src/checksums/spooky-c.c
Compiling ==> src/formats/csv.c
Compiling ==> src/formats/fdupes.c
Compiling ==> src/formats/json.c
Compiling ==> src/formats/pretty.c
Compiling ==> src/formats/progressbar.c
build_python_formatter(["src/formats/py.c"], ["src/formats/py.c.in"])
Compiling ==> src/formats/py.c
Compiling ==> src/formats/sh.c
Compiling ==> src/formats/summary.c
Compiling ==> src/formats/timestamp.c
Compiling ==> src/libart/art.c
Linking Program ==> rmlint
gzip_file(["docs/rmlint.1.gz"], ["docs/_build/man/rmlint.1"])
scons: done building targets.
scons: Reading SConscript files ...
Checking whether the C compiler works(cached) yes
Checking for git revision... (cached) a9fa1b6
Checking for pkg-config... (cached) yes
Checking for glib-2.0 >= 2.32... (cached) yes
Checking for blkid... (cached) yes
Checking whether __SSE4_2__ is declared... (cached) yes
Checking whether blkid_devno_to_wholedisk is declared... (cached) yes
Checking for existence of /sys/block... (cached) yes
Checking for C function sysctlbyname()... (cached) no
Checking for C header file libelf.h... (cached) yes
Checking for C library libelf... (cached) yes
Checking for C type struct fiemap... (cached) yes
Checking size of off_t ... (cached) yes
Checking for C function stat64()... (cached) yes
Checking whether G_CHECKSUM_SHA512 is declared... (cached) yes
Checking for C header file mntent.h... (cached) yes
Checking for C function getmntinfo()... (cached) no
Checking for C header file locale.h... (cached) yes
scons: done reading SConscript files.
scons: Building targets ...
build_config_template(["src/config.h"], ["src/config.h.in"])
Compiling ==> src/checksum.c
Compiling ==> src/cmdline.c
Compiling ==> src/file.c
Compiling ==> src/formats.c
Compiling ==> src/main.c
Compiling ==> src/preprocess.c
Compiling ==> src/session.c
Compiling ==> src/settings.c
Compiling ==> src/shredder.c
Compiling ==> src/traverse.c
Compiling ==> src/treemerge.c
Compiling ==> src/utilities.c
Compiling ==> src/checksums/city.c
Compiling ==> src/formats/csv.c
Compiling ==> src/formats/fdupes.c
Compiling ==> src/formats/json.c
Compiling ==> src/formats/pretty.c
Compiling ==> src/formats/progressbar.c
build_python_formatter(["src/formats/py.c"], ["src/formats/py.c.in"])
Compiling ==> src/formats/py.c
Compiling ==> src/formats/sh.c
Compiling ==> src/formats/summary.c
Compiling ==> src/formats/timestamp.c
Linking Program ==> rmlint
Install file: "rmlint" as "/usr/bin/rmlint"
msgfmt po/de.po -o po/de.mo
Install file: "po/de.mo" as "/usr/share/locale/de/LC_MESSAGES/rmlint.mo"
msgfmt po/fr.po -o po/fr.mo
Install file: "po/fr.mo" as "/usr/share/locale/fr/LC_MESSAGES/rmlint.mo"
Building manpage from rst...
Using sphinx-build binary: /usr/bin/sphinx-build
gzip_file(["docs/rmlint.1.gz"], ["docs/_build/man/rmlint.1"])
Install file: "docs/rmlint.1.gz" as "/usr/share/man/man1/rmlint.1.gz"
scons: done building targets.

version 2.0.0 compiled: Jan  3 2015 at [01:24:20] "Personable Pidgeon" (rev a9fa1b6)
compiled with: +mounts +nonstripped +fiemap +sha512 +bigfiles +intl
Enkidu
 
Old 01-02-2015, 07:12 PM   #9
sahib_bommelig
LQ Newbie
 
Registered: Jan 2015
Posts: 8

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by Enkidu70 View Post
Well... Because there is no package available for Linuxmint (Ubuntu)
None of the developers use ubuntu or any .deb based distribution.
Just ask the next friendly maintainer you find for a debian package.

Quote:
Originally Posted by Enkidu70 View Post
But version does not change (keeps 2.0.0 and empty rmlint.sh still created):
Version did not change since it is still alpha software. Before the official release only the git revision changes.
Speaking of that, it is quite recent so that's okay. Still I wonder what's going wrong. How do you rmlint now and what do you expect from it?

I tested the following:

Code:
$ mkdir empty
$ rmlint -T df empty
==> 1 file(s) after investigation, nothing to search through.
$ stat rmlint.sh
stat: "rmlint.sh": No such file or directory
If it finds empty files or directories it will of course write a script too.
Also check if the script was not there previously. And if you accidentally installed
an older version (unlikely, since you checked that with --version) in /usr/local or something.
 
Old 01-03-2015, 01:00 PM   #10
Enkidu70
LQ Newbie
 
Registered: Jan 2015
Distribution: Linuxmint
Posts: 6

Rep: Reputation: Disabled
Lightbulb

Quote:
Originally Posted by sahib_bommelig View Post
How do you rmlint now and what do you expect from it?
Seems to work, but...
Code:
==> Insgesamt 23867 dateien, von denen 0 Duplikate in 0 Gruppen sind.
==> Dies entspricht 0 B an Duplikaten die entfernt werden können.

Eine sh Datei wurde nach /home/enkidu/Bilder/tmp/rmlint.sh geschrieben.
Suppress the msg also, please!

Enkidu
 
Old 01-03-2015, 02:18 PM   #11
sahib_bommelig
LQ Newbie
 
Registered: Jan 2015
Posts: 8

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by Enkidu
Suppress the msg also, please!
Pardon, forgot that. Fixed in master.
 
Old 01-04-2015, 06:04 AM   #12
Enkidu70
LQ Newbie
 
Registered: Jan 2015
Distribution: Linuxmint
Posts: 6

Rep: Reputation: Disabled
Quote:
Originally Posted by sahib_bommelig View Post
Pardon, forgot that. Fixed in master.
Tnx! It is doing a very good job!

Anyway. Would say it is beta now. So maybe you can also "implement" a proper versioning!?

Cheers,
Enkidu
 
Old 05-09-2015, 07:08 PM   #13
sahib_bommelig
LQ Newbie
 
Registered: Jan 2015
Posts: 8

Original Poster
Rep: Reputation: Disabled
We're proud to release the new rmlint version 2.2.0 "Dreary Dropbear"!

Rmlint is a fast, feature-full but still easy to use lint and duplicate file finder.
This new releases includes over 400 commits and some noticeable improvements:

- Improved speed, particularly for byte-by-byte comparison option "-pp".
- Reduced memory footprint. This is particularly important for very large data sets (>5 million files) which rmlint now handles with ease.
- Fix some annoying bugs and crashes (especially on 32bit).
- Improved testsuite to ensure internal program integrity during development.

Reminder: We still feature a nice progressbar (-g), finding duplicate
directories (-D) and fast byte-by-byte comparison (-pp).

Links:

- GitHub
- Documentation
- Full Changelog

Support wanted:

Non-developers:

- Testers and morale boosters. Give us some feedback via Issue Tracker.
- Packagers for other distributions. You can also vote for the AUR package to get included in the official repos.
- Translators (only French and German available at present)
- Beer money is appreciated too of course.

Developers:

Here's what we're currently working on:

- An easy GUI for those in need (Prototype)
- Extend testsuite (current coverage as per lcov output)
- Automated speed regression tests (early benchmark)
- Faster re-running of rmlint (improved --cache)
- Sort output files by certain criteria (find biggest size sucker e.g.)
- Make shell script perform sanity checks.

Have fun!
 
Old 10-25-2015, 09:55 AM   #14
sahib_bommelig
LQ Newbie
 
Registered: Jan 2015
Posts: 8

Original Poster
Rep: Reputation: Disabled
Hello,

we're happy to release the new rmlint version 2.4.0 Myopic Micrathene.
If you wonder what a Micrathene is, look here.

Here's the newsticker:

- A new optional GUI frontend based on Python/GTK3.
- A benchmark suite to protect against performance regression.
- Support for btrfs and reflink capable filesystems: Files can be now deduplicated by the fileystem using the BTRFS_IOC_FILE_EXTENT_SAME ioctl if
the user specified -c sh:clone.
- New --replay option that reprocesses the json file(s) of a previous run.
- New --sort-by option that sorts rmlint's output. Sort for example by size (--sort-by s) to print the biggest size suckers first.
- The shellscript now does sanity checks before removing files and can be told to double check the files before removing them.

That's of course a short list for about 700 commits.

Links:

- GitHub
- Documentation
- Changelog
- IssueTracker

Support wanted:

While we're a somewhat healthy Open Source project, we can't do everything alone.
This is not only due to time constraints, but also due to the unability to test/package
rmlint on other systems or translating it to languages we don't speak.

In particular we want help on these topics:

- Packagers, particularly for Debian/Ubuntu. See here for more info.
There is already a package for Arch, thanks to Massimiliano Torromeo!
- Translators: See here for more information.
- Testers and Patchers. Especially for the new GUI, since it is a separate codebase.
- Beer money is always welcome.

Plans for upcoming releases:
Not many. We'd like to stabilise rmlint now and go up in smaller version jumps.

Have fun while killing some files.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[TESTERS WANTED] Upcoming slint-next installers Didier Spaier Slackware 17 03-30-2014 01:42 PM
[SOLVED] [TESTERS WANTED] Slint installers for Slackware 14.1 => LAST CALL Didier Spaier Slackware 11 11-22-2013 06:22 AM
LXer: SwapBoost v0.1alpha - early testers wanted LXer Syndicated Linux News 0 07-08-2007 09:46 PM
Beta testers wanted for new distro (IBLS) ico2 Linux - Distributions 4 12-31-2005 07:18 AM
Problem installing JAVAUNIX. needed for Duplicate File Finder bglnelissen Linux - Software 1 12-22-2004 04:53 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - General

All times are GMT -5. The time now is 04:01 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration