[SOLVED] sequential : how to find the missing numbers within a sequence of files that have sequential numbers attached to them?
ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
$file =~ s/-.*$//; #remove from second hyphen to end
to
Code:
$file =~ s/\..*$//; #remove from '.' to end
And "it will cut" (sorry..been watching Forged In Fire )
oh yeah ok two periods with an escape I was just escapng one period.
Code:
foreach $file (@files) {
print "first $file";
$file =~ s/FileName-//;
#$file =~ s/^.*?-//; #remove from beginning to first hyphen
print "second $file";
#$file =~ s/-.*$//; #remove from second hyphen to end
$file =~ s/.mp4//;
$existnums[$file]=$file; #save in array
print " third $file";
}
then just decided to use actual name
gets this.
Code:
first /run/media/userx/3TB-External/Files-Resampled/FileName-168.mp4
second /run/media/userx/3TB-External/Files-Resampled/168.mp4
third /run/media/userx/3TB-External/Files-Resampled/168
first /run/media/userx/3TB-External/Files-Resampled/FileName-176.mp4
second /run/media/userx/3TB-External/Files-Resampled/176.mp4
third /run/media/userx/3TB-External/Files-Resampled/176
first /run/media/userx/3TB-External/Files-Resampled/FileName-179.mp4
second /run/media/userx/3TB-External/Files-Resampled/179.mp4
third /run/media/userx/3TB-External/Files-Resampled/179
still prints out every number
let me change that to what you got and see what happens
Note:
that is a strange way to populate the array with out an incurrent number? or is it because it is not getting a number within the array element by how it is getting chopped down?
Code:
$existnums[$file]=$file; #save in array
because $file should equal a number
your new way got me this.
Code:
first /run/media/userx/3TB-External/Files-Resampled/FileName-176.mp4
second /run/media/userx/3TB
third /run/media/userx/3TB
first /run/media/userx/3TB-External/Files-Resampled/FileName-179.mp4
second /run/media/userx/3TB
third /run/media/userx/3TB
1
2
3
4
5
6
7
8
9
10
11
12
13
code here
Code:
## remove leading and trailing parts
foreach $file (@files) {
print "first $file";
#$file =~ s/FileName-//;
#$file =~ s/^.*?-//; #remove from beginning to first hyphen
$file =~ s/-.*$//;
print "second $file";
#$file =~ s/-.*$//; #remove from second hyphen to end
#$file =~ s/.mp4//;
$file =~ s/\..*$//; #remove from '.' to end
$existnums[$file]=$file; #save in array
print " third $file";
}
## remove leading and trailing parts
foreach $file (@files) {
print "first $file";
$file =~ s/FileName-//;
#$file =~ s/^.*?-//; #remove from beginning to first hyphen
#$file =~ s/-.*?-//; #remove from second hyphen to end
print "second $file";
#$file =~ s/-.*$//; #remove from second hyphen to end
#$file =~ s/.mp4//;
$file =~ s/\..*$//; #remove from '.' to end
$existnums[$file]=$file; #save in array
print " third $file";
}
get this
Code:
first /run/media/userx/3TB-External/Files-Resampled/FileName-179.mp4
second /run/media/userx/3TB-External/Files-Resampled/179.mp4
third /run/media/userx/3TB-External/Files-Resampled/179
but I do not understand why that path to is still there?
in bash it gets chopped off and leaving me with just the number inside of the variable.
#!/usr/bin/perl
use strict;
use warnings;
my $dir = '/path/to/files';
opendir(my $DIRECTORY, $dir) || die "Can't open $dir: $!\n";
my %numhash = ();
for (my $min = 0; $min <= 999; $min++) {
my @files = readdir $DIRECTORY;
for my $this_file (@files) {
if ($this_file =~ /^(.*?)-(\d+?)-(.*?)\.ext$/) {
$numhash{$2} = 1;
}
}
}
my @sortedarray = (sort {$a <=> $b} keys %numhash);
for (my $i = $sortedarray[0]; $i < $sortedarray[-1]; $i++) {
if (exists $numhash{sprintf "%03d", $i}) {
next;
} else {
print STDERR "$i is missing!\n";
}
}
nope
Code:
userx%slackwhere ⚡ scripts ⚡> ./perl-number-list
Use of uninitialized value in numeric lt (<) at ./perl-number-list line 23.
Use of uninitialized value $i in numeric lt (<) at ./perl-number-list line 23.
line 23
Code:
for (my $i = $sortedarray[0]; $i < $sortedarray[-1]; $i++) {
userx%slackwhere ⚡ scripts ⚡> ./perl-number-list
7813 is missing!
7814 is missing!
7815 is missing!
7816 is missing!
7817 is missing!
7818 is missing!
7819 is missing!
7820 is missing!
7821 is missing!
7822 is missing!
7823 is missing!
7824 is missing!
7825 is missing!
7826 is missing!
7827 is missing!
7828 is missing!
7829 is missing!
7830 is missing!
7831 is missing!
changed it to
Code:
for (my $min = 0; $min <= 270; $min++) {
my @files = readdir $DIRECTORY;
270 but that didn't stop it for going nuts with the numbers.
AGAIN - fixed it I removed some malformed filenamed files
Code:
userx%slackwhere ⚡ scripts ⚡> ./perl-number-list
162 is missing!
169 is missing!
170 is missing!
172 is missing!
173 is missing!
174 is missing!
175 is missing!
181 is missing!
186 is missing!
195 is missing!
196 is missing!
197 is missing!
198 is missing!
245 is missing!
Just have not visually checked it again the list.
Its getting really late here so I will trust your work and mark this solved and check them numbers in the morning.
I added a print statement there to see if it's picking up any files, can you run it like that with your changes (it would probably be easier for you to just add the print $this_file, "\n"; line to whatever you have saved.
I added a print statement there to see if it's picking up any files, can you run it like that with your changes (it would probably be easier for you to just add the print $this_file, "\n"; line to whatever you have saved.
Edit:
OK, I guess you got it working.
yes thanks -- for the both of you perl jockies @scasey as well ...
userx%slackwhere ⚡ scripts ⚡> ./perl-number-list
7813 is missing!
7814 is missing!
7815 is missing!
7816 is missing!
7817 is missing!
7818 is missing!
7819 is missing!
7820 is missing!
7821 is missing!
7822 is missing!
7823 is missing!
7824 is missing!
7825 is missing!
7826 is missing!
7827 is missing!
7828 is missing!
7829 is missing!
7830 is missing!
7831 is missing!
changed it to
Code:
for (my $min = 0; $min <= 270; $min++) {
my @files = readdir $DIRECTORY;
270 but that didn't stop it for going nuts with the numbers.
AGAIN - fixed it I removed some malformed filenamed files
Code:
userx%slackwhere ⚡ scripts ⚡> ./perl-number-list
162 is missing!
169 is missing!
170 is missing!
172 is missing!
173 is missing!
174 is missing!
175 is missing!
181 is missing!
186 is missing!
195 is missing!
196 is missing!
197 is missing!
198 is missing!
245 is missing!
Just have not visually checked it again the list.
Its getting really late here so I will trust your work and mark this solved and check them numbers in the morning.
Actually I don't think you that loop with $min at all, it is leftover from a previous thought on how to do it...there's a slang word for useless code that builds up in a program but I forget what it is at the moment.
Here's my final version according to your first criteria:
Code:
#!/usr/bin/perl
use strict;
use warnings;
my $dir = '/path/to/files';
opendir(my $DIRECTORY, $dir) || die "Can't open $dir: $!\n";
my %numhash = ();
my @files = readdir $DIRECTORY;
for my $this_file (@files) {
if ($this_file =~ /^(.*?)-(\d+?)-(.*?)\.ext$/) {
$numhash{$2} = 1;
}
}
my @sortedarray = (sort {$a <=> $b} keys %numhash);
for (my $i = $sortedarray[0]; $i < $sortedarray[-1]; $i++) {
if (! exists $numhash{sprintf "%03d", $i}) {
printf STDERR "%03d is missing!\n", $i;
}
}
Last edited by Laserbeak; 07-07-2017 at 09:45 PM.
Reason: Oops, one final change... make sure missing numbers are front padded with zeros to three digits
Ya got a understand that ext is just generic name/ abbreviation for extension, ya just got a, I tell ya!
Posted from phone
Yeah, I just posted according to the original theoretical specs since you seem to know more how to customize it to your system and situation than I do....
but I do not understand why that path to is still there?
in bash it gets chopped off and leaving me with just the number inside of the variable.
Ahh. I was using a working_dir of . so there was no path on the result, still:
Code:
$file =~ s/^.*?-//; #remove from beginning to first hyphen
should remove everything up to and including the first hyphen in the file name...including the path.
Oh. you changed that to
Code:
$file =~ s/FileName-//;
which wouldn't account for the path in front of the file name, so it's still there.
My regex says match from the beginning of the string (^) any character (.) any number of times (*) up to the first (?) hyphen (-) and replace with nothing.
(although now that there are not two hyphens, the ? isn't really necessary.)
Your regex says replace 'FileName-' with nothing, so you aren't removing the path/to/the/file from the result of the find command.
Rerun with your print statements, but use my regex instead. You should see:
Code:
first /run/media/userx/3TB-External/Files-Resampled/FileName-179.mp4
second 179.mp4
third 179
It works to use
Code:
$file =~ s/.mp4//;
instead of
Code:
s/\..*$//
, but only if the extension is always mp4
[and you should escape the '.'
Code:
s/\.mp4//
because you want to match a literal '.' - not 'any one character' - although it works because a '.' is an 'any one character' ]
...but I digress. Your regex works only if the extension is always .mp4 .. my regex will work no matter what the extension is, matching
a literal dot (\.) any character (.) any number of times (*) to the end of the string ($).
Quote:
Note:
that is a strange way to populate the array with out an incurrent number? or is it because it is not getting a number within the array element by how it is getting chopped down?
Code:
$existnums[$file]=$file; #save in array
because $file should equal a number
$file does equal a number as I coded it. If files xxx-001.mp4, xxx-004.mp4, xxx-006.mp4 exist, the array would contain
Code:
$existnums[1]=1
$existnums[4]=4
$existnums[6]=6
The leading zeros go away in much the same way they do in Excel. Put 001 into a variable and then print it, it will display 1. It's how perl works.
I'm a little lost, and it's late here now. I see you used:
Code:
$file =~ s/-.*$//;
for the first substitution. That would remove everything from the hyphen to the end of the string, which would include the number we're trying to capture.
Please run with these regex's:
Code:
foreach $file (@files) {
$file =~ s/^.*?-//; #remove from beginning to first hyphen
$file =~ s/\..*$//; #remove from . to end
$existnums[$file]=$file; #save what's left in array
}
(I've updated my last post of the full script [#29]...)
PS
I want to look at what Laserbeak contributed, but it really is late...
yeah I just happened to check into regex - looked at it briefly just a little while before getting back to this post, and being late here I just hack away on it not having time to really think about what I was doing per se'
whereas bash is just:
Code:
var=${file##*-}
#strips everything from left to right
# to the farthest - (hyphen) right of left side
#removing path and filename up to the hyphen
#/path/to/File-Name-123.mp4
gets
123.mp4
var=${var%.*}
#then strips right to left to closest dot
giving me
123
#so,
#this
var=${file##*-}
var=${var%.*}
is all I needed to get the numbers.
easier readability for me, whereas just learning the meanings of the symbols used in regex is all . (dot) is anything, ? is at the start, $ is at the end then the \ / \\ /. choppy looking lines like this
Code:
$this_file =~ /^(.*?)-(\d+?)-(.*?)\.ext$/)
it makes a kind of like Hieroglyphs to me.
even though I figured that one out in the filename was not actually as I stated before hand in the first post.
giving me this instead,
Code:
$this_file =~ /^(.*?)-(\d+?)\.mp4$/)
because .mp4 is explicitly stated. then the tail end part is not needed.
Code:
(.*?)
leaving it with searching for just one hyphen and keeping whats in between it (\d+?)
easier readability for me, whereas just learning the meanings of the symbols used in regex is all . (dot) is anything, ? is at the start, $ is at the end then the \ / \\ /. choppy looking lines like this
Code:
$this_file =~ /^(.*?)-(\d+?)-(.*?)\.ext$/)
it makes a kind of like Hieroglyphs to me.
even though I figured that one out in the filename was not actually as I stated before hand in the first post.
giving me this instead,
Code:
$this_file =~ /^(.*?)-(\d+?)\.mp4$/)
because .mp4 is explicitly stated. then the tail end part is not needed.
Code:
(.*?)
leaving it with searching for just one hyphen and keeping whats in between it (\d+?)
Once you get used to them, regular expressions are like second nature, you hardly have to think about them. And, although the slash is traditional, you can use any symbol like a comma etc. So $this_file =~ ,^(.*?)-(\d+?)\.mp4$,) would work too.
The question mark is there because by nature, regular expressions are greedy. If you did actually have two hyphens in your filenames like in your original post, a search like /(.*)-/ would suck up everything to the last hyphen. The question mark makes it stop at the first hyphen.
\d is just the set of digits, the + sign means 1 or more while the * sign means 0 or more.
And you know the ^ and $ mean the beginning and end of a line of the string, using the variable $/ to define the end-of-line character. This is automatically set for you depending on your system. On DOS/Windows $/ = "\r\n", on UNIX it is "\n", on the Classic Mac (not Mac OS X which is UNIX) it is "\r". But as in here, you can get around that by adding gs after the s///; statement. That treats the entire thing as a string and does global search and replace instead of one at a time.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.