LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 05-27-2022, 08:16 PM   #1
TheLexx
Member
 
Registered: Apr 2013
Distribution: Gentoo
Posts: 79

Rep: Reputation: Disabled
Subject: find . (need escaped sequence)


I'm trying to retrieve filenames in an escaped format. I would like to use the "find command" and retrieve all the filenames under a singe directory, similar to the form you would get with the following command "find /home/username". If the filenames contain spaces, single quotes, double quotes or other characters that are problematic to Unix, I would like to retrieve those names as ether a hex or octagonal escape sequence.

I am writing a short script in python (that calls standard Unix commands). The purpose of the script is to create a text file that contains the md5sum for all files under a directory. The purpose of the file is to verify that a directory structure was copied without corruption.

I tried experimenting with ls and the --quote-name and --quoting-style options. Any idea where I should start?
 
Old 05-28-2022, 01:33 AM   #2
ondoho
LQ Addict
 
Registered: Dec 2013
Posts: 19,872
Blog Entries: 12

Rep: Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053
Quote:
Originally Posted by TheLexx View Post
I'm trying to retrieve filenames in an escaped format. I would like to use the "find command" and retrieve all the filenames under a singe directory, similar to the form you would get with the following command "find /home/username". If the filenames contain spaces, single quotes, double quotes or other characters that are problematic to Unix, I would like to retrieve those names as ether a hex or octagonal escape sequence.

I am writing a short script in python (that calls standard Unix commands). The purpose of the script is to create a text file that contains the md5sum for all files under a directory. The purpose of the file is to verify that a directory structure was copied without corruption.

I tried experimenting with ls and the --quote-name and --quoting-style options. Any idea where I should start?
I don't think find alone can do this.

I suspect an XY-problem here.
First of all I think you should either write a python script, or a shell script.
There might be rare occasions where you might need to include shell commands in a python script, but I don't think this is such an occasion.
What exactly are you trying to achieve and where exactly does it fail?
Show us.
Please use CODE tags for full command output and code (see my signature).
 
Old 05-28-2022, 05:02 AM   #3
NevemTeve
Senior Member
 
Registered: Oct 2011
Location: Budapest
Distribution: Debian/GNU/Linux, AIX
Posts: 4,874
Blog Entries: 1

Rep: Reputation: 1871Reputation: 1871Reputation: 1871Reputation: 1871Reputation: 1871Reputation: 1871Reputation: 1871Reputation: 1871Reputation: 1871Reputation: 1871Reputation: 1871
Code:
find . -type f -print0 | xargs -0
could be a start

Last edited by NevemTeve; 05-28-2022 at 06:59 AM.
 
1 members found this post helpful.
Old 05-28-2022, 10:07 AM   #4
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 21,976

Rep: Reputation: 7337Reputation: 7337Reputation: 7337Reputation: 7337Reputation: 7337Reputation: 7337Reputation: 7337Reputation: 7337Reputation: 7337Reputation: 7337Reputation: 7337
if it is a single directory: do not use find, a simple * would do the same.
Code:
for f in *; do printf "%q\n" "$f"; done
 
Old 05-28-2022, 11:30 AM   #5
EdGr
Senior Member
 
Registered: Dec 2010
Location: California, USA
Distribution: I run my own OS
Posts: 1,000

Rep: Reputation: 472Reputation: 472Reputation: 472Reputation: 472Reputation: 472
You want to avoid passing filenames through the shell. Have the find command execute md5sum directly:

Code:
find $dir -type f -exec md5sum {} \; > checksums
Check the files with:

Code:
md5sum -c checksums
Ed
 
Old 06-25-2022, 02:05 PM   #6
TheLexx
Member
 
Registered: Apr 2013
Distribution: Gentoo
Posts: 79

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by ondoho View Post
I suspect an XY-problem here.
Well maybe your right. I came into this with a mindset that "I gotta get this done". So, I grabbed the first peg I found and decided to ram it into the hole no matter what shape it was.. After more examination, I might be trying to ram a square peg into a round hole.

Lets start with what I am trying to achieve. I've found out that when I copy lots of data (greater than 10GB) it is not unheard of for single byte errors to creep in. So, I would like to verify that the files are identical. The kind of copying I am doing is copying whole disk partitions or possibly just large sections of a disk partition,either way the copying is many directories deep.

As a way to assure no errors creep in, I would like to create two master files of md5sums, one for the source and one for the destination . I can then compare the two master files with "diff". If there is a discrepancy I can then zero in on the particular file(s) with issues and re-copy those files. To add to the assurance, I would unmount/mount the destination partition before creating the master file, this would assure that I am not reading from buffer.

One issue in creating a "diff-able file", is that the find may not return the files in the same order. Using the command "sort" should take care of this issue. I am still worried about a second issue that the filenames themselves could be problematic. This is what I was obsessing with when I first posted. My original idea was just to use a brute force replace all problematic characters with escape sequences where I thought that the command "ls -Q" would not be sufficient. Upon farther reflection I am thinking that "ls -Q" might be sufficient.

This is what I will try for now

Code:
find sourcedirectory -type f -exec ls -Q {} \; | sort > tempfile
Then I will read the file one line and running md5sum on each line. Do any of you see a potential stumbling block? I think most programs filter so that carriage returns are not part of the filename. My guess that by using "ls -Q" such a filename would get escaped. I'll keep everyone updated on my progress.
 
Old 06-25-2022, 02:52 PM   #7
boughtonp
Senior Member
 
Registered: Feb 2007
Location: UK
Distribution: Debian
Posts: 3,616

Rep: Reputation: 2555Reputation: 2555Reputation: 2555Reputation: 2555Reputation: 2555Reputation: 2555Reputation: 2555Reputation: 2555Reputation: 2555Reputation: 2555Reputation: 2555

What's wrong with the method specified in post #5 - i.e. generate the list of checksums on one side, then transfer that file and use --check to verify against it on the other side.

 
Old 06-26-2022, 01:36 AM   #8
MadeInGermany
Senior Member
 
Registered: Dec 2011
Location: Simplicity
Posts: 2,816

Rep: Reputation: 1211Reputation: 1211Reputation: 1211Reputation: 1211Reputation: 1211Reputation: 1211Reputation: 1211Reputation: 1211Reputation: 1211
I fear that ls -Q does not solve a problem - but might create one.
Go for
Code:
find "sourcedirectory" -type f -exec md5sum {} \; | sort -k2 > checksums
 
Old 06-26-2022, 07:42 AM   #9
Ser Olmy
Senior Member
 
Registered: Jan 2012
Distribution: Slackware
Posts: 3,345

Rep: Reputation: Disabled
Quote:
Originally Posted by TheLexx View Post
I've found out that when I copy lots of data (greater than 10GB) it is not unheard of for single byte errors to creep in.
That absolutely should not happen. There's a hardware issue or a serious software bug somewhere in your setup.

Hard drives, SATA protocols, PCIe transfers, Ethernet connections, IP transport protocols ... they all have robust error detection and correction mechanisms. It's almost impossible for data corruption to happen undetected with any of these mechanisms.

If you find that you cannot reliably transfer a few gigabytes of data from A to B, you definitely need to figure out what exactly is corrupting your data. If a simple transfer like you're describing is failing, there's no way to tell what else is getting silently corrupted without you noticing.

If this is indeed a hardware issue, the most likely culprit is memory. If it's software related, it could be pretty much any process involved in the transfer.
 
1 members found this post helpful.
Old 06-27-2022, 12:39 AM   #10
rnturn
Senior Member
 
Registered: Jan 2003
Location: Illinois (SW Chicago 'burbs)
Distribution: openSUSE, Raspbian, Slackware. Previous: MacOS, Red Hat, Coherent, Consensys SVR4.2, Tru64, Solaris
Posts: 2,808

Rep: Reputation: 550Reputation: 550Reputation: 550Reputation: 550Reputation: 550Reputation: 550
Quote:
Originally Posted by TheLexx View Post
The purpose of the script is to create a text file that contains the md5sum for all files under a directory. The purpose of the file is to verify that a directory structure was copied without corruption.
Would:
Code:
$ find ${SRCDIR} -type f -exec cat {} \; | md5sum

$ <some command to copy the directory $SRCDIR to $TGTDIR>

$ find ${TGTDIR} -type f -exec cat {} \; | md5sum
give you what you want? 'find' is simply walking the tree at the specified directory and invoking 'cat' to copy each file to stdout and piping the contents of all the files into md5sum.

It's a little different than EdGr's solution. That one will find the individual file (or files) with copy errors (which shouldn't occur, BTW) while the above lets you know that the copy process, as a whole, worked. Or didn't work.

There are multiple way to copy directory structures: tar, cpio, "cp -R" (or course), rsync, and I'm probably forgetting a few others. In my experience, all of them copy files/trees without fail if the system isn't broken in some way (filesystem corruption, hardware failure, etc.). So I'm not certain obtaining the checksum is strictly necessary but, I suppose, can give one confidence in the process.

As for making the copy: I lean heavily toward 'cpio' when copying directories (or trees) and the find "-print0" switch used in conjunction with cpio's "--null" switch handles filespecs with funky characters just fine.

HTH...

Last edited by rnturn; 06-27-2022 at 01:11 AM.
 
Old 06-27-2022, 12:52 AM   #11
MadeInGermany
Senior Member
 
Registered: Dec 2011
Location: Simplicity
Posts: 2,816

Rep: Reputation: 1211Reputation: 1211Reputation: 1211Reputation: 1211Reputation: 1211Reputation: 1211Reputation: 1211Reputation: 1211Reputation: 1211
No, find reads the files in directory order.
You must sort the files on both sides before you checksum+compare.
 
Old 06-27-2022, 01:07 AM   #12
rnturn
Senior Member
 
Registered: Jan 2003
Location: Illinois (SW Chicago 'burbs)
Distribution: openSUSE, Raspbian, Slackware. Previous: MacOS, Red Hat, Coherent, Consensys SVR4.2, Tru64, Solaris
Posts: 2,808

Rep: Reputation: 550Reputation: 550Reputation: 550Reputation: 550Reputation: 550Reputation: 550
Quote:
Originally Posted by TheLexx View Post
Upon farther reflection I am thinking that "ls -Q" might be sufficient.
Not sure how the OP is/was copying directories but, Gnu-Linux's find has the "-print0" switch is handy to get around the landmines like special characters and spaces in filenames when copying directory structures with 'cpio' and its "--null" switch. (Hmm... I seem to have said that above.) Sadly, not many utilities are able to process ASCIZ (null-terminated) filespecs. (For example, tar may blow up spectacularly when confronted with them.)

Last edited by rnturn; 06-27-2022 at 01:12 AM.
 
Old 06-27-2022, 01:40 AM   #13
MadeInGermany
Senior Member
 
Registered: Dec 2011
Location: Simplicity
Posts: 2,816

Rep: Reputation: 1211Reputation: 1211Reputation: 1211Reputation: 1211Reputation: 1211Reputation: 1211Reputation: 1211Reputation: 1211Reputation: 1211
Not quite correct.
-print0 (and cpio --null) can handle a newline character in file names.
-print handles all other characters correctly.
 
Old 06-29-2022, 01:23 AM   #14
chrism01
LQ Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Rocky 9.2
Posts: 18,364

Rep: Reputation: 2752Reputation: 2752Reputation: 2752Reputation: 2752Reputation: 2752Reputation: 2752Reputation: 2752Reputation: 2752Reputation: 2752Reputation: 2752Reputation: 2752
Here's a guy who had the same type of issue and did a nice simple write up of using rsync-with-checksums https://blog.wirelessmoves.com/2017/...and-rsync.html
 
1 members found this post helpful.
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Python: need to pass quoted string to an exec call; prob w/ escaped chars BrianK Programming 4 12-22-2008 07:49 PM
Spaces and escaped spaces pslacerda Linux - Newbie 13 12-20-2008 09:03 AM
php+XML - escaped ampersands not escaping correctly veritas Programming 7 06-22-2007 05:51 PM
php regex: escaped characters? Thinking Programming 1 02-09-2006 10:01 AM
renaming file with escaped ascii, octal or anything whansard Linux - General 2 08-07-2005 08:59 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 07:57 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration