ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I'm trying to retrieve filenames in an escaped format. I would like to use the "find command" and retrieve all the filenames under a singe directory, similar to the form you would get with the following command "find /home/username". If the filenames contain spaces, single quotes, double quotes or other characters that are problematic to Unix, I would like to retrieve those names as ether a hex or octagonal escape sequence.
I am writing a short script in python (that calls standard Unix commands). The purpose of the script is to create a text file that contains the md5sum for all files under a directory. The purpose of the file is to verify that a directory structure was copied without corruption.
I tried experimenting with ls and the --quote-name and --quoting-style options. Any idea where I should start?
I'm trying to retrieve filenames in an escaped format. I would like to use the "find command" and retrieve all the filenames under a singe directory, similar to the form you would get with the following command "find /home/username". If the filenames contain spaces, single quotes, double quotes or other characters that are problematic to Unix, I would like to retrieve those names as ether a hex or octagonal escape sequence.
I am writing a short script in python (that calls standard Unix commands). The purpose of the script is to create a text file that contains the md5sum for all files under a directory. The purpose of the file is to verify that a directory structure was copied without corruption.
I tried experimenting with ls and the --quote-name and --quoting-style options. Any idea where I should start?
I don't think find alone can do this.
I suspect an XY-problem here.
First of all I think you should either write a python script, or a shell script.
There might be rare occasions where you might need to include shell commands in a python script, but I don't think this is such an occasion.
What exactly are you trying to achieve and where exactly does it fail?
Show us.
Please use CODE tags for full command output and code (see my signature).
Well maybe your right. I came into this with a mindset that "I gotta get this done". So, I grabbed the first peg I found and decided to ram it into the hole no matter what shape it was.. After more examination, I might be trying to ram a square peg into a round hole.
Lets start with what I am trying to achieve. I've found out that when I copy lots of data (greater than 10GB) it is not unheard of for single byte errors to creep in. So, I would like to verify that the files are identical. The kind of copying I am doing is copying whole disk partitions or possibly just large sections of a disk partition,either way the copying is many directories deep.
As a way to assure no errors creep in, I would like to create two master files of md5sums, one for the source and one for the destination . I can then compare the two master files with "diff". If there is a discrepancy I can then zero in on the particular file(s) with issues and re-copy those files. To add to the assurance, I would unmount/mount the destination partition before creating the master file, this would assure that I am not reading from buffer.
One issue in creating a "diff-able file", is that the find may not return the files in the same order. Using the command "sort" should take care of this issue. I am still worried about a second issue that the filenames themselves could be problematic. This is what I was obsessing with when I first posted. My original idea was just to use a brute force replace all problematic characters with escape sequences where I thought that the command "ls -Q" would not be sufficient. Upon farther reflection I am thinking that "ls -Q" might be sufficient.
This is what I will try for now
Code:
find sourcedirectory -type f -exec ls -Q {} \; | sort > tempfile
Then I will read the file one line and running md5sum on each line. Do any of you see a potential stumbling block? I think most programs filter so that carriage returns are not part of the filename. My guess that by using "ls -Q" such a filename would get escaped. I'll keep everyone updated on my progress.
What's wrong with the method specified in post #5 - i.e. generate the list of checksums on one side, then transfer that file and use --check to verify against it on the other side.
I've found out that when I copy lots of data (greater than 10GB) it is not unheard of for single byte errors to creep in.
That absolutely should not happen. There's a hardware issue or a serious software bug somewhere in your setup.
Hard drives, SATA protocols, PCIe transfers, Ethernet connections, IP transport protocols ... they all have robust error detection and correction mechanisms. It's almost impossible for data corruption to happen undetected with any of these mechanisms.
If you find that you cannot reliably transfer a few gigabytes of data from A to B, you definitely need to figure out what exactly is corrupting your data. If a simple transfer like you're describing is failing, there's no way to tell what else is getting silently corrupted without you noticing.
If this is indeed a hardware issue, the most likely culprit is memory. If it's software related, it could be pretty much any process involved in the transfer.
Distribution: openSUSE, Raspbian, Slackware. Previous: MacOS, Red Hat, Coherent, Consensys SVR4.2, Tru64, Solaris
Posts: 2,808
Rep:
Quote:
Originally Posted by TheLexx
The purpose of the script is to create a text file that contains the md5sum for all files under a directory. The purpose of the file is to verify that a directory structure was copied without corruption.
Would:
Code:
$ find ${SRCDIR} -type f -exec cat {} \; | md5sum
$ <some command to copy the directory $SRCDIR to $TGTDIR>
$ find ${TGTDIR} -type f -exec cat {} \; | md5sum
give you what you want? 'find' is simply walking the tree at the specified directory and invoking 'cat' to copy each file to stdout and piping the contents of all the files into md5sum.
It's a little different than EdGr's solution. That one will find the individual file (or files) with copy errors (which shouldn't occur, BTW) while the above lets you know that the copy process, as a whole, worked. Or didn't work.
There are multiple way to copy directory structures: tar, cpio, "cp -R" (or course), rsync, and I'm probably forgetting a few others. In my experience, all of them copy files/trees without fail if the system isn't broken in some way (filesystem corruption, hardware failure, etc.). So I'm not certain obtaining the checksum is strictly necessary but, I suppose, can give one confidence in the process.
As for making the copy: I lean heavily toward 'cpio' when copying directories (or trees) and the find "-print0" switch used in conjunction with cpio's "--null" switch handles filespecs with funky characters just fine.
Distribution: openSUSE, Raspbian, Slackware. Previous: MacOS, Red Hat, Coherent, Consensys SVR4.2, Tru64, Solaris
Posts: 2,808
Rep:
Quote:
Originally Posted by TheLexx
Upon farther reflection I am thinking that "ls -Q" might be sufficient.
Not sure how the OP is/was copying directories but, Gnu-Linux's find has the "-print0" switch is handy to get around the landmines like special characters and spaces in filenames when copying directory structures with 'cpio' and its "--null" switch. (Hmm... I seem to have said that above.) Sadly, not many utilities are able to process ASCIZ (null-terminated) filespecs. (For example, tar may blow up spectacularly when confronted with them.)
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.