Subject: find . (need escaped sequence)
I'm trying to retrieve filenames in an escaped format. I would like to use the "find command" and retrieve all the filenames under a singe directory, similar to the form you would get with the following command "find /home/username". If the filenames contain spaces, single quotes, double quotes or other characters that are problematic to Unix, I would like to retrieve those names as ether a hex or octagonal escape sequence.
I am writing a short script in python (that calls standard Unix commands). The purpose of the script is to create a text file that contains the md5sum for all files under a directory. The purpose of the file is to verify that a directory structure was copied without corruption. I tried experimenting with ls and the --quote-name and --quoting-style options. Any idea where I should start? |
Quote:
I suspect an XY-problem here. First of all I think you should either write a python script, or a shell script. There might be rare occasions where you might need to include shell commands in a python script, but I don't think this is such an occasion. What exactly are you trying to achieve and where exactly does it fail? Show us. Please use CODE tags for full command output and code (see my signature). |
Code:
find . -type f -print0 | xargs -0 |
if it is a single directory: do not use find, a simple * would do the same.
Code:
for f in *; do printf "%q\n" "$f"; done |
You want to avoid passing filenames through the shell. Have the find command execute md5sum directly:
Code:
find $dir -type f -exec md5sum {} \; > checksums Code:
md5sum -c checksums |
Quote:
Lets start with what I am trying to achieve. I've found out that when I copy lots of data (greater than 10GB) it is not unheard of for single byte errors to creep in. So, I would like to verify that the files are identical. The kind of copying I am doing is copying whole disk partitions or possibly just large sections of a disk partition,either way the copying is many directories deep. As a way to assure no errors creep in, I would like to create two master files of md5sums, one for the source and one for the destination . I can then compare the two master files with "diff". If there is a discrepancy I can then zero in on the particular file(s) with issues and re-copy those files. To add to the assurance, I would unmount/mount the destination partition before creating the master file, this would assure that I am not reading from buffer. One issue in creating a "diff-able file", is that the find may not return the files in the same order. Using the command "sort" should take care of this issue. I am still worried about a second issue that the filenames themselves could be problematic. This is what I was obsessing with when I first posted. My original idea was just to use a brute force replace all problematic characters with escape sequences where I thought that the command "ls -Q" would not be sufficient. Upon farther reflection I am thinking that "ls -Q" might be sufficient. This is what I will try for now Code:
find sourcedirectory -type f -exec ls -Q {} \; | sort > tempfile |
What's wrong with the method specified in post #5 - i.e. generate the list of checksums on one side, then transfer that file and use --check to verify against it on the other side.
|
I fear that ls -Q does not solve a problem - but might create one.
Go for Code:
find "sourcedirectory" -type f -exec md5sum {} \; | sort -k2 > checksums |
Quote:
Hard drives, SATA protocols, PCIe transfers, Ethernet connections, IP transport protocols ... they all have robust error detection and correction mechanisms. It's almost impossible for data corruption to happen undetected with any of these mechanisms. If you find that you cannot reliably transfer a few gigabytes of data from A to B, you definitely need to figure out what exactly is corrupting your data. If a simple transfer like you're describing is failing, there's no way to tell what else is getting silently corrupted without you noticing. If this is indeed a hardware issue, the most likely culprit is memory. If it's software related, it could be pretty much any process involved in the transfer. |
Quote:
Code:
$ find ${SRCDIR} -type f -exec cat {} \; | md5sum It's a little different than EdGr's solution. That one will find the individual file (or files) with copy errors (which shouldn't occur, BTW) while the above lets you know that the copy process, as a whole, worked. Or didn't work. There are multiple way to copy directory structures: tar, cpio, "cp -R" (or course), rsync, and I'm probably forgetting a few others. In my experience, all of them copy files/trees without fail if the system isn't broken in some way (filesystem corruption, hardware failure, etc.). So I'm not certain obtaining the checksum is strictly necessary but, I suppose, can give one confidence in the process. As for making the copy: I lean heavily toward 'cpio' when copying directories (or trees) and the find "-print0" switch used in conjunction with cpio's "--null" switch handles filespecs with funky characters just fine. HTH... |
No, find reads the files in directory order.
You must sort the files on both sides before you checksum+compare. |
Quote:
|
Not quite correct.
-print0 (and cpio --null) can handle a newline character in file names. -print handles all other characters correctly. |
Here's a guy who had the same type of issue and did a nice simple write up of using rsync-with-checksums https://blog.wirelessmoves.com/2017/...and-rsync.html
|
All times are GMT -5. The time now is 03:06 PM. |