Loop command on folders in a directory

azurite · 06-14-2016, 10:44 PM

Quote:

Originally Posted by JJJCR

check out link below, it will explain some of the looping process.

http://wiki.bash-hackers.org/syntax/pe

http://www.linuxjournal.com/content/...lobbing-option

Good luck! Glad to know you finally run the script successfully.

Quote:

Originally Posted by Turbocapitalist

Well, most of it is buried in the "bash" manual page as shopt, for, and cd are built-in commands. I'd recommend following everything up with a browser through it to learn it as a reference source. No one can remember it all, or even most of it, but you can remember where you last saw something and be able to look it up as needed. "just in time" rather than "just in case". If you use features often enough, they will sink in.

"shopt -s globstar" is where the important change is. That allows ** to mean a recursive descent into directories and subdirectories. It is about the equivalent of "find" offered in #5 above.

The ${f%/*} trims the variable $f using substring removal matching from the end.

The "stat -c %n" prints just the file name and using the $( ... ) for command substitution the resulting output from the program "stat" is treated as if it were a string, allowing it to be assigned to a variable.

The ${f##/} is more substring removal but matching from the beginning. (Note that the slash is being matched it is not an operator.)

The second output=" line is just just a search and removal of '_eddy_corrected_brain.nii.gz/' and appending '_dtifit'

So, yes, "bash" is a scripting language.

Edit: JJJCR was more concise and posted the relevant links while I was slowly drafting the above.

Thank you for taking the time to reply. I will try to read the info you have provided and write a few more scripts that I need. I'm hoping it will be okay for me to post it here (once it's ready) so that it may be checked for correctness?

Also, are there any textbooks (or other helpful) resources that would make a good resource for learning for someone who doesn't have much of a computer science background?

Turbocapitalist · 06-15-2016, 12:59 AM

Sure. Though subsequent scripts would probably benefit from being posted in the programming subforum.

The Bash Hackers' wiki is quite thorough and has links at the start to a lot of other resources to fill in background here and there. The Wooledge site has three useful series, maybe start with them, since they are shorter and more focused:

http://mywiki.wooledge.org/BashGuide
http://mywiki.wooledge.org/BashPitfalls
http://mywiki.wooledge.org/BashSheet

O'Reilly has three printed books which might be relevant, Classic Shell Scripting, Learning the bash Shell, and a bash Cookbook. You'll have to skim them at your library or book store to see if they would help. Most things stay the same so the print versions don't really go out of date, but as you see with 'globstar' some things do get added occasionally.

JJJCR · 06-15-2016, 02:04 AM

Quote:

Thank you for taking the time to reply. I will try to read the info you have provided and write a few more scripts that I need. I'm hoping it will be okay for me to post it here (once it's ready) so that it may be checked for correctness?

Also, are there any textbooks (or other helpful) resources that would make a good resource for learning for someone who doesn't have much of a computer science background?

The one good thing to learn Bash is to just do it. Regex, AWK and SED are good topics to learn which makes life easier also.

There's quite a lot of sites to learn Bash, a nice keyword will bring what you need.

Here's a link: http://wiki.bash-hackers.org/scripting/tutoriallist

allend · 06-15-2016, 09:17 AM

Ouch!
I left an asterisk out.

Code:

  # Build the output file name
  output="${f##/}"
  output="${f##*/}"

Thanks to JJJCR and Turbocapitalist for following up.

As a demonstration exercise you could try this.
Make a temporary directory somewhere.
Open a terminal in that directory.
Execute these commands.

Code:

mkdir {a,b,c,a/a1,b/b1,c/c1}
tree
touch {a/a1/,b/b1/,c/c1/}{a,b,c}{1,2,3}.xyz
tree
for f in **/*.xyz; do echo "$f"; done
shopt -s globstar
for f in **/*.xyz; do echo "$f"; done
for f in **/a*.xyz; do echo "$f"; done
for f in **/*.xyz; do echo "${f##*/}"; done
for f in **/*.xyz; do echo "${f%/*}"; done
shopt -u globstar

To clean up, run 'rm -rf ./*' from the same directory, being careful to note that period character!

I also note that the use of 'stat' in the script is a cheap hack that relies on your particular data structure having unique file suffixes.

azurite · 06-16-2016, 12:50 AM

Quote:

Originally Posted by allend

Ouch!
I left an asterisk out.

Code:

  # Build the output file name
  output="${f##/}"
  output="${f##*/}"

Thanks to JJJCR and Turbocapitalist for following up.

As a demonstration exercise you could try this.
Make a temporary directory somewhere.
Open a terminal in that directory.
Execute these commands.

Code:

mkdir {a,b,c,a/a1,b/b1,c/c1}
tree
touch {a/a1/,b/b1/,c/c1/}{a,b,c}{1,2,3}.xyz
tree
for f in **/*.xyz; do echo "$f"; done
shopt -s globstar
for f in **/*.xyz; do echo "$f"; done
for f in **/a*.xyz; do echo "$f"; done
for f in **/*.xyz; do echo "${f##*/}"; done
for f in **/*.xyz; do echo "${f%/*}"; done
shopt -u globstar

To clean up, run 'rm -rf ./*' from the same directory, being careful to note that period character!

I also note that the use of 'stat' in the script is a cheap hack that relies on your particular data structure having unique file suffixes.

I haven't had a chance yet to test this out but will do it soon.
What do the * stand for when writing scripts? What is the difference between using one * and using two **? Also in reference to the script you wrote earlier, what does the % and # stand for?

Turbocapitalist · 06-16-2016, 01:32 AM

Quote:

Originally Posted by azurite

What do the * stand for when writing scripts? What is the difference between using one * and using two **? Also in reference to the script you wrote earlier, what does the % and # stand for?

See posts #29 and #30 above. Be sure to check the Bash Hackers' link in either. It goes into detail about the answers to those questions.

azurite · 07-04-2016, 11:18 PM

Hello again everyone,

I have another question.
I am working with the same files and directory/subdirectory structure as mentioned in this thread before. Now, what is changed is after running the working script, I now have a file named 'dti_FA.nii.gz' in all of the subdirectories. How do I go into all the subdirectories again, make a copy of the specified file and move it to another directory, let's call it TBSS. In addition, while I am copying the files over, I also need to simultaneously rename the files to include the subject number that's in subfolder path. So it would be renamed to something like '111180_dti_FA.nii.gz?'

Now I've tried modifying the previous script to do the above but I'm not sure I have it down correctly.

Code:

#!/bin/bash

# Top of directory structure containing files to be processed.
topdir="/home/natasha/Documents/DWI"

cd "$topdir"

# Set shopt to also look in subdirectories
shopt -s globstar

for f in **/*dti_FA.nii.gz; do
  # Change to subdirectory
  cd "${f%/*}"


  # Look for files with this name
  file="$(stat -c %n *dti_FA.nii.gz)"


  #copy the files into the tbss directory
  cp $file /$topdir /home/natasha/Documents/TBSS


  # Rename the files
  # Build the output file name
  output="${f/*}"
  output="${output}"


  # Change back to top directory
  cd "$topdir"
done

# Undo change to shopt
shopt -u globstar

Turbocapitalist · 07-05-2016, 01:57 AM

Quote:

Originally Posted by azurite

Code:

...
  # Look for files with this name
  file="$(stat -c %n *dti_FA.nii.gz)"

  #copy the files into the tbss directory
  cp $file /$topdir /home/natasha/Documents/TBSS
...

The "cp" will work with globs so you could look for *dti_FA.nii.gz directly without needed to store the name in $file. Then there are several options for doing the actual copying. The main way would be to use the common "cp source destination" syntax

Code:

cp *dti_FA.nii.gz /home/natasha/Documents/TBSS/

That would put any files ending in dti_FA.nii.gz into the directory /home/natasha/Documents/TBSS/ It's helpful to always put a trailing slash on directory names for two reasons. One is that it reminds you later of where the file is supposed to go. The other is that so the file(s) will really go to a directory and not keep overwriting the same file.

Another option for doing the copying is to use -t to specify the destination (target), but that's not so useful in this case. Another option which would save space would be to make a hardlink with -l instead of copying, that saves space. It basically copies the name to a new place, leaving you with another name to access the file though you still have only one file.

See the manual page for "cp" for the details.

If you wanted to move the file from one directory to another or rename it, then the "mv" program will do that for you. In it's basic use it would also be "mv source destination"

Code:

mv -i *dti_FA.nii.gz /home/natasha/Documents/TBSS/

The -i is optional but useful. See the manual page for "mv" for details.

azurite · 07-05-2016, 02:15 AM

Quote:

Originally Posted by Turbocapitalist

The "cp" will work with globs so you could look for *dti_FA.nii.gz directly without needed to store the name in $file. Then there are several options for doing the actual copying. The main way would be to use the common "cp source destination" syntax

Code:

cp *dti_FA.nii.gz /home/natasha/Documents/TBSS/

That would put any files ending in dti_FA.nii.gz into the directory /home/natasha/Documents/TBSS/ It's helpful to always put a trailing slash on directory names for two reasons. One is that it reminds you later of where the file is supposed to go. The other is that so the file(s) will really go to a directory and not keep overwriting the same file.

Another option for doing the copying is to use -t to specify the destination (target), but that's not so useful in this case. Another option which would save space would be to make a hardlink with -l instead of copying, that saves space. It basically copies the name to a new place, leaving you with another name to access the file though you still have only one file.

See the manual page for "cp" for the details.

If you wanted to move the file from one directory to another or rename it, then the "mv" program will do that for you. In it's basic use it would also be "mv source destination"

Code:

mv -i *dti_FA.nii.gz /home/natasha/Documents/TBSS/

The -i is optional but useful. See the manual page for "mv" for details.

Ah thank you for the prompt response.
So I should write something along the lines of the following?

Code:

# Top of directory structure containing files to be processed.
topdir="/home/natasha/Documents/DWI"

cd "$topdir"

# Set shopt to also look in subdirectories
shopt -s globstar

for f in **/*dti_FA.nii.gz; do
  # Change to subdirectory
  cd "${f%/*}"


  # Look for files with this name and copy into new directory
 cp *dti_FA.nii.gz /home/natasha/Documents/TBSS/


  # Change back to top directory
  cd "$topdir"
done

# Undo change to shopt
shopt -u globstar

Though, I'm still not sure how to go about prepending the subdirectory name to the filename before moving it to the new tbss directory? Any ideas?

Turbocapitalist · 07-05-2016, 02:50 AM

If your script is in the current working directory then you do not need to prepend any path to the file name.

The following would copy from a single directory ( /home/natasha/Documents/DWI/ ) any files with the right ending to /home/natasha/Documents/TBSS/

Code:

cd /home/natasha/Documents/DWI/

cp *dti_FA.nii.gz /home/natasha/Documents/TBSS/

It's only if you are in another directory that you'd have to have a path.

Code:

cd /home/natasha/Documents/

cp /home/natasha/Documents/DWI/*dti_FA.nii.gz /home/natasha/Documents/TBSS/

azurite · 07-05-2016, 10:43 AM

Quote:

Originally Posted by Turbocapitalist

If your script is in the current working directory then you do not need to prepend any path to the file name.

The following would copy from a single directory ( /home/natasha/Documents/DWI/ ) any files with the right ending to /home/natasha/Documents/TBSS/

Code:

cd /home/natasha/Documents/DWI/

cp *dti_FA.nii.gz /home/natasha/Documents/TBSS/

It's only if you are in another directory that you'd have to have a path.

Code:

cd /home/natasha/Documents/

cp /home/natasha/Documents/DWI/*dti_FA.nii.gz /home/natasha/Documents/TBSS/

I think my wording was not correct. I meant how do I rename the files so that it includes the bolded part of the pathname in the filename.

Actually, I don't really need to make a copy (I just wanted to be cautious), so the primary objective is renaming the files and moving them to another directory.

Code:

home/natasha/Documents/DWI/111180-100/3t_2016-01-07_21-42/003_DTI_siemens_TClessdistort/dtifit_FA.nii.gz
home/natasha/Documents/DWI/111445-100/3t_2016-01-07_21-42/003_DTI_siemens_TClessdistort/dtifit_FA.nii.gz
etc and loop through all subfolders

so the file name would change from

Code:

dtifit_FA.nii.gz

to

Code:

111180-100_dtifit_FA.nii.gz

?

I found a simpler code, not sure if this will work recursively though?

Code:

shopt -s globstar
for f in **/*dti_*; 
do mv "$f" "$f{/dti_/ /##_dti}";  done

Turbocapitalist · 07-06-2016, 12:36 AM

Quote:

Originally Posted by azurite

I found a simpler code, not sure if this will work recursively though?

Code:

shopt -s globstar
for f in **/*dti_*; 
do mv "$f" "$f{/dti_/ /##_dti}";  done

I'm less familiar with the new globstar feature and not able to get that one to work. I tend to be reliant on "find" for recursive traversal of directories. It has a lot of options. Here I use formatted printing (printf) to get the path and file names separately and pipe that into a while loop as the variables $p and $f

Code:

find /home/natasha/Documents/DWI/ -type f -name 'dti*.gz' -printf '%h %f\n' | while read p f;
do
  p=$(echo $p | sed -e 's#/[^/]*/[^/]*$##; s#^.*/##;');
  echo mv -i "$f" "$p""_$f";
done

The tool "sed" then does the string manipulation. It trims off the deepest two directory names and then trims off all but the deepest remaining one. Often you see substitutions as s///, but that makes paths hard to work with because they contain slashes themselves. So the delimiter used above is # making s/// into s### instead.

That loop does not check for any errors and expects the dti*.gz files to always be two directories down from the number used in renaming.

allend · 07-06-2016, 07:57 AM

An alternative bash solution

Code:

#!/bin/bash

# Top of directory structure containing files to be processed.
topdir="/home/natasha/Documents/DWI"

# Destination directory for files
destdir="/home/natasha/Documents/TBSS/"

cd "$topdir"

shopt -s globstar

for f in **/dtifit_FA.nii.gz; do
  [[ "$f" =~ ([[:digit:]-]*)/.*/(dtifit_FA.nii.gz) ]] && \
  cp "$f" "$destdir""${BASH_REMATCH[1]}"_"${BASH_REMATCH[2]}"
done

shopt -u globstar

PS - Thanks to grail for the introduction to the use of the =~ operator with the BASH_REMATCH array

azurite · 07-06-2016, 11:43 AM

Quote:

Originally Posted by allend

An alternative bash solution

Code:

#!/bin/bash

# Top of directory structure containing files to be processed.
topdir="/home/natasha/Documents/DWI"

# Destination directory for files
destdir="/home/natasha/Documents/TBSS/"

cd "$topdir"

shopt -s globstar

for f in **/dtifit_FA.nii.gz; do
  [[ "$f" =~ ([[:digit:]-]*)/.*/(dtifit_FA.nii.gz) ]] && \
  cp "$f" "$destdir""${BASH_REMATCH[1]}"_"${BASH_REMATCH[2]}"
done

shopt -u globstar

PS - Thanks to grail for the introduction to the use of the =~ operator with the BASH_REMATCH array

I just want to say thank you to everyone, I can't even begin to comprehend some of the codes you all come up with but I am trying to learn so thank you for being patient with me. Is there also a method of doing this with a for loop parameter expansion? Or allend if you could explain what each line of your code does, all the brackets etc seem scary to me.

I have the directory structure below (only included 3 subfolders for the purposes of keeping this uncluttered). How do I modify your code so that it renames all the files that begin with "dti"? Do I just replace "dtifit_FA.nii.gz" with "dti_*nii.gz"?

Code:

├── 111180-100
│** └── 3t_2016-01-07_21-42
│**     └── 003_DTI_siemens_TClessdistort
│**         ├── 0001.dcm
│**         ├── 111180-100_eddy_corrected_brain_mask.nii.gz
│**         ├── 111180-100_eddy_corrected_brain.nii.gz
│**         ├── 111180-100_eddy_corrected.ecclog
│**         ├── 111180-100_eddy_corrected.nii.gz
│**         ├── 20160107_214213DTIsiemensTClessdistorts003a001.bval
│**         ├── 20160107_214213DTIsiemensTClessdistorts003a001.bvec
│**         ├── 20160107_214213DTIsiemensTClessdistorts003a001.nii.gz
│**         ├── dti_FA.nii.gz
│**         ├── dti_L1.nii.gz
│**         ├── dti_L2.nii.gz
│**         ├── dti_L3.nii.gz
│**         ├── dti_MD.nii.gz
│**         ├── dti_MO.nii.gz
│**         ├── dti_S0.nii.gz
│**         ├── dti_V1.nii.gz
│**         ├── dti_V2.nii.gz
│**         └── dti_V3.nii.gz
├── 111405-100
│** └── 3t_2015-12-08_21-54
│**     └── 003_DTI_siemens_TClessdistort
│**         ├── 0001.dcm
│**         ├── 111405-100_eddy_corrected_brain_mask.nii.gz
│**         ├── 111405-100_eddy_corrected_brain.nii.gz
│**         ├── 111405-100_eddy_corrected.ecclog
│**         ├── 111405-100_eddy_corrected.nii.gz
│**         ├── 20151208_215447DTIsiemensTClessdistorts003a001.bval
│**         ├── 20151208_215447DTIsiemensTClessdistorts003a001.bvec
│**         ├── 20151208_215447DTIsiemensTClessdistorts003a001.nii.gz
│**         ├── dti_FA.nii.gz
│**         ├── dti_L1.nii.gz
│**         ├── dti_L2.nii.gz
│**         ├── dti_L3.nii.gz
│**         ├── dti_MD.nii.gz
│**         ├── dti_MO.nii.gz
│**         ├── dti_S0.nii.gz
│**         ├── dti_V1.nii.gz
│**         ├── dti_V2.nii.gz
│**         └── dti_V3.nii.gz
├── 111440-100
│** └── 3t_2016-01-27_20-58
│**     └── 003_DTI_siemens_TClessdistort
│**         ├── 0001.dcm
│**         ├── 111440-100_eddy_corrected_brain_mask.nii.gz
│**         ├── 111440-100_eddy_corrected_brain.nii.gz
│**         ├── 111440-100_eddy_corrected.ecclog
│**         ├── 111440-100_eddy_corrected.nii.gz
│**         ├── 20160127_205836DTIsiemensTClessdistorts003a001.bval
│**         ├── 20160127_205836DTIsiemensTClessdistorts003a001.bvec
│**         ├── 20160127_205836DTIsiemensTClessdistorts003a001.nii.gz
│**         ├── dti_FA.nii.gz
│**         ├── dti_L1.nii.gz
│**         ├── dti_L2.nii.gz
│**         ├── dti_L3.nii.gz
│**         ├── dti_MD.nii.gz
│**         ├── dti_MO.nii.gz
│**         ├── dti_S0.nii.gz
│**         ├── dti_V1.nii.gz
│**         ├── dti_V2.nii.gz
│**         └── dti_V3.nii.gz
└── example

allend · 07-06-2016, 03:32 PM

Quote:

Or allend if you could explain what each line of your code does

Actually there is only one line of code that is new in my last post.

Code:

  [[ "$f" =~ ([[:digit:]-]*)/.*/(dtifit_FA.nii.gz) ]] && \
  cp "$f" "$destdir""${BASH_REMATCH[1]}"_"${BASH_REMATCH[2]}"

It has been written as two lines using the \ as a line continuation.
Breaking it down a little:
[[ "$f" =~ ([[:digit:]-]*)/.*/(dtifit_FA.nii.gz) ]] is a conditional expression that evaluates as true or false. The filename in the variable named f is matched to the regular expression ([[:digit:]-]*)/.*/(dtifit_FA.nii.gz). The regular expression says to:
-look for a string containing only characters that are digits or a hyphen (the subject numbers in your data),
-then the longest string that starts with a / and ends with a / separated by zero or more characters (denoted by .*),
-and finally the string "dtifit_FA.nii.gz".
If a match is found, then the parenthesised parts of the regular expression are assigned to the array BASH_REMATCH in turn.
If the conditional expression is true, then the next part of the line is executed (denoted by &&).
The final part of the line, cp "$f" "$destdir""${BASH_REMATCH[1]}"_"${BASH_REMATCH[2]}", is the copy command (you could change 'cp' to 'mv' or 'ln -s' to move or create symbolic links instead).

Quote:

Do I just replace "dtifit_FA.nii.gz" with "dti_*nii.gz"

Close - "dti_.*nii.gz"