Linux - NewbieThis Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place!
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I was wondering if anyone can help me figure out what it going on with a CRON job. The scenario:
I have a script that transfers file between two machines using sftp. The script sftps to to remote machine, does an ls on the directory, and pipes that to a file, "remote_files.txt". It then transfers this file to the local machine. There is a file named "old_files.txt" on the local machine that has a list of all the files that were transferred. When the "remote_files.txt" file is transferred to the local machine, the script runs a grep against them, and writes the deltas to "new_files.txt". It then sftps the files that are listed in the "new_files.txt" file, and appends them to the "old_files.txt" list.
The CRON job was set to run every 10 minutes, and has had to be increased to 30 minutes. When I run the script manually, it takes about 48 seconds, but when it runs through CRON, its taking over 30 minutes. We dont want to have to keep increasing it, and are trying to figure out why its taking so long. When I run top whiles its running, its the grep command that is taking about 28 minutes to run. This is the command that we have:
If what you're doing is essentially backing up one machine to the other, then using rsync/rsyncd would probably give you a lot more options, flexibility, and maybe even speed (since it's very good with transferring just the parts of things that have changed.)
For this to work, though, you'd have to be able to configure both machines to open the proper port and for the greatest flexibility, you might want to run the whole thing using rsyncd on the remote machine and you'd have to make sure that rsync/rsyncd are properly installed and accessible to you on both ends. I use it all the time on my local machine, but haven't worked with remote ones yet.
For more info, there's tons on the web or see the list at "rsync" <rsync@lists.samba.org> where the experts hang out.
It sounds like you're not sure which line in your cron script is actually taking a long time. You can find out with the time command. Just put in front of each command and it will tell you how much time each is taking.
Of course, it writes to stderr, so you'll have to be sure to capture it, like chrism01 did.
As for scp, I have code like this in a couple of my cron scripts:
Code:
test -r ~/.keychain/$(hostname)-sh && . ~/.keychain/$(hostname)-sh
if [ -z "$SSH_AGENT_PID" ] || [ ! -d "/proc/$SSH_AGENT_PID" ]; then
echo "ERROR: No ssh agent, SSH_AGENT_PID=\"$SSH_AGENT_PID\""
exit 1
fi
@chrism01 - Yes, we have it set up to capture to a log like that. I will try adding that to the code to see if I can see anything else..
@josephj - Are you asking what kind of access I personally have? I have root access to both machines. We have it set up so there is a user with the same username on both machines, and that is what we use to initiate the transfer. The way our network is set up: We have one web machine that is exposed to the internet, and one application machine that is not exposed to the internet. The application and web machine can only talk to each other but only when the application machine initiates an SFTP connection. Our clients upload a file through our UI that is on the web machine, and it gets transfered to the application machine to be processed. We have explored using rsync, but it requires both machines directories to match, and when the file on the application machine is processed, its moved to an archive directory, so that wont work..
@KenJackson - I will look into adding the time command to the script. Would I add it on the line above, or in line with the current commands?
We have explored using rsync, but it requires both machines directories to match, and when the file
Are you able to set up links during the archiving process on the remote machine, such as 'ln -s DIR/archive/FILE DIR/FILE' so that rsync can find them? Would make this easy if you can.
Re the grep, is the cron job disabled when you tested this? I'm wondering if perhaps cron is tripping over itself by touching one of the txt files while the previous cron job was still running.
For me personally 9 out of 10 problems with cron are dealing with the fact that my script runs as a different user. And I am betting it's your problem this time too.
You use $HOME but never mention under which user the cron task runs as.
Test your script by
su -c '/usr/local/bin/script par1 par2' useritshouldrunas
And if you are using $HOME that might not even be sufficient. Echo it in your script so you know its value!
Separately from the CRON vs manually-run timing issue, you might also think about this design issue.
If I understand your process correctly, it looks like your old_files.txt file keeps getting appended to. (If I'm mistaken, you can ignore all this.) I'm not sure how many lines yours currently contains, but as it keeps getting larger, your grep may take longer and longer to process, I suppose depending on how beefy your machine is.
I ran a little test on an older (1.6 GHz) PC I have with Slackware 13.37. I created a file of file names with ls to simulate your remote_files.txt, and then another set of files simmulating your old_files.txt. I timed several runs of grep matching yours, using old_files.txt with increasing numbers of lines. The run-time of the grep increased as the number of lines in old_files.txt increased:
number of lines in old_files.txt -- grep run time
15 -- near-instantaneous
2000 -- 9 seconds
8000 -- 88 seconds
32000 -- 13 minutes 48 seconds
Again, if I understand your process correctly, you may find it keeps taking longer and longer as your old_files.txt grows. Just a thought.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.