CRON Help

blsimpson · 09-04-2012, 02:04 PM

I was wondering if anyone can help me figure out what it going on with a CRON job. The scenario:

I have a script that transfers file between two machines using sftp. The script sftps to to remote machine, does an ls on the directory, and pipes that to a file, "remote_files.txt". It then transfers this file to the local machine. There is a file named "old_files.txt" on the local machine that has a list of all the files that were transferred. When the "remote_files.txt" file is transferred to the local machine, the script runs a grep against them, and writes the deltas to "new_files.txt". It then sftps the files that are listed in the "new_files.txt" file, and appends them to the "old_files.txt" list.

The CRON job was set to run every 10 minutes, and has had to be increased to 30 minutes. When I run the script manually, it takes about 48 seconds, but when it runs through CRON, its taking over 30 minutes. We dont want to have to keep increasing it, and are trying to figure out why its taking so long. When I run top whiles its running, its the grep command that is taking about 28 minutes to run. This is the command that we have:

Code:

grep -vf $HOME/old_files.txt $HOME/remote_files.txt > $HOME/new_files.txt

I dont see anything in the cron logs that would tell me anything.. Any ideas?

chrism01 · 09-04-2012, 07:14 PM

Are you capturing all the output eg

Code:

*/30 * * * * /path/my.sh >/home/me/my.log  2>&1

Also, try adding the 'set -xv' debug code near the top

Code:

#!/bin/bash
set -xv

josephj · 09-04-2012, 07:57 PM

What access do you have to both machines?

If what you're doing is essentially backing up one machine to the other, then using rsync/rsyncd would probably give you a lot more options, flexibility, and maybe even speed (since it's very good with transferring just the parts of things that have changed.)

For this to work, though, you'd have to be able to configure both machines to open the proper port and for the greatest flexibility, you might want to run the whole thing using rsyncd on the remote machine and you'd have to make sure that rsync/rsyncd are properly installed and accessible to you on both ends. I use it all the time on my local machine, but haven't worked with remote ones yet.

For more info, there's tons on the web or see the list at "rsync" <rsync@lists.samba.org> where the experts hang out.

HTH

Joe

KenJackson · 09-04-2012, 10:00 PM

It sounds like you're not sure which line in your cron script is actually taking a long time. You can find out with the time command. Just put in front of each command and it will tell you how much time each is taking.

Of course, it writes to stderr, so you'll have to be sure to capture it, like chrism01 did.

As for scp, I have code like this in a couple of my cron scripts:

Code:

test -r ~/.keychain/$(hostname)-sh  &&  . ~/.keychain/$(hostname)-sh
if [ -z "$SSH_AGENT_PID" ] || [ ! -d "/proc/$SSH_AGENT_PID" ]; then
    echo "ERROR: No ssh agent, SSH_AGENT_PID=\"$SSH_AGENT_PID\""
    exit 1
fi

blsimpson · 09-04-2012, 10:28 PM

Thank you all for the replies..

@chrism01 - Yes, we have it set up to capture to a log like that. I will try adding that to the code to see if I can see anything else..

@josephj - Are you asking what kind of access I personally have? I have root access to both machines. We have it set up so there is a user with the same username on both machines, and that is what we use to initiate the transfer. The way our network is set up: We have one web machine that is exposed to the internet, and one application machine that is not exposed to the internet. The application and web machine can only talk to each other but only when the application machine initiates an SFTP connection. Our clients upload a file through our UI that is on the web machine, and it gets transfered to the application machine to be processed. We have explored using rsync, but it requires both machines directories to match, and when the file on the application machine is processed, its moved to an archive directory, so that wont work..

@KenJackson - I will look into adding the time command to the script. Would I add it on the line above, or in line with the current commands?

Code:

time
echo This is code

or

Code:

time echo this is code

KenJackson · 09-04-2012, 10:34 PM

Add it to the front of the command

Code:

time echo this is code

padeen · 09-05-2012, 04:19 AM

Quote:

Originally Posted by blsimpson

We have explored using rsync, but it requires both machines directories to match, and when the file

Are you able to set up links during the archiving process on the remote machine, such as 'ln -s DIR/archive/FILE DIR/FILE' so that rsync can find them? Would make this easy if you can.

Re the grep, is the cron job disabled when you tested this? I'm wondering if perhaps cron is tripping over itself by touching one of the txt files while the previous cron job was still running.

geox · 09-05-2012, 08:33 AM

For me personally 9 out of 10 problems with cron are dealing with the fact that my script runs as a different user. And I am betting it's your problem this time too.

You use $HOME but never mention under which user the cron task runs as.

Test your script by
su -c '/usr/local/bin/script par1 par2' useritshouldrunas

And if you are using $HOME that might not even be sufficient. Echo it in your script so you know its value!

David49 · 09-05-2012, 02:39 PM

Separately from the CRON vs manually-run timing issue, you might also think about this design issue.

If I understand your process correctly, it looks like your old_files.txt file keeps getting appended to. (If I'm mistaken, you can ignore all this.) I'm not sure how many lines yours currently contains, but as it keeps getting larger, your grep may take longer and longer to process, I suppose depending on how beefy your machine is.

I ran a little test on an older (1.6 GHz) PC I have with Slackware 13.37. I created a file of file names with ls to simulate your remote_files.txt, and then another set of files simmulating your old_files.txt. I timed several runs of grep matching yours, using old_files.txt with increasing numbers of lines. The run-time of the grep increased as the number of lines in old_files.txt increased:

number of lines in old_files.txt -- grep run time
15 -- near-instantaneous
2000 -- 9 seconds
8000 -- 88 seconds
32000 -- 13 minutes 48 seconds

Again, if I understand your process correctly, you may find it keeps taking longer and longer as your old_files.txt grows. Just a thought.