need help with a script

psingh · 07-31-2003, 07:03 AM

I need some help.

I am generating an ouput file from a simulation that is several
gig in size. The output file is ascii text and I can generate it from a command line option.
{command line option} > output_file.txt
How do I start writing to a new ouput file once the size of an ouput file reaches a reasonable size.
I tried to check the size of the output file and once it approaches a certain size, redirect the output to a new file ... only it did not quite work out.
{command line option} > output_file-$i.txt
Here, I incremented the variable i once the file size was above a certain threshold.

Any thoughts?

sk8guitar · 07-31-2003, 08:28 AM

since i only really know perl, you could maybe use something like this:

Code:

#!/usr/bin/perl -w
#
my $file_write_to="test.pl";
my $size_of_file=(-s $file_write_to);
my $max_size=1000000;
open FILEHAND,">>$file_write_to";
select FILEHAND;
print "whatever you are outputting";
$count=0;
if($size_of_file>$max_size)
{
        $count++;
        $file_write_to="test.pl".$count;
};

the -s operator in perl will return you the size of a file in bytes.

TheLinuxDuck · 07-31-2003, 05:02 PM

The problem here is that you're wanting shells > pipe to allow you to monitor the output files' size, which I don't think is possible. I would suggest that you consider writing a wrapper in perl or some such for this. You can fork to the program, and instead of dumping the output to a file, your script handles the output. In perl, it would be something like (for 5.6.1 or better):

Code:

  open FORKIN, "-|", "/path/to/your/binary", "cmd","line","opts"
    or die "Cannot fork: $!\n";

Then, you'd simply read each line from the program, as though it were a file:

Code:

  my($line);
  while($line = <FORKIN>) {
  }

In the loop, you can count the number of chars per line, or simply count each line, and use that to determine when to open a new output file.

Code:

  my($line);
  my($char_count) = 0;
  my($max_char_count) = 20000; # bytes
  while($line = <FORKIN>) {
    $char_count += length($line);
    if($char_count > $max_char_count) {
       # close old output file and open new one
      $char_count = 0;
    }
    print OUT $line;
  }
  close FORKIN;

Something like that, anyway.. I haven't tested this, but with some modification, it should work.

kev82 · 07-31-2003, 05:17 PM

wouldnt the easiest way be to re-write the output part of the simulation to output to a new file every n lines and pass n by command line argument?

or pipe the output to split.

TheLinuxDuck · 08-01-2003, 08:09 AM

If I'm not mistaken, piping the output to split would still cause it to dump all it's data to one file first, and then split it. I could be wrong on that though, since I don't know the specifics of how split works. (=

AFA rewriting the prog to do the splitting.. that may no be an option. If it can be done, that would eliminate the need for a wrapper..

kev82 · 08-01-2003, 08:44 AM

Quote:

piping the output to split would still cause it to dump all it's data to one file first

which file would it dump it to and why?
i dont know the innards of split myself but im pretty sure it just waits until a buffers full then writes out a file.

i think simulation_command | split -l(num lines per file) should work fine for psingh

TheLinuxDuck · 08-01-2003, 11:00 AM

kev, you're prolly right. There always seems to be a CL way to do just about everything one could need to do. I just don't know about alot of them, so I tend to write my own perl versions. (=

psingh · 08-05-2003, 04:23 AM

Thanks folks, by the time I had gotten a reply, I found that I could redirect the output via the split command and control the size of the file. Each file size is about 200Megbytes.
I am running this on Redhat 8. I now find that the processing is "extremely slow". For the first few files, the system was processing about 1Meg/minute (until about 8 files). Since then, it has been processing about 1Meg/10 minutes. I am not sure what is slowing the system down.
I made sure I gziped the output files as they were created. There is sufficient space on the system (< 63% utilized). Any thoughts will be greatly appreciated.

psingh · 08-05-2003, 04:33 AM

I opted to use the split command since I am a newbie to perl. However, if perl is my only option, I can redo the work. What say TheLinuxDuck / Kev82?

kev82 · 08-05-2003, 05:43 AM

Quote:

I now find that the processing is "extremely slow"

comaped to what, did it run much faster before you piped it to split? whats your load average while the programs running?

Quote:

I opted to use the split command since I am a newbie to perl. However, if perl is my only option, I can redo the work.

I dont get what your asking here, if the simulations finished why run it again? why would perl be you only option?
i dont know perl but from what ive heard its pretty good for knocking up a quick solution and therefore would be fine to use here. my personal choice however would be to rewrite the output code of the simulation so it creates the seperate files itsself.

psingh · 08-05-2003, 06:02 AM

The simulation starts off extremely fast (essentailly running tcpdump on a large data file).
Memory 433M USED 438M Available
Swap 500M USED 1G Available
CPU useage is rather erratic (1%- 98%). I have nothing else running.
After creating 8 files, it has slowed down considerably (compared to the speed when it started off).
It hasn't finished yet. It is about half done and processing at 1/10th the initial rate. Could it be due to the fact that I am writing to just one output buffer and all split is doing is sending it to a file?