LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 11-18-2011, 12:25 PM   #1
cristalp
Member
 
Registered: Aug 2011
Distribution: Linux Mint
Posts: 103

Rep: Reputation: Disabled
AWK: split the file into multiple file and request for explanation of a known code


Dear Experts,

I have a file looks like:
Code:
input_wez

.....
.....
.....
end

useless content


input_rty

....
....
....
end

uesless content

input_utl

....
....
....
end

uesless content

...
...
I want to split the file based on the patter of each input_***/end coupling. Eliminate the useless content.
The output file should have a name labled by the three code after "input_" and a sequence number, like:
Code:
wez_1.txt
rty_2.txt
utl_3.txt
...

***_9999.txt
...
The content of each file should be:
Code:
....
....
....
end
Please notice that there is not line of "input_***" and no empty line saved in the output file! The file started right from the content which was 2 lines after the "input_***" title in the big file.

No "input_***" and No that empty line between the "input_***" and the content.

I modified some other's code and now can achieve close result by:
Code:
awk -F_ '/input/{ f=$2; n++; next} f{print > f "_" n ".pdb"} /END/{close(f);f=x}' INPUTFILE
But the output file from this code looks like:
Code:
#EMPTY LINE APPEARED HERE
....
....
....
end
Please notice that the empty line which is in between of the "input_***" and the content can not be eliminated by this code.


My questions are:

1. How to eliminate the empty line by the simplest modification in above awk code

2. In the above awk code, what is the meaning of the f before
Code:
f{print > f "_" n ".pdb"}
Why when I replace it by
Code:
{print > f "_" n ".pdb"}
it gave me file name as _n.pdb, but not ***_n.pdb anymore?
Is this a general method when I am trying to write to files?
What is the general usage and functional purpose of
Code:
f{....}
?


3. In the end of my awk code, when I close the file by
Code:
{close(f);f=x}
Why do I need to reset f to x? If I do not do this, why I get the "useless content" at the end of each output file? What is the logic behind?

Could you please, may be, if you understand better the code than me, explain a bit more for these two parts of the code?

I know, may be these questions are annoying. But now I am really tring very hard to understand AWK and I really hope I can use it more freely. To do that I have to have a better and deeper understanding. I hope these question may not disturb you too much. But, if you don't like it, please just ignore it. I would thank you all the same!!!
 
Old 11-18-2011, 12:47 PM   #2
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,011

Rep: Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194
Quote:
1. How to eliminate the empty line by the simplest modification in above awk code
Change the order
Quote:
2. In the above awk code, what is the meaning of the f before
I'll answer with a question, what is the point of the following in your code (answer is the same):
Code:
/input/
Quote:
Why do I need to reset f to x?
What does 'x' equal?
 
Old 11-18-2011, 12:48 PM   #3
cristalp
Member
 
Registered: Aug 2011
Distribution: Linux Mint
Posts: 103

Original Poster
Rep: Reputation: Disabled
Hi all,

I found a answer for the 1st question, but may be not the simplest method:

Code:
awk 'BEGIN {FS = "_"} /input/{ f=$2; n++;next} f{if (NF > 0) print > f "_" n ".txt"} /END/{close(f);f=x}' INPUTFILE
Any better ideas??

Thanks!
 
Old 11-18-2011, 01:45 PM   #4
firstfire
Member
 
Registered: Mar 2006
Location: Ekaterinburg, Russia
Distribution: Debian, Ubuntu
Posts: 709

Rep: Reputation: 428Reputation: 428Reputation: 428Reputation: 428Reputation: 428
Hi, cristalp.

Try this:
Code:
awk -F_ '/input/{ f=$2; n++; m=0; next;} {m++} m>1&&f{print > f "_" n ".pdb"} /end/{close(f); f=0}' test.txt
On your questions:
1. To eliminate empty line (if you mean the line after input_* ) one could use additional counter `m', which counts lines after input_* and print only lines with m > 2. See above for example.

2,3.
In the code
Code:
f{print > f "_" n ".pdb"}
`f' before `{' understood as pattern, actually as logical expression. Expression in braces executed only if variable `f' have non-null and non-zero value. As you can see, I use more complex logical expression `m>1&&f' to decide whether to print something or not.

Resetting f to x means resetting f to empty string (because variable `x' is not set) so as to f be a logical false. Note that I reset `f' to zero with the same effect.

If you remove `f' and use just {print > f "_" n ".pdb"}, then you get not only wez_1.pdb etc, but also _1.pdb etc. _n.pdb files contain what you called 'useless content' which follow n-th input_***...end record. This happens because you print every line regardless of the value of `f' and f=="" for useless content.

Note that /END/ in your code should read as /end/ (if you use `end' in input file).

Hope this helps. I apologize for my poor english.

Last edited by firstfire; 11-18-2011 at 02:05 PM.
 
1 members found this post helpful.
Old 11-23-2011, 07:29 AM   #5
cristalp
Member
 
Registered: Aug 2011
Distribution: Linux Mint
Posts: 103

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by firstfire View Post
Hi, cristalp.

Try this:
Code:
awk -F_ '/input/{ f=$2; n++; m=0; next;} {m++} m>1&&f{print > f "_" n ".pdb"} /end/{close(f); f=0}' test.txt
On your questions:
1. To eliminate empty line (if you mean the line after input_* ) one could use additional counter `m', which counts lines after input_* and print only lines with m > 2. See above for example.

2,3.
In the code
Code:
f{print > f "_" n ".pdb"}
`f' before `{' understood as pattern, actually as logical expression. Expression in braces executed only if variable `f' have non-null and non-zero value. As you can see, I use more complex logical expression `m>1&&f' to decide whether to print something or not.

Resetting f to x means resetting f to empty string (because variable `x' is not set) so as to f be a logical false. Note that I reset `f' to zero with the same effect.

If you remove `f' and use just {print > f "_" n ".pdb"}, then you get not only wez_1.pdb etc, but also _1.pdb etc. _n.pdb files contain what you called 'useless content' which follow n-th input_***...end record. This happens because you print every line regardless of the value of `f' and f=="" for useless content.

Note that /END/ in your code should read as /end/ (if you use `end' in input file).

Hope this helps. I apologize for my poor english.
Thanks a lot firstfair. Your explanation is very clear and very helpful and your English is good in fact. Really helpful, Thanks again!
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] Split a file to multiple file using awk or perl fad216 Programming 17 03-02-2011 06:15 AM
How to split a file into multiple files using AWK? keenboy Linux - General 1 08-05-2010 01:18 PM
how split file to multiple vol and how restore it with request of inserting next vol? digitalblade Linux - Newbie 7 02-14-2009 12:04 PM
awk command to split file Hebron Linux - Newbie 3 08-27-2007 07:35 PM
How to split file , .. awk or split ERBRMN Linux - General 9 08-15-2006 12:02 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 01:57 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration