LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 09-09-2008, 12:08 PM   #1
tr1px
LQ Newbie
 
Registered: Jul 2003
Location: Hollywood Boring Florida
Distribution: Fedora Core 4 - 2.6.12-1.1398_FC4
Posts: 4

Rep: Reputation: 0
Question text match pipe to file then delete from original text file create new dir automatic


I have a huge file of about 3 million records with email data. So far I use:

cat split_?.txt |grep -i '\<domain.com' >> ./domain/domain.txt

This takes emails matching the domain and puts them in a file for that domain.

Now I need to delete those files from domain.txt from the original. My ultimate goal is to be able to automate the whole process using a shell script which I am learning right now.

I want to take file1.txt which has email data or records and have a script go through and look at all the domains in there. Then the script is suppose to create a folder matching the domain text. Now after this I want to delete the row or record from file1.txt.

file1 -> Look at domain -> create Folder for domain -> put record in new file in domain folder -> delete record from file1

I hope this is not to complicated...

For right now help with

cat split_?.txt |grep -i '\<domain.com' >> ./domain/domain.txt ----> and then delete the record from split_?.txt would be ok.

Thank you in advance.
 
Old 09-09-2008, 04:20 PM   #2
arckane
Member
 
Registered: Sep 2005
Location: UK
Distribution: Gentoo/Debian/Ubuntu
Posts: 308

Rep: Reputation: 39
Looks like a sed or awk approach I think... hmmm, let me think.

Just to make sure I get this right:

You'll end up with multiple files called /domain/domain_name.txt, so in theory you could go through the directory of files, pull out any of the domain_name sections, search for matching ones and remove?

If that's the case then something like:

Code:
for I in ./domain/*.txt; do T=${I##./domain/}; I=${T%.txt}; sed -n "/$I"'/!p' ./file.txt ; done
Once you're happy that you get the correct output, change the sed line to be "sed -i -n ..." and the -i will edit the file.txt rather than keep parsing and printing the whole file with missing lines.

TEST THIS FIRST, don't take it in stone. I've quickly played and it works for me...
 
Old 09-09-2008, 06:33 PM   #3
chrism01
LQ Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Rocky 9.2
Posts: 18,369

Rep: Reputation: 2753Reputation: 2753Reputation: 2753Reputation: 2753Reputation: 2753Reputation: 2753Reputation: 2753Reputation: 2753Reputation: 2753Reputation: 2753Reputation: 2753
There's enough ops involved that I'd recommend actually writing a proper shell script instead of a one-liner. Much easier to debug and eg you need to check each one to see if it exists before creating it.

http://tldp.org/LDP/Bash-Beginners-G...tml/index.html
http://www.tldp.org/LDP/abs/html/

Last edited by chrism01; 09-09-2008 at 06:53 PM.
 
Old 09-10-2008, 02:03 PM   #4
tr1px
LQ Newbie
 
Registered: Jul 2003
Location: Hollywood Boring Florida
Distribution: Fedora Core 4 - 2.6.12-1.1398_FC4
Posts: 4

Original Poster
Rep: Reputation: 0
Let me draw this out a little more clear.

I have a directory that contains around 37 million email records with fname lname addr email ... split into 13 files. split_1.txt, split_2.txt ... through split_13.txt. I now need a script that can read through file by file and look for domains. (ex. mike@domain.com [mike]@[domain.com]). The file it is going through looks similar to this:

"mike","dawson","23 kimber lane","hollywood","FL","33020","5553211234","mike@domain.com","blah","blah-blah"

I want to automatically have a script look at each records and pull [domain.com] check if the folder exists /somedir/domain.com, if it does not -> create it and then... if it exists create file called /somedir/domain.com/domain.com.txt and enter the record into that file. Once that is finished I would like the record to be deleted from which ever split_?.txt file it came from. The deleting part is not that important. It is only to save disk space. If someone can help me out with this I will love you forever. I have been doing this sort of manually and it takes hours having to pull through known domains and having to wait.

I know how to create loops and I am close to having the answer but seem a bit clueless.
 
Old 09-10-2008, 06:58 PM   #5
chrism01
LQ Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Rocky 9.2
Posts: 18,369

Rep: Reputation: 2753Reputation: 2753Reputation: 2753Reputation: 2753Reputation: 2753Reputation: 2753Reputation: 2753Reputation: 2753Reputation: 2753Reputation: 2753Reputation: 2753
Ok, that's much clearer.

1. there are many ways to do it (I'd use Perl), but if you've nearly got the answer, how about posting it along with what's wrong and we can help you fix it.
2. When it's done I'd gzip the orig files and back them up for ref
3. IIUC, you want a dir for every domain and a file for every user. If you have 37M users you may run out of inodes before disk space (use df -i to check)

HTH

Last edited by chrism01; 09-10-2008 at 06:59 PM.
 
Old 09-10-2008, 08:18 PM   #6
tr1px
LQ Newbie
 
Registered: Jul 2003
Location: Hollywood Boring Florida
Distribution: Fedora Core 4 - 2.6.12-1.1398_FC4
Posts: 4

Original Poster
Rep: Reputation: 0
No, I want a file per domain with all the email records from that domain in one file. ex:

"john","dude","2334 kimber lane","hollywood","FL","33020","5553211234","john@domain.com","blah","blah-blah"
"mike","dawson","23121 kimber lane","hollywood","FL","33020","5553211234","mike@domain.com","blah","blah-blah"
"paul","walker","2346 kimber lane","hollywood","FL","33020","5553211234","paul@domain.com","blah","blah-blah"
"jody","jane","2334 kimber lane","hollywood","FL","33020","5553211234","jody@domain.com","blah","blah-blah"
"tim","stuart","2334 kimber lane","hollywood","FL","33020","5553211234","tim@domain.com","blah","blah-blah"
"mike","jones","23566 kimber lane","hollywood","FL","33020","5553211234","mike@domain.com","blah","blah-blah"

lets say the above records are all different people but their email addresses are from the same domain they all belong in the
/domain.com/domain.com.txt

lets say they all use hotmail above they then would go in
/hotmail.com/hotmail.com.txt

Now when I said I was almost there I ment I can do all this manually with
cat split_?.txt |grep -i '\<domain.com' >> ./domain.com/domain.com.txt
 
Old 09-10-2008, 09:40 PM   #7
chrism01
LQ Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Rocky 9.2
Posts: 18,369

Rep: Reputation: 2753Reputation: 2753Reputation: 2753Reputation: 2753Reputation: 2753Reputation: 2753Reputation: 2753Reputation: 2753Reputation: 2753Reputation: 2753Reputation: 2753
Copied your data into a file t.t and ran this

Code:
#Set IFS to hardcoded newline only; default is space,tab,newline
IFS="
"

for rec in `cat t.t`
do
    user_dom=`echo $rec|cut -d',' -f8`
    echo $user_dom   #debug = user@domain.com
    domain=`echo $user_dom|cut -d'@' -f2|cut -d'"' -f1`
    echo $domain #debug = domain.com

    #Add to file
    echo $rec >>tmp/${domain}/${domain}.txt
done
HTH
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
pipe output to append to a text file davee Linux - Newbie 5 03-22-2016 07:44 PM
bash script to create text in a file or replace value of text if already exists knightto Linux - Newbie 5 09-10-2008 11:13 PM
How to parse text file to a set text column width and output to new text file? jsstevenson Programming 12 04-23-2008 02:36 PM
Delete ^O from a text file pwc101 Programming 3 12-05-2007 11:02 AM
Pipe telnet session output to text file joshlamerritt Linux - Software 3 02-10-2004 08:42 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 06:55 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration