text match pipe to file then delete from original text file create new dir automatic
Linux - NewbieThis Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place!
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
This takes emails matching the domain and puts them in a file for that domain.
Now I need to delete those files from domain.txt from the original. My ultimate goal is to be able to automate the whole process using a shell script which I am learning right now.
I want to take file1.txt which has email data or records and have a script go through and look at all the domains in there. Then the script is suppose to create a folder matching the domain text. Now after this I want to delete the row or record from file1.txt.
file1 -> Look at domain -> create Folder for domain -> put record in new file in domain folder -> delete record from file1
I hope this is not to complicated...
For right now help with
cat split_?.txt |grep -i '\<domain.com' >> ./domain/domain.txt ----> and then delete the record from split_?.txt would be ok.
Looks like a sed or awk approach I think... hmmm, let me think.
Just to make sure I get this right:
You'll end up with multiple files called /domain/domain_name.txt, so in theory you could go through the directory of files, pull out any of the domain_name sections, search for matching ones and remove?
If that's the case then something like:
Code:
for I in ./domain/*.txt; do T=${I##./domain/}; I=${T%.txt}; sed -n "/$I"'/!p' ./file.txt ; done
Once you're happy that you get the correct output, change the sed line to be "sed -i -n ..." and the -i will edit the file.txt rather than keep parsing and printing the whole file with missing lines.
TEST THIS FIRST, don't take it in stone. I've quickly played and it works for me...
There's enough ops involved that I'd recommend actually writing a proper shell script instead of a one-liner. Much easier to debug and eg you need to check each one to see if it exists before creating it.
I have a directory that contains around 37 million email records with fname lname addr email ... split into 13 files. split_1.txt, split_2.txt ... through split_13.txt. I now need a script that can read through file by file and look for domains. (ex. mike@domain.com [mike]@[domain.com]). The file it is going through looks similar to this:
I want to automatically have a script look at each records and pull [domain.com] check if the folder exists /somedir/domain.com, if it does not -> create it and then... if it exists create file called /somedir/domain.com/domain.com.txt and enter the record into that file. Once that is finished I would like the record to be deleted from which ever split_?.txt file it came from. The deleting part is not that important. It is only to save disk space. If someone can help me out with this I will love you forever. I have been doing this sort of manually and it takes hours having to pull through known domains and having to wait.
I know how to create loops and I am close to having the answer but seem a bit clueless.
1. there are many ways to do it (I'd use Perl), but if you've nearly got the answer, how about posting it along with what's wrong and we can help you fix it.
2. When it's done I'd gzip the orig files and back them up for ref
3. IIUC, you want a dir for every domain and a file for every user. If you have 37M users you may run out of inodes before disk space (use df -i to check)
lets say the above records are all different people but their email addresses are from the same domain they all belong in the
/domain.com/domain.com.txt
lets say they all use hotmail above they then would go in
/hotmail.com/hotmail.com.txt
Now when I said I was almost there I ment I can do all this manually with
cat split_?.txt |grep -i '\<domain.com' >> ./domain.com/domain.com.txt
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.