LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 09-24-2012, 09:38 PM   #16
konsolebox
Senior Member
 
Registered: Oct 2005
Distribution: Gentoo, Slackware, LFS
Posts: 2,248
Blog Entries: 8

Rep: Reputation: 235Reputation: 235Reputation: 235

Copy. I'll just place them within tags to make things more clear. I'll think about it.
Code:
1, 4242, 3.42323e+23, 0.1, 0, 0,5,294875, 8438393, 394,,,,,,,,
,0, ,0,,,, 0.487564, , ,0, 0,0, 87563,,,,,,,,, , 0 ,
,1, ,,,,,,,,,,,,,,,, 0, , , , 5,
,1, ,,,,,,5241,,,,, , , 0.4543e-3 , 0 , 111111111,
1, 1000,,,, 9576336e+10, 0.1, 0, 0, ,,, , 8438393, 001,
Code:
rc11,rc12,rc13,rc14,rc15, etc.
,rc21,rc22,rc23,rc24,rc25, etc.
,rc31,rc32,rc33,rc34,rc35, etc.
,rc41,rc42,rc43,rc44,rc45, etc.
rc51,rc52,rc53,rc54,rc55, etc.
etc.
etc.
etc.
 
Old 09-24-2012, 09:43 PM   #17
konsolebox
Senior Member
 
Registered: Oct 2005
Distribution: Gentoo, Slackware, LFS
Posts: 2,248
Blog Entries: 8

Rep: Reputation: 235Reputation: 235Reputation: 235
Trimming commas and removing leading spaces would be easy but what actually confuses me is the consistency of the columns that a data from column Z would later become column T but on another row another data from the same column would reside on column G. That is what could happen if you trim commas that way. The script I made was actually done to conform with the original in which every field is separated with a comma with an extra space after it.

---- Add ----

So do you really mean to trim out commas?

Last edited by konsolebox; 09-24-2012 at 10:11 PM.
 
Old 09-25-2012, 07:46 AM   #18
atjurhs
Member
 
Registered: Aug 2012
Posts: 316

Original Poster
Rep: Reputation: Disabled
Konsolebox, it's on a private LAN, but I'll see if I can post some of the actual data inputs and output later today.

as to removing comas, only those that are "multiples". In other words, if I use a text editor to remove all occurrences of ,, and replace them with just one , and repeat this replacing over and over again, eventually the output file will end up with something like:

Code:
rc11,rc12,rc13,rc14,rc15, etc.
,rc21,rc22,rc23,rc24,rc25, etc.
,rc31,rc32,rc33,rc34,rc35, etc.
,rc41,rc42,rc43,rc44,rc45, etc.
rc51,rc52,rc53,rc54,rc55, etc.
etc.
etc.
etc.
and that's useable
 
Old 09-25-2012, 07:50 AM   #19
konsolebox
Senior Member
 
Registered: Oct 2005
Distribution: Gentoo, Slackware, LFS
Posts: 2,248
Blog Entries: 8

Rep: Reputation: 235Reputation: 235Reputation: 235
Yes and like I said are you not concerned if data from one column is repositioned to another column all because a part of the columns to the left would be deleted?

For example.
Code:
1,2,3,4,5
1,2,3,,5
1,,3,4,5
Suppose I am to remove the first column, the output would be like this:
Code:
2,3,4,5
2,3,5
3,4,5
Notice how the columns are no longer consistent with each other.
Code:
2	3	4	5
2	3	5
3	4	5

Last edited by konsolebox; 09-25-2012 at 07:51 AM.
 
Old 09-25-2012, 11:54 AM   #20
atjurhs
Member
 
Registered: Aug 2012
Posts: 316

Original Poster
Rep: Reputation: Disabled
ok, here's the best I can type of an input file:

note there are actually 54 columns of data, I'll only type 15 because there is nothing unique about the rest of the columns
also the end of each line does not have any spaces just a carriage return
I'll also represent the number of digits in a value as counting numbers example 12345 would represent a number that has 5 digits (this may help you in counting)
Code:
header_1,header_2,header_3,header_4,header_5,header_6,header_7,header_8,header9_,header10_  ,header11_0_0_    ,header12,header13     ,header14_,header15_
12345678 ,1234 ,123 ,1 ,123456789 ,123456789 ,1 ,1234567 ,1 ,1234567 ,1 ,12345678 ,1 ,1 ,123456789
         ,     ,    ,  ,          ,          ,       ,     ,123456789, , ,        ,  ,  ,
         ,     ,1234567, ,        ,          ,1234567,12345,123456789, , ,        ,  ,  ,
         ,     ,    ,  ,          ,          ,       ,     ,123456789, , ,        ,  ,  ,
         ,     ,    ,  ,          ,          ,       ,     ,123456789, , ,        ,  ,  ,
         ,     ,    ,  ,          ,          ,       ,     ,123456789, , ,        ,  ,  ,
         ,     ,    ,  ,          ,          ,       ,     ,123456789, , ,        , 1,1 ,
         ,     ,    ,  ,          ,          ,       ,     ,123456789, , ,        ,  ,  ,
         ,     ,1234567, ,        ,          ,12     ,     ,123456789, , ,        ,  ,  ,
         ,     ,1234567, ,        ,          ,1234567,12345,123456789, , ,        ,  ,  ,12345
12345678 ,1234 ,123 ,1 ,123456789 ,123456789 ,1 ,1234567 ,1 ,1234567 ,1 ,12345678 ,1 ,1 ,123456789
         ,     ,    ,  ,          ,          ,       ,     ,123456789, , ,        ,  ,  ,
         ,     ,1234567, ,        ,          ,1234567,12345,123456789, , ,        ,  ,  ,
         ,     ,    ,  ,          ,          ,       ,     ,123456789, , ,        ,  ,  ,
etc.
etc.
etc.

it's a mess!!! but it's what i've got, and need to turn into a "standard" csv file, so then I can do other stuff with it
 
Old 09-25-2012, 03:13 PM   #21
atjurhs
Member
 
Registered: Aug 2012
Posts: 316

Original Poster
Rep: Reputation: Disabled
so if I run konsolebox code with the intent to delete columns 1,2,4,6,7,9 I get:

Code:
header_1,header_4,header_6,header9_,header11_0_0_ ,header12,header13,header14_,header15_
12345678,1,123456789,1,1,12345678,1,1,123456789
,,,123456789,,,,,
,,,123456789,,,,,
,,,123456789,,,,,
,,,123456789,,,,,
,,,123456789,,,,,
,,,123456789,,,1,1,
,,,123456789,,,,,
,,,123456789,,,,,
,,,123456789,,,,,12345
12345678,1,123456789,1,1,12345678,1,1,123456789
,,,123456789,,,,,
,,,123456789,,,,,
,,,123456789,,,,,
so it didn't delete the correct columns, but in the columns it deleted it did the deletion correctly!

WOOOHOOOOO,I FIGURED IT OUT the script is giving output as though the file is zero based

so the listing of to delete columns REMOVE=( 1 2 4 6 7 9)
lining them up shows whiich columns deleted 2,3,5,7,8,10 and which columns are kept 1,4,6,9,12,13,14,15

solving this problem was a simple one line change in konsolebox's script
Code:
 unset "FIELDS[$I]"  goes to unset "FIELDS[$I-1]"
and I think this solves the puzzle

Last edited by atjurhs; 09-25-2012 at 04:12 PM.
 
Old 09-25-2012, 04:21 PM   #22
konsolebox
Senior Member
 
Registered: Oct 2005
Distribution: Gentoo, Slackware, LFS
Posts: 2,248
Blog Entries: 8

Rep: Reputation: 235Reputation: 235Reputation: 235
Sorry, indices start with 0 so you had to start your columns from 0, not 1. I guess this was my mistake that I didn't notice. But if you're going to start from 1, here's a modification:
Code:
#!/bin/bash

# Change columns here. This is repetitive within the loop, but just for convenience.
REMOVE=(1 2 4 6 7 9)

# Extension of output file's name.
OUTPUTEXT='out'

IFS=','

for FILE; do
	while read LINE; do
		FIELDS=() # just to be safe
		read -a FIELDS <<< "${LINE//, /,}"
		for I in "${REMOVE[@]}"; do
			unset "FIELDS[$I - 1]"
		done
		LINE="${FIELDS[*]}"
		echo "${LINE//,/, }"
	done < "$FILE" > "$FILE.$OUTPUTEXT"
done
And this is the output to it:
Code:
header_3, header_5, header_8, header10_  , header11_0_0_    , header12, header13     , header14_, header15_
123 , 123456789 , 1234567 , 1234567 , 1 , 12345678 , 1 , 1 , 123456789
   ,          ,     , , ,        ,  ,  
1234567,        , 12345, , ,        ,  ,  
   ,          ,     , , ,        ,  ,  
   ,          ,     , , ,        ,  ,  
   ,          ,     , , ,        ,  ,  
   ,          ,     , , ,        , 1, 1 
   ,          ,     , , ,        ,  ,  
1234567,        ,     , , ,        ,  ,  
1234567,        , 12345, , ,        ,  ,  , 12345
123 , 123456789 , 1234567 , 1234567 , 1 , 12345678 , 1 , 1 , 123456789
   ,          ,     , , ,        ,  ,  
1234567,        , 12345, , ,        ,  ,
If you intend to trim the spaces out:
Code:
#!/bin/bash

# Change columns here. This is repetitive within the loop, but just for convenience.
REMOVE=(1 2 4 6 7 9)

# Extension of output file's name.
OUTPUTEXT='out'

IFS=','

shopt -s extglob

for FILE; do
	while read LINE; do
		FIELDS=() # just to be safe
		read -a FIELDS <<< "${LINE//*([[:blank:]]),*([[:blank:]])/,}"
		for I in "${REMOVE[@]}"; do
			unset "FIELDS[$I - 1]"
		done
		echo "${FIELDS[*]}"
	done < "$FILE" > "$FILE.$OUTPUTEXT"
done
Output:
Code:
header_3,header_5,header_8,header10_,header11_0_0_,header12,header13,header14_,header15_
123,123456789,1234567,1234567,1,12345678,1,1,123456789
,,,,,,,
1234567,,12345,,,,,
,,,,,,,
,,,,,,,
,,,,,,,
,,,,,,1,1
,,,,,,,
1234567,,,,,,,
1234567,,12345,,,,,,12345
123,123456789,1234567,1234567,1,12345678,1,1,123456789
,,,,,,,
1234567,,12345,,,,,
If you intend to clean up multiple instances of commas (,,) as well.
This would certainly break consistency within the columns so use it at your own risk.
Code:
#!/bin/bash

# Change columns here. This is repetitive within the loop, but just for convenience.
REMOVE=(1 2 4 6 7 9)

# Extension of output file's name.
OUTPUTEXT='out'

IFS=','

shopt -s extglob

for FILE; do
	while read LINE; do
		FIELDS=() # just to be safe
		read -a FIELDS <<< "${LINE//*([[:blank:]]),*([[:blank:]])/,}"
		for I in "${REMOVE[@]}"; do
			unset "FIELDS[$I - 1]"
		done
		for I in "${!FIELDS[@]}"; do
			[[ ${FIELDS[I]} == *([[:blank:]]) ]] && unset "FIELDS[$I]"
		done
		echo "${FIELDS[*]}"
	done < "$FILE" > "$FILE.$OUTPUTEXT"
done
Output
Code:
header_3,header_5,header_8,header10_,header11_0_0_,header12,header13,header14_,header15_
123,123456789,1234567,1234567,1,12345678,1,1,123456789

1234567,12345



1,1

1234567
1234567,12345,12345
123,123456789,1234567,1234567,1,12345678,1,1,123456789

1234567,12345
And this is the output if you don't remove any column (REMOVE=()). Spaces and empty instances are just trimmed out. Notice how data are no longer in their proper columns.
Code:
header_1,header_2,header_3,header_4,header_5,header_6,header_7,header_8,header9_,header10_,header11_0_0_,header12,header13,header14_,header15_
12345678,1234,123,1,123456789,123456789,1,1234567,1,1234567,1,12345678,1,1,123456789
123456789
1234567,1234567,12345,123456789
123456789
123456789
123456789
123456789,1,1
123456789
1234567,12,123456789
1234567,1234567,12345,123456789,12345
12345678,1234,123,1,123456789,123456789,1,1234567,1,1234567,1,12345678,1,1,123456789
123456789
1234567,1234567,12345,123456789
Furthermore if you only intend to trim out spaces (32) and not other white spaces like tabs, you can make the script run a lot faster by:
Code:
read -a FIELDS <<< "${LINE//*( ),*( )/,}"
 
Old 09-25-2012, 04:23 PM   #23
konsolebox
Senior Member
 
Registered: Oct 2005
Distribution: Gentoo, Slackware, LFS
Posts: 2,248
Blog Entries: 8

Rep: Reputation: 235Reputation: 235Reputation: 235
Looks like you made an edit when I was making the post. Good thing everything's now clear for you.
 
Old 09-25-2012, 10:19 PM   #24
atjurhs
Member
 
Registered: Aug 2012
Posts: 316

Original Poster
Rep: Reputation: Disabled
Thanks soooo much for all your help!

Tabitha
 
Old 09-26-2012, 07:46 AM   #25
atjurhs
Member
 
Registered: Aug 2012
Posts: 316

Original Poster
Rep: Reputation: Disabled
Is there a way to have it delete all columns except specified ones.

I can do it the current way except I have 54 columns and of those columns I only need 7
 
Old 09-26-2012, 08:30 AM   #26
konsolebox
Senior Member
 
Registered: Oct 2005
Distribution: Gentoo, Slackware, LFS
Posts: 2,248
Blog Entries: 8

Rep: Reputation: 235Reputation: 235Reputation: 235
This way:
Code:
#!/bin/bash

[[ BASH_VERSINFO -ge 3 ]] || {
	echo "You need bash version 3.0 or higher to run this script."
	exit 1
}

# Set columns to keep (columns starts from 1)
KEEP_CONFIG=(1 2 4 6 7 9)

# Extension of output file's name.
OUTPUTEXT='out'

IFS=','

KEEP=()
for I in "${KEEP_CONFIG[@]}"; do
	(( J = I - 1 ))
	KEEP[J]=$J
done

for FILE; do
	while read LINE; do
		FIELDS=() # just to be safe
		read -a FIELDS <<< "${LINE//, /,}"
		for I in "${!FIELDS[@]}"; do
			[[ -z ${KEEP[I]} ]] && unset "FIELDS[$I]"
		done
		LINE="${FIELDS[*]}"
		echo "${LINE//,/, }"
	done < "$FILE" > "$FILE.$OUTPUTEXT"
done
Or
Code:
#!/bin/bash

[[ BASH_VERSINFO -ge 3 ]] || {
	echo "You need bash version 3.0 or higher to run this script."
	exit 1
}

# Set columns to keep (columns starts from 1)
KEEP_CONFIG=(1 2 4 6 7 9)

# Extension of output file's name.
OUTPUTEXT='out'

IFS=','

KEEP=()
for I in "${KEEP_CONFIG[@]}"; do
	(( J = I - 1 ))
	KEEP[J]=$J
done

shopt -s extglob

for FILE; do
	while read LINE; do
		FIELDS=() # just to be safe
		LINE=${LINE//*( ),*( )/,}
		read -a FIELDS <<< "${LINE%%*( )}"
		for I in "${!FIELDS[@]}"; do
			[[ -z ${KEEP[I]} ]] && unset "FIELDS[$I]"
		done
		echo "${FIELDS[*]}"
	done < "$FILE" > "$FILE.$OUTPUTEXT"
done
 
1 members found this post helpful.
Old 09-26-2012, 10:39 AM   #27
atjurhs
Member
 
Registered: Aug 2012
Posts: 316

Original Poster
Rep: Reputation: Disabled
The first verion runs really quick and gives the correct results.

The second version runs REALLY slow, but if wait long enough it will have the correct answer.

konsolebox, thanks so much you've been a great great help!!!konsolebox,
 
Old 09-26-2012, 10:48 AM   #28
konsolebox
Senior Member
 
Registered: Oct 2005
Distribution: Gentoo, Slackware, LFS
Posts: 2,248
Blog Entries: 8

Rep: Reputation: 235Reputation: 235Reputation: 235
Thumbs up

welcome.
 
Old 09-26-2012, 11:16 AM   #29
atjurhs
Member
 
Registered: Aug 2012
Posts: 316

Original Poster
Rep: Reputation: Disabled
there's another thing I would like to do with the same file, it has to do with removing rows, but it's a little tricky, would you be willing to help me on that? if so should I just start a new thread or what?
 
Old 09-26-2012, 11:19 AM   #30
konsolebox
Senior Member
 
Registered: Oct 2005
Distribution: Gentoo, Slackware, LFS
Posts: 2,248
Blog Entries: 8

Rep: Reputation: 235Reputation: 235Reputation: 235
I suggest starting a new thread. I'll still help. I don't mind, especially since it seems interesting.
Please share the link to the thread here.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
merge columns from multiple files vijay_babu1981 Linux - Newbie 21 06-24-2014 06:59 AM
Averaging columns from multiple files carlr Programming 3 03-18-2012 01:24 AM
extracting columns from multiple files with awk orcaja Linux - Newbie 7 02-14-2012 10:24 PM
merge multiple files each with two columns. 11st col same but may have difft values newbie271 Linux - Newbie 2 01-10-2012 06:03 PM
[SOLVED] AWK (or TCL/TK): Matching rows and columns between multiple files Euler2 Programming 6 05-30-2011 06:31 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 08:52 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration