[SOLVED] deleting columns over multiple files

konsolebox · 09-24-2012, 09:38 PM

Copy. I'll just place them within tags to make things more clear. I'll think about it.

Code:

1, 4242, 3.42323e+23, 0.1, 0, 0,5,294875, 8438393, 394,,,,,,,,
,0, ,0,,,, 0.487564, , ,0, 0,0, 87563,,,,,,,,, , 0 ,
,1, ,,,,,,,,,,,,,,,, 0, , , , 5,
,1, ,,,,,,5241,,,,, , , 0.4543e-3 , 0 , 111111111,
1, 1000,,,, 9576336e+10, 0.1, 0, 0, ,,, , 8438393, 001,

Code:

rc11,rc12,rc13,rc14,rc15, etc.
,rc21,rc22,rc23,rc24,rc25, etc.
,rc31,rc32,rc33,rc34,rc35, etc.
,rc41,rc42,rc43,rc44,rc45, etc.
rc51,rc52,rc53,rc54,rc55, etc.
etc.
etc.
etc.

konsolebox · 09-24-2012, 09:43 PM

Trimming commas and removing leading spaces would be easy but what actually confuses me is the consistency of the columns that a data from column Z would later become column T but on another row another data from the same column would reside on column G. That is what could happen if you trim commas that way. The script I made was actually done to conform with the original in which every field is separated with a comma with an extra space after it.

---- Add ----

So do you really mean to trim out commas?

atjurhs · 09-25-2012, 07:46 AM

Konsolebox, it's on a private LAN, but I'll see if I can post some of the actual data inputs and output later today.

as to removing comas, only those that are "multiples". In other words, if I use a text editor to remove all occurrences of ,, and replace them with just one , and repeat this replacing over and over again, eventually the output file will end up with something like:

Code:

rc11,rc12,rc13,rc14,rc15, etc.
,rc21,rc22,rc23,rc24,rc25, etc.
,rc31,rc32,rc33,rc34,rc35, etc.
,rc41,rc42,rc43,rc44,rc45, etc.
rc51,rc52,rc53,rc54,rc55, etc.
etc.
etc.
etc.

and that's useable

konsolebox · 09-25-2012, 07:50 AM

Yes and like I said are you not concerned if data from one column is repositioned to another column all because a part of the columns to the left would be deleted?

For example.

Code:

1,2,3,4,5
1,2,3,,5
1,,3,4,5

Suppose I am to remove the first column, the output would be like this:

Code:

2,3,4,5
2,3,5
3,4,5

Notice how the columns are no longer consistent with each other.

Code:

2	3	4	5
2	3	5
3	4	5

atjurhs · 09-25-2012, 11:54 AM

ok, here's the best I can type of an input file:

note there are actually 54 columns of data, I'll only type 15 because there is nothing unique about the rest of the columns
also the end of each line does not have any spaces just a carriage return
I'll also represent the number of digits in a value as counting numbers example 12345 would represent a number that has 5 digits (this may help you in counting)

Code:

header_1,header_2,header_3,header_4,header_5,header_6,header_7,header_8,header9_,header10_  ,header11_0_0_    ,header12,header13     ,header14_,header15_
12345678 ,1234 ,123 ,1 ,123456789 ,123456789 ,1 ,1234567 ,1 ,1234567 ,1 ,12345678 ,1 ,1 ,123456789
         ,     ,    ,  ,          ,          ,       ,     ,123456789, , ,        ,  ,  ,
         ,     ,1234567, ,        ,          ,1234567,12345,123456789, , ,        ,  ,  ,
         ,     ,    ,  ,          ,          ,       ,     ,123456789, , ,        ,  ,  ,
         ,     ,    ,  ,          ,          ,       ,     ,123456789, , ,        ,  ,  ,
         ,     ,    ,  ,          ,          ,       ,     ,123456789, , ,        ,  ,  ,
         ,     ,    ,  ,          ,          ,       ,     ,123456789, , ,        , 1,1 ,
         ,     ,    ,  ,          ,          ,       ,     ,123456789, , ,        ,  ,  ,
         ,     ,1234567, ,        ,          ,12     ,     ,123456789, , ,        ,  ,  ,
         ,     ,1234567, ,        ,          ,1234567,12345,123456789, , ,        ,  ,  ,12345
12345678 ,1234 ,123 ,1 ,123456789 ,123456789 ,1 ,1234567 ,1 ,1234567 ,1 ,12345678 ,1 ,1 ,123456789
         ,     ,    ,  ,          ,          ,       ,     ,123456789, , ,        ,  ,  ,
         ,     ,1234567, ,        ,          ,1234567,12345,123456789, , ,        ,  ,  ,
         ,     ,    ,  ,          ,          ,       ,     ,123456789, , ,        ,  ,  ,
etc.
etc.
etc.

it's a mess!!! but it's what i've got, and need to turn into a "standard" csv file, so then I can do other stuff with it

atjurhs · 09-25-2012, 03:13 PM

so if I run konsolebox code with the intent to delete columns 1,2,4,6,7,9 I get:

Code:

header_1,header_4,header_6,header9_,header11_0_0_ ,header12,header13,header14_,header15_
12345678,1,123456789,1,1,12345678,1,1,123456789
,,,123456789,,,,,
,,,123456789,,,,,
,,,123456789,,,,,
,,,123456789,,,,,
,,,123456789,,,,,
,,,123456789,,,1,1,
,,,123456789,,,,,
,,,123456789,,,,,
,,,123456789,,,,,12345
12345678,1,123456789,1,1,12345678,1,1,123456789
,,,123456789,,,,,
,,,123456789,,,,,
,,,123456789,,,,,

so it didn't delete the correct columns, but in the columns it deleted it did the deletion correctly!

WOOOHOOOOO,I FIGURED IT OUT

the script is giving output as though the file is zero based

so the listing of to delete columns REMOVE=( 1 2 4 6 7 9)
lining them up shows whiich columns deleted 2,3,5,7,8,10 and which columns are kept 1,4,6,9,12,13,14,15

solving this problem was a simple one line change in konsolebox's script

Code:

 unset "FIELDS[$I]"  goes to unset "FIELDS[$I-1]"

and I think this solves the puzzle

konsolebox · 09-25-2012, 04:21 PM

Sorry, indices start with 0 so you had to start your columns from 0, not 1. I guess this was my mistake that I didn't notice. But if you're going to start from 1, here's a modification:

Code:

#!/bin/bash

# Change columns here. This is repetitive within the loop, but just for convenience.
REMOVE=(1 2 4 6 7 9)

# Extension of output file's name.
OUTPUTEXT='out'

IFS=','

for FILE; do
	while read LINE; do
		FIELDS=() # just to be safe
		read -a FIELDS <<< "${LINE//, /,}"
		for I in "${REMOVE[@]}"; do
			unset "FIELDS[$I - 1]"
		done
		LINE="${FIELDS[*]}"
		echo "${LINE//,/, }"
	done < "$FILE" > "$FILE.$OUTPUTEXT"
done

And this is the output to it:

Code:

header_3, header_5, header_8, header10_  , header11_0_0_    , header12, header13     , header14_, header15_
123 , 123456789 , 1234567 , 1234567 , 1 , 12345678 , 1 , 1 , 123456789
   ,          ,     , , ,        ,  ,  
1234567,        , 12345, , ,        ,  ,  
   ,          ,     , , ,        ,  ,  
   ,          ,     , , ,        ,  ,  
   ,          ,     , , ,        ,  ,  
   ,          ,     , , ,        , 1, 1 
   ,          ,     , , ,        ,  ,  
1234567,        ,     , , ,        ,  ,  
1234567,        , 12345, , ,        ,  ,  , 12345
123 , 123456789 , 1234567 , 1234567 , 1 , 12345678 , 1 , 1 , 123456789
   ,          ,     , , ,        ,  ,  
1234567,        , 12345, , ,        ,  ,

If you intend to trim the spaces out:

Code:

#!/bin/bash

# Change columns here. This is repetitive within the loop, but just for convenience.
REMOVE=(1 2 4 6 7 9)

# Extension of output file's name.
OUTPUTEXT='out'

IFS=','

shopt -s extglob

for FILE; do
	while read LINE; do
		FIELDS=() # just to be safe
		read -a FIELDS <<< "${LINE//*([[:blank:]]),*([[:blank:]])/,}"
		for I in "${REMOVE[@]}"; do
			unset "FIELDS[$I - 1]"
		done
		echo "${FIELDS[*]}"
	done < "$FILE" > "$FILE.$OUTPUTEXT"
done

Output:

Code:

header_3,header_5,header_8,header10_,header11_0_0_,header12,header13,header14_,header15_
123,123456789,1234567,1234567,1,12345678,1,1,123456789
,,,,,,,
1234567,,12345,,,,,
,,,,,,,
,,,,,,,
,,,,,,,
,,,,,,1,1
,,,,,,,
1234567,,,,,,,
1234567,,12345,,,,,,12345
123,123456789,1234567,1234567,1,12345678,1,1,123456789
,,,,,,,
1234567,,12345,,,,,

If you intend to clean up multiple instances of commas (,,) as well.
This would certainly break consistency within the columns so use it at your own risk.

Code:

#!/bin/bash

# Change columns here. This is repetitive within the loop, but just for convenience.
REMOVE=(1 2 4 6 7 9)

# Extension of output file's name.
OUTPUTEXT='out'

IFS=','

shopt -s extglob

for FILE; do
	while read LINE; do
		FIELDS=() # just to be safe
		read -a FIELDS <<< "${LINE//*([[:blank:]]),*([[:blank:]])/,}"
		for I in "${REMOVE[@]}"; do
			unset "FIELDS[$I - 1]"
		done
		for I in "${!FIELDS[@]}"; do
			[[ ${FIELDS[I]} == *([[:blank:]]) ]] && unset "FIELDS[$I]"
		done
		echo "${FIELDS[*]}"
	done < "$FILE" > "$FILE.$OUTPUTEXT"
done

Output

Code:

header_3,header_5,header_8,header10_,header11_0_0_,header12,header13,header14_,header15_
123,123456789,1234567,1234567,1,12345678,1,1,123456789

1234567,12345



1,1

1234567
1234567,12345,12345
123,123456789,1234567,1234567,1,12345678,1,1,123456789

1234567,12345

And this is the output if you don't remove any column (REMOVE=()). Spaces and empty instances are just trimmed out. Notice how data are no longer in their proper columns.

Code:

header_1,header_2,header_3,header_4,header_5,header_6,header_7,header_8,header9_,header10_,header11_0_0_,header12,header13,header14_,header15_
12345678,1234,123,1,123456789,123456789,1,1234567,1,1234567,1,12345678,1,1,123456789
123456789
1234567,1234567,12345,123456789
123456789
123456789
123456789
123456789,1,1
123456789
1234567,12,123456789
1234567,1234567,12345,123456789,12345
12345678,1234,123,1,123456789,123456789,1,1234567,1,1234567,1,12345678,1,1,123456789
123456789
1234567,1234567,12345,123456789

Furthermore if you only intend to trim out spaces (32) and not other white spaces like tabs, you can make the script run a lot faster by:

Code:

read -a FIELDS <<< "${LINE//*( ),*( )/,}"

konsolebox · 09-25-2012, 04:23 PM

Looks like you made an edit when I was making the post. Good thing everything's now clear for you.

atjurhs · 09-25-2012, 10:19 PM

Thanks soooo much for all your help!

Tabitha

atjurhs · 09-26-2012, 07:46 AM

Is there a way to have it delete all columns except specified ones.

I can do it the current way except I have 54 columns and of those columns I only need 7

konsolebox · 09-26-2012, 08:30 AM

This way:

Code:

#!/bin/bash

[[ BASH_VERSINFO -ge 3 ]] || {
	echo "You need bash version 3.0 or higher to run this script."
	exit 1
}

# Set columns to keep (columns starts from 1)
KEEP_CONFIG=(1 2 4 6 7 9)

# Extension of output file's name.
OUTPUTEXT='out'

IFS=','

KEEP=()
for I in "${KEEP_CONFIG[@]}"; do
	(( J = I - 1 ))
	KEEP[J]=$J
done

for FILE; do
	while read LINE; do
		FIELDS=() # just to be safe
		read -a FIELDS <<< "${LINE//, /,}"
		for I in "${!FIELDS[@]}"; do
			[[ -z ${KEEP[I]} ]] && unset "FIELDS[$I]"
		done
		LINE="${FIELDS[*]}"
		echo "${LINE//,/, }"
	done < "$FILE" > "$FILE.$OUTPUTEXT"
done

Or

Code:

#!/bin/bash

[[ BASH_VERSINFO -ge 3 ]] || {
	echo "You need bash version 3.0 or higher to run this script."
	exit 1
}

# Set columns to keep (columns starts from 1)
KEEP_CONFIG=(1 2 4 6 7 9)

# Extension of output file's name.
OUTPUTEXT='out'

IFS=','

KEEP=()
for I in "${KEEP_CONFIG[@]}"; do
	(( J = I - 1 ))
	KEEP[J]=$J
done

shopt -s extglob

for FILE; do
	while read LINE; do
		FIELDS=() # just to be safe
		LINE=${LINE//*( ),*( )/,}
		read -a FIELDS <<< "${LINE%%*( )}"
		for I in "${!FIELDS[@]}"; do
			[[ -z ${KEEP[I]} ]] && unset "FIELDS[$I]"
		done
		echo "${FIELDS[*]}"
	done < "$FILE" > "$FILE.$OUTPUTEXT"
done

atjurhs · 09-26-2012, 10:39 AM

The first verion runs really quick and gives the correct results.

The second version runs REALLY slow, but if wait long enough it will have the correct answer.

konsolebox, thanks so much you've been a great great help!!!konsolebox,

konsolebox · 09-26-2012, 10:48 AM

welcome.

atjurhs · 09-26-2012, 11:16 AM

there's another thing I would like to do with the same file, it has to do with removing rows, but it's a little tricky, would you be willing to help me on that? if so should I just start a new thread or what?

konsolebox · 09-26-2012, 11:19 AM

I suggest starting a new thread. I'll still help. I don't mind, especially since it seems interesting.
Please share the link to the thread here.