Extract first word after a match

sysmicuser · 10-23-2022, 09:18 PM

Hey Guys,

I have a file like this

cat extract_data.text
[CODE]Delete Unattached Managed Standard SSD Volume pvc-1566b063-fcc6-45a0-a95d-d16e01408807 from PZI-AU-SANDBOX-SUB001
Delete Unattached Managed Standard SSD Volume pvc-edfd546d-646a-4fb8-99a4-d400cbeb608c from PZI-AU-SANDBOX-SUB001
Delete Unattached Managed Standard SSD Volume kubernetes-dynamic-pvc-404fd0bd-10ac-4ca8-bc14-2d292632f59a from PZI-AU-SANDBOX-SUB001
Delete Unattached Managed Standard SSD Volume kubernetes-dynamic-pvc-8893f6fe-d06b-4e59-a22e-9271785f7b94 from PZI-AU-SANDBOX-SUB001
Delete Unattached Managed Standard HDD Volume kubernetes-dynamic-pvc-68a671fb-b250-42df-b335-2a4c0a375262 from PZI-AU-SANDBOX-SUB001

All I want is the resource name which comes as First word AFTER Volume. Therefore the sample output what I want is

Code:

pvc-1566b063-fcc6-45a0-a95d-d16e01408807
pvc-edfd546d-646a-4fb8-99a4-d400cbeb608c
kubernetes-dynamic-pvc-404fd0bd-10ac-4ca8-bc14-2d292632f59a
kubernetes-dynamic-pvc-68a671fb-b250-42df-b335-2a4c0a375262

ANd, I did try something like this:

sed -nr "s/.*Volume (\w+).*/\1/p" extract_data.text

Code:

pvc
pvc
kubernetes
kubernetes
pvc
pvc
pvc
pvc
pvc
kubernetes
kubernetes
kubernetes
kubernetes
kubernetes
kubernetes
kubernetes
kubernetes
kubernetes
kubernetes
kubernetes
kubernetes
kubernetes
kubernetes
pvc
pvc
pvc
pvc
pvc
kubernetes
kubernetes
kubernetes
kubernetes
kubernetes
kubernetes

Second try:

grep "Volume" extract_data.text |cut -f 3

Code:

Delete Unattached Managed Standard SSD Volume pvc-1566b063-fcc6-45a0-a95d-d16e01408807 from PZI-AU-SANDBOX-SUB001
Delete Unattached Managed Standard SSD Volume pvc-edfd546d-646a-4fb8-99a4-d400cbeb608c from PZI-AU-SANDBOX-SUB001
Delete Unattached Managed Standard SSD Volume kubernetes-dynamic-pvc-404fd0bd-10ac-4ca8-bc14-2d292632f59a from PZI-AU-SANDBOX-SUB001
Delete Unattached Managed Standard SSD Volume kubernetes-dynamic-pvc-8893f6fe-d06b-4e59-a22e-9271785f7b94 from PZI-AU-SANDBOX-SUB001
Delete Unattached Managed Standard HDD Volume kubernetes-dynamic-pvc-68a671fb-b250-42df-b335-2a4c0a375262 from PZI-AU-SANDBOX-SUB001

3rd oen is EPIC but I am not an awk expert, something like form this [post|https://stackoverflow.com/questions/...d-after-match]

awk 'BEGIN{FS="Volume"} {printf ("%s:%d:%s\n", extract_data.txt, NR, $2)}'

Code:

awk: cmd. line:1: BEGIN{FS="Volume"} {printf ("%s:%d:%s\n", extract_data.txt, NR, $2)}
awk: cmd. line:1:                                                       ^ syntax error

syg00 · 10-23-2022, 10:31 PM

Use sed. The word parameter (\w) doesn't include special characters like the dash. You could create a char group that does, but I'd likely use "not space" like this (untested)

Code:

sed -nr "s/.*Volume ([^[:space:]]+).*/\1/p" extract_data.text

awk can be made much simpler if the data are all that well structured.

pan64 · 10-23-2022, 11:27 PM

Code:

awk -F'[- ]' '{ print $7 }' file

or something similar (not tested)

Turbocapitalist · 10-23-2022, 11:37 PM

sed and AWK will do the job as will Perl:

Code:

perl -n -e 'm/(?<=Volume )(\S+)/ && print $1,"\n"' extract_data.text

The advantage that approach has is the pattern matching can be quite powerful. I used Perl for everything for a very long time before learning sed and AWK.

grail · 10-24-2022, 03:15 AM

awk just likes encouragement

Code:

awk 'n{print;n=0}/Volume/{n++}' RS=' ' file

syg00 · 10-24-2022, 03:29 AM

What's wrong with something much simpler (presuming well-formed data)

Code:

awk '/Volume/ {print $7}' file

boughtonp · 10-24-2022, 06:46 AM

Quote:

Originally Posted by sysmicuser

All I want is the resource name which comes as First word AFTER Volume.

If the data is guaranteed well-formed, there's no need for a test - either of these would do:

Code:

awk '{print $7}' extract_data.txt

Code:

cut -d' ' -f7 extract_data.txt

Otherwise, I might use slight variations on the various examples already provided...

If you need to exclude certain rows, you can say when column 6 is "Volume", print column 7 - this is more restrictive than just checking Volume exists within the line:

Code:

awk '$6=="Volume" {print $7}' extract_data.txt

If "Volume" is at an unknown/changing position, a tweaked version of the answer Grail provided:

Code:

awk -vRS='\\s' 'found {print;found=0}  /\<Volume\>/ {found=1}' extract_data.txt

Mainly the "\<" and "\>" ensure "Volume" is a distinct word and fixing the variable to make it obvious what "found" is doing.
Using "\s" for the record separator provides more predictable behaviour at end of lines (though not necessarily correct).

Turbocapitalist's Perl also may need a word boundary at the start, and can use "$&" instead of the capturing group.

Code:

perl -n -e 'm/(?<=\bVolume )\S+/ && print $&,"\n"' extract_data.txt

Also, because it's Perl there's a slightly simpler version available, using "\K" to reset the match text, but not the position instead of the lookbehind.

Code:

perl -n -e 'm/\bVolume \K\S+/ && print $&,"\n"' extract_data.txt

And whilst Perl is very powerful, we don't need all that power here, and can just use grep with -P flag:

Code:

grep -oP '\bVolume \K\S+' extract_data.txt