LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 04-03-2024, 08:17 PM   #1
sharky
Member
 
Registered: Oct 2002
Posts: 569

Rep: Reputation: 84
Remove trailing characters while adding leading characters


Text file contains numerous strings with trailing sub-string.

example where _xx is the trailing sub-string;

Quote:
"m1_xx" some other text "m2_xx"
"p2_xx" yet more text "p2_xx" extra text
change is good "hello_xx"
desired output:

Quote:
"yy_m1" some other text "yy_m2"
"yy_p2" yet more text "yy_p2" extra text
change is good "yy_hello"
I found ways to make the substitution. However, with my method I lose the existing spacing - all the strings in the output are separated by a single space.
 
Old 04-03-2024, 08:56 PM   #2
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,138

Rep: Reputation: 4122Reputation: 4122Reputation: 4122Reputation: 4122Reputation: 4122Reputation: 4122Reputation: 4122Reputation: 4122Reputation: 4122Reputation: 4122Reputation: 4122
sed is your friend - use regex and capture groups. Do-able in a single invocation.
 
Old 04-03-2024, 10:56 PM   #3
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,008

Rep: Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193
Please provide what you have tried so we may assist?
 
Old 04-04-2024, 01:05 PM   #4
sharky
Member
 
Registered: Oct 2002
Posts: 569

Original Poster
Rep: Reputation: 84
Quote:
Originally Posted by syg00 View Post
sed is your friend - use regex and capture groups. Do-able in a single invocation.
What is a 'capture group'?
 
Old 04-04-2024, 01:32 PM   #5
sharky
Member
 
Registered: Oct 2002
Posts: 569

Original Poster
Rep: Reputation: 84
Quote:
Originally Posted by grail View Post
Please provide what you have tried so we may assist?
Code:
#!/bin/env python3

def changeXXToYY():

  # read lines into list
  with open("testText") as fp:
    mapList = fp.readlines()

  # remove all line feeds
  mapList = [x.strip() for x in mapList]

  toRemove = '_XX"'
  toAdd = '"YY_'

  for elem in mapList:
    elem = elem.split()
    for item in elem:
      if toRemove in item:
        item = toAdd + item.split(toRemove)[0].split('"')[-1] + '"'
        print(item)

change2kTo3d()
This prints out the desired new string but the original line remains unchanged.
 
Old 04-04-2024, 02:37 PM   #6
MadeInGermany
Senior Member
 
Registered: Dec 2011
Location: Simplicity
Posts: 2,806

Rep: Reputation: 1207Reputation: 1207Reputation: 1207Reputation: 1207Reputation: 1207Reputation: 1207Reputation: 1207Reputation: 1207Reputation: 1207
With sed:
Code:
sed 's/"\([^"]*\)_xx"/"yy_\1"/g' testText
The \( \) group match is referred by the \1 in the substitution string.
[^"] is a character that is not a "
[^"]* is such a character any times.
The /g modifier looks for further matches/substitutions in the line; a further match is right from the current match.
 
1 members found this post helpful.
Old 04-04-2024, 03:12 PM   #7
sharky
Member
 
Registered: Oct 2002
Posts: 569

Original Poster
Rep: Reputation: 84
Quote:
Originally Posted by MadeInGermany View Post
With sed:
Code:
sed 's/"\([^"]*\)_xx"/"yy_\1"/g' testText
The \( \) group match is referred by the \1 in the substitution string.
[^"] is a character that is not a "
[^"]* is such a character any times.
The /g modifier looks for further matches/substitutions in the line; a further match is right from the current match.
It works. Thanks for the explanation also.
 
Old 04-04-2024, 03:25 PM   #8
sharky
Member
 
Registered: Oct 2002
Posts: 569

Original Poster
Rep: Reputation: 84
Quote:
Originally Posted by MadeInGermany View Post
With sed:
Code:
sed 's/"\([^"]*\)_xx"/"yy_\1"/g' testText
The \( \) group match is referred by the \1 in the substitution string.
[^"] is a character that is not a "
[^"]* is such a character any times.
The /g modifier looks for further matches/substitutions in the line; a further match is right from the current match.
My apologies but I noticed that my input file will also have cases where the original string is not withing double quotes.

How should this sed command be modified to work in such cases? I've tried a few things but nothing changed.
 
Old 04-04-2024, 08:50 PM   #9
MadeInGermany
Senior Member
 
Registered: Dec 2011
Location: Simplicity
Posts: 2,806

Rep: Reputation: 1207Reputation: 1207Reputation: 1207Reputation: 1207Reputation: 1207Reputation: 1207Reputation: 1207Reputation: 1207Reputation: 1207
If " anchors cannot be used, you can try \b anchors ("word boundaries"):
Code:
sed 's/\b\([^" ]*\)_xx\b/yy_\1/g' testText
[^" ]* is a string of characters that are not " or space.
The pre-defined "word boundary" is just a marker not a character, so it must not be re-inserted. But it is less precise e.g. also occurs at a - character.
The following uses Extended RegularExpression and three ( ) groups:
Code:
sed -E 's/(^|[" ])([^" ]*)_xx([" ]|$)/\1yy_\2\3/g' testText
The 1st group is the beginning marker or a " or space character.
The 2nd group is a string of not " or space characters.
The 3rd group is a " or space character or the end marker.
\1 \2 \3 is what the respective group has matched.
 
Old 04-04-2024, 09:15 PM   #10
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,138

Rep: Reputation: 4122Reputation: 4122Reputation: 4122Reputation: 4122Reputation: 4122Reputation: 4122Reputation: 4122Reputation: 4122Reputation: 4122Reputation: 4122Reputation: 4122
An alternate approach is to specify what you are looking for, rather than what you are not looking for. Also protects from overlooking possible corner cases (like what if one of those blanks is a tab ?).
Code:
 sed -r 's/([[:alnum:]]+)_xx/yy_\1/g' input.file
 
Old 04-05-2024, 09:01 AM   #11
sundialsvcs
LQ Guru
 
Registered: Feb 2004
Location: SE Tennessee, USA
Distribution: Gentoo, LFS
Posts: 10,671
Blog Entries: 4

Rep: Reputation: 3945Reputation: 3945Reputation: 3945Reputation: 3945Reputation: 3945Reputation: 3945Reputation: 3945Reputation: 3945Reputation: 3945Reputation: 3945Reputation: 3945
Just to clarify: a “regular expression (regex …)” can not only match a string pattern – (“yes or no, does it match?”) – but also return to you various specified pieces of the matching string. Such as: “some other text” and “more text.” In environments like sed, these pieces are instantly available as things like [left to right …] “$1” and “$2.” Or maybe, “\1” and “\2.” Which you can immediately use to produce output.

Also: These days, “regex support” is universal, and the syntax has become standardized. Implementations now vary only in the details. Every language has it. Therefore, understanding this very important power-tool is definitely “an essential life skill.” (Like knowing how to use a life jacket …) If you need to “tear apart a text string,” (and who doesn’t?), regex has your back.

There are “esoteric fee-churs” in regexes that you can learn about if and when you actually need them, and others that you might use every day.

Last edited by sundialsvcs; 04-05-2024 at 09:10 AM.
 
Old 04-05-2024, 05:59 PM   #12
sharky
Member
 
Registered: Oct 2002
Posts: 569

Original Poster
Rep: Reputation: 84
Quote:
Originally Posted by sundialsvcs View Post
Just to clarify: a “regular expression (regex …)” can not only match a string pattern – (“yes or no, does it match?”) – but also return to you various specified pieces of the matching string. Such as: “some other text” and “more text.” In environments like sed, these pieces are instantly available as things like [left to right …] “$1” and “$2.” Or maybe, “\1” and “\2.” Which you can immediately use to produce output.

Also: These days, “regex support” is universal, and the syntax has become standardized. Implementations now vary only in the details. Every language has it. Therefore, understanding this very important power-tool is definitely “an essential life skill.” (Like knowing how to use a life jacket …) If you need to “tear apart a text string,” (and who doesn’t?), regex has your back.

There are “esoteric fee-churs” in regexes that you can learn about if and when you actually need them, and others that you might use every day.
I do coding in Cadence SKILL language for design automation in a Linux environment (analog IC design). However, to my complete and utter shame, I have never gotten past a few rudimentary regular expression usages. The fact is, despite working in a Linux environment, I don't often have much need for regular expressions and have never taken that deep dive. I blame it on linuxquestions - you guys spoil me with amazing solutions.
 
Old 04-05-2024, 05:59 PM   #13
sharky
Member
 
Registered: Oct 2002
Posts: 569

Original Poster
Rep: Reputation: 84
Quote:
Originally Posted by syg00 View Post
An alternate approach is to specify what you are looking for, rather than what you are not looking for. Also protects from overlooking possible corner cases (like what if one of those blanks is a tab ?).
Code:
 sed -r 's/([[:alnum:]]+)_xx/yy_\1/g' input.file
This worked perfectly.

Thanks!
 
Old 04-06-2024, 05:40 AM   #14
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,138

Rep: Reputation: 4122Reputation: 4122Reputation: 4122Reputation: 4122Reputation: 4122Reputation: 4122Reputation: 4122Reputation: 4122Reputation: 4122Reputation: 4122Reputation: 4122
You need to take that "deep dive" - regex is a powerful and useful tool. MadeInGermany has given you good pointers to get you started.
 
Old 04-06-2024, 11:53 AM   #15
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,881

Rep: Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660
Please forgive if this is obvious to LQ regulars.

The excellent solution posted by syg00 may be generalized.
xx and yy could be variable names instead of character strings.

With this InFile ...
Code:
m1_SALT some other text m2_SALT
p2_SALT yet more text p2_SALT extra text
change is good hello_SALT
m1_HAM some other text m2_HAM
p2_HAM yet more text p2_HAM extra text
change is good hello_HAM
... this code ...
Code:
xx='SALT'
yy='SUGAR'
sed -r 's/([[:alnum:]]+)_'$xx'/'$yy'_\1/g' <$InFile >$OutFile
... produces this OutFile ...
Code:
SUGAR_m1 some other text SUGAR_m2
SUGAR_p2 yet more text SUGAR_p2 extra text
change is good SUGAR_hello
m1_HAM some other text m2_HAM
p2_HAM yet more text p2_HAM extra text
change is good hello_HAM
Which shows how we may change SALT into SUGAR
but not HAM into CHEESE.

Daniel B. Martin

.

Last edited by danielbmartin; 04-06-2024 at 11:57 AM.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
check leading and trailing white space from a variable and fail the script dilip_d21 Linux - Software 20 12-13-2021 09:03 AM
LXer: Add leading zeroes that aren't really leading LXer Syndicated Linux News 0 09-13-2019 12:31 PM
LXer: Leading and trailing whitespace LXer Syndicated Linux News 0 06-28-2019 03:04 AM
Truncating trailing characters liguorir Programming 0 05-23-2004 04:00 AM
passwords with trailing characters murray_linux Slackware 4 04-08-2004 11:12 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 08:27 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration