[SOLVED] Remove trailing characters while adding leading characters
ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Remove trailing characters while adding leading characters
Text file contains numerous strings with trailing sub-string.
example where _xx is the trailing sub-string;
Quote:
"m1_xx" some other text "m2_xx"
"p2_xx" yet more text "p2_xx" extra text
change is good "hello_xx"
desired output:
Quote:
"yy_m1" some other text "yy_m2"
"yy_p2" yet more text "yy_p2" extra text
change is good "yy_hello"
I found ways to make the substitution. However, with my method I lose the existing spacing - all the strings in the output are separated by a single space.
Please provide what you have tried so we may assist?
Code:
#!/bin/env python3
def changeXXToYY():
# read lines into list
with open("testText") as fp:
mapList = fp.readlines()
# remove all line feeds
mapList = [x.strip() for x in mapList]
toRemove = '_XX"'
toAdd = '"YY_'
for elem in mapList:
elem = elem.split()
for item in elem:
if toRemove in item:
item = toAdd + item.split(toRemove)[0].split('"')[-1] + '"'
print(item)
change2kTo3d()
This prints out the desired new string but the original line remains unchanged.
The \( \) group match is referred by the \1 in the substitution string.
[^"] is a character that is not a "
[^"]* is such a character any times.
The /g modifier looks for further matches/substitutions in the line; a further match is right from the current match.
The \( \) group match is referred by the \1 in the substitution string.
[^"] is a character that is not a "
[^"]* is such a character any times.
The /g modifier looks for further matches/substitutions in the line; a further match is right from the current match.
The \( \) group match is referred by the \1 in the substitution string.
[^"] is a character that is not a "
[^"]* is such a character any times.
The /g modifier looks for further matches/substitutions in the line; a further match is right from the current match.
My apologies but I noticed that my input file will also have cases where the original string is not withing double quotes.
How should this sed command be modified to work in such cases? I've tried a few things but nothing changed.
If " anchors cannot be used, you can try \b anchors ("word boundaries"):
Code:
sed 's/\b\([^" ]*\)_xx\b/yy_\1/g' testText
[^" ]* is a string of characters that are not " or space.
The pre-defined "word boundary" is just a marker not a character, so it must not be re-inserted. But it is less precise e.g. also occurs at a - character.
The following uses Extended RegularExpression and three ( ) groups:
Code:
sed -E 's/(^|[" ])([^" ]*)_xx([" ]|$)/\1yy_\2\3/g' testText
The 1st group is the beginning marker or a " or space character.
The 2nd group is a string of not " or space characters.
The 3rd group is a " or space character or the end marker.
\1 \2 \3 is what the respective group has matched.
An alternate approach is to specify what you are looking for, rather than what you are not looking for. Also protects from overlooking possible corner cases (like what if one of those blanks is a tab ?).
Just to clarify: a “regular expression (regex …)” can not only match a string pattern – (“yes or no, does it match?”) – but also return to you various specified pieces of the matching string. Such as: “some other text” and “more text.” In environments like sed, these pieces are instantly available as things like [left to right …] “$1” and “$2.” Or maybe, “\1” and “\2.” Which you can immediately use to produce output.
Also: These days, “regex support” is universal, and the syntax has become standardized. Implementations now vary only in the details. Every language has it. Therefore, understanding this very important power-tool is definitely “an essential life skill.” (Like knowing how to use a life jacket …) If you need to “tear apart a text string,” (and who doesn’t?), regex has your back.
There are “esoteric fee-churs” in regexes that you can learn about if and when you actually need them, and others that you might use every day.
Last edited by sundialsvcs; 04-05-2024 at 09:10 AM.
Just to clarify: a “regular expression (regex …)” can not only match a string pattern – (“yes or no, does it match?”) – but also return to you various specified pieces of the matching string. Such as: “some other text” and “more text.” In environments like sed, these pieces are instantly available as things like [left to right …] “$1” and “$2.” Or maybe, “\1” and “\2.” Which you can immediately use to produce output.
Also: These days, “regex support” is universal, and the syntax has become standardized. Implementations now vary only in the details. Every language has it. Therefore, understanding this very important power-tool is definitely “an essential life skill.” (Like knowing how to use a life jacket …) If you need to “tear apart a text string,” (and who doesn’t?), regex has your back.
There are “esoteric fee-churs” in regexes that you can learn about if and when you actually need them, and others that you might use every day.
I do coding in Cadence SKILL language for design automation in a Linux environment (analog IC design). However, to my complete and utter shame, I have never gotten past a few rudimentary regular expression usages. The fact is, despite working in a Linux environment, I don't often have much need for regular expressions and have never taken that deep dive. I blame it on linuxquestions - you guys spoil me with amazing solutions.
An alternate approach is to specify what you are looking for, rather than what you are not looking for. Also protects from overlooking possible corner cases (like what if one of those blanks is a tab ?).
The excellent solution posted by syg00 may be generalized.
xx and yy could be variable names instead of character strings.
With this InFile ...
Code:
m1_SALT some other text m2_SALT
p2_SALT yet more text p2_SALT extra text
change is good hello_SALT
m1_HAM some other text m2_HAM
p2_HAM yet more text p2_HAM extra text
change is good hello_HAM
... this code ...
Code:
xx='SALT'
yy='SUGAR'
sed -r 's/([[:alnum:]]+)_'$xx'/'$yy'_\1/g' <$InFile >$OutFile
... produces this OutFile ...
Code:
SUGAR_m1 some other text SUGAR_m2
SUGAR_p2 yet more text SUGAR_p2 extra text
change is good SUGAR_hello
m1_HAM some other text m2_HAM
p2_HAM yet more text p2_HAM extra text
change is good hello_HAM
Which shows how we may change SALT into SUGAR
but not HAM into CHEESE.
Daniel B. Martin
.
Last edited by danielbmartin; 04-06-2024 at 11:57 AM.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.