LinuxQuestions.org
Review your favorite Linux distribution.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 05-27-2016, 09:48 AM   #1
rblampain
Senior Member
 
Registered: Aug 2004
Location: Western Australia
Distribution: Debian 11
Posts: 1,291

Rep: Reputation: 52
Learning Python, how should I implement the following problem in Python 3 code?


Two UTF-8 strings are compared word for word in nested "for" loops to find if any word in one string exists in the other. If a match is found, two arrays are searched, array one is a list of words that need to be accepted, array two is a 2-dimension array having in column 1 the word that needs to be changed and in column 2 of the same row the word to be used as a replacement. So far it's simple. But if there is no match in any array, an option is presented to manually accept the word or to change it, if the word is accepted it is then added to array one. If it is replaced it is added to array two column 1 and the entered replacement word is added to array two column 2 while one string is modified accordingly and the loop for the initial comparisons continue because a second occurence or match, or a third, may happen in the same strings, or in subsequent string comparisons, and the object is to have these repeated occurences resolved programmatically rather than manually. In other words, the arrays are dynamic within the loop itself.

I have found that Python does not seem to accept a change of variable value (the arrays) in a loop (despite a lot of confusing claims of "dynamism" in tutorials) and the apparent solution (as far as I can tell) is to terminate the loop after each change and restart it with the changed string and the possibly changed arrays. The result I get seems to be that Python executes on memorised old arrays and modifying them within the loop has no effect in the loop.

Can anyone shed light on this? Give a solution?

Thank you for your help.

Last edited by rblampain; 05-27-2016 at 09:52 AM.
 
Old 05-27-2016, 10:33 AM   #2
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,011

Rep: Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194
What you describe I think sounds fairly common. You are looping over an array and then want to change its contents while still looping. Think about this, if you were to ask for the size of the array
within the loop, what value would you expect? The size was already an available value prior to the loop starting so for it to have changed would throw out a number of things known about the array at the start
of the loop.

This is of course different to performing say a for loop which is filling an array, you can then check values on the array which will change possibly on each iteration.

Hope some of that makes sense.

Now my question about your task, why would you loop over the array and need to add to it the same time to then check the new value in the array?

Also, what is the final objective? ie. why are we filling these arrays at all?

Remember, if you are telling us what you want to do we will look at trying to help with that, but if you have a problem and this is one possible solution, it may be better to pose the problem
and see if anyone might have a better / simpler direction to use? (just a thought)

Lastly, maybe you need to go back to the drawing board and look at solving the simple cases first and build from there.
 
Old 05-28-2016, 01:37 AM   #3
rblampain
Senior Member
 
Registered: Aug 2004
Location: Western Australia
Distribution: Debian 11
Posts: 1,291

Original Poster
Rep: Reputation: 52
Thank you for your answer.
This is "throw-away" code that needs to be done only once for each file but there are hundreds of files of the same type although no new file will be added. I used Python 3 because programming languages I am more familiar with and tried to use before to solve this problem were not (or not easily) UTF-8 compatible. So I have very little need (and no intention) to learn Python in depth.
The contents of individual files are mostly unique and it is not possible to try to build lists of "words" to be accepted or changed and with what and to apply that to all files which makes it a necessity to start with empty arrays for every file and build them up as we go since every "acceptance" or every "change" of a word may involve a variable amount of time in separate manual research and avoiding to do the same thing more than once is vital, not only to avoid the waste of research time but also for quality of result and consistency since it becomes acceptable to spend any time necessary in that research if it is known that it needs to be done only once.
From your answer, I understand (perhaps incorrectly) that to solve the problem I need to build arrays of a hypothetical size that is large enough to accommodate the maximum number of additions, with fictitious values and that Python will accept a changed value in array elements but not a change of size of the array, within the same loop. Although I understand I am supposed to "learn" by myself, I would prefer not to spend a considerable amount of time to find the answer to this particular point because any search on the Internet that includes the expressions "dynamic" and "array" invariably refers to the futility of trying to dynamically build array names, not values. Also and due to my inexperience, it could take me an eternity to determine if something is not working because of a mistake in my code or because what I am trying to do is not possible but will not raise an error message and there is no help available on the Internet.

So, my question is now: Is my assumption correct? If it is correct, the simplest solution for me is to terminate the nested loops and restart them on a partly corrected string "on change" in any array using a flag, assuming any addition to arrays have been successful (which I think I can check if it works). This would become a function (no "goto" or "gosub" in Python) calling itself if the flag is set.

Programming in the Western Australian desert.

Last edited by rblampain; 05-28-2016 at 01:43 AM. Reason: humour
 
Old 05-28-2016, 03:00 AM   #4
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,011

Rep: Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194
You can dynamically fill the array and then use it, just not if you are already looping on the same array. So you do not need to fill to any predetermined size.

I have re-read your initial description and am still a little confused as to why we need to add to the array whilst still searching through them?
I am also a little lost at the specifics behind the term 'UTF-8 strings' and what 'any word in one string exists in the other' exactly means? Are you referring to a string as a single word or possibly a sentence of paragraph that is being searched? Because is only a word, most languages have some form of an instring function.

So, back to the problem, we are reading a file which has 'strings' in them (see above) and based on the strings read we want to check if they need to be added to one of two arrays.
If this is the case, once you have found the new string is missing from the respective array, you can simply add after the loop has been closed and move to the next string in the file.

Does the above seem to be the task?

Once last thing I would mention, saying something like 'So I have very little need (and no intention) to learn Python in depth' will not lend many to want to help you. No one expects you to become
a guru, but this sort of wording makes people wonder why they should help you if you don't want to bother yourself ... just a thought
 
1 members found this post helpful.
Old 05-28-2016, 06:47 AM   #5
rblampain
Senior Member
 
Registered: Aug 2004
Location: Western Australia
Distribution: Debian 11
Posts: 1,291

Original Poster
Rep: Reputation: 52
Thank you for your second answer.
What the code does is a bit complex in theory but easier to resolve in code. It reads two files that contain exactly the same number of lines (strings terminated by line feed) , then splits the files into lines on line feed then split each line into words on space (substrings separated by char 32). All lines (or sentences) in one file are then compared to the corresponding line in the other file (but not to any other line) to see if any word exists in both. The need to add to the arrays at that stage is due to the fact that, after the comparison has found a match, there could be a repetition, or even multiple repetitions, of the same word in the same line (or sentence)of the target file and while any "word" could be replaced by the same new word in the whole file, this is not true for other files containing the same word, hence the need to construct specific arrays for each file. It is a little bit as if words that need attention have a specific meaning in the context of one file but the meaning of the same word could be different in the context of another file, similar to various definitions of words of a dictionary used in different contexts in conversations hence the need to "accept" a word manually if used in a correct context and add it to an array of accepted words (existing in both files in the comparison) at the first occurence or replace it manually with a better word if necessary and also add it to the other array (wrong word=correct word) in the same way and for the same reason.

Switching to a new language because the files are UTF-8 encoded may not have been necessary in this case after all, I only persisted in case I find in the near future that Python is more useful to me than I first anticipated.

It is a fact that many do not want to help when they know my interest is only limited to solving one particular problem and I have accepted that in many other posts but I think it is necessary to expose the fact so that advices given are not unnecessarily related to my personal and unusual way of approaching the problem I try to solve. I am sure people are automatically convinced they are answering a question from an evolving colleague when, at 75, I am more looking to quick solutions and retirement.

Your last post gives the answer to my fundamental question of "dynamism" of arrays within a loop telling me that I would be loosing my time to try to get that working as it is not possible. The only remaining question could be what would be the conventional solution to this sort of problem although in my case, I am quite prepared to use a short-cut, terminate the comparisons and so on, as explained in my previous post although it may be the only solution.

I will wait a few days in case anybody has questions or comments to add and I will then mark this thread as solved.

Last edited by rblampain; 05-28-2016 at 07:48 AM.
 
Old 05-28-2016, 07:38 AM   #6
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,011

Rep: Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194
Firstly, hats off, I hope at 75 I cam still learning new things

Your problem sounds like a diff between the files that then needs to have options to make alterations. Being that you seem to imply each group of files is a one off I am not really sure there
would be a programmatic solution as each file would need the manual input you speak of. Another thing I find unclear is that if this is a one off process, why the need for arrays at all? It sounds
like you are trying to find a complicated way to edit a group of files. If the words in the arrays were to be used on subsequent files to make the same changes then I would see more purpose.

Sorry if I have missed the point

Hopefully one of the coding gurus will jump on and give you some help / direction
 
1 members found this post helpful.
Old 05-29-2016, 05:02 AM   #7
rblampain
Senior Member
 
Registered: Aug 2004
Location: Western Australia
Distribution: Debian 11
Posts: 1,291

Original Poster
Rep: Reputation: 52
Thank you again, it looks like I have persistently failed to provide an understandable explanation of my problem.
Corresponding lines in two files (lets call them 'base file' and 'target file') are compared (line 1 to line 1, line x to line x etc) if any word/expression existing in line x of 'base file' is found in the corresponding line x of 'target file', this word must be either accepted or changed in 'target file'. The arrays are needed because there may be numerous occurrences of the same word or expression within the same line/sentence/string that is being looped through by the inner loop (word for word) and within subsequent lines of the same file that will be looped through next by the outer loop (line for line). Here are examples of line/sentence/string, starting with a uppercase character and ending with punctuation mark to make it easier to understand.

Suppose at the start of the execution of the program (arrays are empty), line/sentence/string number one (split on line feed) of 'target file' (to be edited) is as follows (subsequently split on space):
Code:
Expression1 expression2 expression3 expression4 expression2 expression6.
Suppose "expression2" (second "word" in the string) is found to exist in both arguments of the comparison (in line number one of 'base file' and in line number one of 'target file') and needs to be either accepted or changed manually in 'target file' but it also exists further in the same line/sentence/string of 'target file' as "word" number 5 and we must avoid to also have to change that manually. Hence what has been done with the first occurence of "expression2" is recorded in an array as an accepted expression or in another array as an expression that has been changed and when the second occurence happen at "word" number 5 in line one of 'target file', the same solution than that applied to "word" number 2 in the same line of 'target file' is automatically applied.

Suppose line/sentence/string number 637 of the same 'target file' is as follows:
Code:
Expression517 expression923 expression2 expression347 expression421 expression169 expression1255.
and the comparison shows that the same expression ("expression2") exists in line 637 of 'base file' and in line 637 of 'target file'. If it has been forgotten by the operator that "expression2" has already been dealt with as a value to be evaluated manually, the same research mentioned in the previous posts will be done again (a waste of time) but the resulting decision may not be exactly the same (consistency is not guaranteed if only for possible typos). Hence, "expression2" happening in line number 637 of 'base file and in line number 637 of 'target file' needs to be automatically resolved in the precise same way than its first and second matching occurences in line number one.

I hope it makes sense now.

It sems that, when the first occurence of a match in the inner loop is found, the only solution (as I understand) is to close the loops and loop through the whole program with the same 'base file' and 'target file' and the arrays upgraded correctly so the loops can now execute with the upgraded arrays which will allow the automatic "accept" or "change" of these expressions rather than asking to do it manually.

The solution I am inplementing now is to:
define the upgrading of these arrays as a function
set a flag immediately when any array is upgraded
close both loops after the comparison if flag set
test if the flag is set, if set it is then reset and the function is called

The problem is that, as it is, there is a lot of unnecessary repeated execution of code but if it works, I will try to limit this function to the inner loop (word for word loop within the line for line loop) although the termination of any loop will automatically position the program to the next iteration while, in this case, the match that lead to the closure of the loop need the loop to restart at the same iteration in case a matching comparison occurs more than once within the same iteration. This is where a more experienced Python user could advise if there is a better solution.

Of course it is possible to test a whole line for matches, make lists of what needs to be accepted or changed and do the changes at the completion of the loop but this implies breaking a loop and reprocessing the same iteration which would be a similar situation.

Another solution is to write a 2-pass program that first creates a list of the necessary results and apply these results in the second pass.

Last edited by rblampain; 05-29-2016 at 06:18 AM.
 
Old 05-29-2016, 06:24 AM   #8
AnanthaP
Member
 
Registered: Jul 2004
Location: Chennai, India
Posts: 952

Rep: Reputation: 217Reputation: 217Reputation: 217
Assumptions:
  1. Since File2 is going to change, we can write the new status into a new file (say File2A) and on EOJ, swap them around.
  2. Will be defining an empty hash (associative index) which is empty to start with. This would contain original word and replacement (same as original if retained)

Process.
Read first record in files 1 and 2 until no more records (first line available initially)
Step 1. For each word in the line
Step 2. If it exists in the hash created in step 3.3 (true for line 637 in the example) then apply it
ELSE (not in the hash list,)
Step 3.1 If it exists in any word in file2 (same line number) then prompt and accept correction or to retain.
Step 3.2 Make the change and write the updated line into File2A
Step 3.3 Append an entry into the hash table
Step 4. Read next record in files 1 and 2 respectively.

OK

Last edited by AnanthaP; 05-29-2016 at 06:34 AM.
 
1 members found this post helpful.
Old 05-29-2016, 10:53 AM   #9
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,011

Rep: Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194
New explanation is much clearer and as you can see AnanthaP has a possible solution.

Interestingly, one of the parts confusing me has now been omitted and made things easier, ie. the need for a second array (actually first in initial description) 'array one is a list of words that need to be accepted'. So as pointed out above, no need to loop over the array at all as a simple hash will give you an instant answer with easy additions and changes

Have at it
 
1 members found this post helpful.
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] learning python 3 - cannot find mistake in my code rblampain Programming 2 05-27-2016 07:11 AM
I got error while installing python-tk python-psycopg2 python-twisted saili kadam Linux - Newbie 1 09-05-2015 03:03 AM
LXer: The Nature and Importance of Source Code and Learning Programming with Python LXer Syndicated Linux News 0 02-25-2010 02:50 PM
python problem - compiled from source - python -V still showing old version txm123 Linux - Newbie 1 02-15-2006 11:05 AM
Try Python, O'reilly Learning Python haknot Programming 5 02-15-2002 08:27 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 03:00 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration