I want to get both the revised and original text from a document. I do it this way:
Set wrdDoc = wrdApp.Documents.Open(fileName)
For each sent in wrdDoc.Sentences
if sent.Revisions.Count >=0 then
after=sent.text
sent.Revisions.RejectAll
before=sent.text
SaveRev(before,after)
End if
next
Now that would be fine, except that malformed sentences like
This is one sentence.This is another.
Will get parsed in a weird way. First, there will be this one: "This is one sentence.", then this one with both "This is one sentence.This is another."
What happens when there are revisions there? The first iteration will revert revisions on the first sentence, then the second iteration will not 'see' that revised portion.
Bottom line is, the first iteration will get both versions of the first sentence, and the second iteration will get only the original version of the first sentence (while getting both versions from the second sentence).
Let me clarify:
Let's say I had the original
We started with this sentence.And this sentence.
And it was revised to
We ended with this sentence.And this other sentence.
First iteration will result in
Before: We started with this sentence.
After: We ended with this sentence.
But second iteration will have
Before: We ended with this sentence.And this sentence.
After: We ended with this sentence.And this other sentence.
Well, what I did was alter the logic, undoing the revision reversion:
Set wrdDoc = wrdApp.Documents.Open(fileName)
For each sent in wrdDoc.Sentences
if sent.Revisions.Count >=0 then
wrdDoc.Undo
after=sent.text
sent.Revisions.RejectAll
before=sent.text
SaveRev(before,after)
End if
next
I like this because I end up with an unaltered document (except for the last sentence).
The thing is, doing this puts the macro in an infinite loop at one specific sentence.
I have no idea of the mechanics of the for each, I have no clue what is causing it to hang. Obviously altering the collection is messing up the loop, but I don't understand why.
I could loop for i=0 to wrdDoc.Sentences.Count, but I think that will make me skip sentences for the same reasons I'm repeating one now, and I cannot risk it (even if I test OK, I have to be sure it will never happen).
So the question is (are):
Can any one help me figuring out why it's locking on a sentence,
Is there a better way of doing this?
How can I solve it while making sure not to skip sentences.
Thank you very much!
PS: I can provide sample documents, let me know if it's needed (maybe what I'm doing wrong is already clear to someone, and I'd have to make the samples as I cannot share the documents I'm working on).
--EDIT--
Ok so this is where it's hanging, only on the 32nd file.
It doesn't hang on a sentence, it actually does a few at the start of the document, then goes back to the beginning.
I previously encountered the same error, but it looped in a single sentence, and didn't go back to the beginning. I think it's the same issue. I'll try to reproduce original and revised versions here.
Originalversion
MAIN TITLE
Measurement of some variable
1 REQUIRED TOOLS
1.1 Special tools
NOTe:
Some note about the procedure (unaltered by revision)
Equipment name (carrier returned line)
(english) assemply with Equipment PN
Kit
Equipment name (carrier returned line)
(english) assemply with (Another) Equipment PN
Kit
Document continues...
There are 2 equipment entries before it restarts the loop.
Revision consisted of inserting the document number, some First Letter of the Word caps, and changing the order between Equipment PN and "Kit".
Revised version
ducument number
MAIN TITLE
Measurement of Some Variable
1 REQUIRED TOOLS
1.1 Special Tools
NOTe:
Some note about the procedure (unaltered by revision)
Equipment name (carrier returned line)
(english) assemply with kit
Equipment PN
Equipment name (carrier returned line)
(english) assemply with kit
(Another) Equipment PN
Document continues...
Recorded original/revison pairs were:
Original..................................Revised
{Empty}...................................Document number
Measurement of some variable..............Measurement of Some Variable
Special tools............................Special Tools
(english) assemply with..................(english) assemply with kit
(english) assemply with..................(english) assemply with kit
Then it starts again, recording the same entries until I break.
I don't see the sentences overlapping I talked about, but there was a line break insertion on the revision.
Thanks!
Enumerable objects should not be altered during the enumeration or bad things can happen (what depends on the type of collection).
My guess is that the revision/undo process, combined with the wonky sentence, is causing the Sentences enumerable to change.
You should prepare your own collection first, to see if that makes a difference. Simply try Set sents = New Collection: For Each sent in wrdDoc.Sentences: sents.Add sent: Next then use sents for your main For Each loop.
Related
My question is somewhat specific, I'm not using any kind of code compiler to achieve the result in the title, I am using a IRC Client that allows the use of "Quirks" so the users can have specific mannerisms when chatting, like starting every word with an uppercase, or changing every "s" into a "2".
Problem is that I can't see the whole code so even though I'm not familiar with REGEXP_REPLACE it makes things harder to learn.
The client simplifies the whole coding process, here's a screenshot of the
interface
Filling the text boxes with "^(\w)" and "upper(\1)" respectively makes the first character capitalized, "(\w)$" and "upper(\1)" does the same with the last character.
I've discovered that "\b(\w)" will uppercase the first character of every word, i've tried "\b(\w)%" for the last character but it didn't work, as there is some syntax error, probably...
So, how do I get every last character capitalized?
1:
Let's say you have a module that's several hundreds of lines long. At the very top of your code file, you go to start up a string, so you type a quote. Total wreckage ensues as the string remains unterminated for a time, causing everything within your entire code file to be subject to erratic encapsulation by your string (see image for actual example of all the errors generated). No big deal, right? You just finish your string and all the errors will go away. While true, you may find the IDE has had its way with other strings in your document. For example, these lines...
oLog.writeLogFile("Starting System Update and Version Update ")
oLog.writeLogFile("Starting Script for Fetching Data from Source to Dest")
...get changed to this:
oLog.writeLogFile("Starting System Update And Version Update ")
oLog.writeLogFile("Starting Script For Fetching Data from Source To Dest")
Notice how and changes to And, for to For, and to to To. What's happening here is that, as other strings in the document become... eh... "destrung"... so some of the words that were once part of a string are now interpreted as keywords by the IDE. Because it's VB, it modifies capitalization automatically. When you finally terminate your string, all the other strings further down in the document become properly terminated as well, but the jarring effects still remain.
Is there a way to prevent this from occurring?
Why not first type a double ", then return in between them and start typing your string? I do it all the time to prevent this. I find that the short delay in between typing your first " and the moment the IDE starts capitalizing keywords is long enough for me to (remember to) type the second ".
I tried out "article buddy" (http://article-buddy.com/sales-page/)
They claim to change any content into original content for the search engines.
Everytime I tested the output of the tool it passed copyscrape and such.
After a while I tried finding out how the tool was fooling the tests because the text did not seem to change in any way. After I pasted the so called original content into "textpad" (text editor). I saw a lot of weird characters especially the question mark that replaced certain characters in the text.
Example (original text):
WestBow Press titles are regularly reviewed by Thomas Nelson & Zondervan for new, talented authors. While there is no guarantee of the number of titles to be signed each year, this is an opportunity to get your foot in the door. - See more at:
Example (output text of article buddy):
W??tB?w Pr??? t?tl?? are regularly r?v??w?d b? Th?m?? N?l??n & Z?nd?rv?n for n?w, t?l?nt?d authors. Wh?l? th?r? ?? n? gu?r?nt?? ?f th? numb?r ?f t?tl?? t? b? ??gn?d each ???r, th?? ?? an opportunity t? g?t ??ur f??t ?n th? door. - S?? m?r? ?t:
Does anyone know what is going on here?
I have seen multiple tools like this one in the past. Typically, the way it works is that it spins the content and replace words or whole sentences.
This used to work back in the days. My suggestion is to not waist your time and money on those. The only acceptable thing these days is original content.
I have over a hundred text files and I need to change the construction of several sentences using a specific format. I am not very familiar or experienced with Word VBA but I hope I could get some ideas to help me get started. I have below the original paragraph and its desired output. Basically I need to place the values (e.g. 40-120 parts) after each item (e.g. isoleucine) and enclose those with "(" and ")".
Original: An acid combination for increasing immunity, comprising the following raw materials by weight: 40-120 parts of isoleucine, 45-135 parts of leucine, 76.5-229.5 parts of lysine hydrochloride, 21.5-64.5 parts of methionine, 35-105 parts of phenylalanine, 40-120 parts of valine, 30-90 parts of threonine, 39-117 parts of arginine, 23-69 parts of histidine, 37.5-112.5 parts of glycine, 50-150 parts of aspartate, 900-2700 parts of dried mushroom, 750-2250 parts of medlar and 250-750 parts of licorice.
Desired Output: An acid combination for increasing immunity comprises (pts.wt.): isoleucine (40-120), leucine (45-135), lysine hydrochloride (76.5-229.5), methionine (21.5-64.5), phenylalanine (35-105), valine (40-120), threonine (30-90), arginine (39-117), histidine (23-69), glycine (37.5-112.5), aspartate (50-150), dried mushroom (900-2700), medlar (750-2250) and licorice (250-750).
Maybe you could try the following sequence :
Find the part you want to change (numbers seperated by - and parts) with the Find function (another link) and a well-formed regexp (meant wildcards for Word)
Set the brackets at the beginning and at the end of the matched element (use the range object)
Delete the last word ("part") - or whatever you want to do
Loop through every results to do the same (see an example of looping through find function here)
Don't forget you can record macro if you are looking for some tips or specific objects (even if the code produced is less complete than the one produced by Excel vba).
Please don't hesitate to post some code if you want some more help,
Regards,
Max
I'm on OS X, and in objective-c I'm trying to convert
for example,
"Bobateagreenapple"
into
"Bob ate a green apple"
Is there any way to do this efficiently? Would something involving a spell checker work?
EDIT: Just some extra information:
I'm attempting to build something that takes some misformatted text (for example, text copy pasted from old pdfs that end up without spaces, especially from internet archives like JSTOR). Since the misformatted text is probably going to be long... well, I'm just trying to figure out whether this is feasibly possible before I actually attempt to actually write system only to find out it takes 2 hours to fix a paragraph of text.
One possibility, which I will describe this in a non-OS specific manner, is to perform a search through all the possible words that make up the collection of letters.
Basically you chop off the first letter of your letter collection and add it to the current word you are forming. If it makes a word (eg dictionary lookup) then add it to the current sentence. If you manage to use up all the letters in your collection and form words out of all of them, then you have a full sentence. But, you don't have to stop here. Instead, you keep running, and eventually you will produce all possible sentences.
Pseudo-code would look something like this:
FindWords(vector<Sentence> sentences, Sentence s, Word w, Letters l)
{
if (l.empty() and w.empty())
add s to sentences;
return;
if (l.empty())
return;
add first letter from l to w;
if w in dictionary
{
add w to s;
FindWords(sentences, s, empty word, l)
remove w from s
}
FindWords(sentences, s, w, l)
put last letter from w back onto l
}
There are, of course, a number of optimizations you could perform to make it go fast. For instance checking if the word is the stem of any word in the dictionary. But, this is the basic approach that will give you all possible sentences.
Solving this problem is much harder than anything you'll find in a framework. Notice that even in your example, there are other "solutions": "Bob a tea green apple," for one.
A very naive (and not very functional) approach might be to use a spell-checker to try to isolate one "real word" at a time in the string; of course, in this example, that would only work because "Bob" happens to be an English word.
This is not to say that there is no way to accomplish what you want, but the way you phrase this question indicates to me that it might be a lot more complicated than what you're expecting. Maybe someone can give you an acceptable solution, but I bet they'll need to know a lot more about what exactly you're trying to do.
Edit: in response to your edit, it would probably take less effort to run some kind of OCR tool on a PDF and correct its output than it would just to correct what this system might give you, let alone program it
I implemented a solution, the code is avaible on code project:
http://www.codeproject.com/Tips/704003/How-to-add-spaces-between-spaceless-strings
My idea was to prioritize results that use up most of the characters (preferable all of them) then favor the ones with the longest words, because 2,3 or 4 character long words can often come up by chance from leftout characters. Most of the times this provides the correct solution.
To find all possible permutations I used recursion. The code is quite fast even with big dictionaries (tested with 50 000 words).