replace in word (wildcards) for multiple rows - vba

I want to make a search/replace macro in word which is on 2 or 3 rows, like this
"art. 2
2
pct. 22 din"
and convert it to this
"art. 2<sup>2</sup>
pct. 22 din"
instead of art. i can have other words too like lit., pct., alin. and the numbers are always different
i tried to use the next wildcard replace but it doesn't work:
search: "(art. )([0-9]{1;})(^13)([0-9]{1;})(^13)"
replace: "\1\2<sup>\3</sup>^p"
if i type only (art. )([0-9]{1;})(^13) at the search field it works, but if i type the rest it doesn't find anything

Probably a bit late to be helpful, but I think this will do the trick, if I odd the horrible Word syntax correctly (I used this guide to check).
Search: ([a-z]#. )([0-9]#)^13([0-9]#)^13
With your replace as before.
First line match:
([a-z]#. ) - Any lower case letter occurring 1 or more times, followed by a dot.
([0-9]#) - Any digit occurring one or more times.
^13 - Paragraph mark.
Second line match:
([0-9]#) - Any digit occurring one or more times.
^13 - Paragraph mark.
If you find yourself doing stuff like this a lot, it might be worth using something that supports more common regular expressions (e.g. Notepad++ or similar). You might find the syntax a little bit more readable, and it would have the added bonus of teaching you something you can apply across many other environments.

Related

How to replace some characters after a specific character to another specific character in one big sql line in notepad++

I have a big sql file with thousand user something like this:
('someone1#mydomain.com','{SSHA512}JWHCqHzazH2vGneLPfhMKkoAamzvxdNCWYOlhZ+uDx36jHdoMXwQmbEemvUMn7ZG6c9+22noXjjb2hAb99/5A/slscDJPKav','','en_US','maildir','Maildir','/home/vmail','vmail1','mydomain.com/someone1/',0,'mydomain.com','','','normal','',0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,NULL,'1970-01-01 01:01:01',0,'',NULL,NULL,'2020-03-19 13:15:58','2015-08-03 06:11:53','2020-03-19 13:15:58','9999-12-31 00:00:00',1'someone1'),
('someone2#mydomain.com','{SSHA512}UoMeyocmdC2DxM0S7B4WFdjnCNuvkngzzLus33h9nugKVlvdhlcboKmMDDuAkCHEyLBUgf8DicKWFPJVS7EOF/ytv27MQ3Ch','','en_US','maildir','Maildir','/home/vmail','vmail1','mydomain.com/someone2/',0,'mydomain.com','','','normal','',0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,NULL,'1970-01-01 01:01:01',0,'',NULL,NULL,'2015-12-17 12:27:35','2015-08-03 06:44:10','2021-06-08 06:55:33','9999-12-31 00:00:00',1'someone2'),
('someone3#mydomain.com','{SSHA512}A6ToCf4OfP3XNEU9ngEmGN/LDquH9+s9Qxme3SoJaDyVvxiWpnwwTiAALSdnmhIxDB2VQK0zhdF+jP8ARvh0N3IDL0Xv/KmL','','en_US','maildir','Maildir','/home/vmail','vmail1','mydomain.com/someone3/',0,'mydomain.com','','','normal','',0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,NULL,'1970-01-01 01:01:01',0,'',NULL,NULL,'2018-04-03 12:31:09','2015-08-03 06:50:01','2018-04-03 12:31:18','9999-12-31 00:00:00',1'someone3'),
('someone4#mydomain.com','{SSHA512}t7/JbUPQ+rtKeRTgWRH6KlETr2JsqYORBOZouzOzs4Wo6YfHYLoy0m+U4kZXk+AeNgMep2hGZSodPZdK2l2bn9MhOKHOuF/L','','en_US','maildir','Maildir','/home/vmail','vmail1','mydomain.com/someone4/',0,'mydomain.com','','','normal',''0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,NULL,'1970-01-01 01:01:01',0,'',NULL,NULL,'2020-03-18 07:48:26','2016-11-14 06:59:04','2021-06-08 05:54:28',9999-12-31 00:00:00',1'someone4')
And now I need to delete the last word ('someone1' , 'someone2' , 'someone3' , 'someone4') for every user which adjoining to 1. It will be looks like
....9999-12-31 00:00:00',1)
not like in original
....9999-12-31 00:00:00',1'someone1')
....9999-12-31 00:00:00',1'someone2')
etc
But don't forget they are not in different lines. All this is in one big line and this makes me to ask you help. Thanks a lot.
It seems that (from your examples) the rows do not contain any parentheses except their start and end characters. So you can search for one quotation mark ', and a number of letters and/or digits, and one quotation mark ', and than ).
To do this;
Open Replace window in Notepad++ by using ctrl+h shortcut
From Search Mode section select Reqular expression
Write '[a-zA-Z0-9]*?[-,_,.]*?[a-zA-Z0-9]*?[-,_,.]*?[a-zA-Z0-9]*?[-,_,.]*?[a-zA-Z0-9]*?'\) to Find what box
Write '\) to Replace with box
Click Replace All button.
This works if user names consist of letters or digits and _, -, . at most 3 times.
Be Sure that you have a copy of original file as a backup. And also be aware of that the regular expression that we use may find unrelated parts if any row contains closing parentheses except end of it.

VBA replace certain carriage

All.
I am used to programming VBA in Excel, but am new to the structures in Word.
I am working through a library of text files to update them. Many of them are either OCR documents, or were manually entered.
Each has a recurring pattern, the most common of which is unnecessary carriage returns.
For example, I am looking at several text files where there is a double return after each line. A search and replace of all double carriage returns removes all paragraph distinctions.
However, each line is approximately 30 characters long, and if I manually perform the following logic, it gives me a functional document.
If there is a double carriage return after 30+ characters, I replace them with a space.
If there were less than 30 characters prior to the double return, I replace them with a single return.
Can anyone help me with some rudimentary code that would help me get started on that? I could then modify it for each "pattern" of text documents I have.
e.g.
In this case, there are more than
thirty characters per line. And I
will keep going to illustrate this
example.
This would be a new paragraph, and
would be separated by another of
the single returns.
I want code that would return:
In this case, there are more than thirty character returns. And I will keep going to illustrate this example.
This would be a new paragraph, and would be separated by another of the single returns.
Let me know if anyone can throw something out that I can play with!
You can do this without code (which RegEx requires), simply using Word's own wildcard Find/Replace tools, where:
Find = ([!^13]{30,})[^13]{1,}
Replace = \1^32
and, to clean up the residual multi-paragraph breaks:
Find = [^13]{2,}
Replace = ^p
You could, of course, record the above as a macro...
Here is a RegEx that might work for you:
(\n\n)(?<!\.(\n\n))
The substitution is just a plain space, you can try it out (and modify / tweak it) here: https://regex101.com/r/zG9GPw/4
This 'pattern' tells the RegEx engine to look for the newline character \n which occurs x2 like this \n\n (worth noting this is from your question and might be different in your files, e.g. could be \r\n) and it assumes that a valid line break will be proceeded by a full stop: \..
In RegEx the full stop symbol is a single character wild card so it needs to be escaped with the '\' (n and r are normal characters, escaping them tells the RegEx engine they represent newline and return characters).
So... the expression is looking for a group of x2 newline characters but then uses a negative look-behind to exclude any matches where the previous character was a full stop.
Anyway, it's all explained on the site:
Here is how you could do a RegEx find and replace using NotePad++ (I'm not sure if it comes with RegEx or if a plugin is needed, either way it is easy). But you can set a location, filters (to target specific file types), and other options (such as search in sub-directories).
Other than that, as #MacroPod pointed out you could also do this with MS Word, document by document, not using any code :)

Parser not recognizing a dash

My program makes calculations on physics vectors and it allows copy/pasting from websites and then tries to parse them into the x, y, and z components automatically. I've come across one website (http://mathinsight.org/cross_product_examples) that has (3,−3,1). While that looks normal, that minus is actually not recognized by VB. Visually, it is longer than the normal minus (− and -), but return the same Unicode of 45. This picture shows the Unicode for every character (I added a minus in front of the first 3 for comparison) in the Textbox. Also, from this website, I had to use Ctrl+c because right clicking shows that this is not simple HTML.
One is valid (the first), but the second gives VB fits as shown below. Either it won't compile (shown by the blue line below) or a simple assignment (the second one) wrecks havok on my form.
I have tried using
vectorString.Replace("–", "-")
and pasting in the longer dash for the target string and a normal keystroke dash as the replacement, but nothing happens. I'm guessing that since they both have the same Unicode.
Is there some way to convert the longer, invalid dash into the one recognized by VB? I tried using dash symbol that Word likes to replace the minus sign with and it comes up as Unicode 150. So, apparently there are at least three different kinds of dashes. Any thoughts?
The character from Math Insight is U+2212, minus sign. The character you tried using in your Replace call is U+2013, en dash. That's why your replace didn't work.
Beyond the standard ASCII hyphen (-, U+0045), there are two common dashes: the en dash (–, U+2013) and the em dash (—, U+2014). There is also a figure dash (‒, U+2012), but it is not as common.

Collect a word between two spaces in objective c

I'm trying to implement stuff similar to spell check, but I need to get the word that is limited by a space. EX: "HI HOW R U", I need to collect HI, HOW and so on as they type. i.e. After user hits HI and space I need to collect HI and do a spell check.
Check the documentation for NSString Here. You want the message componentsSepeparatedByString:.
I don't know objective-C, but I'm fairly sure it'll have a Regexp library - although it'd be straightforward to code it without one.
Regexp: \b([^\s])*\b
\b = word boundary (whitespace, comma, dot, exclamation-mark, etc.)
\s = whitespace character
[...] = character set
[^...] = negated character set (any character(s) EXCEPT ...)
() = grouping construct
* = zero or more times
So the suggested expression would start matching at any word boundary, then match every subsequent character that is not a whitespace character, then match a word boundary.
Your stated case is so simple you may just want to look for spaces (one char at a time) and get the substring, but RegExp is very widely used across a range of languages and platforms, and so it's fairly easy to find an expression when you need to - and one often does for common stuff like checking if zip codes, phone numbers, email addresses and so on are syntactically correct. So it's worth learning in any case. :)

Regex issue using ICU regex/regexkitlite

Starting a new question as my other question solved a different issue with the regex.
Here's my regex:
(?i)\\d{1,4}(?<!v(?:ol)?\\.?\\s?)(?![^\\(]*\\))
Regex split up for clarity:
(?i) - case insensitive
\\d{1,4} - a number with 1-4 digits
(?<!v(?:ol)?\\.?\\s?) the number cannot be preceded by 'v', 'v.', 'vol', 'vol.', with or without a space on the end.
(?![^\\(]*\\)) - Number cannot be inside parentheses.
It all works except for the 'vol.' bit.:
#"Words words 342 words (2342) (words 2 words) (words).ext" result 342 - correct.
#"Words - words words (2010) (words 2 words) (words).ext" result nil - correct.
#"words words v34 35.ext" result 34 - incorrect.
#"Words vol.342 343 (1234) (3 words) (desc).ext" result 342 - incorrect.
What am I doing wrong with my 'vol.' section?
You need to put the lookbehind before the number. Also, you need to add digits as illegal characters inside the lookbehind, or the 4 in v.34 will match. Try
(?i)(?<!v(?:ol)?\\.?\\s*\\d*)\\d{1,4}(?![^(]*\\))
This is expecting (edit: wrongly, as it turns out) that regexkitlite supports infinite repetition inside lookbehind which not many regex flavors do.
A look into the docs shows that it does support finite (but variable) repetition inside lookbehind, and if you are aware that the following will only work if there is at most one space between vol. and the number, then you could try
(?i)(?<!v(?:ol)?\\.?\\s?)(?<!\\d)\\d{1,4}(?![^(]*\\))