How to remove unknown line break (special character) in text file? - vb.net

I have a text file which shows a Line Break in UltraEdit if we replace a special character in text file manually it works fine. Unknown Line Break. I have to change it manually and then process the files.
Please let me know some way how to remove all occurrences of this character with VB.Net code.
If I replace ♀ in UltraEdit, it replaces line break with my desired string. But in my VB string I cannot use this character or line break.

The character you have in your file is the form-feed character usually used as control character for a page break.
In UltraEdit in Page Setup configuration dialog (a printing related dialog) there is the option Page break code which has by default the decimal value 12 (hexadecimal 0C) which is the form-feed character.
A page break can be displayed in UltraEdit with a horizontal line across the document window on enabling Show Page Breaks as Lines in menu/ribbon View.
The form-feed character can be removed in UltraEdit with searching for ^b on using a normal, non regular expression or an UltraEdit regular expression replace, or with searching for \f on using a Unix or Perl regular expression replace.
In VB.Net code ChrW(12) can be used to reference the form-feed control character as suggested already by Hans Passant.

Related

Postgresql "+" symbol for carriage return / new line for zsh script

So I query that pulls text from a column called "description."
Each description contains a list like so:
1) Some text here
2) Some text here
3) Some text here
The problem is when I run the query in my zsh script the new lines return a "+" symbol instead of a carriage return.
1) Some text here + 2) Some text here + 3) Some text here...
The code is
tempdescriptionday4=$(/Applications/Postgres.app/Contents/Versions/12/bin/psql -h 1.1.1.1 -p5555 -U myuser mytable -t -c "SELECT description FROM cycle_10 WHERE air_date = 2020-11-10"
I'm trying to inject this list into an XML file for an RSS feed but I'm stuck on how to format this properly. I tried replacing the + symbol with 
 but that didn't work--I still am not getting a new line. Any ideas?
Found this in the postgresql12 manual — it's called:linestyle
"Sets the border line drawing style to one of ascii, old-ascii, or unicode. Unique abbreviates are allowed. (that would mean one letter is enough.) The default setting is ascii. This option only affects the aligned and wrapped output formats"
So for me since I have multiple lines in a cell it "wraps" the text at the end making it a wrapped output format.
"ascii style uses plain ASCII characters. Newlines in data are shown using a + symbol in the right-hand margin." When the wrapped format wraps data from one line to the next without a new line character, a dot"." is shown in in the right-hand margin of the first line, and again in the left-hand margin of the following line."
There's also "old-ascii" style that uses a ":" for wrapping and a "unicode" style which uses an ellipsis symbol in the right and margin of first line.
So the problem I have is that the output format is using ASCII by default when there's no available option for XML output style for new lines. Bummer.
Turning linestyle off fixes removing the + symbol but I want the outputted linestyle to be XML so if it strips any new line indication there's no way for me to format it. "-A --no-align Switches to unaligned output mode. (The default output mode is aligned.) This is equivalent to \pset format unaligned."
So I used awk to substitute the + symbol for 
 and also tried
but those are not creating new lines for me? It validates as proper XML.

VBA replace certain carriage

All.
I am used to programming VBA in Excel, but am new to the structures in Word.
I am working through a library of text files to update them. Many of them are either OCR documents, or were manually entered.
Each has a recurring pattern, the most common of which is unnecessary carriage returns.
For example, I am looking at several text files where there is a double return after each line. A search and replace of all double carriage returns removes all paragraph distinctions.
However, each line is approximately 30 characters long, and if I manually perform the following logic, it gives me a functional document.
If there is a double carriage return after 30+ characters, I replace them with a space.
If there were less than 30 characters prior to the double return, I replace them with a single return.
Can anyone help me with some rudimentary code that would help me get started on that? I could then modify it for each "pattern" of text documents I have.
e.g.
In this case, there are more than
thirty characters per line. And I
will keep going to illustrate this
example.
This would be a new paragraph, and
would be separated by another of
the single returns.
I want code that would return:
In this case, there are more than thirty character returns. And I will keep going to illustrate this example.
This would be a new paragraph, and would be separated by another of the single returns.
Let me know if anyone can throw something out that I can play with!
You can do this without code (which RegEx requires), simply using Word's own wildcard Find/Replace tools, where:
Find = ([!^13]{30,})[^13]{1,}
Replace = \1^32
and, to clean up the residual multi-paragraph breaks:
Find = [^13]{2,}
Replace = ^p
You could, of course, record the above as a macro...
Here is a RegEx that might work for you:
(\n\n)(?<!\.(\n\n))
The substitution is just a plain space, you can try it out (and modify / tweak it) here: https://regex101.com/r/zG9GPw/4
This 'pattern' tells the RegEx engine to look for the newline character \n which occurs x2 like this \n\n (worth noting this is from your question and might be different in your files, e.g. could be \r\n) and it assumes that a valid line break will be proceeded by a full stop: \..
In RegEx the full stop symbol is a single character wild card so it needs to be escaped with the '\' (n and r are normal characters, escaping them tells the RegEx engine they represent newline and return characters).
So... the expression is looking for a group of x2 newline characters but then uses a negative look-behind to exclude any matches where the previous character was a full stop.
Anyway, it's all explained on the site:
Here is how you could do a RegEx find and replace using NotePad++ (I'm not sure if it comes with RegEx or if a plugin is needed, either way it is easy). But you can set a location, filters (to target specific file types), and other options (such as search in sub-directories).
Other than that, as #MacroPod pointed out you could also do this with MS Word, document by document, not using any code :)

Parser not recognizing a dash

My program makes calculations on physics vectors and it allows copy/pasting from websites and then tries to parse them into the x, y, and z components automatically. I've come across one website (http://mathinsight.org/cross_product_examples) that has (3,−3,1). While that looks normal, that minus is actually not recognized by VB. Visually, it is longer than the normal minus (− and -), but return the same Unicode of 45. This picture shows the Unicode for every character (I added a minus in front of the first 3 for comparison) in the Textbox. Also, from this website, I had to use Ctrl+c because right clicking shows that this is not simple HTML.
One is valid (the first), but the second gives VB fits as shown below. Either it won't compile (shown by the blue line below) or a simple assignment (the second one) wrecks havok on my form.
I have tried using
vectorString.Replace("–", "-")
and pasting in the longer dash for the target string and a normal keystroke dash as the replacement, but nothing happens. I'm guessing that since they both have the same Unicode.
Is there some way to convert the longer, invalid dash into the one recognized by VB? I tried using dash symbol that Word likes to replace the minus sign with and it comes up as Unicode 150. So, apparently there are at least three different kinds of dashes. Any thoughts?
The character from Math Insight is U+2212, minus sign. The character you tried using in your Replace call is U+2013, en dash. That's why your replace didn't work.
Beyond the standard ASCII hyphen (-, U+0045), there are two common dashes: the en dash (–, U+2013) and the em dash (—, U+2014). There is also a figure dash (‒, U+2012), but it is not as common.

busybox httpd cgi doesn't print "return"

Please help, I can't find the solution
Situation. I have busybox httpd server. In cgi-bin folder is an cgi-executable, which sends to client formatted text by printf command.
Problem is that the text format should look like a column, but client receives only a string. Despite the fact that in "printf" I use "\n" and "(char) 13".
Another words executable doesn't return "return" symbol
I wrote following
for (i=0; i<4;i++)
printf ("%9.8g%c\n", lTemp[i]*dTemp[i], (char) 13 );
The text that is sent from your CGI program to the web client is treated as HTML text, not plain text.
When HTML is processed for display in the browser, newline and carriage return (what you simply call "return") characters are ignored.
To cause the displayed text to perform a line break, the HTML break tag, "< br />" should be inserted into the output string:
printf("%9.8g <br />\r\n", lTemp[i] * dTemp[i]);
The use of newlines and whitespace in the text that your CGI programs generates will have little bearing on the actual HTML page that gets displayed. Use newlines and whitespace to format the HTML so that the source is readable, and use HTML tags to control the displayed text in the client's browser.
BTW
Using a numeric constant and a character conversion in a printf is not the preferred method of outputting a carriage-return character.
Use the defined escape sequence \r in the format.

Searching for backslash character in vim

How to search word start \word in vim. I can do it using the find menu. Is there any other short cut for this?
Try:
/\\word
in command mode.
You can search for most anything in your document using regular expressions. From normal mode, type '/' and then start typing your regular expression, and then press enter. '\<' would match the beginning of a word, so
/\<foo
would match the string 'foo' but only where it is at the beginning of a word (preceded by whitespace in most cases).
You can search for the backslash character by escaping it with a backslash, so:
/\<\\foo
Would find the pattern '\foo' at the beginning of a word.
Not directly relevant (/\\word is the the correct solution, and nothing here changes that), but for your information:
:h magic
If you are for a pattern with many characters with special meaning to regexes, you may find "nomagic" and "very nomagic" mode useful.
/\V^.$
will search for the literal string ^.$, instead of "lines of exactly one character" (\v "very magic" and the default \m "magic" modes) or "lines of exactly one period" (\M "nomagic" mode).
The reason searching for something including "\" is different is because "\" is a special character and needs to be escaped (prepended with a backslash)
Similarly, to search for "$100", which includes the special character "$":
Press /
Type \$100
Press return
To search for "abc", which doesn't include a special character:
Press /
Type abc
Press return