Deleting blank lines ONLY if they are followed by a non-blank line... multiple blank lines in a row should be deleted - lines

I have a txt file that is basically in address form, like so:
John Smith
123 Address Way
Blah Blah Blah
Each block of text is followed by 3 blank lines (which I want). However, some of the addresses in the file are missing data, thus they are blank like so:
John Smith
123 Address Way
Blah Blah Blah
I want to keep the multiple (3) blank lines after each data, but I want to delete only the single blank lines.
Anybody have any ideas? All the stuff on google I've found relates to deleting multiple blank lines, or all blank lines... the opposite of what I need.

When you have one of these problems, and the file is not gigantic, one of the best tools for the job is perl in undef $/ mode, which makes it read the entire file as one big string; this allows you to match \n just like any other character.
At the character level, assuming there is no trailing horizontal whitespace on any line, a blank line is two newline characters in a row; two blank lines is three newline characters, and so on. To delete a blank line, you delete one of the two newline characters. Now, if you just write s/\n\n/\n/g, that will do more than you want, because \n\n will match pairs of newlines within longer runs of newlines. So you need a construct that will match two newlines in a row but only if they are not preceded or followed by more newlines. This is what look-around assertions are for.
perl -pe 'BEGIN { undef $/ } s/\s+$//mg; s/(?<!\n)\n\n(?!\n)/\n/sg'
should do the job. It will have the side effect of deleting trailing whitespace, if any, from every line of the file. If you want to delete double blank lines as well as single blank lines (but still not triple blank lines), you just have to adjust the middle of the second RE:
perl -pe 'BEGIN { undef $/ } s/\s+$//mg; s/(?<!\n)\n{2,3}(?!\n)/\n/sg'

Related

Getting specific rows in a Powershell variable/array

I hope I'm able to ask my question as simple as possible. I am very new to working with PowerShell.
Now to my question:
I use Invoke-Sqlcmd to run a query, which puts Data in a variable, let's say $Data.
In this case I query for triggers in an SQL Database.
Then I kind of split the array to get more specific information:
$Data2 = $Data | Where {$_.table -like 'dbo.sportswear'}
$Data3 = $Data2 | Where {$_.event -match "Delete"}
So in the end I have a variable with these Indexes(?), I'm not sure if they are called indexes.
table
trigger_name
activation
event
type
status
definition
Now all I want is to check something in the definition.
So I create a $Data4 = $Data3.definition, so far so good.
But now I have a big text and I want only the content of 2-3 specific rows.
When I used like $Data4[1] or $Data4[1..100], I realized that PowerShell sees every char as a line/row.
But when I just write $Data4 it shows me the content nice formatted with paragraphs, new lines and so on.
Has anyone an idea how I can get specific rows or lines of my variable?
Thank you all :)
It appears $Data4 is a formatted string. Since it is a single string, any indexed element lookups return single characters (of type System.Char). If you want indexes to return longer substrings, you will need to split your string into multiple strings somehow or come up with a more sophisticated search mechanism.
If we assume the rows you are after are actual lines separated by line feed and/or carriage return, you can just split on those newline characters and use indexes to access your lines:
# Array indexing starts at 0 for line 1. So [1] is line 2.
# Outputs lines 2,3,4
($Data4 -split '\r?\n')[1..3]
# Outputs lines 2,7,20
($Data4 -split '\r?\n')[1,6,19]
-split uses regex to match characters and perform a string split on all matches. It results in an array of substrings. \r matches a carriage return. \n matches a line feed. ? matches 0 or one character, which is needed in case there are no carriage returns preceding your line feeds.

CMake multiline message with FATAL_ERROR

CMake documentation (for example current version 3.11.2) states
CMake Warning and Error message text displays using a simple markup language. Non-indented text is formatted in line-wrapped paragraphs delimited by newlines. Indented text is considered pre-formatted.
However, it doesn't mention any markup format. Unless the "non-indented" vs. "indented" is all there is about the "simple markup".
Anyway, I failed to make it work with FATAL_ERROR mode.
Furthermore, I noticed that with STATUS mode message is printed with leading -- (two dashes and space). While with FATAL_ERROR every line break in the message is turned into two lines, which (IMHO) looks awful.
Now I have a multiline message which lists what is wrong in CMAKE_BUILD_TYPE and what values are accepted. Because of above-mentioned issues, I ended up printing the message as STATUS and indenting subsequent lines with three spaces (so they align well with the --). Then I do a simple FATAL_ERROR repeating only the "title line" (stating that CMAKE_BUILD_TYPE is wrong). This looks acceptable on both console output and cmake-gui. (Although the 3 spaces indentation is needless on cmake-gui...)
However, I'm surprised how poorly is this topic described. And it seems to be so since long - see for example question [CMake] Extra blank lines with SEND_ERROR and FATAL_ERROR ?! remaining unanswered for almost 9 years now...
Are there any good practices, advice or tips for handling such messages? Or should they be avoided in the first place?
You're right. The "simple markup" is either non-indented (unformatted) or indented (formatted). Also, the non-indented text is in paragraphs delimited by newlines. That's why you end up with blank lines in between paragraphs.
Here's a running explanation of the various kinds of messages. Warning types and error types behave the same as far as formatted vs. unformatted text goes. The difference, of course, is what happens to the processing and generation phases of CMake. For readability, you can split strings into multiple double-quoted pieces that will be concatenated.
# STATUS
message(STATUS
"This is a status message. It is prefixed with \"-- \". It goes to stdout. The "
"lines will wrap according to the width of your terminal.\n"
"New lines will begin a new line at column 1, but without the \"-- \" prefix, "
"unless you provide it; they will not create a blank line (i.e., new "
"paragraph). Spacing between sentences is unaltered by CMake.\n"
"-- Here's a new paragraph with an explicit \"-- \" prefix added.")
# no mode given (informational)
message(
"This is an informational message. It goes to stderr. Each line begins at column "
"1. The lines will wrap according to the width of your terminal.\n"
"New lines will begin a new line at column 1; they will not create a blank line "
"(i.e., new paragraph). Spacing between sentences is unaltered by CMake (3 spaces "
"preceded this sentence.).")
# WARNING--unformatted
message(WARNING
"This is an unformatted warning message. It goes to stderr. Each line begins "
"at column 3. The lines will wrap at a particular column (it appears to be "
"column 77, set within CMake) and wrap back to column 3.\n"
"New lines will begin a new paragraph, so they will create a blank line. A final "
"thing about unformatted messages: They will separate sentences with 2 spaces, "
"even if your string had something different.")
# WARNING--formatted and unformatted
message(WARNING
" This is a formatted warning message. It goes to stderr. Formatted lines will"
" be indented an additional 2 spaces beyond what was provided in the output"
" string. The lines will wrap according to the width of your terminal.\n"
" Indented new lines will begin a new line. They will not create a blank line."
" If you separate sentences with 1 space, that's what you'll get. If you"
" separate them with 2 spaces, that's also what you'll get.\n"
" If you want to control the width of the formatted paragraphs\n"
" (a good practice), just keep track of the width of each line and place\n"
" a \"\\n\" at the end of each line.\n \n"
" And, if you want a blank line between paragraphs, just place \"\\n \\n\"\n"
" (i.e., 2 newlines separated by a space) at the end of the first paragraph.\n"
"Non-indented new lines, however, will be treated like unformatted warning "
"messages, described above. They will begin at and wrap to column 3. They begin "
"a new paragraph, so they will create a blank line. There will be 2 spaces "
"between sentences, regardless of how many you placed after the period (In the "
"script, there were 4 spaces before this sentence).\n"
"And, as you'd expect, a second unindented paragraph will be preceded by a "
"blank line. But why would you mix formatted and unformatted text?")
I saved this into Message.cmake and invoked it with cmake -P Message.cmake 2> output.txt. It results in the following stdout:
-- This is a status message. It is prefixed with "-- ". It goes to stdout. The lines will wrap according to the width of your terminal.
New lines will begin a new line at column 1, but without the "-- " prefix, unless you provide it; they will not create a blank line (i.e., new paragraph). Spacing between sentences is unaltered by CMake.
-- Here's a new paragraph with an explicit "-- " prefix added.
The file, output.txt, contains:
This is an informational message. It goes to stderr. Each line begins at column 1. The lines will wrap according to the width of your terminal.
New lines will begin a new line at column 1; they will not create a blank line (i.e., new paragraph). Spacing between sentences is unaltered by CMake (3 spaces preceded this sentence.).
CMake Warning at MessageScript.cmake:19 (message):
This is an unformatted warning message. It goes to stderr. Each line
begins at column 3. The lines will wrap at a particular column (it appears
to be column 77, set within CMake) and wrap back to column 3.
New lines will begin a new paragraph, so they will create a blank line. A
final thing about unformatted messages: They will separate sentences with 2
spaces, even if your string had something different.
CMake Warning at MessageScript.cmake:28 (message):
This is a formatted warning message. It goes to stderr. Formatted lines will be indented an additional 2 spaces beyond what was provided in the output string. The lines will wrap according to the width of your terminal.
Indented new lines will begin a new line. They will not create a blank line. If you separate sentences with 1 space, that's what you'll get. If you separate them with 2 spaces, that's also what you'll get.
If you want to control the width of the formatted paragraphs
(a good practice), just keep track of the width of each line and place
a "\n" at the end of each line.
And, if you want a blank line between paragraphs, just place "\n \n"
(i.e., 2 newlines separated by a space) at the end of the first paragraph.
Non-indented new lines, however, will be treated like unformatted warning
messages, described above. They will begin at and wrap to column 3. They
begin a new paragraph, so they will create a blank line. There will be 2
spaces between sentences, regardless of how many you placed after the
period (In the script, there were 4 spaces before this sentence).
And, as you'd expect, a second unindented paragraph will be preceded by a
blank line. But why would you mix formatted and unformatted text?
SUMMARY
INFORMATIONAL MESSAGES (no mode given)
start at column 1
wrap in terminal window until newline
go to stderr
new paragraphs begin without preceding blank line
sentence and word spacing preserved
STATUS MESSAGES
start at column 1, with "-- " prefix on first paragraph
wrap in terminal window until newline
go to stdout
new paragraphs begin without preceding blank line
sentence and word spacing preserved
UNFORMATTED WARNING AND ERROR MESSAGES (unindented strings)
start at column 3
wrap at column 77
go to stderr
new paragraphs are preceded by a blank line
sentences separated by 2 spaces; words by 1 space
FORMATTED WARNING AND ERROR MESSAGES (indented strings)
start at column 3, plus whatever indentation the string had
wrap in terminal window until newline
go to stderr
new paragraphs begin without preceding blank line
sentence and word spacing preserved

Need solution for break line issue in string

I have below string which has enter character coming randomely and fields are separated by ~$~ and end with ##&.
Please help me to merge broken line into one.
In below string enter character is occured in address field (4/79A)
-------Sting----------
23510053~$~ABC~$~4313708~$~19072017~$~XYZ~$~CHINNUSAMY~$~~$~R~$~~$~~$~~$~42~$~~$~~$~~$~~$~28022017~$~
4/79A PQR Marg, Mumbai 4000001~$~TN~$~637301~$~Owns~$~RAT~$~31102015~$~12345~$~##&
Thanks in advance.
Rupesh
Seems to be a (more or less) duplicate of https://stackoverflow.com/a/802439/3595749
Note, you should ask to your client to remove the CRLF signs (rather than aplying the code below).
Nevertheless, try this:
cat inputfile | tr -d '\n' | sed 's/##&/##\&\n/g' >outputfile
Explanation:
tr is to remove the carriage return,
sed is to add it again (only when ##& is encountred). s/##&/##\&\n/g is to substitute "##&" by "##&\n" (I add a carriage return and "&" must be escaped). This applies globally (the "g" letter at the end).
Note, depending of the source (Unix or Windows), "\n" must be replaced by "\r\n" in some cases.

Write multiple lines to text file with '\n'

I have a program that iterates over all lines of a text file, adds spaces between the characters, and writes the output to the same file. However, if there are multiple lines in the input, I want the output to have separate lines as well. I tried:
let text = format!(r"{}\n", line); // Add newline character to each line (while iterating)
file.write_all(text.as_bytes()); // Write each line + newline
Here is an example input text file:
foo
bar
baz
And its output:
f o o\n b a r\n b a z
It seems that Rust treats "\n" as an escaped n character, but using r"\n" treats it as a string. How can I have Rust treat \n as a newline character to write multiple lines to a text file?
Note: I can include the rest of my code if you need it, let me know.
Edit: I am on Windows 7 64 bit
The problem is the 'r' in front of your string. Remove it and your program will print newlines instead of '\n'.
Also note that only most Unices use '\n' as newline. Windows uses "\r\n".

Fortran read statement reading beyond an end of line

do you know if the following statement is guaranteed to be true by one of the fortran 90/95/2003 standards?
"Suppose a read statement for a character variable is given a blank line (i.e., containing only white spaces and new line characters). If the format specifier is an asterisk (*), it continues to read the subsequent lines until a non-blank line is found. If the format specifier is '(A)', a blank string is substituted to the character variable."
For example, please look at the following minimal program and input file.
program code:
PROGRAM chk_read
INTEGER, PARAMETER :: MAXLEN=30
CHARACTER(len=MAXLEN) :: str1, str2
str1='minomonta'
read(*,*) str1
write(*,'(3A)') 'str1_start|', str1, '|str1_end'
str2='minomonta'
read(*,'(A)') str2
write(*,'(3A)') 'str2_start|', str2, '|str2_end'
END PROGRAM chk_read
input file:
----'input.dat' content is below this line----
yamanakako
kawaguchiko
----'input.dat' content is above this line----
Please note that there are four lines in 'input.dat' and the first and third lines are blank (contain only white spaces and new line characters). If I run the program as
$ ../chk_read < input.dat > output.dat
I get the following output
----'output.dat' content is below this line----
str1_start|yamanakako |str1_end
str2_start| |str2_end
----'output.dat' content is above this line----
The first read statement for the variable 'str1' seems to look at the first line of 'input.dat', find a blank line, move on to the second line, find the character value 'yamanakako', and store it in 'str1'.
In contrast, the second read statement for the variable 'str2' seems to be given the third line, which is blank, and store the blank line in 'str2', without moving on to the fourth line.
I tried compiling the program by Intel Fortran (ifort 12.0.4) and GNU Fortran (gfortran 4.5.0) and got the same result.
A little bit about a background of asking this question: I am writing a subroutine to read a data file that uses a blank line as a separator of data blocks. I want to make sure that the blank line, and only the blank line, is thrown away while reading the data. I also need to make it standard conforming and portable.
Thanks for your help.
From Fortran 2008 standard draft:
List-directed input/output allows data editing according to the type
of the list item instead of by a format specification. It also allows
data to be free-field, that is, separated by commas (or semicolons) or
blanks.
Then:
The characters in one or more list-directed records constitute a
sequence of values and value separators. The end of a record has the
same effect as a blank character, unless it is within a character
constant. Any sequence of two or more consecutive blanks is treated as
a single blank, unless it is within a character constant.
This implicitly states that in list-directed input, blank lines are treated as blanks until the next non-blank value.
When using a fmt='(A)' format descriptor when reading, blank lines are read into str. On the other side, fmt=*, which implies list-directed I/O in free-form, skips blank lines until it finds a non-blank character string. To test this, do something like:
PROGRAM chk_read
INTEGER :: cnt
INTEGER, PARAMETER :: MAXLEN=30
CHARACTER(len=MAXLEN) :: str
cnt=1
do
read(*,fmt='(A)',end=100)str
write(*,'(I1,3A)')cnt,' str_start|', str, '|str_end'
cnt=cnt+1
enddo
100 continue
END PROGRAM chk_read
$ cat input.dat
yamanakako
kawaguchiko
EOF
Running the program gives this output:
$ a.out < input.dat
1 str_start| |str_end
2 str_start| |str_end
3 str_start| |str_end
4 str_start|yamanakako |str_end
5 str_start| |str_end
6 str_start|kawaguchiko |str_end
On the other hand, if you use default input:
read(*,fmt=*,end=100)str
You end up with this output:
$ a.out < input.dat
1 str1_start|yamanakako |str1_end
2 str2_start|kawaguchiko |str2_end
This Part of the F2008 standard draft probably treats your problem:
10.10.3 List-directed input
7 When the next effective item is of type character, the input form
consists of a possibly delimited sequence of zero or more
rep-char s whose kind type parameter is implied by the kind of the
effective item. Character sequences may be continued from the end of
one record to the beginning of the next record, but the end of record
shall not occur between a doubled apostrophe in an
apostrophe-delimited character sequence, nor between a doubled quote
in a quote-delimited character sequence. The end of the record does
not cause a blank or any other character to become part of the
character sequence. The character sequence may be continued on as many
records as needed. The characters blank, comma, semicolon, and slash
may appear in default, ASCII, or ISO 10646 character sequences.