Removing blank lines - awk

I have a csv file in which every other line is blank. I have tried everything, nothing removes the lines. What should make it easier is that the the digits 44 appear in each valid line. Things I have tried:
grep -ir 44 file.csv
sed '/^$/d' <file.csv
cat -A file.csv
sed 's/^ *//; s/ *$//; /^$/d' <file.csv
egrep -v "^$" file.csv
awk 'NF' file.csv
grep '\S' file.csv
sed 's/^ *//; s/ *$//; /^$/d; /^\s*$/d' <file.csv
cat file.csv | tr -s \n
Decided I was imagining the blank lines, but import into Google Sheets and there they are still! Starting to question my sanity! Can anyone help?

sed -n -i '/44/p' file
-n means skip printing
-i inplace (overwrite same file)
- /44/p print lines where '44' exists
without '44' present
sed -i '/^\s*$/d' file
\s is matching whitespace, ^startofline, $endofline, d delete line

Use the -i option to replace the original file with the edited one.
sed -i '/^[ \t]*$/d' file.csv
Alternatively output to another file and rename it, which is doing the exactly what -i does.
sed '/^[[:blank:]]*$/d' file.csv > file.csv.out && mv file.csv.out file.csv

Given:
$ cat bl.txt
Line 1 (next line has a tab)
Line 2 (next has several space)
Line 3
You can remove blank lines with Perl:
$ perl -lne 'print unless /^\s*$/' bl.txt
Line 1 (next line has a tab)
Line 2 (next has several space)
Line 3
awk:
$ awk 'NF>0' bl.txt
Line 1 (next line has a tab)
Line 2 (next has several space)
Line 3
sed + tr:
$ cat bl.txt | tr '\t' ' ' | sed '/^ *$/d'
Line 1 (next line has a tab)
Line 2 (next has several space)
Line 3
Just sed:
$ sed '/^[[:space:]]*$/d' bl.txt
Line 1 (next line has a tab)
Line 2 (next has several space)
Line 3

Aside from the fact that your commands do not show that you capture their output in a new file to be used in place of the original, there's nothing wrong with them, EXCEPT that:
cat file.csv | tr -s \n
should be:
cat file.csv | tr -s '\n' # more efficient alternative: tr -s '\n' < file.csv
Otherwise, the shell eats the \ and all that tr sees is n.
Note, however, that the above only eliminates only truly empty lines, whereas some of your other commands also eliminate blank lines (empty or all-whitespace).
Also, the -i (for case-insensitive matching) in grep -ir 44 file.csv is pointless, and while using -r (for recursive searches) will not change the fact that only file.csv is searched, it will prepend the filename followed by : to each matching line.
If you have indeed captured the output in a new file and that file truly still has blank lines, the cat -A (cat -et on BSD-like platforms) you already mention in your question should show you if any unusual characters are present in the file, in the form of ^<char> sequences, such as ^M for \r chars.

If you like awk, this should do:
awk '/44/' file
It will only print lines that contains 44

Related

Convert multiple lines to a line separated by brackets and "|"

I have the following data in multiple lines:
1
2
3
4
5
6
7
8
9
10
I want to convert them to lines separated by "|" and "()":
(1)|(2)|(3)|(4)|(5)|(6)|(7)|(8)|(9)|10
I made a mistake. I'm sorry,I want to convert them to lines separated by "|" and "()":
(1)|(2)|(3)|(4)|(5)|(6)|(7)|(8)|(9)|(10)
What I have tried is:
seq 10 | sed -r 's/(.*)/(\1)/'|paste -sd"|"
What's the best unix one-liner to do that?
This might work for you (GNU sed):
sed 's/.*/(&)/;H;1h;$!d;x;s/\n/|/g' file
Surround each line by parens.
Append all lines to the hold space except for the first line which replaces the hold space.
Delete all lines except the last.
On the last line, swap to the hold space and replace all newlines by |'s.
N.B. When a line is deleted no further commands are invoked and the command cycle begins again. That is why the last two commands are only executed on the last line of the file.
Alternative:
sed -z 's/\n$//;s/.*/(&)/mg;y/\n/|/' file
With your shown samples please try following awk code. This should work in any version of awk.
awk -v OFS="|" '{val=(val?val OFS:"") "("$0")"} END{print val}' Input_file
Using GNU sed
$ sed -Ez ':a;s/([0-9]+)\n/(\1)|/;ta;s/\|$/\n/' input_file
(1)|(2)|(3)|(4)|(5)|(6)|(7)|(8)|(9)|(10)
Here is another simple awk command:
awk 'NR>1 {printf "%s|", p} {p="(" $0 ")"} END {print p}' file
(1)|(2)|(3)|(4)|(5)|(6)|(7)|(8)|(9)|(10)
Here it is:
sed -z 's/^/(/;s/\n/)|(/g;s/|($//' your_input
where -z allows you to treat the whole file as a single string with embedded \ns.
In detail, the sed script above consists of 3 commands separated by ;s:
s/^/(/ inserts a ( at the beginning of the whole file,
s/\n/)|(/g changes every \n to )|(;
s/|($// removes the trailing |( resulting from the \n at EOF, that is likely in your file since you are on linux.
With perl:
$ seq 10 | perl -pe 's/.*/($&)/; s/\n/|/ if !eof'
(1)|(2)|(3)|(4)|(5)|(6)|(7)|(8)|(9)|(10)
s/.*/($&)/ to surround input lines with ()
s/\n/|/ if !eof will change newline to | except for the last input line.
Here's a solution with paste (just for fun):
$ seq 10 | paste -d'()' /dev/null - /dev/null | paste -sd'|'
(1)|(2)|(3)|(4)|(5)|(6)|(7)|(8)|(9)|(10)
Using any awk:
$ seq 10 | awk '{printf "%s(%s)", sep, $0; sep="|"} END{print ""}'
(1)|(2)|(3)|(4)|(5)|(6)|(7)|(8)|(9)|(10)

How to add N blank lines between all rows of a text file?

I have a file that looks
a
b
c
d
Suppose I want to add N lines (in the example 3, but I actually need 20 or 100 depending on the file)
a
b
c
d
I can add one blank line between all of them with sed
sed -i '0~1 a\\' file
But sed -i '0~3 a\\' file inserts one line every 3 rows.
You may use with GNU sed:
sed -i 'G;G;G' file
The G;G;G will append three empty lines below each non-final line.
Or, awk:
awk 'BEGIN{ORS="\n\n\n"};1'
See an online sed and awk demo.
If you need to set the number of newlines dynamically use
nl="
"
awk -v nl="$nl" 'BEGIN{for(c=0;c<3;c++) v=v""nl;ORS=v};1' file > newfile
With GNU awk:
awk -i inplace -v lines=3 '{print; for(i=0;i<lines;i++) print ""}' file
Update with Ed's hints (see comments):
awk -i inplace -v lines=3 '{print; for(i=1;i<=lines;i++) print ""}' file
Update (without trailing empty lines):
awk -i inplace -v lines=3 'NR==1; NR>1{for(i=1;i<=lines;i++) print ""; print}' file
Output to file:
a
b
c
d
With sed and corutils:
N=4
sed "\$b;$(yes G\; | head -n$N)" infile
Similar trick with awk:
N=4
awk 1 RS="$(yes \\n | head -n$N | tr -d '\n')" infile
This might work for you (GNU sed):
sed ':a;G;s/\n/&/2;Ta' file
This will add 2 blank lines following each line.
Change 2 to what ever number you desire between each line.
An alternative (more efficient?):
sed '1{x;:a;/^.\{2\}/!s/^/\n/;ta;s/.//;x};G' file

How can I print only lines that are immediately preceeded by an empty line in a file using sed?

I have a text file with the following structure:
bla1
bla2
bla3
bla4
bla5
So you can see that some lines of text are preceeded by an empty line.
I understand that sed has the concept of two buffers, a pattern space buffer and a hold space buffer, so I'm guessing these need to come in to play here, but I'm unclear how to specify them to accomplish what I need.
In my contrived example above, I'd expect to see the following lines outputted:
bla3
bla5
sed is for doing s/old/new on individual lines, that is all. Any time you start talking about buffers or doing anything related to multi-lines comparisons you're using the wrong tool.
You could do this with awk:
$ awk -v RS= -F'\n' 'NR>1{print $1}' file
bla3
bla5
but it would fail to print the first non-empty line if the first line(s) in the file were empty so this may be what you want if you want lines of all space chars considered to be empty lines:
$ awk 'NF && !p{print} {p=NF}' file
bla3
bla5
and this otherwise:
$ awk '($0!="") && (p==""){print} {p=$0}' file
bla3
bla5
All of the above will work even if there are multiple empty lines preceding any given non-empty line.
To see the difference between the 3 approaches (which you won't see given the sample input in the question):
PS1> printf '\nfoo\n \nbar\n\netc\n' | cat -E
$
foo$
$
bar$
$
etc$
PS1> printf '\nfoo\n \nbar\n\netc\n' | awk -v RS= -F'\n' 'NR>1{print $1}'
etc
PS1> printf '\nfoo\n \nbar\n\netc\n' | awk 'NF && !p{print} {p=NF}'
foo
bar
etc
PS1> printf '\nfoo\n \nbar\n\netc\n' | awk '($0!="") && (p==""){print} {p=$0}'
foo
etc
You can use the hold buffer easily to print the line before the blank like this:
sed -n -e '/^$/{x; p;}' -e h input
But I don't see an easy way to use it for your use case. For your case, instead of using the hold buffer, you could do:
sed -n -e '/^$/ba' -e d -e :a -e n -e p input
But I would do this with awk.
awk 'NR!=1{print $1}' RS= FS=\\n input-file
awk 'p;{p=/^$/}' file
above command does these for each line:
if p is 1, print line;
if line is empty, set p to 1.
if lines consisting of one or more spaces are also considered empty:
awk 'p;{p=!NF}' file
to print non-empty lines each coming right after an empty line, you can use this:
awk 'p*!(p=/^$/)' file
if p is 1 and this line is not empty (1*!(0) = 1*1 = 1), print this line;
otherwise (1*!(1) = 1*0 = 0, 0*anything = 0), don't print anything.
note that this one may not work with all awks, a portable version of this would look like:
awk 'p*(/./);{p=/^$/}' file
if lines consisting of one or more spaces are also considered empty:
awk 'p*NF;{p=!NF}' file
see them online here, and here.
If sed/awk is not mandatory, you can do it with grep:
grep -A 1 '^$' input.txt | grep -v -E '^$|--'
You can use sed to match a range of lines and do sub-matches inside the matches, like so:
# - use the "-n" option to omit printing of lines
# - match lines between a blank line (/^$/) and a non-blank one (/^./),
# then print only the line that contains at least a character,
# i.e, the non-blank line.
sed -ne '
/^$/,/^./ {
/^./{ p; }
}' input.txt
tested by gnu sed, your data in 'a':
$ sed -nE '/^$/{N;s/\n(.+)/\1/p}' a
bla3
bla5
add -i option precedes -n to real editing

Replace a string in each occurence of string in a file, add additional line at first line in that file

I did search and found how to replace each occurrence of a string in files. Besides that I want to add one line to a file only at the first occurrence of the string.
I know this
grep -rl 'windows' ./ | xargs sed -i 's/windows/linux/g'
will replace each occurrence of string. So how do I add a line to that file at first match of the string? Can any have an idea how to do that? Appreciate your time.
Edited :
Exaple : replace xxx with TTT in file, add a line at starting of file for first match.
Input : file1, file2.
file1
abc xxx pp
xxxy rrr
aaaaaaaaaaaddddd
file2
aaaaaaaaaaaddddd
Output
file1
#ADD LINE HERE FOR FIRST MATCH DONT ADD FOR REST OF MATCHES
abc TTT pp
TTTy rrr
aaaaaaaaaaaddddd
file2
aaaaaaaaaaaddddd
Cribbing from the answers to this question.
Something like this would seem to work:
sed -e '0,/windows/{s/windows/linux/; p; T e; a \new line
;:e;d}; s/windows/linux/g'
From start of the file to the first match of /windows/ do:
replace windows with linux
print the line
if s/windows/linux/ did not replace anything jump to label e
add the line new line
create label e
delete the current pattern space, read the next line and start processing again
Alternatively:
awk '{s=$0; gsub(/windows/, "linux")} 7; (s ~ /windows/) && !w {w=1; print "new line"}' file
save the line in s
replace windows with linux
print the line (7 is true and any true pattern runs the default action of {print})
if the original line contained windows and w is false (variables are empty strings by default and empty strings are false-y in awk)
set w to 1 (truth-y value)
add the new line
If I understand you correctly, all you need is:
find . -type f -print |
while IFS= read -r file; do
awk 'gsub(/windows/,"unix"){if (!f) $0 = $0 ORS "an added line"; f=1} 1' "$file" > tmp &&
mv tmp "$file"
done
Note that the above, like sed and grep would, is working with REs, not strings. To use strings would require the use of index() and substr() in awk, is not possible with sed, and with grep requires an extra flag.
To add a leading line to the file if a change is made using gNU awk for multi-char RS (and we may as well do sed-like inplace editing since we're using gawk):
find . -type f -print |
while IFS= read -r file; do
gawk -i inplace -v RS='^$' -v ORS= 'gsub(/windows/,"unix"){print "an added line"} 1' "$file"
done

Trying to use variable in sed or awk

I have 2 separate text files, each in the same exact format. I can grep FILE1.txt for a specific search term and output the line numbers of every match. The line numbers are outputted in numeric order to a file or a variable.
I want use each line number and print that line from FILE2.txt in numeric order to a single OUTPUT.txt. Does anyone know a way, using awk or sed to do this?
I have a string variable $linenumbers with values of 25 26 27 28.
I use the following command:
for i in $linenumbers; do sed -n "/$I/p" $i test_read2.fastq >> test.fastq; done.
I get errors of
sed: can't read 25: No such file or directory
sed: can't read 26: No such file or directory
sed: can't read 27: No such file or directory
sed: can't read 28: No such file or directory
If I do this sed command one by one, I can pull line number 25, 26, 27 and 28 from the file and print it to file using the following command;
sed -n "25p" test_read2.fastq >> test.fastq
I want to replace "25p" with variable so it pulls out multiple lines (25,26,27,28) from the file without doing this one by one...
Try this:
grep -n interesting FILE1.txt | cut -d: -f1 | while read l
do
sed -n "$l p" FILE2.txt
done
Example:
$ cat FILE1.txt
foo
bar
baz
$ cat FILE2.txt
qux
quux
quuux
$ grep -n bar FILE1.txt | cut -d: -f1 | while read l; do sed -n "$l p" FILE2.txt; done
quux
Not sure what exactly you want to do. If you want to print the lines of file which are defined in lines you could do awk 'NR==FNR{a[$0];next}FNR in a' lines file
test:
$ cat lines
1
3
7
$ cat file
a
b
c
d
e
f
g
$ awk 'NR==FNR{a[$0];next}FNR in a' lines file
a
c
g
sed -n "` grep -n 'Yourpattern' File1.txt | sed 's/:.*/p/'`" File2.txt
be carefful with substitution and (double) quote in YourPattern