How to delete top and last non empty lines of the file - awk

I want to delete top and last non empty line of the file.
Example:
cat test.txt
//blank_line
abc
def
xyz
//blank_line
qwe
mnp
//blank_line
Then output should be:
def
xyz
//blank_line
qwe
I have tried with commands
sed "$(awk '/./{line=NR} END{print line}' test.txt)d" test.txt
to remove last non empty line. At here there are two command, (1) sed and (2) awk. But I want to do by single command.

Reading the whole file in memory at once with GNU sed for -E and -z:
$ sed -Ez 's/^\s*\S+\n//; s/\n\s*\S+\s*$/\n/' test.txt
def
xyz
qwe
or with GNU awk for multi-char RS:
$ awk -v RS='^$' '{gsub(/^\s*\S+\n|\n\S+\s*$/,"")} 1' test.txt
def
xyz
qwe
Both GNU tools accept \s and \S as shorthand for [[:space:]] and [^[:space:]] respectively and GNU sed accepts the non-POSIX-sed-standard \n as meaning newline.

This is a double pass method:
awk '(NR==FNR) { if(NF) {t=FNR;if(!h) h=FNR}; next}
(h<FNR && FNR<t)' file file
The integers h and t keep track of the head and the tail. In this case, empty lines can also contain blanks. You could replace if(NF) by if(length($0)==0) to be more strict.
This one reads everything into memory and does a simple replace at the end:
$ awk '{b=b RS $0}
END{ sub(/^[[:blank:]\n]*[^\n]+\n/,"",b);
sub(/\n[^\n]+[[:blank:]\n]*$,"",b);
print b }' file

A single-pass, fast and relatively memory-efficient approach utilising a buffer:
awk 'f {
if(NF) {
printf "%s",buf
buf=""
}
buf=(buf $0 ORS)
next
}
NF {
f=1
}' file

here is a golfed version of #kvantour's solution
$ awk 'NR==(n=FNR){e=!NF?e:n;b=!b?e:b}b<n&&n<e' file{,}

This might work for you (GNU sed):
sed -E '0,/\S/d;H;$!d;x;s/.(.*)\n.*\S.*/\1/' file
Use a range to delete upto and including the first line containing a non-space character. Then copy the remains of the file into the hold space and at the end of file use substitution to remove the last line containing a non-space character and any empty lines to the end of the file.
Alternative:
sed '0,/\S/d' file | tac | sed '0,/\S/d'| tac

Related

Convert multiple lines to a line separated by brackets and "|"

I have the following data in multiple lines:
1
2
3
4
5
6
7
8
9
10
I want to convert them to lines separated by "|" and "()":
(1)|(2)|(3)|(4)|(5)|(6)|(7)|(8)|(9)|10
I made a mistake. I'm sorry,I want to convert them to lines separated by "|" and "()":
(1)|(2)|(3)|(4)|(5)|(6)|(7)|(8)|(9)|(10)
What I have tried is:
seq 10 | sed -r 's/(.*)/(\1)/'|paste -sd"|"
What's the best unix one-liner to do that?
This might work for you (GNU sed):
sed 's/.*/(&)/;H;1h;$!d;x;s/\n/|/g' file
Surround each line by parens.
Append all lines to the hold space except for the first line which replaces the hold space.
Delete all lines except the last.
On the last line, swap to the hold space and replace all newlines by |'s.
N.B. When a line is deleted no further commands are invoked and the command cycle begins again. That is why the last two commands are only executed on the last line of the file.
Alternative:
sed -z 's/\n$//;s/.*/(&)/mg;y/\n/|/' file
With your shown samples please try following awk code. This should work in any version of awk.
awk -v OFS="|" '{val=(val?val OFS:"") "("$0")"} END{print val}' Input_file
Using GNU sed
$ sed -Ez ':a;s/([0-9]+)\n/(\1)|/;ta;s/\|$/\n/' input_file
(1)|(2)|(3)|(4)|(5)|(6)|(7)|(8)|(9)|(10)
Here is another simple awk command:
awk 'NR>1 {printf "%s|", p} {p="(" $0 ")"} END {print p}' file
(1)|(2)|(3)|(4)|(5)|(6)|(7)|(8)|(9)|(10)
Here it is:
sed -z 's/^/(/;s/\n/)|(/g;s/|($//' your_input
where -z allows you to treat the whole file as a single string with embedded \ns.
In detail, the sed script above consists of 3 commands separated by ;s:
s/^/(/ inserts a ( at the beginning of the whole file,
s/\n/)|(/g changes every \n to )|(;
s/|($// removes the trailing |( resulting from the \n at EOF, that is likely in your file since you are on linux.
With perl:
$ seq 10 | perl -pe 's/.*/($&)/; s/\n/|/ if !eof'
(1)|(2)|(3)|(4)|(5)|(6)|(7)|(8)|(9)|(10)
s/.*/($&)/ to surround input lines with ()
s/\n/|/ if !eof will change newline to | except for the last input line.
Here's a solution with paste (just for fun):
$ seq 10 | paste -d'()' /dev/null - /dev/null | paste -sd'|'
(1)|(2)|(3)|(4)|(5)|(6)|(7)|(8)|(9)|(10)
Using any awk:
$ seq 10 | awk '{printf "%s(%s)", sep, $0; sep="|"} END{print ""}'
(1)|(2)|(3)|(4)|(5)|(6)|(7)|(8)|(9)|(10)

How to add N blank lines between all rows of a text file?

I have a file that looks
a
b
c
d
Suppose I want to add N lines (in the example 3, but I actually need 20 or 100 depending on the file)
a
b
c
d
I can add one blank line between all of them with sed
sed -i '0~1 a\\' file
But sed -i '0~3 a\\' file inserts one line every 3 rows.
You may use with GNU sed:
sed -i 'G;G;G' file
The G;G;G will append three empty lines below each non-final line.
Or, awk:
awk 'BEGIN{ORS="\n\n\n"};1'
See an online sed and awk demo.
If you need to set the number of newlines dynamically use
nl="
"
awk -v nl="$nl" 'BEGIN{for(c=0;c<3;c++) v=v""nl;ORS=v};1' file > newfile
With GNU awk:
awk -i inplace -v lines=3 '{print; for(i=0;i<lines;i++) print ""}' file
Update with Ed's hints (see comments):
awk -i inplace -v lines=3 '{print; for(i=1;i<=lines;i++) print ""}' file
Update (without trailing empty lines):
awk -i inplace -v lines=3 'NR==1; NR>1{for(i=1;i<=lines;i++) print ""; print}' file
Output to file:
a
b
c
d
With sed and corutils:
N=4
sed "\$b;$(yes G\; | head -n$N)" infile
Similar trick with awk:
N=4
awk 1 RS="$(yes \\n | head -n$N | tr -d '\n')" infile
This might work for you (GNU sed):
sed ':a;G;s/\n/&/2;Ta' file
This will add 2 blank lines following each line.
Change 2 to what ever number you desire between each line.
An alternative (more efficient?):
sed '1{x;:a;/^.\{2\}/!s/^/\n/;ta;s/.//;x};G' file

How can I print only lines that are immediately preceeded by an empty line in a file using sed?

I have a text file with the following structure:
bla1
bla2
bla3
bla4
bla5
So you can see that some lines of text are preceeded by an empty line.
I understand that sed has the concept of two buffers, a pattern space buffer and a hold space buffer, so I'm guessing these need to come in to play here, but I'm unclear how to specify them to accomplish what I need.
In my contrived example above, I'd expect to see the following lines outputted:
bla3
bla5
sed is for doing s/old/new on individual lines, that is all. Any time you start talking about buffers or doing anything related to multi-lines comparisons you're using the wrong tool.
You could do this with awk:
$ awk -v RS= -F'\n' 'NR>1{print $1}' file
bla3
bla5
but it would fail to print the first non-empty line if the first line(s) in the file were empty so this may be what you want if you want lines of all space chars considered to be empty lines:
$ awk 'NF && !p{print} {p=NF}' file
bla3
bla5
and this otherwise:
$ awk '($0!="") && (p==""){print} {p=$0}' file
bla3
bla5
All of the above will work even if there are multiple empty lines preceding any given non-empty line.
To see the difference between the 3 approaches (which you won't see given the sample input in the question):
PS1> printf '\nfoo\n \nbar\n\netc\n' | cat -E
$
foo$
$
bar$
$
etc$
PS1> printf '\nfoo\n \nbar\n\netc\n' | awk -v RS= -F'\n' 'NR>1{print $1}'
etc
PS1> printf '\nfoo\n \nbar\n\netc\n' | awk 'NF && !p{print} {p=NF}'
foo
bar
etc
PS1> printf '\nfoo\n \nbar\n\netc\n' | awk '($0!="") && (p==""){print} {p=$0}'
foo
etc
You can use the hold buffer easily to print the line before the blank like this:
sed -n -e '/^$/{x; p;}' -e h input
But I don't see an easy way to use it for your use case. For your case, instead of using the hold buffer, you could do:
sed -n -e '/^$/ba' -e d -e :a -e n -e p input
But I would do this with awk.
awk 'NR!=1{print $1}' RS= FS=\\n input-file
awk 'p;{p=/^$/}' file
above command does these for each line:
if p is 1, print line;
if line is empty, set p to 1.
if lines consisting of one or more spaces are also considered empty:
awk 'p;{p=!NF}' file
to print non-empty lines each coming right after an empty line, you can use this:
awk 'p*!(p=/^$/)' file
if p is 1 and this line is not empty (1*!(0) = 1*1 = 1), print this line;
otherwise (1*!(1) = 1*0 = 0, 0*anything = 0), don't print anything.
note that this one may not work with all awks, a portable version of this would look like:
awk 'p*(/./);{p=/^$/}' file
if lines consisting of one or more spaces are also considered empty:
awk 'p*NF;{p=!NF}' file
see them online here, and here.
If sed/awk is not mandatory, you can do it with grep:
grep -A 1 '^$' input.txt | grep -v -E '^$|--'
You can use sed to match a range of lines and do sub-matches inside the matches, like so:
# - use the "-n" option to omit printing of lines
# - match lines between a blank line (/^$/) and a non-blank one (/^./),
# then print only the line that contains at least a character,
# i.e, the non-blank line.
sed -ne '
/^$/,/^./ {
/^./{ p; }
}' input.txt
tested by gnu sed, your data in 'a':
$ sed -nE '/^$/{N;s/\n(.+)/\1/p}' a
bla3
bla5
add -i option precedes -n to real editing

Why does awk not filter the first column in the first line of my files?

I've got a file with following records:
depots/import/HDN1YYAA_15102018.txt;1;CAB001
depots/import/HDN1YYAA_20102018.txt;2;CLI001
depots/import/HDN1YYAA_20102018.txt;32;CLI001
depots/import/HDN1YYAA_25102018.txt;1;CAB001
depots/import/HDN1YYAA_50102018.txt;1;CAB001
depots/import/HDN1YYAA_65102018.txt;1;CAB001
depots/import/HDN1YYAA_80102018.txt;2;CLI001
depots/import/HDN1YYAA_93102018.txt;2;CLI001
When I execute following oneliner awk:
cat lignes_en_erreur.txt | awk 'FS=";"{ if(NR==1){print $1}}END {}'
the output is not the expected:
depots/import/HDN1YYAA_15102018.txt;1;CAB001
While I am suppose get only the frist column:
If I run it through all the records:
cat lignes_en_erreur.txt | awk 'FS=";"{ if(NR>0){print $1}}END {}'
then it will start filtering only after the second line and I get the following output:
depots/import/HDN1YYAA_15102018.txt;1;CAB001
depots/import/HDN1YYAA_20102018.txt
depots/import/HDN1YYAA_20102018.txt
depots/import/HDN1YYAA_25102018.txt
depots/import/HDN1YYAA_50102018.txt
depots/import/HDN1YYAA_65102018.txt
depots/import/HDN1YYAA_80102018.txt
depots/import/HDN1YYAA_93102018.txt
Does anybody knows why awk is skiping the first line only.
I tried deleting first record but the behaviour is the same, it will skip the first line.
First, it should be
awk 'BEGIN{FS=";"}{ if(NR==1){print $1}}END {}' filename
You can omit the END block if it is empty:
awk 'BEGIN{FS=";"}{ if(NR==1){print $1}}' filename
You can use the -F command line argument to set the field delimiter:
awk -F';' '{if(NR==1){print $1}}' filename
Furthermore, awk programs consist of a sequence of CONDITION [{ACTIONS}] elements, you can omit the if:
awk -F';' 'NR==1 {print $1}' filename
You need to specify delimiter in either BEGIN block or as a command-line option:
awk 'BEGIN{FS=";"}{ if(NR==1){print $1}}'
awk -F ';' '{ if(NR==1){print $1}}'
cut might be better suited here, for all lines
$ cut -d';' -f1 file
to skip the first line
$ sed 1d file | cut -d';' -f1
to get the first line only
$ sed 1q file | cut -d';' -f1
however at this point it's better to switch to awk
if you have a large file and only interested in the first line, it's better to exit early
$ awk -F';' '{print $1; exit}' file

awk to transpose lines of a text file

A .csv file that has lines like this:
20111205 010016287,1.236220,1.236440
It needs to read like this:
20111205 01:00:16.287,1.236220,1.236440
How do I do this in awk? Experimenting, I got this far. I need to do it in two passes I think. One sub to read the date&time field, and the next to change it.
awk -F, '{print;x=$1;sub(/.*=/,"",$1);}' data.csv
Use that awk command:
echo "20111205 010016287,1.236220,1.236440" | \
awk -F[\ \,] '{printf "%s %s:%s:%s.%s,%s,%s\n", \
$1,substr($2,1,2),substr($2,3,2),substr($2,5,2),substr($2,7,3),$3,$4}'
Explanation:
-F[\ \,]: sets the delimiter to space and ,
printf "%s %s:%s:%s.%s,%s,%s\n": format the output
substr($2,0,3): cuts the second firls ($2) in the desired pieces
Or use that sed command:
echo "20111205 010016287,1.236220,1.236440" | \
sed 's/\([0-9]\{8\}\) \([0-9]\{2\}\)\([0-9]\{2\}\)\([0-9]\{2\}\)\([0-9]\{3\}\)/\1 \2:\3:\4.\5/g'
Explanation:
[0-9]\{8\}: first match a 8-digit pattern and save it as \1
[0-9]\{2\}...: after a space match 3 times a 2-digit pattern and save them to \2, \3 and \4
[0-9]\{3\}: and at last match 3-digit pattern and save it as \5
\1 \2:\3:\4.\5: format the output
sed is better suited to this job since it's a simple substitution on single lines:
$ sed -r 's/( ..)(..)(..)/\1:\2:\3./' file
20111205 01:00:16.287,1.236220,1.236440
but if you prefer here's GNU awk with gensub():
$ awk '{print gensub(/( ..)(..)(..)/,"\\1:\\2:\\3.","")}' file
20111205 01:00:16.287,1.236220,1.236440