From linux command line, how can I remove \n from a particular line to merge two lines together? [closed] - awk

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
Using the command line, how can I transform something like:
1 first line
2 second line
3 third line
4 fourth line
extra bit
5 fifth line
6 sixth line
into, say:
1 first line
2 second line
3 third line
4 fourth line; extra bit
5 fifth line
6 sixth line
The condition on which I would like to merge, is to remove any newline creating a line which does not start with a number.
I have seen answers to similar questions using the command-line tools awk, sed, and tr.

awk '/^[0-9]/{ printf "%s%s", (NR == 1 ? "" : "\n"), $0; next}
{printf "; %s", $0} END { printf "\n"}' input
I'm not really sure what you want to do when the first line does not begin with a digit, and I'm making the assumption that starting with a digit is the characteristic you are looking for to combine lines. Modify as needed.

With GNU sed:
sed "4{N;s/\n/; /}" file
With GNU awk:
awk -v line=4 'NR==line{x=$0; getline; $0=x "; " $0}1' file
Output:
1 first line
2 second line
3 third line
4 fourth line; extra bit
5 fifth line
6 sixth line

Could you please try following.
Written and tested it in
https://ideone.com/xqk4si
awk -v line_num="5" '
FNR==(line_num-1){
val=$0
next
}
val{
$0=val";"$0
val=""
}
1
' Input_file
Explanation: mentioning awk variable named line_num which has line number which OP wants to merge with its previous line. In main program checking condition if current line is just one lesser than mentioned line number of yes then create variable val and save that line. Then next condition checking if Val is SET then print previous line value semi colon and current line value and next will skip all further statements from there. 1 is way to print the current lines in awk

On second thought, it might be better to merge all lines that do not start with a number, rather than specifying by number each line to be merged.
Easy to do with ed:
printf "%s\n" '2,$g/^[^0-9]/-1s/$/; /\' '.,+1j' w | ed -s input.txt
Translated from ed's rather cryptic commands: For each line that does not start with a digit (Skipping the first line because it has no previous one to merge with), add ; to the end of the previous line, and then join those two lines. Finally save the changed file.
Example:
$ cat input.txt
1 first line
2 second line
extra stuff
3 third line
4 fourth line
extra bit
5 fifth line
6 sixth line
$ printf "%s\n" '2,$g/^[^0-9]/-1s/$/; /\' '.,+1j' w | ed -s input.txt
$ cat input.txt
1 first line
2 second line; extra stuff
3 third line
4 fourth line; extra bit
5 fifth line
6 sixth line

With GNU sed, to join any number of lines not starting with a digit:
sed -E ':a;N;s/\n([^0-9])/; \1/;ta;P;D;' file

Related

How to print certain lines from sections of a file separated by a blank line with sed

I have been trying to come up with a sed command that will pull certain lines from blocks of text separated by a blank line in a file. The blocks of text are as below.
# cat test_file.txt
line 1
line 2
line 3
line 4
line 5
line 1
line 2
line 3
line 4
line 5
line 1
line 2
line 3
line 4
line 5
I am trying to pull out line 2 an 4 from each block so the output will be like below.
line 2
line 4
line 2
line 4
line 2
line 4
I came up with a way to do it for the first block of text using sed:
# sed -n -e 2p -e 4p test_flie.txt
line 2
line 4
But haven't been able to find a way to get it to continue for each block of text till the end of the file. Any pointers would be greatly appreciated.
awks paragraph mode exists specifically to handle blank-line separated records/blocks of text like you're dealing with:
$ awk 'BEGIN{RS=""; ORS="\n\n"; FS=OFS="\n"} {print $2, $4}' file
line 2
line 4
line 2
line 4
line 2
line 4
Reference the POSIX standard:
If RS is null, then records are separated by sequences consisting of a <newline> plus one or more blank lines, leading or trailing blank lines shall not result in empty records at the beginning or end of the input
If you need to not have a blank line printed after the final record:
$ awk 'BEGIN{RS=""; FS=OFS="\n"} NR>1{print prev ORS} {prev=$2 OFS $4} END{print prev}' file
line 2
line 4
line 2
line 4
line 2
line 4
or if you don't want to use paragraph mode for some reason then:
$ awk 'BEGIN{tgts[2]; tgts[4]} !NF{print ""; lineNr=0; next} ++lineNr in tgts' file
line 2
line 4
line 2
line 4
line 2
line 4
I'd use awk for this, e.g:
awk '(!NF&&m=NR)||NR-m==2||NR-m==4' file
This might work for you (GNU sed):
sed -n '/\S/{n;p;n;n;p;:a;n;//ba;p}' file
Set the -n option for explicit printing. Print the second and fourth lines then throw away any non-blank lines and print the first blank one. Repeat.

Problems with awk substr

I am trying to split a file column using the substr awk command. So the input is as follows (it consists of 4 lines, one blank line):
#NS500645:122:HYGVMBGX2:4:21402:2606:16446:ACCTAGAAGG:R1
ACCTAGAAGGATATGCGCTTGCGCGTTAGAGATCACTAGAGCTAAGGAATTTGAGATTACAGTAAGCTATGATCC
/AAAAEEEEEEEEEEAAEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE
I want to split the second line by the pattern "GATC" but keeping it on the right sub-string like:
ACCTAGAAGGATATGCGCTTGCGCGTTAGA GATCACTAGAGCTAAGGAATTTGAGATTACAGTAAGCTATGATCC
I want that the last line have the same length as the splitted one and regenerate the file like:
ACCTAGAAGGATATGCGCTTGCGCGTTAGA
/AAAAEEEEEEEEEEAAEEEAEEEEEEEEE
GATCACTAGAGCTAAGGAATTTGAGATTACAGTAAGCTAT
EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE
GATCC
EEEEE
For split the last colum I am using this awk script:
cat prove | paste - - - - | awk 'BEGIN
{FS="\t"; OFS="\t"}\ {gsub("GATC","/tGATC", $2); {split ($2, a, "\t")};\ for
(i in a) print substr($4, length(a[i-1])+1,
length(a[i-1])+length(a[i]))}'
But the output is as follows:
/AAAAEEEEEEEEEEAAEEEAEEEEEEEEE
EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE
EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE
Being the second and third line longer that expected.
I check the calculated length that are passed to the substr command and are correct:
1 30
31 70
41 45
Using these length the output should be:
/AAAAEEEEEEEEEEAAEEEAEEEEEEEEE
EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE
EEEEE
But as I showed it is not the case.
Any suggestions?
I guess you're looking something like this, but your question formatting is really confusing
$ awk -v OFS='\t' 'NR==1 {next}
NR==2 {n=index($0,"GATC")}
/^[^+]/ {print substr($0,1,n-1),substr($0,n)}' file
ACCTAGAAGGATATGCGCTTGCGCGTTAGA GATCACTAGAGCTAAGGAATTTGAGATTACAGTAAGCTATGATCC
/AAAAEEEEEEEEEEAAEEEAEEEEEEEEE EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE
I assumed your file is in this format
dummy header line to be ignored
ACCTAGAAGGATATGCGCTTGCGCGTTAGAGATCACTAGAGCTAAGGAATTTGAGATTACAGTAAGCTATGATCC
+
/AAAAEEEEEEEEEEAAEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE

awk: Compare two sets of numbers (generated by random and strict rules)

I have many files containing some fixed words and numbers:
The FIRST SET of numbers has a fixed length of 7 digits: the first 4 of them being like a random prefix (in example are 100,200,300 but can be others..) we do not need it, we are interested for the remaining 4 digits.
The SECOND SET of number/s is generated number based on the last 4 digits from the FIRST SET (xxx7777 = 7777; xxx0066 = 66). You can see that the SECOND SET can NOT have leading zeros, they are cut out already and this is a rule.
Input
first second third 1007777 fourth 7777
...
first second third 2008341 fourth 8341
...
first second third 3000005 fourth 5
...
...
first second third 2008341 fourth 8
...
first second third 2008341 fourth 341
I found in other examples here - how to find interested lines using grep, but I didn't found AWK example doing what I want, because of the rule with the leading zeros maybe i'm having problems..
My attempt to find the wrong generations:
grep -Pr 'first second third' docs/test/*.txt | awk '{ if($4=$6) print $4 " " $6}'
7777 7777
8341 8341
5 5
8 8
341 341
The correct Output should look like this:
2008341 8
2008341 341
..only the problems (not right generated) lines and the filename.
Thanks ! :)
$ awk '/first second third/ && (substr($4,4)+0 != $NF) {print FILENAME, $4, $NF}' file
file 2008341 8
file 2008341 341
Call it as:
awk '...' docs/test/*.txt
or:
find docs -name '*.txt' -exec awk '...' {} \;
or similar as you see fit.
Use this gnu way, intented to be human readable and maintenable :
$ grep -r foobarbase . | awk '
{match($4, /[0-9]{4}$/, a); #1
a[0]=gensub(/^0+/, "", "g", a[0])} #2
$NF != a[0] #3
' file
Output :
first second third 2008341 fourth 8
first second third 2008341 fourth 341
Explanations :
#1 get the last 4 digits of column 4 and assign a array with match
#2 remove all leading 0
#3 if cutted part is different than last column, print (default awk behavior on true condition)

print whole variable contents if the number of lines are greater than N

How to print all lines if certain condition matches.
Example:
echo "$ip"
this is a sample line
another line
one more
last one
If this file has more than 3 lines then print the whole variable.
I am tried:
echo $ip| awk 'NR==4'
last one
echo $ip|awk 'NR>3{print}'
last one
echo $ip|awk 'NR==12{} {print}'
this is a sample line
another line
one more
last one
echo $ip| awk 'END{x=NR} x>4{print}'
Need to achieve this:
If this file has more than 3 lines then print the whole file. I can do this using wc and bash but need a one liner.
The right way to do this (no echo, no pipe, no loops, etc.):
$ awk -v ip="$ip" 'BEGIN{if (gsub(RS,"&",ip)>2) print ip}'
this is a sample line
another line
one more
last one
You can use Awk as follows,
echo "$ip" | awk '{a[$0]; next}END{ if (NR>3) { for(i in a) print i }}'
one more
another line
this is a sample line
last one
you can also make the value 3 configurable from an awk variable,
echo "$ip" | awk -v count=3 '{a[$0]; next}END{ if (NR>count) { for(i in a) print i }}'
The idea is to store the contents of the each line in {a[$0]; next} as each line is processed, by the time the END clause is reached, the NR variable will have the line count of the string/file you have. Print the lines if the condition matches i.e. number of lines greater than 3 or whatever configurable value using.
And always remember to double-quote the variables in bash to avoid undergoing word-splitting done by the shell.
Using James Brown's useful comment below to preserve the order of lines, do
echo "$ip" | awk -v count=3 '{a[NR]=$0; next}END{if(NR>3)for(i=1;i<=NR;i++)print a[i]}'
this is a sample line
another line
one more
last one
Another in awk. First test files:
$ cat 3
1
2
3
$ cat 4
1
2
3
4
Code:
$ awk 'NR<4{b=b (NR==1?"":ORS)$0;next} b{print b;b=""}1' 3 # look ma, no lines
[this line left intentionally blank. no wait!]
$ awk 'NR<4{b=b (NR==1?"":ORS)$0;next} b{print b;b=""}1' 4
1
2
3
4
Explained:
NR<4 { # for tghe first 3 records
b=b (NR==1?"":ORS) $0 # buffer them to b with ORS delimiter
next # proceed to next record
}
b { # if buffer has records, ie. NR>=4
print b # output buffer
b="" # and reset it
}1 # print all records after that

awk - skip last line for condition

When I wrote an answer for this question I used the following:
something | sed '$d' | awk '$1>3{print $0}'
e.g.
print only lines where the 1st field is bigger than 3 (awk)
but omit the last line sed '$d'.
This seems for me a bit of duplicate work, surely it is possible to do the above only with awk - without the sed?
I'm an awkdiot - so, can someone suggest a solution?
Here's one way you could do it:
$ printf "%s\n" {1..10} | awk 'NR>1&&p>3{print p}{p=$1}'
4
5
6
7
8
9
Basically, print the first field of the previous line, rather than the current one.
As Wintermute has rightly pointed out in the comments (thanks), in order to print the whole line, you can modify the code to this:
awk 'p { print p; p="" } $1 > 3 { p = $0 }'
This only assigns the contents of contents of the line to p if the first field is greater than 3.