Print line modified and the line after using awk - awk

I want to modify lines in a file using awk and print the new lines with the following line.
My file is like this
Name_Name2_ Name3_Name4
ASHRGSJFSJRGDJRG
Name5_Name6_Name7_Name8
ADGTHEGHGTJKLGRTIWRK
I want
Name-Name2
ASHRGSJFSJRGDJRG
Name5-Name6
ADGTHEGHGTJKLGRTIWRK
I have sued awk to modify my file:
awk -F'_' {print $1 "-" $2} file > newfile
but I don't know how to tell to print also the line just after (ABDJRH)
sure is it possible with awk x=NR+1 NR<=x
thanks

Following awk may help you on same.
awk -F"_" '/_/{print $1"-"$2;next} 1' Input_file

assuming your structure in sample (no separation in line with "data" letter )
awk '$0=$1' Input_file
# or with sed
sed 's/[[:space:]].*//' Input_file

Related

awk command to read a key value pair from a file

I have a file input.txt which stores information in KEY:VALUE form. I'm trying to read GOOGLE_URL from this input.txt which prints only http because the seperator is :. What is the problem with my grep command and how should I print the entire URL.
SCRIPT
$> cat script.sh
#!/bin/bash
URL=`grep -e '\bGOOGLE_URL\b' input.txt | awk -F: '{print $2}'`
printf " $URL \n"
INPUT_FILE
$> cat input.txt
GOOGLE_URL:https://www.google.com/
OUTPUT
https
DESIRED_OUTPUT
https://www.google.com/
Since there are multiple : in your input, getting $2 will not work in awk because it will just give you 2nd field. You actually need an equivalent of cut -d: -f2- but you also need to check key name that comes before first :.
This awk should work for you:
awk -F: '$1 == "GOOGLE_URL" {sub(/^[^:]+:/, ""); print}' input.txt
https://www.google.com/
Or this non-regex awk approach that allows you to pass key name from command line:
awk -F: -v k='GOOGLE_URL' '$1==k{print substr($0, length(k FS)+1)}' input.txt
Or using gnu-grep:
grep -oP '^GOOGLE_URL:\K.+' input.txt
https://www.google.com/
Could you please try following, written and tested with shown samples in GNU awk. This will look for string GOOGLE_URL and will catch further either http or https value from url, in case you need only https then change http[s]? to https in following solution please.
awk '/^GOOGLE_URL:/{match($0,/http[s]?:\/\/.*/);print substr($0,RSTART,RLENGTH)}' Input_file
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
/^GOOGLE_URL:/{ ##Checking condition if line starts from GOOGLE_URL: then do following.
match($0,/http[s]?:\/\/.*/) ##Using match function to match http[s](s optional) : till last of line here.
print substr($0,RSTART,RLENGTH) ##Printing sub string of matched value from above function.
}
' Input_file ##Mentioning Input_file name here.
2nd solution: In case you need anything coming after first : then try following.
awk '/^GOOGLE_URL:/{match($0,/:.*/);print substr($0,RSTART+1,RLENGTH-1)}' Input_file
Take your pick:
$ sed -n 's/^GOOGLE_URL://p' file
https://www.google.com/
$ awk 'sub(/^GOOGLE_URL:/,"")' file
https://www.google.com/
The above will work using any sed or awk in any shell on every UNIX box.
I would use GNU AWK following way for that task:
Let file.txt content be:
EXAMPLE_URL:http://www.example.com/
GOOGLE_URL:https://www.google.com/
KEY:GOOGLE_URL:
Then:
awk 'BEGIN{FS="^GOOGLE_URL:"}{if(NF==2){print $2}}' file.txt
will output:
https://www.google.com/
Explanation: GNU AWK FS might be pattern, so I set it to GOOGLE_URL: anchored (^) to begin of line, so GOOGLE_URL: in middle/end will not be seperator (consider 3rd line of input). With this FS there might be either 1 or 2 fields in each line - latter is case only if line starts with GOOGLE_URL: so I check number of fields (NF) and if this is second case I print 2nd field ($2) as first record in this case is empty.
(tested in gawk 4.2.1)
Yet another awk alternative:
gawk -F'(^[^:]*:)' '/^GOOGLE_URL:/{ print $2 }' infile

Concatenate the sequence to the ID in fasta file

Here is my input file
>OTU1;size=4;
ATTCCGGGTTTACT
ATTCCTTTTATCGA
ATC
>OTU2;size=10;
CGGATCTAGGCGAT
ACT
>OTU3;size=5;
ATTCCCGGGATCTA
ACTTTTC
The expected output file is:
>OTU1;size=4;ATTCCGGGTTTACTATTCCTTTTATCGAATC
>OTU2;size=10;CGGATCTAGGCGATACT
>OTU3;size=5;ATTCCCGGGATCTAACTTTTC
I've tried the code from Remove line breaks in a FASTA file
but this doesn't work for me, and I am not sure how to modify the code from that post...
Any suggestion? Thanks in advance!
Here is another awk script. Using the awk internal parsing mechanism.
awk 'BEGIN{RS=">";OFS="";}NR>1{$1=$1;print ">"$0}' input.txt
Output is:
>OTU1;size=4;ATTCCGGGTTTACTATTCCTTTTATCGAATC
>OTU2;size=10;CGGATCTAGGCGATACT
>OTU3;size=5;ATTCCCGGGATCTAACTTTTC
Explanation:
awk '
BEGIN { # initialize awk internal variables
RS=">"; # set `RS`=record separator to `>`
OFS=""; # set `OFS`=output field separator to empty string.
}
NR>1 { # handle from 2nd record (1st record is empty).
$1=$1; # regenerate the output line
print ">"$0 # print out ">" with computed output line
}' input.txt
$ awk '{printf "%s%s", (/^>/ ? ors : ""), $0; ors=ORS} END{print ""}' file
>OTU1;size=4;ATTCCGGGTTTACTATTCCTTTTATCGAATC
>OTU2;size=10;CGGATCTAGGCGATACT
>OTU3;size=5;ATTCCCGGGATCTAACTTTTC
Could you please try following too.
awk -v RS=">" 'NR>1{gsub(/\n/,"");print ">"$0}' Input_file
My original attempt was awk -v RS=">" -v FS="\n" -v OFS="" 'NF>1{$1=$1;print ">"$0}' Input_file but later I saw it is already answered buy dudi boy so written another(first mentioned) one.
Similar to my answer here:
$ awk 'BEGIN{RS=">"; FS="\n"; ORS=""}
(FNR==1){next}
{ name=$1; seq=$0; gsub(/(^[^\n]*|)\n/,"",seq) }
{ print ">" name seq }' file1.fasta file2.fasta file3.fasta ...

awk condition for 30th column of a line is not working

My Input file looks like below,
1,,B4,3000,Rushab,UNI,20130919T22:45:05+0100,20190930T23:59:59+0100,,kapeta,,6741948090816,2285917436,971078887,1283538808965528,20181102_20001,,,,,,,,,,,,,,,C
2,,B4,3000,Rushab,UNI,20130919T22:45:05+0100,20190930T23:59:59+0100,20181006T11:57:13+0100,,vsuser,6741948090816,2285917436,971078887,1283538808965528,20181102_20001,,,,,,,,,,,,,,,H
1,,F1,100000,RAWBANK,UNI,20180416T15:25:00+0100,20190416T23:59:59+0100,,enrruac,,7522609506635,3101315044,998445487,1290161608965816,20181102_20001,,,,,,,,,,,,,,,C
4,,F1,100000,RAWBANK,UNI,20180416T15:25:00+0100,20190416T23:59:59+0100,20181007T22:25:13+0100,,vsuser,7522609506635,3101315044,998445487,1290161608965816,20181102_20001,,,,,,,,,,,,,,,H
i want to print only the line that are starting with '1' and ends with 'C', so i am trying with below command,
awk -F, '$1=='1' && $31=='C'{print $0}' input_file.txt
but i am not getting any output.
Use double quotes:
awk -F, '$1=="1" && $31=="C"{print $0}' file
or
awk -F, '$1=="1" && $31=="C"' file
As other users suggested, this can be done using a simple regex. So you can use sed as well as awk
sed '/^1,.*,C$/!d' file

How to delete first three columns in a delimited file

For example, I have a csv file as follow,
12345432|1346283301|5676438284971|13564357342151697 ...
87540258|1356433301|1125438284971|135643643462151697 ...
67323266|1356563471|1823543828471|13564386436651697 ...
and hundreds more columns but I want to remove first three columns and save to a new file(if possible same file would be better for me)
This is the result I want.
13564357342151697 ...
135643643462151697 ...
13564386436651697 ...
I have been looking and trying but I am not able to do it. And below is the code I have.
awk -F'|' '{print $1 > "newfile"; sub(/^[^|]+\|/,"")}1' old.csv > new.csv
Appreciate if someone can help me. Thank you.
You can use cut :
cut -f4- -d'|' old.csv > new.csv
#Heng: try:
awk -F"|" '{for(i=4;i<=NF;i++){printf("%s%s",$i,i==NF?"":"|")};print ""}' Input_file
OR
awk -F"|" '{for(i=4;i<=NF;i++){printf("%s%s",$i,i==NF?"\n":"|")};}' Input_file
you could re-direct this command's output into a file as per your need.
EDIT:
awk -F"|" 'FNR==1{++e;fi="REPORT_A1_"e;} {for(i=4;i<=NF;i++){printf("%s%s",$i,i==NF?"\n":"|") > fi}}' Input_file1 Input_file2 Input_file3
This is what you're looking for:
awk -F '|' '{$1=$2=$3=""; print $0}' oldfile > newfile
But it will have leading whitespaces so then add the following substitution:
sub(/^[ \t\|]+/,"") --> changed to sub(/^[ \t\|]+/,"") (escaped leading '|' from column removal)
awk -F '|' '{$1=$2=$3="";OFS="|";sub(/^[ \t\|]+/,"") ;print $0}' oldFile > newFile
awk -F\| '{print $NF}' file >newfile
13564357342151697 ...
135643643462151697 ...
13564386436651697 ...

find match, print first occurrence and continue until the end of the file - awk

I have a pretty large file from which I'd like to extract only the first line of those containing my match and then continuing doing that until the end of the file. Example of input and desired output below
Input
C,4,2,5,6,8,9,5
C,4,5,4,5,4,43,6
S,4,23,567,2,4,5
S,23,4,7,78,8,9,6
S,3,5,67,8,54,56
S,4,8,9,54,3,4,52
E,2,3,213,5,8,44
E,5,7,9,67,89,33
E,54,526,54,43,53
S,9,8,9,7,9,32,4
S,5,6,4,5,67,87,88
S,4,23,5,8,5,7,3
E,4,6,4,8,9,32,23
E,43,7,1,78,9,8,65
Output
S,4,23,567,2,4,5
S,9,8,9,7,9,32,4
The match in my lines is S, which usually comes after a line that starts with either E or C. What I'm struggling with is to tell awk to print only the first line after those with E or C. Another way would be to print the first of the bunch of lines containing S. Any idea??
does this one-liner help?
awk '/^S/&&!i{print;i=!i}!/^S/{i=!i}' file
or more "readable":
awk -v p=1 '/^S/&&p{print;p=0}!/^S/{p=1}' file
You can use sed, like this:
sed -rn '/^(E|C)/{:a;n;/^S/!ba;p}' file
here's a multi liner to enter in a file (e.g. u.awk)
/^[CE]/ {ON=1; next}
/^S/ {if (ON) print}
{ON=0}
then run : "awk -f u.awk inputdatafile"
awk to the rescue!
$ awk '/^[CE]/{p=1} /^S/&&p{p=0;print}' file
S,4,23,567,2,4,5
S,9,8,9,7,9,32,4
$ awk '/^S/{if (!f) print; f=1; next} {print; f=0}' file
C,4,2,5,6,8,9,5
C,4,5,4,5,4,43,6
S,4,23,567,2,4,5
E,2,3,213,5,8,44
E,5,7,9,67,89,33
E,54,526,54,43,53
S,9,8,9,7,9,32,4
E,4,6,4,8,9,32,23
E,43,7,1,78,9,8,65