Please suggest -
Input File -
G238740
G316342
G748951
G952443
G955221
G952842
G767727
G339717
G712953
Command i tried:
awk -F";" '{ print "displayName == \""$1"\" " "||" }' input
Current output -
displayName == "G238740" ||
displayName == "G316342" ||
displayName == "G748951" ||
displayName == "G952443" ||
displayName == "G955221" ||
displayName == "G952842" ||
displayName == "G767727" ||
displayName == "G339717" ||
displayName == "G712953" ||
Desired output , How to get output in one single line
displayName == "G238740" || displayName == "G316342" || displayName == "G748951"
Use printf instead of print, so it won't add a newline. And then conditionally print the || separator on all lines but the first:
awk '{printf("%sdisplayName == \"%s\"", (NR == 1 ? "" : " || "), $1)}
END {print ""}' input
Related
I have one file with multiple lines (reads from a genome) and they are sorted (based on their locations). Now I want to loop over these lines and if multiple lines have the same ID (column 4), I want to keep either keep the first, if column 3 is a plus or the last, if column three is a minus. This is m code but it seems like my variable (lastID) is not properly updated after each line.
Tips are much appreciated.
awk 'BEGIN {lastline=""; lastID=""}
{if ($lastline != "" && $4 != $lastID)
{print $lastline; lastline=""};
if ($3 == "+" && $4 != $lastID)
{print $0; lastline=""}
else if ($3 == "+" && $4 == $lastID)
{lastli=""}
else if ($3 == "-")
{lastline=$0};
lastID=$4
}' file
To access the value of a variable in awk you just use the name of the variable, just like in C and most other Algol-based languages. You don't stick a $ in front of it like you would with shell. Try changing:
$lastline != "" && $4 != $lastID
to:
lastline != "" && $4 != lastID
etc.
This might be what you're trying to do (your BEGIN section was doing nothing useful so I just removed it):
awk '
(lastline != "") && ($4 != lastID) {
print lastline
lastline=""
}
$3 == "+" {
if ($4 == lastID) {
lastli=""
}
else {
print $0
lastline=""
}
}
$3 == "-" {
lastline=$0
}
{ lastID=$4 }
' file
When formatted sensibly like that you can see that lastli is never used anywhere except where it's set to "" so that's probably a bug - maybe it's supposed to lastline in which case it can be made common rather than being set in both the if and else legs?
you may want to utilize awk's own condition{statement} structure. Note that code layout is not universally accepted but I find it easier to read for short statements.
$ awk '$lastline!="" && $4 != $lastID {print lastline; lastline=""}
$3=="+" && $4 != $lastID {print; lastline=""}
$3=="+" && $4 == $lastID {lastli=""}
$3=="-" {lastline=$0}
{lastID=$4}' file
Grep pattern and select portion of a line after a matching patterns 41572: 90000: and 90002:
input
hyt : generation
str : 122344
stks : 9000233
dhy : 9000aaaa
sjyt : hist : hhh9000kkk
Count ch : 41572:47149-47999/2(14485-14910) 41584:47149-47999/2(14911-15449) 90000:47919-47999/2(15447-15477) 90002:47919-47999/2(15478-15418)
drx : 12345
here the code used
awk '
{
flag=""
for(i=1;i<=NF;i++){
if($i ~ /41572/ || $i ~ /90000/ || $i ~ /90002/){
flag=1
printf("%s%s",$i,i==NF?ORS:OFS)
}
}
}
!flag
' Input_file
with the code above from Mr. RavinderSingh13, I got the following output
hyt : generation
str : 122344
stks : 9000233
dhy : 9000aaaa
sjyt : hist : hhh9000kkk
41572:47149-47999/2(14485-14910) 90000:47919-47999/2(15447-15477) 90002:47919-47999/2(15478-15418)
drx : 12345
I need the following output desired
hyt : generation
str : 122344
stks : 9000233
dhy : 9000aaaa
sjyt : hist : hhh9000kkk
Count ch : 41572:47149-47999/2(14485-14910) 90000:47919-47999/2(15447-15477) 90002:47919-47999/2(15478-15418)
drx : 12345
Thanks in advance
EDIT: Adding solution as per OP's new question.
awk '{flag="";for(i=1;i<=NF;i++){if($i ~ /41572/ || $i ~ /90000/ || $i ~ /90002/){flag=1;printf("%s%s",$i,i==NF?ORS:OFS)}}} !flag'
OR
awk '
{
flag=""
for(i=1;i<=NF;i++){
if($i ~ /41572/ || $i ~ /90000/ || $i ~ /90002/){
flag=1
printf("%s%s",$i,i==NF?ORS:OFS)
}
}
}
!flag
' Input_file
Could you please try following(though not fully clear going by as per shown sample output only).
awk 'NF>1{for(i=1;i<=NF;i++){if($i ~ /41572/ || $i ~ /90000/ || $i ~ /90002/){printf("%s%s",$i,i==NF?ORS:OFS)}};next} 1' Input_file
Adding a non-one liner form of solution too now.
awk '
NF>1{
for(i=1;i<=NF;i++){
if($i ~ /41572/ || $i ~ /90000/ || $i ~ /90002/){
printf("%s%s",$i,i==NF?ORS:OFS)
}
}
next
}
1
' Input_file
Explanation: Adding explanation for above code too here.
awk '
NF>1{ ##Checking if NF is greater than 1.
for(i=1;i<=NF;i++){ ##Using for loop to go through from value 1 to till value of NF.
if($i ~ /41572/ || $i ~ /90000/ || $i ~ /90002/){ ##Checking if value of fields is either 41572 OR 90000 OR 90002 then do following.
printf("%s%s",$i,i==NF?ORS:OFS) ##Print the field value in case above condition is TRUE with NEW line if i==NF or space if not.
}
}
next ##Next will skip all further statements from here.
}
1 ##1 will print all edited/non-edited lines here.
' Input_file ##Mentioning Input_file name here.
I am opening up a file and checking if the items in columns 1 & 8 match certain specs. If yes, write output to a file x. If the items in column 1 match specs but column 8 does not match the specs, write output to file y.
I am defining multiple variables (awk -v v=$var,f1=$file,f2=$output), and I believe how I reference f1 & f2 is the problem. If I remove the quotes:
print $0 >> f2
awk: cmd. line:5: (FILENAME=- FNR=2) fatal: expression for `>>' redirection has null string value
If I put in a $:
print $0 >> $f2
I end up with a bunch of files with odd names that I don't want, and the files I do want are empty (except for the echoed line).
if I put "":
print $0 >> "f2"
The files I want are almost empty, and it creates a file called f2.
#!/bin/bash
output="output.txt"
echo -e "C1\tSeqID\tAminoAcid\tCD1\tCD2\tCD3\tGene\tEnvironment\tFilename" > $output
inputFile="input.txt.gz"
for var in A B C D E F G H I J K L
do
file=$var".txt"
echo -e "C1\tSeqID\tAA\tCD1\tCD2\tCD3\tGene\tEnvironment\tFilename" > $file
#---Wrong, forgot to catch $8 != v
#zcat $inputFile | awk -v v=$var '{
# if ($8 == v && ($1 == "V1" || $1 == "V2" || $1 == "V3" || $1 == "V4" || $1 == "V5" || $1 == "V6" || $1 == "V7" || $1 == "V8" || $1 == "V9" || $1 == "V10"))
# print $0
# }' | tee -a $file $output
zcat $inputFile | awk -v v=$var,f1=$file,f2=$output '{
if ($8 == v && ($1 == "V1" || $1 == "V2" || $1 == "V3" || $1 == "V4" || $1 == "V5" || $1 == "V6" || $1 == "V7" || $1 == "V8" || $1 == "V9" || $1 == "V10"))
print $0 >> "file"
else if ($8 != v && ($1 == "V1" || $1 == "V2" || $1 == "V3" || $1 == "V4" || $1 == "V5" || $1 == "V6" || $1 == "V7" || $1 == "V8" || $1 == "V9" || $1 == "V10"))
print $0 >> "f2"
}'
gzip $file
done
gzip $output
I can run through the loop and have two separate awk commands that write to different files. However, it is a very large file (4G compressed) and it is more efficient to use my current approach (or something similar to it). Any guidance on how to reference the 2nd & 3rd variable are greatly appreciated.
Use separate -vs:
awk -v v="$var" -v f1="$file" -v f2="$output" '...'
% awk -v v=qw,f1=we,f2=as 'BEGIN{print v, "*", f1, "*", f2}'
qw,f1=we,f2=as * *
% awk -v v=qw -v f1=we -v f2=as 'BEGIN{print v, "*", f1, "*", f2}'
qw * we * as
%
Do you need anything else to proceed?
I am having a issue in having the output of the grep (used in system() in nawk ) assigned to a variable .
nawk '{
CITIZEN_COUNTRY_NAME = "INDIA"
CITIZENSHIP_CODE=system("grep "CITIZEN_COUNTRY_NAME " /tmp/OFAC/country_codes.config | cut -d # -f1")
}'/tmp/*****
The value IND is displayed in the console but when i give a printf the value of citizenshipcode is 0 - Can you pls help me here
printf("Country Tags|%s|%s\n", CITIZEN_COUNTRY_NAME ,CITIZENSHIP_CODE)
Contents of country_codes.config file
IND#INDIA
IND#INDIB
CAN#CANADA
system returns the exit value of the called command, but the output of the command is not returned to awk (or nawk). To get the output, you want to use getline directly. For example, you might re-write your script:
awk ' {
file = "/tmp/OFAC/country_codes.config";
CITIZEN_COUNTRY_NAME = "INDIA";
FS = "#";
while( getline < file ) {
if( $0 ~ CITIZEN_COUNTRY_NAME ) {
CITIZENSHIP_CODE = $1;
}
}
close( file );
}'
Pre-load the config file with awk:
nawk '
NR == FNR {
split($0, x, "#")
country_code[x[2]] = x[1]
next
}
{
CITIZEN_COUNTRY_NAME = "INDIA"
if (CITIZEN_COUNTRY_NAME in country_code) {
value = country_code[CITIZEN_COUNTRY_NAME]
} else {
value = "null"
}
print "found " value " for country name " CITIZEN_COUNTRY_NAME
}
' country_codes.config filename
I write the following awk ( print VAL_1 & VAL_2 if match in file )
awk -v VAL_1=$NET -v VAL_2=$NET_SPEED '$1 == VAL_1 && $2 == VAL_2 ' file
how to add in awk the print command ,
in order to print the word MATCH
if
$1=VAL_1
&
$2=VAL_2
lidia
$1 == VAL_1 && $2 == VAL_2 { print "MATCH" }