Use standard output as an input in awk [duplicate] - awk

This question already has answers here:
Reading from stdin OR file using awk
(2 answers)
Closed 4 years ago.
I have a file that I cannot do any modification due to permission issues.
So I can only visualize the files with a tool and its view option which prints out its content to a standard output with 8 tab delimited columns :
foo-tool view file1.txt
ALICE . CANDY 1990 . 76 76 78
MARK . CARAMEL 1991 . 45 88 88
CLAIRE . SALTY XXX . 77 82 12
I do have another file that I want to compare its 1st,6th and 7th columns with the 1st,6th and 7th columns of file1.txt, and add the 3rd and 4th columns of file1.txt to file2.txt in case of any match in these columns.
file2.txt
ALICE . CANDY 1990 . 76 76 97
MARK . CARAMEL 1991 . 45 88 87
BLAINE . SALTY XXX . 77 82 10
If I would be able to open file1.txt rather than only standard output, I would do :
awk 'NR==FNR { a[$1,$6,$7] = $0; next }($1,$6,$7) in a { print a[$1,$6,$7], $3, $4 }' file1.txt file2.txt
So the output would be :
ALICE . CANDY 1990 . 76 76 78 CANDY 1990
MARK . 54 1991 . 45 88 88 CARAMEL 1991
But, since I cannot use the standard output of file1.txt as a file, I am stuck on how to proceed.
I tried to open it and direct its standard output but it did not work:
foo-tool view file1.txt | awk 'NR==FNR { a[$1,$6,$7] = $0; next }($1,$6,$7) in a { print a[$1,$6,$7], $3, $4 }' ARG[$1] file2.txt
How can I pass the standard output as a file input in awk as one of the files?

"-" (a dash) is a special filename that means standard input. This convention is used in many Unix tools, and especially awk
Your command line must then be:
$ foo-tool view file1.txt | awk '{ your_program }' - file2.txt
Alternatively, if your system supports it (Linux does), you can use the /dev/stdin file:
$ foo-tool view file1.txt | awk '{ your_program }' /dev/stdin file2.txt
You can also use "process substitution" if your shell supports it (bash, ksh and zsh do):
$ awk '{ your_program }' <(foo-tool view file1.txt) file2.txt
It may be useful if you have to process the output of several distinct commands, like:
$ awk '{ your_program }' <(foo-tool view file1.txt) <(foo-tool view file2.txt)

Related

Print the data from second column [duplicate]

This question already has answers here:
Pipe symbol | in AWK field delimiter
(3 answers)
Closed 2 years ago.
I am having a file called fixed.txt as shown below:
Column1 | Column2 | Column3
Total expected ratio | 53 | 68
Total number|count number | 54 | 72
reset|print|total | 64 | 84
I am trying to print the output column2 as below:
Fixed.txt:
53
54
64
I tried the below script but I am not getting the desired output.
#!/bin/bash
for d in fixed.txt
do
awk -F" | "
NR>1
awk '{ print $2 }' fixed.txt
done
Can we use pipeline(|) and space at a time as a delimiter?
1st solution: Could you please try following based on your shown samples it's written. Written and tested in
https://ideone.com/WoF40j
awk '
BEGIN{
FS="|"
}
{
print $(NF-1)+0
}
' Input_file
2nd solution: Use space and | as a field delimiter one could run following.
awk -F'[[:blank:]]+\\|[[:blank:]]+' '{print $(NF-1)}' Input_file
OR
awk -F' +\\| +' '{print $(NF-1)}' Input_file

How to compare 2 files having multiple occurances of a number and output the additional occurance?

Currently i am using a awk script to compare 2 files having random numbers in non sequential order.
It works perfect , but there is just one future condition i would like to fulfill.
Current awk function
awk '
{
$0=$0+0
}
FNR==NR{
a[$0]
next
}
($0 in a){
b[$0]
next
}
{ print }
END{
for(j in a){
if(!(j in b)){ print j }
}
}
' compare1.txt compare2.txt
What the the function accomplishes currently ?
It outputs list of all the numbers which are present in compare1 but not in compare 2 and vice versa
If any number has zero in its prefix, ignore zeros while comparing ( basically the absolute value of number must be different to be treated as a mismatch ) Example - 3 should be considered matching with 003 and 014 should be considered matching with 14, 008 with 8 etc
As required It also considers a number matched even if they are not necessarily on the same line in both files
Required additional condition
In its current form , this functions works in such a way that if a file has multiple occurances of a number and other file has even one occurance of that same number , it considers the number matched for both repetitions.
I need the awk function to be edited to output any additional occurrence of a number
cat compare1.txt
57
11
13
3
889
014
91
775
cat compare2.txt
003
889
13
14
57
12
90
775
775
Expected output
12
90
11
91
**775**
The number marked here at end is currently not being shown in output in my present awk function ( 2 occurances - 1 occurrence )
As mentioned at https://stackoverflow.com/a/62499047/1745001, this is the job that comm exists to do:
$ comm -3 <(awk '{print $0+0}' compare1.txt | sort) <(awk '{print $0+0}' compare2.txt | sort)
11
12
775
90
91
and to get rid of the white space:
$ comm -3 <(awk '{print $0+0}' compare1.txt | sort) <(awk '{print $0+0}' compare2.txt | sort) |
awk '{print $1}'
11
12
775
90
91
you just need to count the occurrences and account for it in matching...
$ awk '{k=$0+0}
NR==FNR {a[k]++; next}
!(k in a && a[k]-->0);
END {for(k in a) while(a[k]-->0) print k}' file1 file2
12
90
775
11
91
note that as in your original script there is no absolute value comparison, which you can add easily by just changing k in the first line.

Awk pass variable containing columns to be printed

I want to pass a variable to awk containing which columns to print from a file?
In this trivial case file.txt contains a single line
11 22 33 44 55
This is what I've tried:
awk -v a='2/4' -v b='$2/$4' '{print a"\n"$a"\n"b"\n"$b}' file.txt
output:
2/4
22
$2/$4
11 22 33 44 55
desired output:
0.5
Is there any way to do this type of "eval" of variable as a command?
Here is one method for dividing columns specified in variables:
$ awk -v num=2 -v denom=4 '{print $num/$denom}' file.txt
0.5
If you trust the person who creates the shell variable b, then here is a method that offers flexibility:
$ b='$2/$4'; awk "{print $b}" file.txt
0.5
$ b='$1*$2'; awk "{print $b}" file.txt
242
$ b='$2,$2/$4,$5'; awk "{print $b}" file.txt
22 0.5 55
The flexibility here is due to the fact that b can contain any awk code. This approach requires that you trust the creator of b.

awk and sprintf to zero fill

Using awk and sprintf how can I zero fill both before and after a decimal point
input
11
12.2
9.6
output
110
122
096
I can get either using these, but not both
sprintf("%.1f", $1)
output
110
122
96
sprintf("%03d", $1)
output
011
012
096
x = sprintf("%06.3f", 1.23)
Output:
$ awk 'BEGIN{x = sprintf("%06.3f", 1.23); print x}'
01.230
$
I really can't tell from your question but maybe one of these does whatever it is you want:
$ cat file
11
12.2
9.6
$ awk '{ x=sprintf("%03d",$0*10); print x }' file
110
122
096
$ awk '{ x=sprintf("%04.1f",$0); print x }' file
11.0
12.2
09.6
Obviously you could just use printf with no intermediate variable but you asked for sprintf().

awk + Need to print everything (all rest fields) except $1 and $2

I have the following file and I need to print everything except $1 and $2 by awk
File:
INFORMATION DATA 12 33 55 33 66 43
INFORMATION DATA 45 76 44 66 77 33
INFORMATION DATA 77 83 56 77 88 22
...
the desirable output:
12 33 55 33 66 43
45 76 44 66 77 33
77 83 56 77 88 22
...
Well, given your data, cut should be sufficient:
cut -d\ -f3- infile
Although it adds an extra space at the beginning of each line compared to yael's expected output, here is a shorter and simpler awk based solution than the previously suggested ones:
awk '{$1=$2=""; print}'
or even:
awk '{$1=$2=""}1'
$ cat t
INFORMATION DATA 12 33 55 33 66 43
INFORMATION DATA 45 76 44 66 77 33
INFORMATION DATA 77 83 56 77 88 22
$ awk '{for (i = 3; i <= NF; i++) printf $i " "; print ""}' t
12 33 55 33 66 43
45 76 44 66 77 33
77 83 56 77 88 22
danbens answer leaves a whitespace at the end of the resulting string. so the correct way to do it would be:
awk '{for (i=3; i<NF; i++) printf $i " "; print $NF}' filename
If the first two words don't change, probably the simplest thing would be:
awk -F 'INFORMATION DATA ' '{print $2}' t
Here's another awk solution, that's more flexible than the cut one and is shorter than the other awk ones. Assuming your separators are single spaces (modify the regex as necessary if they are not):
awk --posix '{sub(/([^ ]* ){2}/, ""); print}'
If Perl is an option:
perl -lane 'splice #F,0,2; print join " ",#F' file
These command-line options are used:
-n loop around every line of the input file, do not automatically print it
-l removes newlines before processing, and adds them back in afterwards
-a autosplit mode – split input lines into the #F array. Defaults to splitting on whitespace
-e execute the perl code
splice #F,0,2 cleanly removes columns 0 and 1 from the #F array
join " ",#F joins the elements of the #F array, using a space in-between each element
Variation for csv input files:
perl -F, -lane 'splice #F,0,2; print join " ",#F' file
This uses the -F field separator option with a comma