awk search a line withing a file - awk

I am trying to find occurrences of a STRING in some other file:
First I extract the STRING exactly I want to search:
grep STRING test.txt | cut -d"," -f3 | tr -d ' '
Now I proceed to search it in other file - so my command is:
grep STRING test.txt | cut -d"," -f3 | tr -d ' ' | awk '/$0/' temp.txt
I am getting 0 rows output - but comparing manually I do find the strings common in both files?

You can't pipe like that. You'd need to use a sub-shell; something like:
grep $(grep STRING test.txt | cut -d"," -f3 | tr -d ' ') temp.txt
Alternatively, use awk like this:
awk -F, 'FNR==NR && /STRING/ { gsub(/ /,""); a[$3]; next } FNR!=NR { for (i in a) if ($0 ~ i) { print; next } }' test.txt temp.txt

Related

Need to retrieve a value from an HL7 file using awk

In a Linux script program, I've got the following awk command for other purposes and to rename the file.
cat $edifile | awk -F\| '
{ OFS = "|"
print $0
} ' | tr -d "\012" > $newname.hl7
While this is happening, I'd like to grab the 5th field of the MSH segment and save it for later use in the script. Is this possible?
If no, how could I do it later or earlier on?
Example of the segment.
MSH|^~\&|business1|business2|/u/tmp/TR0049-GE-1.b64|routing|201811302126||ORU^R01|20181130212105810|D|2.3
What I want to do is retrieve the path and file name in MSH 5 and concatenate it to the end of the new file.
I've used this to capture the data but no luck. If fpth is getting set, there is no evidence of it and I don't have the right syntax for an echo within the awk phrase.
cat $edifile | awk -F\| '
{ OFS = "|"
{fpth=$(5)}
print $0
} ' | tr -d "\012" > $newname.hl7
any suggestions?
Thank you!
Try
filename=`awk -F'|' '{print $5}' $edifile | head -1`
You can skip the piping through head if the file is a single line
First of all, it must be mentioned that the awk line in your first piece of code, has zero use:
$ cat $edifile | awk -F\| ' { OFS = "|"; print $0 }' | tr -d "\012" > $newname.hl7
This is totally equivalent to
$ cat $edifile | tr -d "\012" > $newname.hl7
because OFS is only used to redefine $0 if you redefine a field.
Example:
$ echo "a|b|c" | awk -F\| '{OFS="/"; print $0}'
a|b|c
$ echo "a|b|c" | awk -F\| '{OFS="/"; $1=$1; print $0}'
a/b/c
I understand that you have a hl7 file in which you have a single line starting with the string "MSH". From this line you want to store the 5th field: this is achieved in the following way:
fpth=$(awk -v outputfile="${newname}.hl7" '
BEGIN{FS="|"; ORS="" }
($1 == "MSH"){ print $5 }
{ print $0 > outputfile }' $edifile)
I have replaced ORS to an empty character set, as it is equivalent to tr -d "\012". The above will work very nicely if you only have a single MSH in your file.

Replace end of line with comma and put parenthesis in sed/awk

I am trying to process the contents of a file from this format:
this1,EUR
that2,USD
other3,GBP
to this format:
this1(EUR),that2(USD),other3(GBP)
The result should be a single line.
As of now I have come up with this circuit of commands that works fine:
cat myfile | sed -e 's/,/\(/g' | sed -e 's/$/\)/g' | tr '\n' , | awk '{print substr($0, 0, length($0)- 1)}'
Is there a simpler way to do the same by just an awk command?
Another awk:
$ awk -F, '{ printf "%s%s(%s)", c, $1, $2; c = ","} END { print ""}' file
1(EUR),2(USD),3(GBP)
Following awk may help you on same.
awk -F, '{val=val?val OFS $1"("$2")":$1"("$2")"} END{print val}' OFS=, Input_file
Toying around with separators and gsub:
$ awk 'BEGIN{RS="";ORS=")\n"}{gsub(/,/,"(");gsub(/\n/,"),")}1' file
this1(EUR),that2(USD),other3(GBP)
Explained:
$ awk '
BEGIN {
RS="" # record ends in an empty line, not newline
ORS=")\n" # the last )
}
{
gsub(/,/,"(") # replace commas with (
gsub(/\n/,"),") # and newlines with ),
}1' file # output
Using paste+sed
$ # paste -s will combine all input lines to single line
$ seq 3 | paste -sd,
1,2,3
$ paste -sd, ip.txt
this1,EUR,that2,USD,other3,GBP
$ # post processing to get desired format
$ paste -sd, ip.txt | sed -E 's/,([^,]*)(,?)/(\1)\2/g'
this1(EUR),that2(USD),other3(GBP)

How to merge these codes, awk then cut

I am using awk in Debian.
input
11.22.33.44#55878:
11.22.33.43#55879:
...
...
(smtp:55.66.77.88)
(smtp:55.66.77.89)
...
...
cpe-33-22-11-99.buffalo.res.rr.com[99.11.22.33]
cpe-34-22-11-99.buffalo.res.rr.com[99.11.22.34]
...
Parts of sh codes (running in Debian)
awk '/#/ {print > "file1";next} \
/smtp/ {print > "file2";next} \
{print > "file7"}' input
#
if [ -s file1 ] ; then
#IP type => 11.22.33.44#55878:
cut -d'#' -f1 file1 >> output
rm -f file1
fi
#
if [ -s file2 ] ; then
#IP type => (smtp:55.66.77.88)
cut -d':' -f2 file2 | cut -d')' -f1 >> output
rm -f file2
fi
#
if [ -s file7 ] ; then
#IP type => cpe-33-22-11-99.buffalo.res.rr.com[99.11.22.33]
cut -d'[' -f2 file7 | cut -d']' -f1 >> output
rm -f file7
fi
then output
11.22.33.44
11.22.33.43
55.66.77.88
55.66.77.89
99.11.22.33
99.11.22.34
Is it possible to merge these codes only with awk , something like
awk '/#/ {print | cut -d'#' -f1 > "file1";next} \
/smtp/ {print | cut -d':' -f2 | cut -d')' -f1 > "file2";next} \
{print | cut -d'[' -f2 file7 | cut -d']' > "file7"}' input
I am newbie and have no idea for this,
After search questions, still no help.
any hint?
Thanks.
Best Regard.
$ awk -F'[][()#]|smtp:' '/#/{print $1;next} /smtp/{print $3;next} /\[/{print $2}' input
11.22.33.44
11.22.33.43
55.66.77.88
55.66.77.89
99.11.22.33
99.11.22.34
To save this in the file output:
awk -F'[][()#]|smtp:' '/#/{print $1;next} /smtp/{print $3;next} /\[/{print $2}' input >output
How it works
-F'[][()#]|smtp:'
This sets the field separator to (a) any of the characters ][()# or (b) the string smtp:.
/#/{print $1;next}
If the line contains #, then print the first field and skip to the next line.
/smtp/{print $3;next}
If the line contains smtp, then print the third field and skip to the next line.
/\[/{print $2}
If the line contains [, then print the second field.
Variation
There is more than one way to solve this problem, For example, using a slightly different field separator, we can still get the desired output:
$ awk -F'[][()#:]' '/#/{print $1;next} /smtp/{print $3;next} /\[/{print $2}' input
11.22.33.44
11.22.33.43
55.66.77.88
55.66.77.89
99.11.22.33
99.11.22.34

match duplicate string before a specified delimiter

cat test.txt
serverabc.test.net
serverabc.qa.net
serverabc01.test.net
serverstag.staging.net
serverstag.test.net
here i need to match the duplicate strings just before the delimiter '.'
So the expected output would be like below. because string "serverabc" and "serverstag" found to be duplicates. Please help.
serverabc.test.net
serverabc.qa.net
serverstag.staging.net
serverstag.test.net
awk to the rescue!
$ awk -F\. '{c[$1]++; a[$1]=a[$1]?a[$1]RS$0:$0}
END{for(k in c) if(c[k]>1) print a[k]}' file
serverabc.test.net
serverabc.qa.net
serverstag.staging.net
serverstag.test.net
If it is not going to be used allot I would probably just do something like this:
cut -f1 -d\. foo.txt | sort |uniq -c | grep -v " 1 " | cut -c 9-|sed 's/\(.*\)/^\1\\./' > dup.host
grep -f dup.host foo.txt
serverabc.test.net
serverabc.qa.net
serverstag.staging.net
serverstag.test.net

Sort and print in line

Input:
54578787 -58 1
6578999 -658- 3
1352413 -541- 11
4564564 -23- 11
654564 -65- 3
6543564 -65- 1
Desired output:
column3 = 1,3,11
Using:
a=$(awk '{print $3}' text | sort -u | paste -s -d,) && paste <(echo "column3 =") <(echo $a)
I only get:
column3 = [large blank] 1,11,3
Other issue: If I remove all hyphens on the second column, I get
column3 = [large blank] ,1,11,3
I think it's a paste command issue.
Last but not least: why do I have 1,11,3 instead of 1,3,11?
I would just use awk:
$ awk '{a[$3]} END {printf "column3 = "; for (i in a) {printf "%d%s", i, (++v==length(a)?"\n":",")}}' file
column3 = 1,3,11
Explanation
a[$3] populate the a[] array with the 3rd column. This way, any new value will create a new index.
END {} perform commands after processing the whole file.
printf "column3 = " prints "column3 =".
for (i in a) {printf "%d%s", i, (++v==length(a)?"\n":",")} loop through the stored values and print them comma separated, unless it is the last one.
Your current solution would work like this:
$ paste -d" " <(echo "column3 =") <(awk '{print $3}' file | sort -u | paste -s -d,)
column3 = 1,11,3
Note there is no need to store in $a. And to have just one space, use paste -d" ".
And to have it sorted numerically? Just add -n to your sort:
$ paste -d" " <(echo "column3 =") <(awk '{print $3}' file | sort -nu | paste -s -d,)
column3 = 1,3,11
With this command you get the same output, no matter the hyphens.
You can do something like
echo "column3 = $(awk '{print $3}' test.txt |sort -nu | paste -s -d, )"
gives me
column3 = 1,3,11
One key element is to sort with the -n option to do numerical sorting.
It also works with the hyphens deleted:
echo "column3 = $(tr -d - < test.txt| awk '{print $3}' |sort -nu | paste -s -d, )"
also outputs
column3 = 1,3,11
If perl is acceptable:
perl -lanE '
$c3{$F[2]} = 1;
END {say "column3 = ", join(",", sort {$a <=> $b} keys %c3)}
' file
my gawk line looks like:
awk '{a[$3]} END{c=asorti(a,d,"#val_num_asc"); printf "column3 = ";
for(x=1;x<=c;x++)printf "%d%s", d[x],(c==x?"\n":",")}' file
output:
column3 = 1,3,11
Note
you need gawk to run that (asorti function)
sorting ascending as numbers
output in single line.
Assuming you truly want the numbers sorted and not just reproduced in the order they are first seen:
$ awk '{print $3}' file | sort -nu | awk '{s=(NR>1?s",":"")$0} END{print "column3 =",s}'
column3 = 1,3,11
You were getting 1,11,3 because without the -n arg for sort you are sorting alphabetically instead of numerically and the first char of 11 (i.e. 1) comes before the first char of 3.