multiple field separator in awk

multiple field separator in awk - awk

I'm trying to process an input which has two field seperators ; and space. I'm able to parse the input with one separator using:
echo "10.23;7.15;6.23" | awk -v OFMF="%0.2f" 'BEGIN{FS=OFS=";"} {print $1,$2,$3}'
10.23;7.15;6.23
For an input with two seperators, I tried this and it doesn't parse both the seperators:
echo "10.23;7.15 6.23" | awk -v OFMF="%0.2f" 'BEGIN{FS=OFS=";" || " "} {print $1,$2,$3}'

You want to set FS to a character list:
awk -F'[; ]' 'script' file
and the other builtin variable you're trying to set is named OFMT, not OFMF:
$ echo "10.23;7.15 6.23" | awk -F'[; ]' -v OFMT="%0.2f" '{print $1,$2,$3}'
10.23 7.15 6.23
$ echo "10.23;7.15 6.23" | awk 'BEGIN{FS="[; ]"; OFS=";"; OFMT="%0.2f"} {print $1,$2,$3}'
10.23;7.15;6.23

Related

Need to retrieve a value from an HL7 file using awk

In a Linux script program, I've got the following awk command for other purposes and to rename the file.
cat $edifile | awk -F\| '
{ OFS = "|"
print $0
} ' | tr -d "\012" > $newname.hl7
While this is happening, I'd like to grab the 5th field of the MSH segment and save it for later use in the script. Is this possible?
If no, how could I do it later or earlier on?
Example of the segment.
MSH|^~\&|business1|business2|/u/tmp/TR0049-GE-1.b64|routing|201811302126||ORU^R01|20181130212105810|D|2.3
What I want to do is retrieve the path and file name in MSH 5 and concatenate it to the end of the new file.
I've used this to capture the data but no luck. If fpth is getting set, there is no evidence of it and I don't have the right syntax for an echo within the awk phrase.
cat $edifile | awk -F\| '
{ OFS = "|"
{fpth=$(5)}
print $0
} ' | tr -d "\012" > $newname.hl7
any suggestions?
Thank you!

Try
filename=`awk -F'|' '{print $5}' $edifile | head -1`
You can skip the piping through head if the file is a single line

First of all, it must be mentioned that the awk line in your first piece of code, has zero use:
$ cat $edifile | awk -F\| ' { OFS = "|"; print $0 }' | tr -d "\012" > $newname.hl7
This is totally equivalent to
$ cat $edifile | tr -d "\012" > $newname.hl7
because OFS is only used to redefine $0 if you redefine a field.
Example:
$ echo "a|b|c" | awk -F\| '{OFS="/"; print $0}'
a|b|c
$ echo "a|b|c" | awk -F\| '{OFS="/"; $1=$1; print $0}'
a/b/c
I understand that you have a hl7 file in which you have a single line starting with the string "MSH". From this line you want to store the 5th field: this is achieved in the following way:
fpth=$(awk -v outputfile="${newname}.hl7" '
BEGIN{FS="|"; ORS="" }
($1 == "MSH"){ print $5 }
{ print $0 > outputfile }' $edifile)
I have replaced ORS to an empty character set, as it is equivalent to tr -d "\012". The above will work very nicely if you only have a single MSH in your file.

Replace end of line with comma and put parenthesis in sed/awk

I am trying to process the contents of a file from this format:
this1,EUR
that2,USD
other3,GBP
to this format:
this1(EUR),that2(USD),other3(GBP)
The result should be a single line.
As of now I have come up with this circuit of commands that works fine:
cat myfile | sed -e 's/,/\(/g' | sed -e 's/$/\)/g' | tr '\n' , | awk '{print substr($0, 0, length($0)- 1)}'
Is there a simpler way to do the same by just an awk command?

Another awk:
$ awk -F, '{ printf "%s%s(%s)", c, $1, $2; c = ","} END { print ""}' file
1(EUR),2(USD),3(GBP)

Following awk may help you on same.
awk -F, '{val=val?val OFS $1"("$2")":$1"("$2")"} END{print val}' OFS=, Input_file

Toying around with separators and gsub:
$ awk 'BEGIN{RS="";ORS=")\n"}{gsub(/,/,"(");gsub(/\n/,"),")}1' file
this1(EUR),that2(USD),other3(GBP)
Explained:
$ awk '
BEGIN {
RS="" # record ends in an empty line, not newline
ORS=")\n" # the last )
}
{
gsub(/,/,"(") # replace commas with (
gsub(/\n/,"),") # and newlines with ),
}1' file # output

Using paste+sed
$ # paste -s will combine all input lines to single line
$ seq 3 | paste -sd,
1,2,3
$ paste -sd, ip.txt
this1,EUR,that2,USD,other3,GBP
$ # post processing to get desired format
$ paste -sd, ip.txt | sed -E 's/,([^,]*)(,?)/(\1)\2/g'
this1(EUR),that2(USD),other3(GBP)

awk split with asterix

I am trying to split a variable as follows. is there any efficient way to do this preferably using awk.
echo 262146*10,69636*32 |awk -F, 'split($1, DCAP,"\\*") {print DCAP[1]}; split($2, DCAP,"\\*"){print DCAP[1]}'

echo '262146*10,69636*32' | awk -F '[,*]' '{print $1; print $3}'
or
echo '262146*10,69636*32' | awk -F '[,*]' '{printf("%d\n%d\n",$1,$3)}'
Output:
262146
69636

If you have a longer sequence you could try:
echo 262146*10,69636*32,10*3 | awk 'BEGIN {FS="*"; RS=","} {print $1}'

awk variable passed to a field in a file

I have a problem using an awk variable to insert into a field 2 of a file. This is my data:
data='AAA||CCC|DDD|EEE|FFF'
ds='BBB'
I want to insert the value of variable ds in field 2 of my file. So I written some test code to view the behavior in awk:
$ echo "$data" | awk -F'|' -v tmp="$ds" '{print $1}'
AAA <== that works ... now I try to print my variable
$ echo "$data" | awk -F'|' -v tmp="$ds" '{print $1 $tmp}'
AAAAAA||CCC|DDD|EEE|FFF <== output but I was expecting AAA|BBB
I also tried this:
$ echo "$data" | awk -F'|' -v tmp="$ds" '{print $tmp}'
AAA||CCC|DDD|EEE|FFF <== output but I was expecting BBB
What am I doing wrong??

In awk, you don't put $ in front of a variable to get its value like you do in bash. Just put the variable name:
$ echo "$data" | awk -F'|' -v tmp="$ds" '{print tmp}'
BBB
The $ is only used to get the value of a field (as in $1, $2, etc.). When putting $var, the field is based on the numeric value of var. If the value of var is 7, e.g., $var is the value of the 7th field. If var is 0 or some non-numeric value, $var is the same as $0, which is the whole line as you saw in your attempts.

Tab separated values in awk

How do I select the first column from the TAB separated string?
# echo "LOAD_SETTLED LOAD_INIT 2011-01-13 03:50:01" | awk -F'\t' '{print $1}'
The above will return the entire line and not just "LOAD_SETTLED" as expected.
Update:
I need to change the third column in the tab separated values.
The following does not work.
echo $line | awk 'BEGIN { -v var="$mycol_new" FS = "[ \t]+" } ; { print $1 $2 var $4 $5 $6 $7 $8 $9 }' >> /pdump/temp.txt
This however works as expected if the separator is comma instead of tab.
echo $line | awk -v var="$mycol_new" -F'\t' '{print $1 "," $2 "," var "," $4 "," $5 "," $6 "," $7 "," $8 "," $9 "}' >> /pdump/temp.txt

You need to set the OFS variable (output field separator) to be a tab:
echo "$line" |
awk -v var="$mycol_new" -F'\t' 'BEGIN {OFS = FS} {$3 = var; print}'
(make sure you quote the $line variable in the echo statement)

Make sure they're really tabs! In bash, you can insert a tab using C-v TAB
$ echo "LOAD_SETTLED LOAD_INIT 2011-01-13 03:50:01" | awk -F$'\t' '{print $1}'
LOAD_SETTLED

Use:
awk -v FS='\t' -v OFS='\t' ...
Example from one of my scripts.
I use the FS and OFS variables to manipulate BIND zone files, which are tab delimited:
awk -v FS='\t' -v OFS='\t' \
-v record_type=$record_type \
-v hostname=$hostname \
-v ip_address=$ip_address '
$1==hostname && $3==record_type {$4=ip_address}
{print}
' $zone_file > $temp
This is a clean and easy to read way to do this.

You can set the Field Separator:
... | awk 'BEGIN {FS="\t"}; {print $1}'
Excellent read:
https://docs.freebsd.org/info/gawk/gawk.info.Field_Separators.html

echo "LOAD_SETTLED LOAD_INIT 2011-01-13 03:50:01" | awk -v var="test" 'BEGIN { FS = "[ \t]+" } ; { print $1 "\t" var "\t" $3 }'

If your fields are separated by tabs - this works for me in Linux.
awk -F'\t' '{print $1}' < tab_delimited_file.txt
I use this to process data generated by mysql, which generates tab-separated output in batch mode.
From awk man page:
-F fs
--field-separator fs
Use fs for the input field separator (the value of the FS prede‐
fined variable).

1st column only
— awk NF=1 FS='\t'
LOAD_SETTLED
First 3 columns
— awk NF=3 FS='\t' OFS='\t'
LOAD_SETTLED LOAD_INIT 2011-01-13
Except first 2 columns
— {g,n}awk NF=NF OFS= FS='^([^\t]+\t){2}'
— {m}awk NF=NF OFS= FS='^[^\t]+\t[^\t]+\t'
2011-01-13 03:50:01
Last column only
— awk '($!NF=$NF)^_' FS='\t', or
— awk NF=NF OFS= FS='^.*\t'
03:50:01

Should this not work?
echo "LOAD_SETTLED LOAD_INIT 2011-01-13 03:50:01" | awk '{print $1}'

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

multiple field separator in awk - awk

Related

Need to retrieve a value from an HL7 file using awk

Replace end of line with comma and put parenthesis in sed/awk

awk split with asterix

awk variable passed to a field in a file

Tab separated values in awk

Categories

Resources