Add delimiters at end of each line - awk

I've a csv file like below.
id,id1,id2,id3,id4,id5
1,101,102,103,104
2,201,202,203
3,301,302
Now what i want to add comma(,) to each line to make all line with same number of delimiters. So desired output should be.
id,id1,id2,id3,id4,id5
1,101,102,103,104,
2,201,202,203,,
3,301,302,,,
Using
awk -F "," ' { print NF-1 } ' file.csv | sort -r | head -1
I am able to find the max occurance of delimiter but not sure how to compare each line and append comma if its less than max.

With GNU awk (as I do not know if this works for other implementations)
$ # simply assign value to NF
$ awk -F, -v OFS=',' '{NF=6} 1' ip.txt
id,id1,id2,id3,id4,id5
1,101,102,103,104,
2,201,202,203,,
3,301,302,,,
If first line determines number of fields required:
$ awk -F, -v OFS=',' 'NR==1{f=NF} {NF=f} 1' ip.txt
id,id1,id2,id3,id4,id5
1,101,102,103,104,
2,201,202,203,,
3,301,302,,,
If any line determines max field:
$ cat ip.txt
id,id1,id2
1,101,102,103
2,201,202,203,204
3,301,302
$ awk -F, -v OFS=',' 'NR==FNR{f=(!f || NF>f) ? NF : f; next} {NF=f} 1' ip.txt ip.txt
id,id1,id2,,
1,101,102,103,
2,201,202,203,204
3,301,302,,

awk -F"," '{i=NF;c="";while (i++ < 6) {c=c","};print $0""c}' file
Output:
id,id1,id2,id3,id4,id5
1,101,102,103,104,
2,201,202,203,,
3,301,302,,,

You are already using the variable NF which indicates how many fields there are on a line.
awk -F , 'NF<6 { OFS=FS; for (i=NF+1; i<=6; i++) $i="" }1' filename
We start looping at the first undefined field and set it to an empty string, until we have six fields. Then the 1 at the end takes care of printing the now fully populated line. The OFS=FS is necessary to make the output field separator also be a comma (it is a space by default).

Following awk may also help you on same.
awk -F, '
FNR==1{
val=NF;
print;
next
}
{
count=NF;
while(count<val){
value=value",";
count++};
print $0 value;
value=count=""
}
' Input_file
Output will be as follows:
id,id1,id2,id3,id4,id5
1,101,102,103,104,
2,201,202,203,,
3,301,302,,,

Unified awk approach (based on number of fields of the 1st header line):
awk -F',' 'NR==1{ max_nf=NF; print }
NR>1{ printf "%s%.*s\n", $0, max_nf-NF, ",,,,,,,,," }' file
The output:
id,id1,id2,id3,id4,id5
1,101,102,103,104,
2,201,202,203,,
3,301,302,,,
Or via loop:
awk -F',' 'NR==1{ max_nf=NF; print }
NR>1{ n=max_nf-NF; r=""; while (n--) r=r","; print $0 r }' file

Related

Sed/awk for String to integer conversion of a csv column in shell

I need 7th column of a csv file to be converted from float to decimal. It's a huge file and I don't want to use while read for conversion. Any shortcuts with awk?
Input:
"xx","x","xxxxxx","xxx","xx","xx"," 00000001.0000"
"xx","x","xxxxxx","xxx","xx","xx"," 00000002.0000"
"xx","x","xxxxxx","xxx","xx","xx"," 00000005.0000"
"xx","x","xxxxxx","xxx","xx","xx"," 00000011.0000"
Output:
"xx","x","xxxxxx","xxx","xx","xx","1"
"xx","x","xxxxxx","xxx","xx","xx","2"
"xx","x","xxxxxx","xxx","xx","xx","5"
"xx","x","xxxxxx","xxx","xx","xx","11"
Tried these, worked. But anything simpler ?
awk 'BEGIN {FS=OFS="\",\""} {$7 = sprintf("%.0f", $7)} 1' $test > $test1
awk '{printf("%s\"\n", $0)}' $test1
With your shown samples, please try following awk program.
awk -v s1="\"" -v OFS="," '{$NF = s1 ($NF + 0) s1} 1' Input_file
Explanation: Simple explanation would be, setting OFS as , then in main program; in each line's last field keeping only digits and covering last field with ", re-shuffle the fields and printing edited/non-edited all lines.
Another simple awk solution:
awk 'BEGIN {FS=OFS="\",\""} {$NF = $NF+0 "\""} 1' file
"xx","x","xxxxxx","xxx","xx","xx","1"
"xx","x","xxxxxx","xxx","xx","xx","2"
"xx","x","xxxxxx","xxx","xx","xx","5"
"xx","x","xxxxxx","xxx","xx","xx","11"
awk 'BEGIN{FS=OFS=","} {gsub(/"/, "", $7); $7="\"" $7+0 "\""; print}' file
Output:
"xx","x","xxxxxx","xxx","xx","xx","1"
"xx","x","xxxxxx","xxx","xx","xx","2"
"xx","x","xxxxxx","xxx","xx","xx","5"
"xx","x","xxxxxx","xxx","xx","xx","11"
gsub(/"/, "", $7): removes all " from $7
$7+0: Reduces the number in $7 to minimal representation

Understanding how OFS works in AWK

This is a follow-up to my question to understand more about the OFS in AWK.
My understanding is, set it once in the beginning and it will be used in "print" to separate the fields. However, it didn't work as expected, as explained in my original question.
My File: someone.txt
LN_A,FN_A<aa#xyz.com>;
LN_B,FN_B<bb#xyz.com>;
Expected output:
FN_A,LN_A,aa
FN_B,LN_B,bb
I have tried the following:
awk -F'[,<#]' -v OFS=',' '{print $2 $1 $3}' someone.txt
awk -F'[,<#]' -v OFS=',' 'NF=3 {print $2 $1 $3}' someone.txt
awk -F'[,<#]' -v OFS=',' 'NF=3; {print $2 $1 $3}' someone.txt
awk -F'[,<#]' -v OFS=',' '{$1=$1} {print $2 $1 $3}' someone.txt
awk -F'[,<#]' -v OFS=',' '{$1=$1} {print $0}' someone.txt
Finally, I managed to get the required output with the following:
awk -F'[,<#]' '{print $2 "," $1 "," $3}' someone.txt
Consider these cases:
a) $ echo '1 2 3' | awk '{print}'
1 2 3
b) $ echo '1 2 3' | awk '{print $1, $2, $3}'
1 2 3
c) $ echo '1 2 3' | awk -v OFS=',' '{print}'
1 2 3
d) $ echo '1 2 3' | awk -v OFS=',' '{print $1, $2, $3}'
1,2,3
e) $ echo '1 2 3' | awk -v OFS=',' '{$1=$1; print}'
1,2,3
The above show OFS being used in "b" and "d" (when individual fields are being printed in a comma-separated list) and in "e" (when the record $0 is being reconstructed as a result of a value being assigned to a field before the record is printed).
Those are the only 2 times when OFS is used implicitly - when printing a comma-separated list of values and when reconstructing the record.
When you print the record (e.g. by print or print $0) as in "a" and "c" above or print any other string you are not using OFS. OFS may have been used earlier to reconstruct the record as in "e" above but the act of printing anything that's not a comma-separated list is not using OFS, it's just printing any old string which just happens to be $0 in this case.
Note:
Explicitly changing a field reconstructs $0 from the existing fields using OFS between the fields, it does not resplit $0 into fields again so FS is not used in this process. So $1=$1 or sub(/1/,2,$1) uses OFS but not FS.
Explicitly changing $0 (i.e. not implicitly as a result of 1 above) resplits $0 into fields using FS as the separator, it does not use OFS in any way. So $0=$0 or sub(/1/,2) uses FS but not OFS.
Understanding how FS and OFS work together and how they effect assignments to fields and $0 is very important. If you can explain this behavior then you've got it:
f) $ echo 'a b' | awk -v OFS=',' '{print NF, $0, $1, $2}'
2,a b,a,b
g) $ echo 'a b' | awk -v OFS=',' '{$1=$1; print NF, $0, $1, $2}'
2,a,b,a,b
h) $ echo 'a b' | awk -v OFS=',' '{$1=$1; $0=$0; print NF, $0, $1, $2}'
1,a,b,a,b,
i) $ echo 'a b' | awk -v OFS=',' '{$1=$1; $0=$0; FS=OFS; print NF, $0, $1, $2}'
1,a,b,a,b,
j) $ echo 'a b' | awk -v OFS=',' '{$1=$1; $0=$0; FS=OFS; $1=$1; print NF, $0, $1, $2}'
1,a,b,a,b,
k) $ echo 'a b' | awk -v OFS=',' '{$1=$1; $0=$0; FS=OFS; $1=$1; $0=$0; print NF, $0, $1, $2}'
2,a,b,a,b
If not then feel free to ask questions.
It is simple, you have set the OFS="," in beginning of your awk statement but you are simply printing the fields(NOTE: without editing the line OR without mentioning field separator(using comma etc)) in that case OFS will not come in picture that is why your output is NOT having anything like separator.
awk -F'[,<#]' -v OFS=',' '{print $2,$1,$3}' Input_fie
If you use above command where I have mentioned , between printing fields you will see you are getting OFS now and this is how it works.
Or in case you want to see use of OFS you could use this(though above solution is BEST one but for your understanding I am adding this one too).
awk -F'[,<#]' -v OFS=',' '{$0=$2 OFS $1 OFS $3} 1' Input_file
Example to understand OFS by printing whole line(s): Let us understand it more clearly by printing whole line with OFS and withoutOFS` effect.
Let us run this code:
awk -F'[,<#]' -v OFS=',' 'FNR==1{$1=$1} 1' Input_file
What it does is when line number 1 is there then I am resetting $1's value as mentioned above to let OFS come into picture so that new value of OFS comes(off course wherever field separator was picked it will place OFS value there). So it will only be done for first line and REST of the lines nothing should happen. Let us see what output comes now?
LN_A,FN_A,aa,xyz.com>;
LN_B,FN_B<bb#xyz.com>;
You see the difference? See first line is having , in output and 2nd line is printing as it is, why because in only 1st line we have edited the first field so OFS came into picture.
As I just found an unused copy of Aho, Kernighan, Weinberger: The AWK Programming language from 1988, I(t)'ll take you to the source (pages 35-36):
"Field Variables. The fields of the current input line are called $1, $2,
through $NF; $0 refers to the whole line. Fields share the properties of other
variables — they may be used in arithmetic or string operations, and may be
assigned to. - -
One can assign a new string to a field:
BEGIN { FS = OFS = "\t" }
$4 == "North America" { $4 = "NA" }
$4 == "South America" { $4 = "SA" }
{ print }
In this program, the BEGIN action sets FS, the variable that controls the input
field separator, and OFS, the output field separator, both to a tab. The print
statement in the fourth line prints the value of $0 after it has been modified by
previous assignments. This is important: when $0 is changed by assignment or
substitution, $1, $2, etc., and NF will be recomputed; likewise, when one of $1, $2, etc., is changed, $0 is reconstructed using OFS to separate fields."

Need to retrieve a value from an HL7 file using awk

In a Linux script program, I've got the following awk command for other purposes and to rename the file.
cat $edifile | awk -F\| '
{ OFS = "|"
print $0
} ' | tr -d "\012" > $newname.hl7
While this is happening, I'd like to grab the 5th field of the MSH segment and save it for later use in the script. Is this possible?
If no, how could I do it later or earlier on?
Example of the segment.
MSH|^~\&|business1|business2|/u/tmp/TR0049-GE-1.b64|routing|201811302126||ORU^R01|20181130212105810|D|2.3
What I want to do is retrieve the path and file name in MSH 5 and concatenate it to the end of the new file.
I've used this to capture the data but no luck. If fpth is getting set, there is no evidence of it and I don't have the right syntax for an echo within the awk phrase.
cat $edifile | awk -F\| '
{ OFS = "|"
{fpth=$(5)}
print $0
} ' | tr -d "\012" > $newname.hl7
any suggestions?
Thank you!
Try
filename=`awk -F'|' '{print $5}' $edifile | head -1`
You can skip the piping through head if the file is a single line
First of all, it must be mentioned that the awk line in your first piece of code, has zero use:
$ cat $edifile | awk -F\| ' { OFS = "|"; print $0 }' | tr -d "\012" > $newname.hl7
This is totally equivalent to
$ cat $edifile | tr -d "\012" > $newname.hl7
because OFS is only used to redefine $0 if you redefine a field.
Example:
$ echo "a|b|c" | awk -F\| '{OFS="/"; print $0}'
a|b|c
$ echo "a|b|c" | awk -F\| '{OFS="/"; $1=$1; print $0}'
a/b/c
I understand that you have a hl7 file in which you have a single line starting with the string "MSH". From this line you want to store the 5th field: this is achieved in the following way:
fpth=$(awk -v outputfile="${newname}.hl7" '
BEGIN{FS="|"; ORS="" }
($1 == "MSH"){ print $5 }
{ print $0 > outputfile }' $edifile)
I have replaced ORS to an empty character set, as it is equivalent to tr -d "\012". The above will work very nicely if you only have a single MSH in your file.

Replace end of line with comma and put parenthesis in sed/awk

I am trying to process the contents of a file from this format:
this1,EUR
that2,USD
other3,GBP
to this format:
this1(EUR),that2(USD),other3(GBP)
The result should be a single line.
As of now I have come up with this circuit of commands that works fine:
cat myfile | sed -e 's/,/\(/g' | sed -e 's/$/\)/g' | tr '\n' , | awk '{print substr($0, 0, length($0)- 1)}'
Is there a simpler way to do the same by just an awk command?
Another awk:
$ awk -F, '{ printf "%s%s(%s)", c, $1, $2; c = ","} END { print ""}' file
1(EUR),2(USD),3(GBP)
Following awk may help you on same.
awk -F, '{val=val?val OFS $1"("$2")":$1"("$2")"} END{print val}' OFS=, Input_file
Toying around with separators and gsub:
$ awk 'BEGIN{RS="";ORS=")\n"}{gsub(/,/,"(");gsub(/\n/,"),")}1' file
this1(EUR),that2(USD),other3(GBP)
Explained:
$ awk '
BEGIN {
RS="" # record ends in an empty line, not newline
ORS=")\n" # the last )
}
{
gsub(/,/,"(") # replace commas with (
gsub(/\n/,"),") # and newlines with ),
}1' file # output
Using paste+sed
$ # paste -s will combine all input lines to single line
$ seq 3 | paste -sd,
1,2,3
$ paste -sd, ip.txt
this1,EUR,that2,USD,other3,GBP
$ # post processing to get desired format
$ paste -sd, ip.txt | sed -E 's/,([^,]*)(,?)/(\1)\2/g'
this1(EUR),that2(USD),other3(GBP)

Tab separated values in awk

How do I select the first column from the TAB separated string?
# echo "LOAD_SETTLED LOAD_INIT 2011-01-13 03:50:01" | awk -F'\t' '{print $1}'
The above will return the entire line and not just "LOAD_SETTLED" as expected.
Update:
I need to change the third column in the tab separated values.
The following does not work.
echo $line | awk 'BEGIN { -v var="$mycol_new" FS = "[ \t]+" } ; { print $1 $2 var $4 $5 $6 $7 $8 $9 }' >> /pdump/temp.txt
This however works as expected if the separator is comma instead of tab.
echo $line | awk -v var="$mycol_new" -F'\t' '{print $1 "," $2 "," var "," $4 "," $5 "," $6 "," $7 "," $8 "," $9 "}' >> /pdump/temp.txt
You need to set the OFS variable (output field separator) to be a tab:
echo "$line" |
awk -v var="$mycol_new" -F'\t' 'BEGIN {OFS = FS} {$3 = var; print}'
(make sure you quote the $line variable in the echo statement)
Make sure they're really tabs! In bash, you can insert a tab using C-v TAB
$ echo "LOAD_SETTLED LOAD_INIT 2011-01-13 03:50:01" | awk -F$'\t' '{print $1}'
LOAD_SETTLED
Use:
awk -v FS='\t' -v OFS='\t' ...
Example from one of my scripts.
I use the FS and OFS variables to manipulate BIND zone files, which are tab delimited:
awk -v FS='\t' -v OFS='\t' \
-v record_type=$record_type \
-v hostname=$hostname \
-v ip_address=$ip_address '
$1==hostname && $3==record_type {$4=ip_address}
{print}
' $zone_file > $temp
This is a clean and easy to read way to do this.
You can set the Field Separator:
... | awk 'BEGIN {FS="\t"}; {print $1}'
Excellent read:
https://docs.freebsd.org/info/gawk/gawk.info.Field_Separators.html
echo "LOAD_SETTLED LOAD_INIT 2011-01-13 03:50:01" | awk -v var="test" 'BEGIN { FS = "[ \t]+" } ; { print $1 "\t" var "\t" $3 }'
If your fields are separated by tabs - this works for me in Linux.
awk -F'\t' '{print $1}' < tab_delimited_file.txt
I use this to process data generated by mysql, which generates tab-separated output in batch mode.
From awk man page:
-F fs
--field-separator fs
Use fs for the input field separator (the value of the FS prede‐
fined variable).
1st column only
— awk NF=1 FS='\t'
LOAD_SETTLED
First 3 columns
— awk NF=3 FS='\t' OFS='\t'
LOAD_SETTLED LOAD_INIT 2011-01-13
Except first 2 columns
— {g,n}awk NF=NF OFS= FS='^([^\t]+\t){2}'
— {m}awk NF=NF OFS= FS='^[^\t]+\t[^\t]+\t'
2011-01-13 03:50:01
Last column only
— awk '($!NF=$NF)^_' FS='\t', or
— awk NF=NF OFS= FS='^.*\t'
03:50:01
Should this not work?
echo "LOAD_SETTLED LOAD_INIT 2011-01-13 03:50:01" | awk '{print $1}'