awk to split variable length record and add unique number on each group of records - awk

i have a file which has variable length columns
x|y|XREC|DELIMITER|ab|cd|ef|IREC|DELIMITER|j|a|CREC|
p|q|IREC|DELIMITER|ww|xx|ZREC|
what i would like is
1|x|y|XREC|
1|ab|cd|ef|IREC|
1|j|a|CREC|
2|p|q|IREC|
2|ww|xx|ZREC|
So far i just managed to get seq number at the beginning
awk '{printf "%d|%s\n", NR, $0}' oldfile > with_seq.txt
Any help?

You could set the delimiter to DELIMITER:
$ awk -F 'DELIMITER[|]' '{for (i=1;i<=NF;i++)print NR"|"$i}' file
1|x|y|XREC|
1|ab|cd|ef|IREC|
1|j|a|CREC|
2|p|q|IREC|
2|ww|xx|ZREC|

Using awk
awk -F "DELIMITER" '{for(i=1;i<=NF;i++)print NR "|" $i}' file|sed 's/||/|/g'
1|x|y|XREC|
1|ab|cd|ef|IREC|
1|j|a|CREC|
2|p|q|IREC|
2|ww|xx|ZREC|

Related

Filtering rows based on column values of csv file

I have a dataset with 1000 rows and 10 columns. Here is the sample dataset
A,B,C,D,E,F,
a,b,c,d,e,f,
g,h,i,j,k,l,
m,n,o,p,q,r,
s,t,u,v,w,x,
From this dataset I want to copy the rows whose has value of column A as 'a' or 'm' to a new csv file. Also I want the header to get copied.
I have tried using awk. It copied all the rows but not the header.
awk '{$1~/a//m/ print}' inputfile.csv > outputfile.csv
How can I copy the header also into the new outputfile.csv?
Thanks in advance.
Considering that your header will be on 1st row, could you please try following.
awk 'BEGIN{FS=OFS=","} FNR==1{print;next} $1 ~ /^a$|^m$/' Input_file > outputfile.csv
OR as per Cyrus sir's comment adding following:
awk 'BEGIN{FS=OFS=","} FNR==1{print;next} $1 ~ /^(a|m)$/' Input_file > outputfile.csv
OR as per Ed sir's comment try following:
awk -F, 'NR==1 || $1~/^[am]$/' Input_file > outputfile.csv
Added corrections in OP's attempt:
Added FS and OFS as , here for all lines since lines are comma delimited.
Added FNR==1 condition which means it is checking 1st line here and printing it simply, since we want to print headers in out file. It will print very first line and then next will skip all further statements from here.
Used a better regex for checking 1st field's condition $1 ~ /^a$|^m$/
This might work for you (GNU sed):
sed '1b;/^[am],/!d' oldFile >newFile
Always print the first line and delete any other line that does not beging a, or m,.
Alternative:
awk 'NR==1 || /^[am],/' oldFile >newFile
With awk. Set field separator (FS) to , and output current row if it's first row or if its first column contains a or m.
awk 'NR==1 || $1=="a" || $1=="m"' FS=',' in.csv >out.csv
Output to out.csv:
A,B,C,D,E,F,
a,b,c,d,e,f,
m,n,o,p,q,r,
$ awk -F, 'BEGIN{split("a,m",tmp); for (i in tmp) tgts[tmp[i]]} NR==1 || $1 in tgts' file
A,B,C,D,E,F,
a,b,c,d,e,f,
m,n,o,p,q,r,
It appears that awk's default delimiter is whitespace. Link
Changing the delimiter can be denoted by using the FS variable:
awk 'BEGIN { FS = "," } ; { print $2 }'

Its possible sort by multiple columns in awk?

Sort By Third Column And Fourth Column
Input
B,c,3,
G,h,2,
J,k,4,
M,n,,1
Output
M,n,,1
G,h,2,
B,c,3,
J,k,4,
Please help me
UPDATED
awk -F, 'a[$3]<$4{a[$3]=$4;b[$3]=$0}END{for(l in a){print b[l]","l} }' FILE2
i use this command and i obtains
this
M,n,,1,
,2
,3
,4
sort is better choice than awk here
$ sort -t, -k3,3n -k4,4n ip.txt
M,n,,1
G,h,2,
B,c,3,
J,k,4,
-t, use , as delimiter
-k3,3n sort by 3rd column, numerically
-k4,4n then sort by 4th column, numerically
awk to the rescue!
$ awk -F, '{a[$3,$4,NR]=$0}
END {n=asorti(a,ix);
for(k=1;k<=n;k++) print a[ix[k]]}' file
M,n,,1
G,h,2,
B,c,3,
J,k,4,
note that key is constructed in a such a way to handle duplicate rows
if you don't have asorti here's a workaround
$ awk -F, '{a[$0]=$3 FS $4 FS NR RS $0}
END {n=asort(a);
for(k=1;k<=n;k++)
{sub(".*"RS,"",a[k]);
print a[k]}}' file
I used RS as a secondary delimiter to keep the line separate from the sort key. Note duplicate rows will be counted as one (duplicate keys are fine). If you want to support duplicate rows change to a[$0,NR]
I use this command and it works:
nawk -F, '{a[$3]<$4;a[$3]=$4;b[$3]=$0} END{for(i in a){print b[i]}}' FILE2

awk ternay operator, count fs with ,

How to make this command line:
awk -F "," '{NF>0?$NF:$0}'
to print the last field of a line if NF>0, otherwise print the whole line?
Working data
bogota
dept math, bogota
awk -F, '{ print ( NF ? $NF : $0 ) }' file
Actually, you don't need ternary operator for this, but use :
awk -F, '{print $NF}' file
This will print the last field, i.e, if there are more than 1 field, it will print the last field, if line has only one field, it will print the same.

How to remove field separators in awk when printing $0?

eg, each row of the file is like :
1, 2, 3, 4,..., 1000
How can print out
1 2 3 4 ... 1000
?
If you just want to delete the commas, you can use tr:
$ tr -d ',' <file
1 2 3 4 1000
If it is something more general, you can set FS and OFS (read about FS and OFS) in your begin block:
awk 'BEGIN{FS=","; OFS=""} ...' file
You need to set OFS (the output field separator). Unfortunately, this has no effect unless you also modify the string, leading the rather cryptic:
awk '{$1=$1}1' FS=, OFS=
Although, if you are happy with some additional space being added, you can leave OFS at its default value (a single space), and do:
awk -F, '{$1=$1}1'
and if you don't mind omitting blank lines in the output, you can simplify further to:
awk -F, '$1=$1'
You could also remove the field separators:
awk -F, '{gsub(FS,"")} 1'
Set FS to the input field separators. Assigning to $1 will then reformat the field using the output field separator, which defaults to space:
awk -F',\s*' '{$1 = $1; print}'
See the GNU Awk Manual for an explanation of $1 = $1

Delete lines from file -- awk

I have a file file.dat containing numbers, for example
4
6
7
I would like to use the numbers of this file to delete lines of another file.
Is there any way to pass this numbers as parameters to awk and delete these lines of another file?
I have this awk solution, but do not like it too much...
awk 'BEGIN { while( (getline x < "./file.dat" ) > 0 ) a[x]=0; } NR in a { next; }1' /path/to/another/file
Can you suggest something more elegant?
using NR==FNR to test which file awk is reading:
$ awk '{if(NR==FNR)idx[$0];else if(!(FNR in idx))print}' idx.txt data.txt
Or
$ awk 'NR==FNR{idx[$0]; next}; !(FNR in idx)' idx.txt data.txt
put index in idx.txt
put data in data.txt
I would use sed instead of awk:
$ sed $(for i in $(<idx.txt);do echo " -e ${i}d";done) file.txt