awk Print Skipping a field - awk

In the case where type is "" print the 3rd field out of sequence and then print the whole line with the exception of the 3rd field.
Given a tab separated line a b c d e the idea is to print ab<tab>c<tab>a<tab>b<tab>d<tab>e
Setting $3="" seems to cause the subsequent print statement to lose the tab field separators and so is no good.
# $1 = year $2 = movie
BEGIN {FS = "\t"}
type=="" {printf "%s\t%s\t", $2 $1,$3; $3=""; print}
type!="" {printf "%s\t<%s>\t", $2 $1,type; print}
END {print ""}
Sticking in a for loop which I like a lot less as a solution results in a blank file.
# $1 = year $2 = movie
BEGIN {FS = "\t"}
type=="" {printf "%s\t%s\t%s\t%s\t", $2 $1,$3,$1,$2; for (i=4; i<=NF;i++) printf "%s\t",$i}
type!="" {printf "%s\t<%s>\t", $2 $1,type; print}
END {print ""}

You need to set the OFS to a tab instead of it's default single blank char and you don't want to just set $3 to a bank char as then you'll get 2 tabs between $2 and $4.
$ cat tst.awk
BEGIN {FS = OFS = "\t"}
{
if (type == "") {
val = $3
for (i=3; i<NF; i++) {
$i = $(i+1)
}
NF--
}
else {
val = "<" type ">"
}
print $2 $1, val, $0
}
$
$ awk -f tst.awk file | tr '\t' '-'
ba-c-a-b-d-e
$
$ awk -v type="foo" -f tst.awk file | tr '\t' '-'
ba-<foo>-a-b-c-d-e
The |tr '\t' '-' is obviously just added to make visible where the tabs are.
If decrementing NF doesn't work in your awk to delete the last field in the record, replace it with sub(/\t[^\t]+$/,"").

One way
awk '{$3=""}1' OFS="\t" infile|column -t
explanation
{$3=""} set column to nil
1 same as print, print the line.
OFS="\t"set Output Field Separator Variable to tab, maybe you needn't it, next commandcolumn -t` make the format again.
column -t columnate lists with tabs.

Related

AWK: How to number auto-increment?

I have a file.file content is:
20210126000880000003|3|33.00|20210126|15:30
1|20210126000000000000000000002207|1220210126080109|1000|100000000000000319|100058110000000325|402041000012|402041000012|PT07|621067000000123645|收款方户名|2021-01-26|2021-01-26|10.00|TN|NCS|12|875466
2|20210126000000000000000000002208|1220210126080110|1000|100000000000000319|100058110000000325|402041000012|402041000012|PT06|621067000000123645|收款方户名|2021-01-26|2021-01-26|20.00|TN|NCS|12|875466
3|20210126000000000000000000002209|1220210126080111|1000|100000000000000319|100058110000000325|402041000012|402041000012|PT08|621067000000123645|收款方户名|2021-01-26|2021-01-26|3.00|TN|NCS|12|875466
I use awk command:
awk -F"|" 'NR==1{print $1};FNR==2{print $2,$3}' testfile
Get the following result:
20210126000880000003
20210126000000000000000000002207 1220210126080109
I want the number to auto-increase:
awk -F"|" 'NR==1{print $1+1};FNR==2{print $2+1,$3+1}' testfile
But get follow result:
20210126000880001024
20210126000000000944237587726336 1220210126080110
have question:
I want to the numer is auto-increase: hope the result is:
20210126000880000003
20210126000000000000000000002207|1220210126080109
-------------------------------------------------
20210126000880000004
20210126000000000000000000002208|1220210126080110
--------------------------------------------------
20210126000880000005
20210126000000000000000000002209|1220210126080111
How to auto_increase?
Thanks!
You may try this gnu awk command:
awk -M 'BEGIN {FS=OFS="|"} NR == 1 {hdr = $1; next} NF>2 {print ++hdr; print $2, $3; print "-------------------"}' file
20210126000880000004
20210126000000000000000000002207|1220210126080109
-------------------
20210126000880000005
20210126000000000000000000002208|1220210126080110
-------------------
20210126000880000006
20210126000000000000000000002209|1220210126080111
-------------------
A more readable version:
awk -M 'BEGIN {
FS=OFS="|"
}
NR == 1 {
hdr = $1
next
}
NF > 2 {
print ++hdr
print $2, $3
print "-------------------"
}' file
Here is a POSIX awk solution that doesn't need -M:
awk 'BEGIN {FS=OFS="|"} NR == 1 {hdr = $1; next} NF>2 {"echo " hdr " + 1 | bc" | getline hdr; print hdr; print $2, $3; print "-------------------"}' file
20210126000880000004
20210126000000000000000000002207|1220210126080109
-------------------
20210126000880000005
20210126000000000000000000002208|1220210126080110
-------------------
20210126000880000006
20210126000000000000000000002209|1220210126080111
-------------------
Anubhava has the best solution but for older versions of GNU awk that don't support -M (big numbers) you can try the following:
awk -F\| 'NR==1 { print $1;hed=$1;hed1=substr($1,(length($1)-1));next; } !/^$/ {print $2" "$3 } /^$/ { print "--------------------------------------------------";printf "%s%s\n",substr(hed,1,((length(hed))-(length(hed1)+1))),++hed1 }' testfile
Explanation:
awk -F\| 'NR==1 { # Set field delimiter to | and process the first line
print $1; # Print the first field
hed=$1; # Set the variable hed to the first field
hed1=substr($1,(length($1)-1)); # Set a counter variable hed1 to the last digit in hed ($1)
next;
}
!/^$/ {
print $2" "$3 # Where there is no blank line, print the second field, a space and the third field
}
/^$/ {
print "--------------------------------------------------"; # Where there is a blank field, process
printf "%s%s\n",substr(hed,1,((length(hed))-(length(hed1)+1))),++hed1 # print the header extract before the counter, followed by the incremented counter
}' testfile

Merge lines based on first column without delimiter

I need to merge all the lines that have the same value on the first column.
The input file is the following:
34600000031|(1|1|0|1|1|20190114180000|20191027185959)
34600000031|(2|2|0|2|2|20190114180000|20191027185959)
34600000031|(3|3|0|3|3|20190114180000|20191027185959)
34600000031|(4|4|0|4|4|20190114180000|20191027185959)
34600000015|(1|1|100|1|8|20190114180000|20191027185959)
34600000015|(2|2|100|2|9|20190114180000|20191027185959)
34600000015|(3|3|100|3|10|20190114180000|20191027185959)
34600000015|(4|4|100|4|11|20190114180000|20191027185959)
I was able to partially achieve it using the following:
awk -F'|' '$1!=p{if(p)print s; p=$1; s=$0; next}{sub(p,x); s=s $0} END{print s}' INPUT
The output is the following:
34600000031|(1|1|0|1|1|20190114180000|20191027185959)|(2|2|0|2|2|20190114180000|20191027185959)|(3|3|0|3|3|20190114180000|20191027185959)|(4|4|0|4|4|20190114180000|20191027185959)
34600000015|(1|1|100|1|8|20190114180000|20191027185959)|(2|2|100|2|9|20190114180000|20191027185959)|(3|3|100|3|10|20190114180000|20191027185959)|(4|4|100|4|11|20190114180000|20191027185959)
What I need (and i cannot find how) is the following:
34600000031|(1|1|0|1|1|20190114180000|20191027185959)(2|2|0|2|2|20190114180000|20191027185959)(3|3|0|3|3|20190114180000|20191027185959)(4|4|0|4|4|20190114180000|20191027185959)
34600000015|(1|1|100|1|8|20190114180000|20191027185959)(2|2|100|2|9|20190114180000|20191027185959)(3|3|100|3|10|20190114180000|20191027185959)(4|4|100|4|11|20190114180000|20191027185959)
I could do a sed after the initial awk but I don't believe that this is the proper way to do it.
You need to substitute the separator in the values too. Your fixes awk would look like this:
awk -F'|' '$1!=p{if(p)print s; p=$1; s=$0; next}{sub(p "\\|",x); s=s $0} END{print s}'
but it's also good to match beginning of the string:
awk -F'|' '$1!=p{if(p)print s; p=$1; s=$0; next}{sub("^" p "\\|",x); s=s $0} END{print s}'
I would do it somewhat simpler, which uses more memory (as it stores everything in an array) but doesn't need the file to be sorted:
awk -F'|' '{ k=$1; sub("^" $1 "\\|", ""); a[k] = a[k] $0 } END{ for (i in a) print i "|" a[i] }'
For each line, remember the first field, substitute the first field with | for nothing, then add it to an array indexed by the first field. On the end, print each element in the array with the key, separator and value.
$ awk -F'|' '
{
curr = $1
sub(/^[^|]+\|/,"")
printf "%s%s", (curr==prev ? "" : ors curr FS), $0
ors = ORS
prev = curr
}
END { print "" }
' file
34600000031|(1|1|0|1|1|20190114180000|20191027185959)(2|2|0|2|2|20190114180000|20191027185959)(3|3|0|3|3|20190114180000|20191027185959)(4|4|0|4|4|20190114180000|20191027185959)
34600000015|(1|1|100|1|8|20190114180000|20191027185959)(2|2|100|2|9|20190114180000|20191027185959)(3|3|100|3|10|20190114180000|20191027185959)(4|4|100|4|11|20190114180000|20191027185959)

awk - remove new line after printing all columns

i am running a following awk script
awk 'BEGIN { FS="|" ; OFS="|" }; { printf $0, $1 "_" $2 }' .someFile
unfortunatley the concatention of fields 1 and 2 is printed on new line, looks like the last field contains a new line character
how can i trim it ?
If you want to use printf (which may have been accidental), I think you can use this:
awk 'BEGIN { FS = OFS = "|" } { printf "%s%s%s_%s", $0, OFS, $1, $2 }' .someFile
printf should always be used with a format string. printf doesn't add the Output Record Separator to the end of what it prints, so you have to do that yourself using \n in the format string or by adding %s and passing ORS as the last argument to printf.
In this case, I think you can just use print though:
awk 'BEGIN { FS = OFS = "|" } { print $0, $1 "_" $2 }' .someFile

awk to count of occurrences then split into two file

Would like to count number of occurences based on $2 field then split the input file into two output files ,
if the $2 field occurances more than 3 times then those lines re-dirceted into OpFile11.txt else re-directed into OpFile22.txt
Input.csv
Des1,Location,Decs2
aaa,a123,xxx
bbb,b789,yyy
xxx,a123,aaa
aaa,a123,xxx
bbb,b789,yyy
ccc,c567,zzz
xxx,a123,aaa
ddd,d456,ddd
OpFile11.txt
aaa,a123,xxx
xxx,a123,aaa
aaa,a123,xxx
xxx,a123,aaa
OpFile22.txt
bbb,b789,yyy
bbb,b789,yyy
ccc,c567,zzz
ddd,d456,ddd
Step#1 : Counting number of occurence:
awk -F, '{key=$2;++a[key]} END {for(i in a) print i","a[i]}' Input.csv
d456,1
b789,2
c567,1
a123,4
Step#2 : Spliting the input file into two parts:
awk ' BEGIN {FS = OFS = ","} FNR==NR {a[$1]=$0;next} ($2 in a) { print $0 }' OccurGR3.csv Input.csv > OpFile11.txt
awk ' BEGIN {FS = OFS = ","} FNR==NR {a[$1]=$0;next} !($2 in a) { print $0 }' OccurGR3.csv Input.csv > OpFile22.txt
where OccurGR3.csv
a123,4
Please suggest to avoid three steps , looking for your suggestions !!!
awk -F, '
NR==FNR { cnt[$2]++; next }
{ print > ( "OpFile" (cnt[$2]<3?22:11) ".txt" ) }
' Input.csv Input.csv

Extra newline coming from somewhere

Can someone explain what I'm doing wrong and how to do it better.
I have a file consisting of records with field separator "-" and record separator "\t" (tab). I want to put each record on a line, followed by the line number, separated by a tab. The input file is called foo.txt.
$ cat foo.txt
a-b-c e-f-g x-y-z
$ < foo.txt tr -cd "\t" | wc -c
2
$ wc foo.txt
1 3 18 foo.txt
My awk script is in the file foo.awk
BEGIN { RS = "\t" ; FS = "-" ; OFS = "\t" }
{
print $1 "-" $2 "-" $3, NR
}
And here is what I get when I run it:
$ gawk -f foo.awk foo.txt
a-b-c 1
e-f-g 2
x-y-z
3
The last record is directly followed by a newline, a tab, and the last number. What is going on?
well I don't know your exact goal, but since you have built the thing with awk, you can just add \n to FS to reach your goal to remove the trailing \n and without starting another process, like tr, sed or awk
BEGIN { RS = "\t" ; FS = "-|\n" ; OFS = "\t" }
There is an newline character at the end of your data that is also output when printing $3.
In particular, it looks like this:
$1 = "x"
$2 = "y"
$3 = "z\n"
You can remove the trailing separator with tr before passing everything to awk:
tr -d '\n' < foo.txt | awk -f foo.awk
or alternatively add \n to the list of field separators (as shown in the answer by Kent), since awk will strip any separators from the fields.
awk 'BEGIN { RS = "\t"; FS = OFS = "-" } { sub(/\n/, ""); print $0 "\t" NR }' file
Output:
a-b-c 1
e-f-g 2
x-y-z 3
ORS = "\n" was not necessary.
And with GNU Awk or Mawk, you can just have RS = "[\t\n]+":
awk 'BEGIN { RS = "[\t\n]+"; FS = OFS = "-" } { print $0 "\t" NR }' file