Print every alternate column in row in a text file - awk

I have a comma separated file. I would like to print every alternate columns into a new row.
Example input file:
Name : John, Age : 30, DOB : 30-Oct-2018
Example output:
Name,Age,DOB
John,30,30-Oct-2018

non-awk solution
$ sed 's/[,:]/\n/g;s/ //g' file | pr -3ts,
Name,Age,DOB
John,30,30-Oct-2018

awk 'BEGIN{FS="[[:blank:]]*[:,][[:blank:]]*"}
{ for(i=1;i<=NF;i+=2) printf (i==1?"":",") $i; print "" }
{ for(i=2;i<=NF;i+=2) printf (i==1?"":",") $i; print "" }' inputfile

Per Your example and output:
$ awk -F', ' '/ : / { for (i=1;i<=NF;i++) { if ( match($i,/ : /) ) { linekeys=linekeys substr($i,1,RSTART-1) ","; linevalues=linevalues substr($i,RSTART+RLENGTH) ","; } } print(substr(linekeys,1,length(linekeys)-1)); print(substr(linevalues,1,length(linevalues)-1)); linekeys=""; linevalues=""; }' file.txt
Name,Age,DOB
John,30,30-Oct-2018

Here's a general idea you could use to implement a solution.
Using awk's split function.
Split the entire line into an array rows with the row delimiter (", "), and save the number of rows.
Split each row into an array cols with the column delimiter (" : "), and save the number of columns. And iterate over the column values and store them into a table, indexed by row and column, e.g. data[row","col].
Finally, iterate over first number of columns and then number of of rows, printing data[row","col].

Related

Sum of specific columns data with based on date using awk

I am having a data which is separated by a comma
LBA0SF004,2018-10-01,4681,4681
LBA0SF004,2018-10-01,919,919
LBA0SF004,2018-10-01,3,3
LBA0SF004,2018-10-01,11453,11453
LBA0SF004,2018-10-02,4681,4681
LBA0SF004,2018-10-02,1052,1052
LBA0SF004,2018-10-02,3,3
LBA0SF004,2018-10-02,8032,8032
I need an awk command to add all 3rd and 4th columns with awk command based on date. If you see the same server with different dates values are available I need data like this
LBA0SF004 2018-10-01 17056 17056
LBA0SF004 2018-10-02 13768 13768
Below GNU AWK construct should be able to do what you are looking for.
awk '
BEGIN {
FS = ","
OFS = " "
}
{
if(NF == 4)
{
a[$1][$2]["3rd"] += $3;
a[$1][$2]["4th"] += $4;
}
}
END {
for (i in a)
{
for (j in a[i])
{
print i, j, a[i][j]["3rd"], a[i][j]["4th"];
}
}
}
' Input_File.txt
Explanation :-
FS is input field Separator which in your case is ,
OFS is output field Separator which is
Create an array a with first column, second column and sum of third and forth columns
At the END, Print the contents of the array

Using awk to sum the values of a column, based on the values of another column, append the sum and percentage to original data

This question is more or less a variant on
https://unix.stackexchange.com/questions/242946/using-awk-to-sum-the-values-of-a-column-based-on-the-values-of-another-column
Same input:
smiths|Login|2
olivert|Login|10
denniss|Payroll|100
smiths|Time|200
smiths|Logout|10
I would like to have the following result:
smiths|Login|2|212
olivert|Login|10|10
denniss|Payroll|100|100
smiths|Time|200|212
smiths|Logout|10|212
Hence, the sum of column 3 for all entries with the same pattern in column 1 should be appended.
In addition, append another column with the percentage, yielding the following result:
smiths|Login|2|212|0.94
olivert|Login|10|10|100
denniss|Payroll|100|100|100
smiths|Time|200|212|94.34
smiths|Logout|10|212|4.72
Here's one that doesn't round the percentages but handles division by zero errors:
Adding to test data a couple of records:
$ cat >> file
test|test|
test2|test2|0
Code:
$ awk '
BEGIN { FS=OFS="|" }
NR==FNR { s[$1]+=$3; next }
{ print $0,s[$1],$3/(s[$1]?s[$1]:1)*100 }
' file file
Output:
smiths|Login|2|212|0.943396
olivert|Login|10|10|100
denniss|Payroll|100|100|100
smiths|Time|200|212|94.3396
smiths|Logout|10|212|4.71698
test|test||0|0
test2|test2|0|0|0
gawk approach:
awk -F'|' '{a[$1]+=$3; b[NR]=$0}END{ for(i in b) {split(b[i], data, FS);
print b[i] FS a[data[1]] FS sprintf("%0.2f", data[3]/a[data[1]]*100) }}' file
The output:
smiths|Login|2|212|0.94
olivert|Login|10|10|100.00
denniss|Payroll|100|100|100.00
smiths|Time|200|212|94.34
smiths|Logout|10|212|4.72

Enumerate lines with same ID in awk

I'm using awk to process the following [sample] of data:
id,desc
168048,Prod_A
217215,Prod_C
217215,Prod_B
168050,Prod_A
168050,Prod_F
168050,Prod_B
What I'm trying to do is to create a column 'item' enumerating the lines within the same 'id':
id,desc,item
168048,Prod_A,#1
217215,Prod_C,#1
217215,Prod_B,#2
168050,Prod_A,#1
168050,Prod_F,#2
168050,Prod_B,#3
Here what I've tried:
BEGIN {
FS = ","
a = 1
}
NR != 1 {
if (id != $1) {
id = $1
printf "%s,%s\n", $0, "#"a
}
else {
printf "%s,%s\n", $0, "#"a++
}
}
But it messes the numbering:
168048,Prod_A,#1
217215,Prod_C,#1
217215,Prod_B,#1
168050,Prod_A,#2
168050,Prod_F,#2
168050,Prod_B,#3
Could someone give me some hints?
P.S. The line order doesn't matter
$ awk -F, 'NR>1{print $0,"#"++c[$1]}' OFS=, file
168048,Prod_A,#1
217215,Prod_C,#1
217215,Prod_B,#2
168050,Prod_A,#1
168050,Prod_F,#2
168050,Prod_B,#3
How it works
-F,
This sets the field separator on input to a comma.
NR>1{...}
This limits the commands in braces to lines other than the first, that is, the one with the header.
print $0,"#"++c[$1]
This prints the line followed by # and a count of the number of times that we have seen the first column.
Associative array c keeps a count of the number of times that an id has been seen. For every line, we increment by 1 the count for id $1. ++ increments. Because ++ precedes c[$1], the increment is done before the value if printed.
OFS=,
This sets the field separator on output to a comma.
Printing a new header as well
$ awk -F, 'NR==1{print $0,"item"} NR>1{print $0,"#"++c[$1]}' OFS=, file
id,desc,item
168048,Prod_A,#1
217215,Prod_C,#1
217215,Prod_B,#2
168050,Prod_A,#1
168050,Prod_F,#2
168050,Prod_B,#3

Awk replace nth Character with blank value

I have the below file with 100s of entries which I want to replace the 46th Character (N) with a blank with an awk command on a unix box. Does anyone know the best way to do this?
TESTENTRY1||||||N|Y|N|OFF||N||||N|L|N|0|N|0|N|N||||A|0||0||N|N|N|Y|N||0|N|N||0|||N|N|N|N|N
TESTENTRY2||||||N|Y|N|OFF||N||||N|L|N|0|N|0|N|N||||A|0||0||N|N|N|Y|N||0|N|N||0|||N|N|N|N|N
So it looks like the below:
TESTENTRY1||||||N|Y|N|OFF||N||||N|L|N|0|N|0|N|N||||A|0||0||N|N|N|Y|N||0|N|N||0|||N||N|N|N
TESTENTRY2||||||N|Y|N|OFF||N||||N|L|N|0|N|0|N|N||||A|0||0||N|N|N|Y|N||0|N|N||0|||N||N|N|N
$ awk 'BEGIN { FS=OFS="|" } { $46 = "" }1' nnn.txt
TESTENTRY1||||||N|Y|N|OFF||N||||N|L|N|0|N|0|N|N||||A|0||0||N|N|N|Y|N||0|N|N||0|||N||N|N|N
TESTENTRY2||||||N|Y|N|OFF||N||||N|L|N|0|N|0|N|N||||A|0||0||N|N|N|Y|N||0|N|N||0|||N||N|N|N
BEGIN { FS=OFS="|" } sets the input and output field separators to the vertical bar before the records are read.
{ $46 = "" } sets the 46th column to be empty in each record.
The trailing 1 prints the resulting record to the output.

awk | Rearrange fields of CSV file on the basis of column value

I need you help in writing awk for the below problem. I have one source file and required output of it.
Source File
a:5,b:1,c:2,session:4,e:8
b:3,a:11,c:5,e:9,session:3,c:3
Output File
session:4,a=5,b=1,c=2
session:3,a=11,b=3,c=5|3
Notes:
Fields are not organised in source file
In Output file: fields are organised in their specific format, for example: all a values are in 2nd column and then b and then c
For value c, in second line, its coming as n number of times, so in output its merged with PIPE symbol.
Please help.
Will work in any modern awk:
$ cat file
a:5,b:1,c:2,session:4,e:8
a:5,c:2,session:4,e:8
b:3,a:11,c:5,e:9,session:3,c:3
$ cat tst.awk
BEGIN{ FS="[,:]"; split("session,a,b,c",order) }
{
split("",val) # or delete(val) in gawk
for (i=1;i<NF;i+=2) {
val[$i] = (val[$i]=="" ? "" : val[$i] "|") $(i+1)
}
for (i=1;i in order;i++) {
name = order[i]
printf "%s%s", (i==1 ? name ":" : "," name "="), val[name]
}
print ""
}
$ awk -f tst.awk file
session:4,a=5,b=1,c=2
session:4,a=5,b=,c=2
session:3,a=11,b=3,c=5|3
If you actually want the e values printed, unlike your posted desired output, just add ,e to the string in the split() in the BEGIN section wherever you'd like those values to appear in the ordered output.
Note that when b was missing from the input on line 2 above, it output a null value as you said you wanted.
Try with:
awk '
BEGIN {
FS = "[,:]"
OFS = ","
}
{
for ( i = 1; i <= NF; i+= 2 ) {
if ( $i == "session" ) { printf "%s:%s", $i, $(i+1); continue }
hash[$i] = hash[$i] (hash[$i] ? "|" : "") $(i+1)
}
asorti( hash, hash_orig )
for ( i = 1; i <= length(hash); i++ ) {
printf ",%s:%s", hash_orig[i], hash[ hash_orig[i] ]
}
printf "\n"
delete hash
delete hash_orig
}
' infile
that splits line with any comma or colon and traverses all odd fields to save either them and its values in a hash to print at the end. It yields:
session:4,a:5,b:1,c:2,e:8
session:3,a:11,b:3,c:5|3,e:9