MDF to VTK Converting using AWK - awk

I am a beginner so sorry if this has been covered before,but I can't seem to find exactly what I need to solve my problem.
I am trying to write an AWK "script" that can convert an MDF(Mesh Definition File) as input into a (VALID) VTK file as output.
I have a sample MDF file that looks like this :
TITLE "1"
NMESHPOINTS 4
NNODES 4
NELEMENTS_TRIANG1 2
TIMESTEP 0.00001
NINTERNAL_TIMESTEPS 1000
NEXTERNAL_TIMESTEPS 100
DAMPING_FACTOR 0.01
MESHPOINT_COORDINATES
1 0.0 0.0 0.0
2 1.0 0.0 0.0
3 1.0 1.0 0.0
4 0.0 1.0 0.0
NODES_TRIANG1
1 1 2 3
2 1 3 4
And I want to make a valid VTK file from this input.
Here is how the output should look like:
# vtk DataFile Version 1.0
2D Unstructured Grid
ASCII
DATASET UNSTRUCTURED_GRID
POINTS 4 float
0.0 0.0 0.0
1.0 0.0 0.0
1.0 1.0 0.0
0.0 1.0 0.0
CELLS 2 8
3 0 1 2
3 0 2 3
CELL_TYPES 2
5
5
I tried to make a picture how the mappings works I hope it explains some of them.
To make it a bit easier for this specific example let's say we only want to work with triangles.
Sadly I dont have the same file as VTK and MDF too, I tried to manualy write one.
Is there any way to do this with AWK?
Any help will be much appreciated!!

Excellent diagram showing the input -> output mapping! Made it extremely easy to write this:
$ cat tst.awk
$1 ~ /^[[:alpha:]]/ { f[$1] = $2 }
!NF { block = "" }
$1 == "MESHPOINT_COORDINATES" {
block = $1
print "# vtk DataFile Version 1.0"
print "2D Unstructured Grid"
print "ASCII"
print ""
print "DATASET UNSTRUCTURED_GRID"
printf "POINTS %d float\n", f["NMESHPOINTS"]
next
}
block == "MESHPOINT_COORDINATES" {
$1 = ""
sub(/^[[:space:]]+/,"")
print
}
$1 == "NODES_TRIANG1" {
block = $1
printf "\nCELLS %d %d\n", f["NELEMENTS_TRIANG1"], f["NELEMENTS_TRIANG1"] * 4
next
}
block == "NODES_TRIANG1" {
printf "%s", 3
for (i=2; i<=NF; i++) {
printf " %s", $i - 1
}
print ""
nlines++
}
END {
printf "\nCELL_TYPES %d\n", nlines
for (i=1; i<=nlines; i++) {
print 5
}
}
.
$ awk -f tst.awk file.mdf
# vtk DataFile Version 1.0
2D Unstructured Grid
ASCII
DATASET UNSTRUCTURED_GRID
POINTS 4 float
0.0 0.0 0.0
1.0 0.0 0.0
1.0 1.0 0.0
0.0 1.0 0.0
CELLS 2 8
3 0 1 2
3 0 2 3
CELL_TYPES 2
5
5
Normally we only answer questions where the poster has attempted to solve it themselves first but you put enough effort into creating the example and describing the mapping that IMHO you deserve help with a solution so - see the above, try to figure out how it's working yourself (add "prints", check the man page, etc.) and then post a new question if you have any specific questions about it.

Related

Changing a list of values in Awk

I am trying to change values in the following list:
A 0.702
B 0.868
C 3.467
D 2.152
If the second column is less than 0.5 I would like to change to -2, between 0.5-1 to -1, between 1-1.5 to 1 and if > 1.5 then to 2.
When I try the following:
awk '$2<0.9 || $2>2' | awk '{if ($2 < 0.5) print $1,-2;}{if($2>0.5 || $2<1) print $1,-1;}{if($2>1 || $2<1.5) print $1,1;}{if($2>2) print $1,2;}'
I get the following:
A -1
A 1
B -1
B 1
C 1
C 2
D 1
D 2
I know I am missing something but for the life of me I can't figure out what - any help gratefully recieved.
If you have multiple if statements and the current value can match multiple statements, you can print multiple outputs.
If you only want to print the output of the first match, you would have to prevent running the if statements that follow.
You can use a single awk and define non overlapping matches with greater than and && lower than.
Note that using only > and < you will not for example 0.5
awk '{
if($2 < 0.5) print($1, -2)
if($2 > 0.5 && $2<1) print($1,-1)
if($2 > 1 && $2<1.5) print($1, 1)
if($2 > 1.5) print($1 ,2)
}
' file
Output
A -1
B -1
C 2
D 2
With your shown samples only. Adding one more solution with using ternary operators for condition checking(for Fun :) ).
awk '{print (NF?($2>1.5?($1 OFS 2):($2>1?($1 OFS 1):($2>0.5?($1 OFS "-1"):($1 OFS "-2")))):"")}' Input_file
Better readable form of above awk code. Since its a one-liner so breaking it up into multi form for better readability here.
awk '
{
print \
(\
NF\
?\
($2>1.5\
?\
($1 OFS 2)\
:\
($2>1\
?\
($1 OFS 1)\
:\
($2>0.5\
?\
($1 OFS "-1")\
:\
($1 OFS "-2")\
)\
)\
)\
:\
""\
)
}
' Input_file
Explanation: Simple explanation would be using ternary operators to perform conditions and accordingly printing values(since its happening in print function).
Another. Replace <s with <=s where needed:
$ awk '{
if($2<0.5) # from low to higher sets the lower limit
$2=-2
else if($2<1) # so only upper limit needs to be tested
$2=-1
else if($2<1.5)
$2=1
else
$2=2
}1' file
Output:
A -1
B -1
C 2
D 2
Probably overkill for your needs but here's a data-driven approach using GNU awk for arrays of arrays and +/-inf:
$ cat tst.awk
BEGIN {
range["-inf"][0.5] = -2
range[0.5][1] = -1
range[1][1.5] = 1
range[1.5]["+inf"] = 2
}
{
val = ""
for ( beg in range ) {
for ( end in range[beg] ) {
if ( (beg+0 < $2) && ($2 <= end+0) ) {
val = range[beg][end]
}
}
}
print $1, val
}
$ awk -f tst.awk file
A -1
B -1
C 2
D 2
I'm assuming above that "between" excludes the start of the range but includes the end of it. You could make it slightly more efficient with:
for ( beg in range ) {
if ( beg+0 < $2 ) {
for ( end in range[beg] ) {
if ( $2 <= end+0 ) {
val = range[beg][end]
}
}
}
}
but I just like having the range comparison all on 1 line and there's only 1 end for every begin so it doesn't make much difference.
UPDATE 1 : new equation should cover nearly all scenarios :
1st half equation handles the sign +/-
2nd half handles the magnitude of the binning
mawk '$NF = (-++_)^(+(__=$NF)<_) * ++_^(int(__+_--^-_)!=_--)'
X -1.25 -2
X -1.00 -2
X -0.75 -2
X -0.50 -2
X -0.25 -2
X 0.00 -2
X 0.25 -2
X 0.50 -1
X 0.75 -1
X 1.00 1
X 1.25 1
X 1.50 2
X 1.75 2
X 2.00 2
X 2.25 2
X 2.50 2
==============================
this may not cover every possible scenario, but if u want a single liner to cover the samples shown :
mawk '$NF = 4 < (_=int(2*$NF)-2)^2 ? 1+(-3)^(_<-_) :_'
A -1
B -1
C 2
D 2

awk script to sum numbers in a column over a loop not working for some iterations in the loop

Sample input
12.0000 0.6000000 0.05
13.0000 1.6000000 0.05
14.0000 2.6000000 0.05
15.0000 3.0000000 0.05
15.0000 3.2000000 0.05
15.0000 3.4000000 0.05
15.0000 3.6000000 0.10
15.0000 3.8000000 0.10
15.0000 4.0000000 0.10
15.0000 4.2000000 0.11
15.0000 4.4000000 0.12
15.0000 4.6000000 0.13
15.0000 4.8000000 0.14
15.0000 5.0000000 0.15
15.0000 5.2000000 0.14
15.0000 5.4000000 0.13
15.0000 5.6000000 0.12
15.0000 5.8000000 0.11
15.0000 6.0000000 0.10
15.0000 6.2000000 0.10
15.0000 6.4000000 0.10
15.0000 6.6000000 0.05
15.0000 6.8000000 0.05
15.0000 7.0000000 0.05
Goal
Print line 1 in output as 0 0
For $2 = 5.000000, $3 = 0.15.
Print line 2 in output as 1 0.15
For $2 = 4.800000 through $2 = 5.200000, sum+=$3 for each line (i.e. 0.14 + 0.15 + 0.14 = 0.43).
Print line 3 in output as 2 0.43.
For $2 = 4.600000 through $2 = 5.400000, sum+=$3 for each line (i.e. 0.13 + 0.14 + 0.15 + 0.14 + 0.13 = 0.69).
Print line 4 in output as 3 0.69
Continue this pattern until $2 = 5.000000 +- 1.6 (9 lines total, plus line 1 as 0 0 = 10 total lines in output)
Desired Output
0 0
1 0.15
2 0.43
3 0.69
4 0.93
5 1.15
6 1.35
7 1.55
8 1.75
9 1.85
Attempt
Script 1
#!/bin/bash
for (( i=0; i<=8; i++ )); do
awk '$2 >= 5.0000000-'$i'*0.2 {sum+=$3}
$2 == 5.0000000+'$i'*0.2 {print '$i', sum; exit
}' test.dat
done > test.out
produces
0 0.15
1 0.43
2 0.69
3 0.93
4 1.15
5 1.35
6 1.55
7 1.75
8 1.85
This is very close. However, the output is missing 0 0 for line 1, and because of this, lines 2 through 10 have $1 and $2 mismatched by 1 line.
Script 2
#!/bin/bash
for (( i=0; i<=8; i++ )); do
awk ''$i'==0 {sum=0}
'$i'>0 && $2 > 5.0000000-'$i'*0.2 {sum+=$3}
$2 == 5.0000000+'$i'*0.2 - ('$i' ? 0.2 : 0) {print '$i', sum; exit
}' test.dat
done > test.out
which produces
0 0
1 0.15
2 0.43
4 0.93
5 1.15
6 1.35
7 1.55
$1 and $2 are now correctly matched. However, I am missing the lines with $1=3, $1=8, and $1=9 completely. Adding the ternary operator causes my code to skip these iterations in the loop somehow.
Question
Can anyone explain what's wrong with script 2, or how to achieve the desired output in one line of code? Thank you.
Solution
I used Ed Morton's solution to solve this. Both of them work for different goals. Instead of using the modulus to save array space, I constrained the array to $1 = 15.0000. I did this instead of the modulus in order to include two other "key" variables that I had wanted to also sum over at different parts of the input, into separate output files.
Furthermore, as far as I understood it, the script summed only for lines with $2 >= 5.0000000, and then multiplied the summation by 2, in order to include the lines with $2 <= 5.0000000. This works for the sample input here because I made $3 symmetric around 0.15. I modified it to sum them separately, though.
awk 'BEGIN { key=5; range=9}
$1 == 15.0000 {
a[NR] = $3
}
$2 == key { keyIdx = NR}
END {
print (0, 0) > "test.out"
sum = a[keyIdx]
for (delta=1; delta<=range; delta++) {
print (delta, sum) > "test.out"
plusIdx = (keyIdx + delta)
minusIdx = (keyIdx - delta)
sum += a[plusIdx] + a[minusIdx]
}
exit
}' test.dat
Is this what you're trying to do?
$ cat tst.awk
$2 == 5 { keyNr = NR }
{ nr2val[NR] = $3 }
END {
print 0, 0
sum = nr2val[keyNr]
for (delta=1; delta<=9; delta++) {
print delta, sum
sum += nr2val[keyNr+delta] + nr2val[keyNr-delta]
}
}
$ awk -f tst.awk file
0 0
1 0.15
2 0.43
3 0.69
4 0.93
5 1.15
6 1.35
7 1.55
8 1.75
9 1.85
We could optimize it to only store 2*(range=9) values in vals[] (using a modulus operator NR%(2*range) for the index) and do the calculation when we hit an NR that's range lines past the line where $2 == key rather than doing it after we've read the whole of the input if it's either too slow or your input file is too big to store all in memory, e.g.:
$ cat tst.awk
BEGIN { key=5; range=9 }
{
idx = NR % (2*range)
nr2val[idx] = $3
}
$2 == key { keyIdx = idx; endNr = NR+range }
NR == endNr { exit }
END {
print 0, 0
sum = nr2val[keyIdx]
for (delta=1; delta<=range; delta++) {
print delta, sum
idx = (keyIdx + delta) % (2*range)
sum += nr2val[idx] + nr2val[idx]
}
exit
}
$ awk -f tst.awk file
0 0
1 0.15
2 0.43
3 0.69
4 0.93
5 1.15
6 1.35
7 1.55
8 1.75
9 1.85
I like your problem. It is an adequate challenge.
My approach is to put all possible into the awk script. And scan the input file only once. Because I/O manipulation is slower than computation (these days).
Do as many computation (actually 9) on the relevant input line.
The required inputs are variable F1 and text file input.txt
The execution command is:
awk -v F1=95 -f script.awk input.txt
So the logic is:
1. Initialize: Compute the 9 range markers and store their values in an array.
2. Store the 3rd input value in an order array `field3`. We use this array to compute the sum.
3. On each line that has 1st field equals 15.0000.
3.1 If found begin marker then mark it.
3.2 If found end marker then compute the sum, and mark it.
4. Finalize: Output all the computed results
script.awk including few debug printout to assist in debugging
BEGIN {
itrtns = 8; # iterations count consistent all over the program.
for (i = 0; i <= itrtns; i++) { # compute range markers per iteration
F1start[i] = (F1 - 2 - i)/5 - 14; # print "F1start["i"]="F1start[i];
F1stop[i] = (F1 - 2 + i)/5 - 14; # print "F1stop["i"]="F1stop[i];
b[i] = F1start[i] + (i ? 0.2 : 0); # print "b["i"]="b[i];
}
}
{ field3[NR] = $3;} # store 3rd input field in ordered array.
$1==15.0000 { # for each input line that has 1st input field 15.0000
currVal = $2 + 0; # convert 2nd input field to numeric value
for (i = 0; i <= itrtns; i++) { # on each line scan for range markers
# print "i="i, "currVal="currVal, "b["i"]="b[i], "F1stop["i"]="F1stop[i], isZero(currVal-b[i]), isZero(currVal-F1stop[i]);
if (isZero(currVal - b[i])) { # if there is a begin marker
F1idx[i] = NR; # store the marker index postion
# print "F1idx["i"] =", F1idx[i];
}
if (isZero(currVal - F1stop[i])) { # if there is an end marker
for (s = F1idx[i]; s <= NR; s++) {sum[i] += field3[s];} # calculate its sum
F2idx[i] = NR; # store its end marker postion (for debug report)
# print "field3["NR"]=", field3[NR];
}
}
}
END { # output the computed results
for (i = 0; i <= itrtns; i++) {print i, sum[i], "rows("F1idx[i]"-"F2idx[i]")"}
}
function isZero(floatArg) { # floating point number pecision comparison
tolerance = 0.00000000001;
if (floatArg < tolerance && floatArg > -1 * tolerance )
return 1;
return 0;
}
Provided input.txt from the question.
12.0000 0.6000000 0.05
13.0000 1.6000000 0.05
14.0000 2.6000000 0.05
15.0000 3.0000000 0.05
15.0000 3.2000000 0.05
15.0000 3.4000000 0.05
15.0000 3.6000000 0.10
15.0000 3.8000000 0.10
15.0000 4.0000000 0.10
15.0000 4.2000000 0.11
15.0000 4.4000000 0.12
15.0000 4.6000000 0.13
15.0000 4.8000000 0.14
15.0000 5.0000000 0.15
15.0000 5.2000000 0.14
15.0000 5.4000000 0.13
15.0000 5.6000000 0.12
15.0000 5.8000000 0.11
15.0000 6.0000000 0.10
15.0000 6.2000000 0.10
15.0000 6.4000000 0.10
15.0000 6.6000000 0.05
15.0000 6.8000000 0.05
15.0000 7.0000000 0.05
The output for: awk -v F1=95 -f script.awk input.txt
0 0.13 rows(12-12)
1 0.27 rows(12-13)
2 0.54 rows(11-14)
3 0.79 rows(10-15)
4 1.02 rows(9-16)
5 1.24 rows(8-17)
6 1.45 rows(7-18)
7 1.6 rows(6-19)
8 1.75 rows(5-20)
The output for: awk -v F1=97 -f script.awk input.txt
0 0.15 rows(14-14)
1 0.29 rows(14-15)
2 0.56 rows(13-16)
3 0.81 rows(12-17)
4 1.04 rows(11-18)
5 1.25 rows(10-19)
6 1.45 rows(9-20)
7 1.65 rows(8-21)
8 1.8 rows(7-22)

Mixture prior not working in JAGS, only when likelihood term included

The code at the bottom will replicate the problem, just copy and paste it into R.
What I want is for the mean and precision to be (-100, 100) 30% of the time, and (200, 1000) for 70% of the time. Think of it as lined up in a, b, and p.
So 'pick' should be 1 30% of the time, and 2 70% of the time.
What actually happens is that on every iteration, pick is 2 (or 1 if the first element of p is the larger one). You can see this in the summary, where the quantiles for 'pick', 'testa', and 'testb' remain unchanged throughout. The strangest thing is that if you remove the likelihood loop, pick then works exactly as intended.
I hope this explains the problem, if not let me know. It's my first time posting so I'm bound to have messed things up.
library(rjags)
n = 10
y <- rnorm(n, 5, 10)
a = c(-100, 200)
b = c(100, 1000)
p = c(0.3, 0.7)
## Model
mod_str = "model{
# Likelihood
for (i in 1:n){
y[i] ~ dnorm(mu, 10)
}
# ISSUE HERE: MIXTURE PRIOR
mu ~ dnorm(a[pick], b[pick])
pick ~ dcat(p[1:2])
testa = a[pick]
testb = b[pick]
}"
model = jags.model(textConnection(mod_str), data = list(y = y, n=n, a=a, b=b, p=p), n.chains=1)
update(model, 10000)
res = coda.samples(model, variable.names = c('pick', 'testa', 'testb', 'mu'), n.iter = 10000)
summary(res)
I think you are having problems for a couple of reasons. First, the data that you have supplied to the model (i.e., y) is not a mixture of normal distributions. As a result, the model itself has no need to mix. I would instead generate data something like this:
set.seed(320)
# number of samples
n <- 10
# Because it is a mixture of 2 we can just use an indicator variable.
# here, pick (in the long run), would be '1' 30% of the time.
pick <- rbinom(n, 1, p[1])
# generate the data. b is in terms of precision so we are converting this
# to standard deviations (which is what R wants).
y_det <- pick * rnorm(n, a[1], sqrt(1/b[1])) + (1 - pick) * rnorm(n, a[2], sqrt(1/b[2]))
# add a small amount of noise, can change to be more as necessary.
y <- rnorm(n, y_det, 1)
These data look more like what you would want to supply to a mixture model.
Following this, I would code the model up in a similar way as I did the data generation process. I want some indicator variable to jump between the two normal distributions. Thus, mu may change for each scalar in y.
mod_str = "model{
# Likelihood
for (i in 1:n){
y[i] ~ dnorm(mu[i], 10)
mu[i] <- mu_ind[i] * a_mu + (1 - mu_ind[i]) * b_mu
mu_ind[i] ~ dbern(p[1])
}
a_mu ~ dnorm(a[1], b[1])
b_mu ~ dnorm(a[2], b[2])
}"
model = jags.model(textConnection(mod_str), data = list(y = y, n=n, a=a, b=b, p=p), n.chains=1)
update(model, 10000)
res = coda.samples(model, variable.names = c('mu_ind', 'a_mu', 'b_mu'), n.iter = 10000)
summary(res)
2.5% 25% 50% 75% 97.5%
a_mu -100.4 -100.3 -100.2 -100.1 -100
b_mu 199.9 200.0 200.0 200.0 200
mu_ind[1] 0.0 0.0 0.0 0.0 0
mu_ind[2] 1.0 1.0 1.0 1.0 1
mu_ind[3] 0.0 0.0 0.0 0.0 0
mu_ind[4] 1.0 1.0 1.0 1.0 1
mu_ind[5] 0.0 0.0 0.0 0.0 0
mu_ind[6] 0.0 0.0 0.0 0.0 0
mu_ind[7] 1.0 1.0 1.0 1.0 1
mu_ind[8] 0.0 0.0 0.0 0.0 0
mu_ind[9] 0.0 0.0 0.0 0.0 0
mu_ind[10] 1.0 1.0 1.0 1.0 1
If you supplied more data, you would (in the long run) have the indicator variable mu_ind take the value of 1 30% of the time. If you had more than 2 distributions you could instead use dcat. Thus, an alternative and more generalized way of doing this would be (and I am borrowing heavily from this post by John Kruschke):
mod_str = "model {
# Likelihood:
for( i in 1 : n ) {
y[i] ~ dnorm( mu[i] , 10 )
mu[i] <- muOfpick[ pick[i] ]
pick[i] ~ dcat( p[1:2] )
}
# Prior:
for ( i in 1:2 ) {
muOfpick[i] ~ dnorm( a[i] , b[i] )
}
}"
model = jags.model(textConnection(mod_str), data = list(y = y, n=n, a=a, b=b, p=p), n.chains=1)
update(model, 10000)
res = coda.samples(model, variable.names = c('pick', 'muOfpick'), n.iter = 10000)
summary(res)
2.5% 25% 50% 75% 97.5%
muOfpick[1] -100.4 -100.3 -100.2 -100.1 -100
muOfpick[2] 199.9 200.0 200.0 200.0 200
pick[1] 2.0 2.0 2.0 2.0 2
pick[2] 1.0 1.0 1.0 1.0 1
pick[3] 2.0 2.0 2.0 2.0 2
pick[4] 1.0 1.0 1.0 1.0 1
pick[5] 2.0 2.0 2.0 2.0 2
pick[6] 2.0 2.0 2.0 2.0 2
pick[7] 1.0 1.0 1.0 1.0 1
pick[8] 2.0 2.0 2.0 2.0 2
pick[9] 2.0 2.0 2.0 2.0 2
pick[10] 1.0 1.0 1.0 1.0 1
The link above includes even more priors (e.g., a Dirichlet prior on the probabilities incorporated into the Categorical distribution).

Format time from alphanumeric to numeric

I have a text file:
ifile.txt
x y z t value
1 1 5 01hr01Jan2018 3
1 1 5 02hr01Jan2018 3.1
1 1 5 03hr01Jan2018 3.2
1 3.4 3 01hr01Jan2018 4.1
1 3.4 3 02hr01Jan2018 6.1
1 3.4 3 03hr01Jan2018 1.1
1 4.2 6 01hr01Jan2018 6.33
1 4.2 6 02hr01Jan2018 8.33
1 4.2 6 03hr01Jan2018 5.33
3.4 1 2 01hr01Jan2018 3.5
3.4 1 2 02hr01Jan2018 5.65
3.4 1 2 03hr01Jan2018 3.66
3.4 3.4 4 01hr01Jan2018 6.32
3.4 3.4 4 02hr01Jan2018 9.32
3.4 3.4 4 03hr01Jan2018 12.32
3.4 4.2 8.1 01hr01Jan2018 7.43
3.4 4.2 8.1 02hr01Jan2018 7.93
3.4 4.2 8.1 03hr01Jan2018 5.43
4.2 1 3.4 01hr01Jan2018 6.12
4.2 1 3.4 02hr01Jan2018 7.15
4.2 1 3.4 03hr01Jan2018 9.12
4.2 3.4 5.5 01hr01Jan2018 2.2
4.2 3.4 5.5 02hr01Jan2018 3.42
4.2 3.4 5.5 03hr01Jan2018 3.21
4.2 4.2 6.2 01hr01Jan2018 1.3
4.2 4.2 6.2 02hr01Jan2018 3.4
4.2 4.2 6.2 03hr01Jan2018 1
Explanation: Each coordinate (x,y) has a z-value and three time values. The spaces are not tabs. They are sequence of spaces.
I would like to format the t-column from alphanumeric to numeric and then convert to a csv file. My expected output is as:
ofile.txt
x,y,z,201801010100,201801010200,201801010300
1,1,5,3,3.1,3.2
1,3.4,3,4.1,6.1,1.1
1,4.2,6,6.33,8.33,5.33
3.4,1,2,3.5,5.65,3.66
3.4,3.4,4,6.32,9.32,12.32
3.4,4.2,8.1,7.43,7.93,5.43
4.2,1,3.4,6.12,7.15,9.12
4.2,3.4,5.5,2.2,3.42,3.21
4.2,4.2,6.2,1.3,3.4,1
The desire time format is replaced with YYYYMMDDHHMin.
I had asked part of this question previously. Please see Format and then convert txt to csv using shell script and awk. However I can't able to change the time format within the following script.
awk -v OFS=, '{k=$1 OFS $2 OFS $3}
!($4 in hdr){hn[++h]=$4; hdr[$4]}
k in row{row[k]=row[k] OFS $5; next}
{rn[++n]=k; row[k]=$5}
END {
printf "%s", rn[1]
for(i=1; i<=h; i++)
printf "%s", OFS hn[i]
print ""
for (i=2; i<=n; i++)
print rn[i], row[rn[i]]
}' ifile.txt
Expanding on my answer from your previous question:
gawk '
BEGIN {
SUBSEP = OFS = ","
month["Jan"] = "01"; month["Feb"] = "02"; month["Mar"] = "03";
month["Apr"] = "04"; month["May"] = "05"; month["Jun"] = "06";
month["Jul"] = "07"; month["Aug"] = "08"; month["Sep"] = "09";
month["Oct"] = "10"; month["Nov"] = "11"; month["Dec"] = "12";
}
function timestamp_to_numeric(s) {
# 03hr31Jan2001 => 200101310300
return substr(s,10,4) month[substr(s,7,3)] substr(s,5,2) substr(s,1,2) "00"
}
NR==1 {next}
{g = timestamp_to_numeric($4); groups[g]; value[$1,$2,$3][g] = $5}
END {
PROCINFO["sorted_in"] = "#ind_str_asc"
printf "x,y,z"; for (g in groups) printf ",%s", g; printf "\n"
for (a in value) {
printf "%s", a
for (g in groups) printf "%s%s", OFS, 0+value[a][g]
printf "\n"
}
}
' ifile.txt
x,y,z,201801010100,201801010200,201801010300
1,1,5,3,3.1,3.2
1,3.4,3,4.1,6.1,1.1
1,4.2,6,6.33,8.33,5.33
3.4,1,2,3.5,5.65,3.66
3.4,3.4,4,6.32,9.32,12.32
3.4,4.2,8.1,7.43,7.93,5.43
4.2,1,3.4,6.12,7.15,9.12
4.2,3.4,5.5,2.2,3.42,3.21
4.2,4.2,6.2,1.3,3.4,1
You have to create a mapping between the month name and the month number, then create a function to transform the timestamp to the new format. Beyond that, the code is the same.

find first 5 maximum values in each line using awk

I am trying to read a text file like the following
word 1 2 3 4 5 6 7 8 9 10
hello 0.2 0.3 0.5 0.1 0.7 0.8 0.6 0.1 0.9
I would like to print the word, "hello", and the maximun 5 values along with the number of the column where they are, like this using awk:
hello 10 0.9 7 0.8 6 0.7 8 0.6 3 0.5
I have thought something like this awk '{ for (i=1; i <= 10; i++) a[$i]=$i};END{c=asort(a)?? for(i in a)print i,a[i]??}', but I would like to print in each line read.
With GNU awk 4.* for sorted_in:
$ cat tst.awk
BEGIN { PROCINFO["sorted_in"] = "#val_num_desc" }
NR>1 {
split($0,a)
printf "%s", a[1]
delete a[1]
for (i in a) {
printf " %d %s", i, a[i]
if (++c == 5) {
c=0
break
}
}
print ""
}
$ awk -f tst.awk file
hello 10 0.9 7 0.8 6 0.7 8 0.6 4 0.5
here is an awk assisted Unix tool set solution.
$ awk -v RS=" " 'NR==1; NR>1{print NR, $0 | "sort -k2nr"} ' file | head -6 | xargs
hello 10 0.9 7 0.8 6 0.7 8 0.6 4 0.5
I think your expected output has some typos.
You stated [you] would like to print in each line read so no limits to records read:
$ awk '{delete a; for(i=2; i<=NF; i++) {a[$i]=$i; b[$i]=i}; n=asort(a); printf "%s: ",$1; for(i=n; i>n-(n>=5?5:n); i--) printf "%s %s ", b[a[i]], a[i]; printf "\n"}' test.in
word: 11 10 10 9 9 8 8 7 7 6
hello: 10 0.9 7 0.8 6 0.7 8 0.6 4 0.5
Walk-thru version:
{
delete a # delete the array before each record
for(i=2; i<=NF; i++) { # from the second field to the last
a[$i]=$i # set field to array index and value
b[$i]=i # remember the field number
}
n=asort(a) # sort the a array
printf "%s: ",$1 # print the record identifier ie. the first field
for(i=n; i>n-(n>=5?5:n); i--) # for the 5 (or value count) biggest values
printf "%s %s", b[a[i]], a[i] # print them out
printf "\n" # enter after each record
}
If a value repeats, it's only printed once.
Using Perl
$ cat cloudy.txt
word 1 2 3 4 5 6 7 8 9 10
hello 0.2 0.3 0.5 0.1 0.7 0.8 0.6 0.1 0.9
$ perl -lane '%kv=();%kv=map{ $_=>$F[$_] } 1..$#F; printf("$F[0] ");$i=0; for $x (reverse sort {$a <=> $b} values %kv) { #y=grep $x eq $kv{$_}, (keys %kv); printf("%d %.1f
",$y[0]+1,$x) if $i++ <5 } print "" ' cloudy.txt
word 11 10.0 10 9.0 9 8.0 8 7.0 7 6.0
hello 10 0.9 7 0.8 6 0.7 8 0.6 4 0.5
$