How can I replace sequences of space-delimited numbers when those sequences span multiple lines and the input is too big to fit in RAM.
A sample input would be:
edit: I re-worked the sample input and input parameters for introducing border cases (excluding ones that have to do with the length of the matched sequence or replacement priorities)
3 12 3 4
0 6 7 10
8 9 12 3
4 6 7 8
10 6 6 7
9 199 10 11
11
note: the number of fields per line is homogeneous but not known in advance; the last line might contain less fields
From that input I would like to:
replace 3 4 with &
replace 6 7 8 with 9 9
replace 6 7 9 with 8 8
replace 7 10 with 11 12
replace 0 with nothing
replace 10 with 13 10
replace 8 9 12 3 5 with #
The expected output would have one number or replacement per line:
3
12
&
6
11 12
8
9
12
&
9 9
13 10
6
8 8
199
13 10
11
11
I'm trying to do the task with awk but I'm having a hard time implementing a dynamic state machine with a pseudo B-Tree:
tr -s '[:space:]' '\n' < input.txt |
awk '
BEGIN {
for (i = 2; i < ARGC; i += 2) {
n = split(ARGV[i], arr)
k = ""
for (j = 1; j <= n; j++) {
k = j SUBSEP k SUBSEP arr[j]
Tree[k]
}
Tree[k] = "$" ARGV[i+1] #=> now can test "if (Tree[k])"
delete ARGV[i]
delete ARGV[i+1]
}
}
{
Key = (int(Key) + 1) SUBSEP Key SUBSEP $1
if ( Key in Tree ) {
if (Tree[Key]) {
print substr(Tree[Key],2)
Buffer = ""
Key = ""
}
else
Buffer = Buffer $1 "\n"
} else {
print Buffer $1
Buffer = ""
Key = ""
}
}
END { if (Buffer != "") printf ("%s", Buffer) }
' - \
'3 4' '&' \
'6 7 8' '9 9' \
'6 7 9' '8 8' \
'7 10' '11 12' \
'0' '' \
'10' '13 10' \
'8 9 12 3 5' '#'
edit: I realised that the code doesn't backtrack after failing to find a complete match in the B-tree, so it's wrong...
How I'm planning to tackle the problem
I'm emulating a B-tree with an array and keys in the following format:
from the middle to the left of the key are the consecutive depths
from the middle to the right of the key are the consecutive values
When a key exists in Tree:
if it doesn't have an associated value then it's a node
if there's a value then it's a leaf
So, for the current input parameters, the content of the Tree array will be:
# from param: "3 4" => "&"
Tree[ 1,"",3 ]
Tree[2,1,"",3,4] = "$&"
# from param: "6 7 8" => "9 9"
Tree[ 1,"",6 ]
Tree[ 2,1,"",6,7 ]
Tree[3,2,1,"",6,7,8] = "$9 9"
# from param: "6 7 9" => "8 8"
Tree[ 1,"",6 ]
Tree[ 2,1,"",6,7 ]
Tree[3,2,1,"",6,7,9] = "$8 8"
# from param: "7 10" => "11 12"
Tree[ 1,"",7 ]
Tree[2,1,"",7,10] = "$11 12"
# from param: "0" => ""
Tree[1,"",0] = "$"
# from param: "10" => "13 10"
Tree[1,"",10] = "$13 10"
# from param: "8 9 12 3 5" => "#"
Tree[ 1,"",8 ]
Tree[ 2,1,"",8,9 ]
Tree[ 3,2,1,"",8,9,12 ]
Tree[ 4,3,2,1,"",8,9,12,3 ]
Tree[5,4,3,2,1,"",8,9,12,3,5] = "$#"
FWIW I'd approach this by figuring out the max number of records that you might need to search in based on the mappings you want, keep a rolling buffer of that number of records, and then do the comparison part on each buffer, e.g.:
$ cat tst.awk
BEGIN {
RS = "[[:space:]]+"
map("3,4" , "&")
map("6,7,8" , "9")
map("9" , "")
map("0" , "\\000")
map("13,10" , "10")
}
{ buf[((NR-1) % maxRecs) + 1] = $0 }
NR >= maxRecs { prt() }
END { prt() }
function prt( nr,sep,str) {
for ( nr=NR-maxRecs+1; nr<=NR; nr++ ) {
str = str sep buf[((nr-1) % maxRecs) + 1]
sep = ORS
}
print ">>>>" ORS str ORS "<<<<"
# Replace the above with something that loops through the
# strings you want replaced, e.g.
#
# for ( mapNr=1; mapNr<=numMaps; mapNr++ ) {
# old = olds[mapNr]
# if ( str ~ old ) { # add something to avoid partial matches
# new = news[mapNr]
# replace old with new in the output
# }
# }
}
function map(old,new, numRecs) {
++numMaps
numRecs = gsub(/,/,ORS,old) + 1
maxRecs = ( numRecs > maxRecs ? numRecs : maxRecs )
olds[numMaps] = old
news[numMaps] = new
}
$ awk -f tst.awk file
>>>>
112
3
4
<<<<
>>>>
3
4
6
<<<<
>>>>
4
6
7
<<<<
>>>>
6
7
8
<<<<
>>>>
7
8
9
<<<<
>>>>
8
9
12
<<<<
>>>>
9
12
0
<<<<
>>>>
12
0
3
<<<<
>>>>
0
3
4
<<<<
>>>>
3
4
15
<<<<
>>>>
4
15
255
<<<<
>>>>
15
255
13
<<<<
>>>>
255
13
10
<<<<
>>>>
13
10
6
<<<<
>>>>
10
6
7
<<<<
>>>>
6
7
8
<<<<
>>>>
7
8
199
<<<<
>>>>
8
199
9
<<<<
>>>>
199
9
0
<<<<
>>>>
9
0
13
<<<<
>>>>
9
0
13
<<<<
The above is just printing the buff-sized strings, the part to be added is replacing the target strings with the new ones in a way that the next target doesn't match the replaced part which is a common problem with, I expect, lots of solutions online so it's left as an exercise.
You'll also need to tweak it to make sure it doesn't revisit lines at the end of the input.
The above uses GNU awk for multi-char RS, if you don't have GNU awk then just pipe the input from tr -s '[:space:]' '\n' as shown in the question.
UPDATE:
previous answer (see edit revisions) was woefully slow (several minutes) when run against a ramped up input (7K mappings in map.txt; 25M tokens
in input.txt1)
new answer (below) is a complete rewrite and processes the 7K-mappings/25M-tokens in ~45 seconds
The main component of this design centers around a tree-like node structure used to manage the series of tokens (lines of input from map.txt):
tree [ParentNodeNbr] [token] [NodeType] = value
Where:
ParentNodeNbr == 0 for the root
token from map.txt
NodeType has one of two values 'node' or 'leaf'
for NodeType = 'node' the value stored in the array is a numeric node number (implemented as an counter that's incremented each time a new node is added to the tree); this node number becomes the ParentNodeNbr for the next token in the series
for NodeType = 'leaf' this designates the 'end' of a series of tokens (line of input from map.txt) and the value stored in the array is the line number (aka FNR) from map.txt; this line number (FNR) is used as an index into a couple other arrays and to determine precendence when an input sequence (from input.txt) has multiple matches from map.txt
when processing a series of tokens from a map.txt line of input we start at ParentNodeNbr == 0 looking for a series of matching nodes, adding new nodes as needed
Setup: storing replacements in a comma-delimited file (map.txt), and adding one additional line to input.txt:
$ head map.txt input.txt
==> map.txt <==
2 3 4,X # "2 3 4" has precendence over ...
2 3,Y # "2 3"
3 4,&
6 7 8,9
9,
0,\000
13 10,10
==> input.txt <==
2 3 4 # keep eye on "2 3" vs "2 3 4" precendence
112 3
4 6 7
8 9 12 0 3
4 15 255 13
10 6
7 8 199 9
0 13
NOTE: here's what tree[][][] looks like when populated from map.txt:
tree [Parent] [Token] [NodeType] = NodeVal
Parent Token NodeType NodeVal MapTo ** MapTo only applies to NodeType = leaf
====== ===== ======== ======= =====
0 0 leaf 6 "\000"
0 2 node 1
0 3 node 3
0 6 node 4
0 9 leaf 5 ""
0 13 node 6
1 3 node 2
1 3 leaf 1 "Y"
2 4 leaf 2 "X"
3 4 leaf 3 "&"
4 7 node 5
5 8 leaf 4 "9"
6 10 leaf 7 "10"
One GNU awk (for multidimensional arrrays):
awk '
function replace(op) {
while ( ((maxToken - minToken + 1) >= maxlen) || op == "flush" ) {
NodeNbr=root
minOrd=maxOrd
for (j=0 ; j<maxlen; j++) { # loop through tokens in buffer[]
token=buffer[ ((minToken + j - 1) % maxlen) + 1 ]
# if we find a matching "leaf" node then keep track of the ordering (ie, FNR from map.txt; lower order == higher precedence)
if ( token in tree [NodeNbr] && "leaf" in tree[NodeNbr][token] )
minOrd= ( tree[NodeNbr][token]["leaf"] < minOrd ) ? tree[NodeNbr][token]["leaf"] : minOrd
# if we find a matching "node" node then grab the next node to compare against the next token from buffer[]
if ( token in tree[NodeNbr] && "node" in tree[NodeNbr][token] ) {
NodeNbr=tree[NodeNbr][token]["node"]
continue
}
break # if we get here we have a token from buffer[] that does not match any of our replacement mappings so abort checking rest of buffer[]
}
if (minOrd < maxOrd) { # if we found at least one complete match (ie, hit a "leaf" node) then ...
print map[minOrd] # use the associated "ord"er to print the associated replacement string and ...
minToken=minToken + len[minOrd] # update the pointer into the buffer[] array
}
else { # otherwise we did not find a match so ...
print buffer[ ((minToken - 1) % maxlen) + 1 ] # print the first token from buffer[] and ...
minToken++ # update the pointer into the buffer[] array
}
if (minToken > maxToken)
break
}
}
BEGIN { root=maxNodeNbr=maxToken=0
minToken=1
maxOrd=9999999999
}
FNR==NR { split($0,a,",")
map[FNR]=a[2] # save replacement string for this input line from map.txt
n=split(a[1],b) # break our matching pattern into tokens
len[FNR]=n # make note of number of tokens in this line of input
maxlen=(n > maxlen) ? n : maxlen # keep track of longest series of tokens
NodeNbr=root # initiate our tree search
for (i=1 ; i<=n ; i++) { # loop through our list of tokens
token=b[i]
if (i==n) # if the last token for this line then create a "leaf" node and store the line number (aka "order")
tree[NodeNbr][token]["leaf"]=FNR
else
if ( tree[NodeNbr][token]["node"] ) # else if we already have a node at this point in the tree then grab its associated node number for the next level in the tree
NodeNbr=tree[NodeNbr][token]["node"]
else { # else create a new "node" node and populate with the next available node number
tree[NodeNbr][token]["node"]=++maxNodeNbr
NodeNbr=maxNodeNbr # use this as the next level in our tree traversal
}
}
maxrec=FNR # keep track of total number of replacement sets from map.txt (only used if we decide to print the contents of map[] to stdout
next
}
FNR==1 {
# Uncomment following to display the contents of the map[] array:
# for (i=1;i<=maxrec;i++)
# print "map:" i ":" map[i] ":"
#
# Uncomment following to display the contents of the tree[][][] array:
# fmt="%6s%8s%10s%10s%10s\n"
# fmt="%6s%8s%10s%10s%10s\n"
# printf "tree [Parent] [Token] [NodeType]\n\n"
# printf fmt, "Parent", "Token", "NodeType", "NodeVal", "MapTo"
# printf fmt, "======", "=====", "========", "=======", "====="
#
# for (NodeNbr=root ; NodeNbr<=maxNodeNbr ; NodeNbr++)
# for (token in tree[NodeNbr])
# for (NodeType in tree[NodeNbr][token]) { # ??
# NodeVal=tree[NodeNbr][token][NodeType]
# printf fmt, NodeNbr, token, NodeType, NodeVal, (NodeType=="leaf") ? "\"" map[NodeVal] "\"" : ""
# }
}
{ for (i=1 ; i<=NF ; i++) { # loop through tokens in current line from input.txt
maxToken++
buffer[ ((maxToken - 1) % maxlen) + 1 ] = $i
if ( (maxToken - minToken + 1) >= maxlen ) # if we have a "full" buffer then ...
replace() # look for replacement match
}
}
END { replace("flush") } # flush the rest of buffer[]
' map.txt input.txt
This generates:
X # "2 3 4" has precendence over "2 3"
112
&
9
12
\000
&
15
255
10
9
199
\000
13
If we switch the first 2 lines of map.txt like such:
==> map.txt <==
2 3,Y # "2 3" has precendence over ...
2 3 4,X # "2 3 4"
We now generate:
Y # "2 3" has precendence over "2 3 4" thus ...
4 # leaving "4" by itself
112
&
9
12
\000
&
15
255
10
9
199
\000
13
I think I have some kind of TSP problem. I have matrix of distances between 15 cities:
A B C D E F G H I J K L M N O
A 0 3 8 7 8 9 4 4 2 9 5 5 7 9 9
B 9 0 6 3 8 9 3 9 5 3 3 4 8 6 8
C 1 7 0 8 3 5 4 3 1 1 7 8 2 4 3
D 1 9 7 0 4 3 5 6 8 4 3 4 2 8 9
E 5 8 3 5 0 9 7 4 9 4 5 7 4 6 2
F 5 7 9 6 2 0 3 5 3 6 6 7 4 9 2
G 3 2 8 1 1 8 0 3 4 5 2 4 7 2 6
H 1 4 7 5 5 3 8 0 1 1 7 6 5 8 1
I 5 5 6 5 5 6 6 4 0 2 1 3 4 9 5
J 4 5 4 1 3 9 2 7 9 0 6 8 1 9 9
K 3 4 6 5 9 4 9 5 2 5 0 5 1 4 2
L 8 9 5 2 6 2 9 9 4 5 5 0 3 1 5
M 5 9 7 1 5 5 5 4 6 2 1 6 0 9 2
N 9 5 7 5 7 8 6 5 2 7 1 2 9 0 1
O 7 6 9 6 9 8 4 5 6 2 9 7 7 7 0
Distance from A to B is not the same as distance from B to A.
letter in row means city from
letter in column means city to
Example:
distance from A to F is 9
distance from F to A is 5
I have to start and end in city A. I have to travel to 9 different cities, I cant visit same city twice. Travelled distance should be minimalised. I am familiar with TSP algorithm but i am not certain how to do it only for 9 cities. It should be possible to solve this by using tsp algorithm only once. Thanks for help.
Eventually i figured it out:
// Dynamic Programming based Java program to find shortest path with
// exactly k edges
import java.util.*;
import java.lang.*;
import java.io.*;
class knapsack{
// Define number of vertices in the graph and inifinite value
static final int V = 15;
static final int INF = Integer.MAX_VALUE;
static int numberofedges=10;
static int[][][] S=new int[15][15][15];
// A Dynamic programming based function to find the shortest path
// from u to v with exactly k edges.
static <bolean> int shortestPath(int graph[][], int u, int v, int k)
{ for(int y=0;y<15;y++){
for(int x=0;x<15;x++){
for(int z=0;z<9;z++){
S[x][y][z]=-1;
}}}
// Table to be filled up using DP. The value sp[i][j][e] will
// store weight of the shortest path from i to j with exactly
// k edges
int sp[][][] = new int[V][V][k+1];
System.out.println(Arrays.toString(S));
// Loop for number of edges from 0 to k
for (int e = 0; e <= k; e++)
{
for (int i = 0; i < V; i++) // for source
{
for (int j = 0; j < V; j++) // for destination
{
sp[i][j][e] = INF;
boolean gofind=true;
for (int x = 1; x <= e; x++) {
if (S[i][j][x] == j) {
System.out.println(S[i][j][x]);
System.out.println("TU");
gofind = false;
break;
}
}
if(gofind){
// initialize value
// from base cases
if (e == 0 && i == j) {
S[i][j][0]=0;
sp[i][j][e] = 0;
}
if (e == 1 && graph[i][j] != INF) {
S[i][j][e]=j;
sp[i][j][e] = graph[i][j];
}
// go to adjacent only when number of edges is
// more than 1
if (e > 1)
{int help=225;
for (int a = 0; a < V; a++)
{
// There should be an edge from i to a and
// a should not be same as either i or j
if (graph[i][a] != INF && i != a &&j!= a && sp[a][j][e-1] != INF)
{if(sp[i][j][e]>graph[i][a] + sp[a][j][e-1]){
help=a;
}
sp[i][j][e] = Math.min(sp[i][j][e],graph[i][a] + sp[a][j][e-1]);
if(help>16)
S[i][j][e]=j;
else
S[i][j][e]=a;
}}
}
}}
}
}
return sp[u][v][k];
}
public static void main (String[] args)
{
try {
Scanner sc = null;
sc = new Scanner(new BufferedReader(new FileReader("src/ADS2021_cvicenie5data.txt")));
/* Let us create the graph shown in above diagram*/
int[][] graph = new int[15][15];
sc.nextLine();
while(sc.hasNextLine()) {
for (int i=0; i<graph.length; i++) {
String[] line = sc.nextLine().trim().split(" ");
for (int j=0; j<line.length; j++) {
graph[i][j] = Integer.parseInt(line[j]);
}
}
}
System.out.println(Arrays.deepToString(graph));
System.out.println("Weight of the shortest path is "+ shortestPath(graph, 0, 0, numberofedges));
System.out.println(Arrays.toString(S));
}
catch (
FileNotFoundException e) {
e.printStackTrace();
}
}
}
How could i pass some variables/ arrays outside of procedure?
Lets say I've my procedure 'myproc' with inputparameters {a b c d e}, e.g.
myproc {a b c d e} {
... do something
(calculate arrays, lists and new variables)
}
Inside this procedure I want to calculate an array phiN(1),phiN(2),...phiN(18) out of the variables a-e which itself is a list, e.g.
set phiN(1) [list 1 2 3 4 5 6 7 8 9];
(lets say the values 1-9 had been calculated out of the input variables a-e). And I want to calculate some other parameter alpha and beta
set alpha [expr a+b];
set beta [expr c+d];
Anyway no I want to pass these new calculated variables outside of my procedure. Compare to matlab I simply would write sg like to get these variables outside of the 'function'.
[phiN,alpha,beta] = myproc{a b c d e}
Has anybody an idea how I can deal in tcl?? Thanks!
There are several options:
Return a list and use lassign outside
Example:
proc myproc {a b c d e} {
set alpha [expr {$a+$b}]
set beta [expr {$c+$d}]
return [list $alpha $beta]
}
lassign [myproc 1 2 3 4 5] alpha beta
This is fine if you return values, but not arrays.
Use upvar and provide the name of the array/variable as argument
Example:
proc myproc {phiNVar a b c d e} {
upvar 1 $phiNVar phiN
# Now use phiN as local variable
set phiN(1) [list 1 2 3 4 5 6 7 8 9]
}
# Usage
myproc foo 1 2 3 4 5
foreach i $foo(1) {
puts $i
}
Use a combination of both
Example:
proc myproc {phiNVar a b c d e} {
uplevel 1 $phiNVar phiN
set alpha [expr {$a+$b}]
set beta [expr {$c+$d}]
set phiN(1) [list 1 2 3 4 5 6 7 8 9]
return [list $alpha $beta]
}
lassign [myproc bar 1 2 3 4 5] alpha beta
foreach i $bar(1) {
puts $i
}
Edit: As Donal suggested, is is also possible to return a dict:
A dict is a Tcl list where the odd elements are the keys and the even elements are the values. You can convert an array to a dict with array get and convert a dict back to an array with array set. You can also use the dict itself.
Example
proc myproc {a b c d e} {
set alpha [expr {$a+$b}]
set beta [expr {$c+$d}]
set phiN(1) [list 1 2 3 4 5 6 7 8 9]
return [list [array get phiN] $alpha $beta]
}
lassign [myproc 1 2 3 4 5] phiNDict alpha beta
array set bar $phiNDict
foreach i $bar(1) {
puts $i
}
# Use the [dict] command to manipulate the dict directly
puts [dict get $phiNDict 1]
For more ideas (this is about arrays, but could apply to values as well) see this wiki entry.
Hey I'm trying to find the distance between records in a text file. I'm trying to do it using awk.
An example input is:
1 2 1 4 yes
2 3 2 2 no
1 1 1 5 yes
4 2 4 0 no
5 1 0 1 no
I want to find the distance between each of the numerical values. I'm doing this by subtracting the values and then squaring the answer. I have tried the following code below but all the distances are simply 0. Any help would be appreciated.
BEGIN {recs = 0; fieldnum = 5;}
{
recs++;
for(i=1;i<=NF;i++) {data[recs,i] = $i;}
}
END {
for(r=1;r<=recs;r++) {
for(f=1;f<fieldnum;f++) {
##find distances
for(t=1;t<=recs;t++) {
distance[r,t]+=((data[r,f] - data[t,f])*(data[r,f] - data[t,f]));
}
}
}
for(r=1;r<=recs;r++) {
for(t=1;t<recs;t++) {
##print distances
printf("distance between %d and %d is %d \n",r,t,distance[r,t]);
}
}
}
No idea what you mean conceptually by the "distance between each of the numerical values" so I can't help you with your algorithm but let's clean up the code to see what that looks like:
$ cat tst.awk
{
for(i=1;i<=NF;i++) {
data[NR,i] = $i
}
}
END {
for(r=1;r<=NR;r++) {
for(f=1;f<NF;f++) {
##find distances
for(t=1;t<=NR;t++) {
delta = data[r,f] - data[t,f]
distance[r,t]+=(delta * delta)
}
}
}
for(r=1;r<=NR;r++) {
for(t=1;t<NR;t++) {
##print distances
printf "distance between %d and %d is %d\n",r,t,distance[r,t]
}
}
}
$
$ awk -f tst.awk file
distance between 1 and 1 is 0
distance between 1 and 2 is 7
distance between 1 and 3 is 2
distance between 1 and 4 is 34
distance between 2 and 1 is 7
distance between 2 and 2 is 0
distance between 2 and 3 is 15
distance between 2 and 4 is 13
distance between 3 and 1 is 2
distance between 3 and 2 is 15
distance between 3 and 3 is 0
distance between 3 and 4 is 44
distance between 4 and 1 is 34
distance between 4 and 2 is 13
distance between 4 and 3 is 44
distance between 4 and 4 is 0
distance between 5 and 1 is 27
distance between 5 and 2 is 18
distance between 5 and 3 is 33
distance between 5 and 4 is 19
Seems to produce some non-zero output....
I have this kind of records (rows):
0 1 4 8 2 3 7 9 3 4 8 9 4 7 9 1 0 0 2 5 8 2 4 5 6 1 0 2 4 8 9 0
Definitions:
group: collection of numbers which are separated by 0-s (zeros)
sub-group: collection of numbers which are separated by local minima in the groups
local minimum: the numbers before and after it are greater
In the above example there are 3 groups and 7 sub-groups, i.e.
groups: 1 4 8 2 3 7 9 3 4 8 9 4 7 9 1 , 2 5 8 2 4 5 6 1 , 2 4 8 9
sub-groups: 1 4 8 , 3 7 9 , 4 8 9 , 7 9 1 , 2 5 8 , 4 5 6 1 , 2 4 8 9 (this last is identical to the group itself)
So, in these kind of records I have to
find the minima (print out: 2, 3, 4, 2)
the size (number of character) of these sub-groups
positions of numbers of the sub-groups in the groups
I have already started to write something, but I am stuck here...
Can anyone help me to solve this?
Here is the code so far:
#!/usr/bin/awk -f
{
db = split($0,a,/( 0)+ */)
for (i=1; i<=db; i++) {
split_at_max(a[i])
for (j=1; j<=ret_count; j++) {
print ""
for (k=1; k<=maximums[j]; k++) {
print ret[j,k]
}
}
}
}
function split_at_max(x) {
m_db = split(x,values," ")
for (mx in ret) {
delete ret[mx]
}
ret_count = 1
ret_curr_db = 0
for (mi=2; mi<m_db; mi++) {
ret_curr_db++
ret[ret_count,ret_curr_db] = values[mi-1]
if ( (values[mi-1] <= values[mi]) &&
(values[mi] >= values[mi+1]) &&
(values[mi+1] <= values[mi+2]) ) {
maximums[ret_count] = ret_curr_db
ret_count++
ret_curr_db = 0
}
}
ret_curr_db++
ret[ret_count,ret_curr_db] = values[mi-1]
ret_curr_db++
ret[ret_count,ret_curr_db] = values[mi]
maximums[ret_count] = ret_curr_db
}
interesting assignment.
wrote a quick and dirty awk script. there should be a lot of room to optimize. I don't know what kind of output are you expecting...
awk -v RS="0" 'NF>1{
delete g;
print "group:";
for(i=1;i<=NF;i++){
printf $i" ";
g[i]=$i
}
print "";
t=1;
delete m;
for(i=2;i<length(g);i++){
if(g[i-1]>g[i] && g[i]<g[i+1]) {
print "found minima:"g[i]
m[t]=i;
t++;
}
}
if(length(m)>0){
s=0;
for(x=1;x<=length(m);x++){
printf "sub-group: "
for(i=s+1;i<m[x];i++){
printf g[i]" "
s=m[x];
}
print "";
if(x+1>length(m)){
printf "sub-group: ";
for(i=s+1;i<=length(g);i++)
printf g[i]" "
print "";
}
}
}else{
print "no minima found. sub-group is the same as group:"
printf "sub-group: "
for(i=1;i<=NF;i++){
printf $i" ";
g[i]=$i
}
}
print "\n-----------------------------"
} yourFile
the output on your example input:
group:
1 4 8 2 3 7 9 3 4 8 9 4 7 9 1
found minima:2
found minima:3
found minima:4
sub-group: 1 4 8
sub-group: 3 7 9
sub-group: 4 8 9
sub-group: 7 9 1
-----------------------------
group:
2 5 8 2 4 5 6 1
found minima:2
sub-group: 2 5 8
sub-group: 4 5 6 1
-----------------------------
group:
2 4 8 9
no minima found. sub-group is the same as group:
sub-group: 2 4 8 9
-----------------------------
update
fixing for those "special" elements like 20,30,40...
still quick and dirty:
change my awk script above to
sed 's/^0$//g' yourFile | awk -v RS="" [following codes are the same as above]......
then the output is:
group:
6 63 81 31 37 44 20
found minima:31
sub-group: 6 63 81
sub-group: 37 44 20
-----------------------------