AWK: Parsing a directed graph / Extracting records from a file based on fields matching parts of an input file - awk

Problem
So I have an input file with three fields. It is basically a list describing a type of directed graph. The first field is the starting node, second connection type (as in this case there is more than one), and last is the node projected onto.
The problem is, this is a very large and unruly direct graph, and I am only interested in some paths. So I want to provide an input file which has a name of nodes that I care about. If the nodes are mentioned in either the first or third field of the graph file, then I want that entire record (as the path type might vary).
Question
How to extract only certain records of a directed graph?
Bonus, how to extract only those paths which join nodes of interest by at most one neighbor (i.e. nodes of interest can be second nearest neighbors).
Request
I am trying to improve my AWK programming, which is why 1) I want to do this in AWK and 2) I would greatly appreciate a verbose explanation of the code :)
Example of the Problem
Input file:
A
C
D
File to parse:
A -> B
A -> C
A -> D
B -> A
B -> D
C -> E
D -> F
E -> B
E -> F
F -> C
...
Output:
A -> B
A -> C
A -> D
B -> A
B -> D
C -> E
D -> F
F -> C
Bonus Example:
A -> B -> D -> F -> C

If I understand your problem correctly, then this will do:
awk 'NR==FNR { data[$1] = 1; next } $1 in data || $3 in data { print }' graph[12]
How it works: while reading the first file, add all interesting nodes to data. While reading the second file, print only the lines where field one or field three is in data, i.e. is an interesting node.

Going for the bonus:
function left(str) { # returns the leftmost char of a given edge (A -> B)
return substr(str,1,1)
}
function right(str) { # returns the rightmost...
return substr(str,length(str),1)
}
function cmp_str_ind(i1, v1, i2, v2) # array travese order function
{ # this forces the start from the node in the beginning of input file
if(left(i1)==left(a)&&left(i2)!=left(a)) # or leftmost value in a
return -1
else if(left(i2)==left(a)&&left(i1)!=left(a))
return 1
else if(i1 < i2)
return -1
return (i1 != i2)
}
function trav(a,b,c,d) { # goes thru edges in AWK order
# print "A:"a," C:"c," D:"d
if(index(d,c)||index(d,right(c))) {
return ""
}
d=d", "c # c holds the current edge being examined
if(index(a,right(c))) { # these edges affect a
# print "3:"c
sub(right(c),"",a)
if(a=="") { # when a is empty, path is found
print d # d has the traversed path
exit
}
for (i in b) {
if(left(i)==right(c)) # only try the ones that can be added to the end
trav(a,b,i,d)
}
a=a""right(c)
} else {
# print "4:"c
for (i in b)
if(left(i)==right(c))
trav(a,b,i,d)
}
}
BEGIN { # playing with the traverse order
PROCINFO["sorted_in"]="cmp_str_ind"
}
NR==FNR {
a=a""$0 # a has the input (ADC)
next
}
{
b[$0]=$0 # b has the edges
}
END { # after reading in the data, recursively test every path
for(i in b) # candidate pruning the unfit ones first. CLR or Dijkstra
if(index(a,left(i))) { # were not consulted on that logic.
# print "3: "i
sub(left(i),"",a)
trav(a,b,i,left(i))
a=a""left(i)
}
else {
# print "2: "i
trav(a,b,i,left(i))
}
}
$ awk -f graph.awk input parse
A, A -> D, D -> F, F -> C
If you uncomment the BEGIN part, you get the A, A -> B, B -> D, D -> F, F -> C. I know, I should work on it more and comment it better but it's midnight over here. Maybe tomorrow.

Related

Getting "value without a container" error

Got this:
for $config.IO.slurp.lines <-> $l {
$l .= trim;
...
}
Get this:
t/01-basic.rakutest ..3/5
Parameter '$l' expects a writable container (variable) as an argument,
but got '# karabiner config file' (Str) as a value without a container.
in sub generate_file at...
I've read the docs on containers but it didn't shed any light on what I can do in this situation aside from maybe assigning $l to a scalar variable, which seems hacky. Is there a way I can containerize $l?
The issue is really that .lines does not produce containers. So with <->, you would bind to the value, rather than a container. There are several ways to solve this, by containerizing as you suggested:
for $config.IO.slurp.lines -> $l is copy {
$l .= trim;
...
}
But that only makes sense if you want to do more changes to $l. If this is really just about trimming the line that you receive, you could do this on the fly:
for $config.IO.slurp.lines>>.trim -> $l {
...
}
Or, if you need to do more pre-processing $l, use a .map:
for $config.IO.slurp.lines.map({
.trim.subst("foo","bar",:g)
}) -> $l {
...
}
Maybe below is what you want? Generally, you read a file via slurp you can comfortably handle its size, or you read a file via lines if you want input taken in lazily, one-line-at-a-time:
my $config = 'alphabet_one_letter_per_line.txt';
my $txt1 = $config.IO.slurp;
$txt1.elems.say; #1
$txt1.print; #returns alphabet same as input
my $txt2 = $config.IO.lines;
$txt2.elems.say; #26
$txt2.join("\n").put; #returns alphabet same as input
Above, you get only 1 element when slurping, but 26 elements when reading lines. As you can see from the above code, there's no need to "...(assign) $l to a scalar variable..." because there's no need to create (temporary variable) $l.
You can store text in #txt arrays, and get the same number of elements as above. And you can just call routines on your stored text, as you have been doing (example below continues $txt2 example above):
$txt2.=map(*.uc);
say $txt2;
Sample Output:
(A B C D E F G H I J K L M N O P Q R S T U V W X Y Z)
[Note, this question seems to have triggered questions on the use of $txt2.=map(*.uc); versus $txt2.=uc;. My rule-of-thumb is simple: if the data structure I'm working on has more than one element, I map using * 'whatever-star' to address the routine call to each element].
https://docs.raku.org/

Match inside match - ocaml raises syntax error

Does anyone know why this function raises the syntax error? I haven't provided my written side functions, since they are probably not that relevant here, since it's revolving around proper syntax.
I tried deleting the brackets that raised the error (which I think.. should be there?), only to then raise another syntax error one line lower, at the begining of the row with the line "|".
type 'a grid = 'a Array.t Array.t
type problem = { initial_grid : int option grid }
type available = { loc : int * int; possible : int list }
type state = { problem : problem; current_grid : int option grid; available = available list }
let branch_state (state : state) : (state * state) option =
if prazni_kvadratki state.current_grid = [] then
None
else
let lst = prazni_kvadratki state.current_grid in
let loc = List.hd lst in
let st1_grid = copy_grid state.current_grid in
let st2_grid = copy_grid state.current_grid in
match razpolozljive state.current_grid loc with
| x :: xs -> (vstavi_vrednost st1_grid loc (Some x);
let st1 = {problem = state.problem; current_grid = st1_grid} in
match xs with
| [y] -> (vstavi_vrednost st2_grid loc (Some y);
let st2 = {
problem = state.problem;
current_grid = st2_grid
}) (* this is where it shows me a syntax error*)
| y :: ys -> let st2 = {
problem = state.problem;
current_grid = copy_grid state.current_grid;
available = {loc = loc; possible = xs}
})
Some (st1, st2)
On around the 5th last line or so you have let with no matching in. The let expression always must have an in.
The basic rule for nested match is that you should use parentheses or begin/end around the inner one:
match x with
| [] -> 0
| [_] ->
begin
match y with
| [] -> 1
| _ -> 2
end
| _ -> 3
Otherwise the final cases of the outer match look like they belong to the inner one. I don't think this is your problem here because you have no outer cases after the inner match.
Syntax issues
You have a few syntax issues.
type state = { problem : problem; current_grid : int option grid; available = available list }
You likely meant to have:
type state = { problem : problem; current_grid : int option grid; available : available list }
However, given how you construct values later in your program where you provide a value for the available field in one case but not in the other, you may want a variant type that allows your state type to be constructed with or without this value, with distinct behavior when not constructed with this value. This might look like:
type state =
| With_available of { problem : problem;
current_grid : int option grid;
available : available list }
| Without_available of { problem : problem;
current_grid : int option grid }
The other syntax issue is missing an in to go with a let which brings us to:
Scoping issues
There are clearly some miunderstandings here for you in regards to how scope works with let bindings in OCaml.
Aside from a definition at the topmost level of a program, all let bindings are local bindings. That is, they apply to a single expression that trails an in keyword.
Consider this toplevel session.
# let x = 5;;
val x : int = 5
# let y =
let x = 42 in
x + 3;;
val y : int = 45
# x;;
- : int = 5
#
Here the x bound with let x = 42 in x + 3 is only in scope for the duration of the expression x + 3. Once we're done with that expression, that binding for x is gone. In the outer scope, x is still bound to 5.
In both cases in your match you bind names st1 and st2, which would have to be local bindings, but then you try to use them in an outer scope, where they don't exist.
If you want st1 and st2, you'd need to bind them in a similar way to a and b in the below simple example.
# let (a, b) = match [1; 2; 3] with
| [x] -> (x, x)
| x :: y :: _ -> (x, y)
| _ -> (1, 1)
in
a + b;;
- : int = 3
#
Pattern-matching
Please also note that the pattern-matching you're shown is not exhaustive. It does not handle an empty list. If you consider it impossible that an empty list will be a result, you still have to either handle it anyway or use a different data structure than a list which can by definition be empty.
You've shown pattern-matching of the basic pattern:
match some_list with
| x :: xs ->
match xs with
| [y] -> ...
| y :: xs -> ...
We can actually match against the two possibilities you've show in one level of match.
match some_list with
| x :: [y] -> ...
| x :: y :: ys -> ...
If you still need to address y :: ys as xs in the second case, we can readily bind that name with the as keyword.
match some_list with
| x :: [y] -> ...
| x :: (y :: ys as xs) -> ...

"Mix" operator does not wait for upstream processes to finish

I have several upstream processes, say A, B and C, doing similar tasks.
Downstream of that, I have one process X that needs to treat all outputs of the A, B and C in the same way.
I tried to use the "mix" operator to create a single channel from the output files of A, B and C like so :
process A {
output:
file outA
}
process B {
output:
file outB
}
process C {
output:
file outC
}
inX = outA.mix(outB,outC)
process X {
input:
file inX
"myscript.sh"
}
Process A often finishes before B and C, and somehow, process X does not wait for process B and C to finish, and only take the outputs of A as input.
The following snippet works nicely:
process A {
output:
file outA
"""
touch outA
"""
}
process B {
output:
file outB
"""
touch outB
"""
}
process C {
output:
file outC
"""
touch outC
"""
}
inX = outA.mix(outB,outC)
process X {
input:
file inX
"echo myscript.sh"
}
If you continue to experience the same problem feel free to open an issue including a reproducible test case.

How should I model a type-safe index in Purescript?

In my application, I'd like to index sets of objects in a type-safe way using a structure similar to a relational database index. For example, I might want to index a set of User objects based on age and name:
import Data.Map as M
import Data.Set as S
type AgeNameIndex = M.Map Int (M.Map String (S.Set User))
Furthermore, I'd like to do operations like union and difference on indexes efficiently, e.g.:
let a = M.singleton 42 $ M.singleton "Bob" $ S.singleton $ User { ... }
b = M.singleton 42 $ M.singleton "Tim" $ S.singleton $ User { ... }
c = union a b -- contains both Bob and Tim
I've tried to model this as follows:
module Concelo.Index
( index
, union
, subtract
, lastValue
, subIndex ) where
import Prelude (($), (>>>), flip, Unit, unit, class Ord)
import Control.Monad ((>>=))
import Data.Map as M
import Data.Set as S
import Data.Maybe (Maybe(Nothing, Just), fromMaybe)
import Data.Tuple (Tuple(Tuple))
import Data.Foldable (foldl)
import Data.Monoid (mempty)
class Index index key value subindex where
isEmpty :: index -> Boolean
union :: index -> index -> index
subtract :: index -> index -> index
lastValue :: index -> Maybe value
subIndex :: key -> index -> subindex
instance mapIndex :: (Index subindex subkey value subsubindex) =>
Index (M.Map key subindex) key value subindex where
isEmpty = M.isEmpty
union small large =
foldl (m (Tuple k v) -> M.alter (combine v) k m) large (M.toList small)
where
combine v = case _ of
Just v' -> Just $ union v v'
Nothing -> Just v
subtract small large =
foldl (m (Tuple k v) -> M.alter (minus v) k m) large (M.toList small)
where
minus v = (_ >>= v' ->
let subindex = subtract v v' in
if isEmpty subindex then Nothing else Just subindex)
lastValue m = M.findMax m >>= (_.value >>> lastValue)
subIndex k m = fromMaybe mempty $ M.lookup k m
instance setIndex :: (Ord value) => Index (S.Set value) Unit value Unit where
isEmpty = S.isEmpty
union = S.union
subtract = flip S.difference
lastValue s = Nothing -- todo: S.findMax
subIndex _ _ = unit
index f = foldl (acc v -> union (f v) acc) mempty
However, the Purescript compiler doesn't like that:
Compiling Concelo.Index
Error found:
in module Concelo.Index
at /home/dicej/p/pssync/src/Concelo/Index.purs line 24, column 1 - line 44, column 49
No type class instance was found for
Concelo.Index.Index subindex0
t1
t2
t3
The instance head contains unknown type variables. Consider adding a type annotation.
in value declaration mapIndex
where subindex0 is a rigid type variable
t1 is an unknown type
t2 is an unknown type
t3 is an unknown type
See https://github.com/purescript/purescript/wiki/Error-Code-NoInstanceFound for more information,
or to contribute content related to this error.
My understanding of this message is that I haven't properly stated that map values in the mapIndex instance are themselves Index instances, but I don't know how to fix that. Where might I add a type annotation to make this compile? Or am I even on the right track given what I'm trying to do?
This is almost certainly because PureScript currently lacks functional dependencies (or type families) which makes this kind of information un-inferrable. There's a writeup of the issue here: https://github.com/purescript/purescript/issues/1580 - it is something we want to support.
There was a discussion about a case very similar to this today as it happens: https://github.com/purescript/purescript/issues/2235
Essentially, the problem here is that the functions of the class do not use all of the type variables, which means there's no way to propagate the information to the constraint for looking up a suitable instance.
I don't really have a suggestion for how to do what you're after here with things as they are, aside from avoiding the class and implementing it with specific types in mind.

Erlang Dynamic Record Editing

I'm storing some data in mnesia, and I'd like to be able to change most of the values involved.
The naive
change(RecordId, Slot, NewValue) ->
[Rec] = do(qlc:q([X || X <- mnesia:table(rec), X#rec.id =:= RecordId])),
NewRec = Rec#rec{Slot=NewValue},
F = fun() -> mnesia:write(NewRec) end,
{atomic, Val} = mnesia:transaction(F),
Val.
doesn't do it; the compiler complains that Slot is not an atom or _. Is there a way to express a general slot editing function as above, or am I going to be stuck defining a whole bunch of change_slots?
A marginally better approach is to pull out the insert and find pieces
atomic_insert(Rec) ->
F = fun() -> mnesia:write(Rec) end,
{atomic, Val} = mnesia:transaction(F),
Val.
find(RecordId) ->
[Rec] = do(qlc:q([X || X <- mnesia:table(rec), X#rec.id =:= RecordId])),
Rec.
change(RecordId, name, NewValue) ->
Rec = find(RecordId),
NewRec = Rec#rec{name=NewValue},
atomic_insert(NewRec);
change(RecordId, some_other_property, NewValue) ->
Rec = find(RecordId),
NewRec = Rec#rec{some_other_property=NewValue},
...
but there's still a bit of code duplication there. Is there any way to abstract that pattern out? Is there an established technique to allow records to be edited? Any ideas in general?
Since records are represented by tuples, you could try using tuple operations to set individual values.
-module(rec).
-export([field_num/1, make_rec/0, set_field/3]).
-record(rec, {slot1, slot2, slot3}).
make_rec() ->
#rec{slot1=1, slot2=2, slot3=3}.
field_num(Field) ->
Fields = record_info(fields, rec),
DifField = fun (FieldName) -> Field /= FieldName end,
case length(lists:takewhile(DifField, Fields)) of
Length when Length =:= length(Fields) ->
{error, not_found};
Length ->
Length + 2
end.
set_field(Field, Value, Record) ->
setelement(field_num(Field), Record, Value).
set_field will return an updated record:
Eshell V5.9.1 (abort with ^G)
1> c(rec).
{ok,rec}
2> A = rec:make_rec().
{rec,1,2,3}
3> B = rec:set_field(slot3, other_value, A).
{rec,1,2,other_value}
You can also define change as a macro (especially if it used only inside the module):
-define(change(RecordId, Slot, NewValue),
begin
[Rec] = do(qlc:q([X || X <- mnesia:table(rec), X#rec.id =:= RecordId])),
NewRec = Rec#rec{Slot=NewValue},
F = fun() -> mnesia:write(NewRec) end,
{atomic, Val} = mnesia:transaction(F),
Val
end).
Usage:
test(R, Id) ->
?change(Id, name, 5).
With macro you can also pass _ as a field (good for pattern matching).
Another way of using that a record is really a tuple would be:
change(RecordId, Index, NewValue) ->
[Rec] = do(qlc:q([X || X <- mnesia:table(rec), X#rec.id =:= RecordId])),
NewRec = setelement(Index, Rec, NewValue),
F = fun() -> mnesia:write(NewRec) end,
{atomic, Val} = mnesia:transaction(F),
Val.
which you could use like this:
5> Val = record:change(id58, #rec.name, new_value).
This is also a "clean" use of records as tuples as you are using the #rec.name syntax to find the index of the field in the tuple. It was the reason this syntax was added.