jq reducing stream to an array of all leaf values using input - input

I want to receive streamed json inputs and reduce them to an array containing the leaf values.
Demo: https://jqplay.org/s/cZxLguJFxv
Please consider
filter:
try reduce range(30) as $i ( []; (.+[until(length==2;input)[1]] // error(.)) )
catch empty
input:
[
[
0,
0,
"a"
],
null
]
[
[
0,
0,
"a"
]
]
[
[
0,
1,
"b"
],
null
]
[
[
0,
1,
"b"
]
]
[
[
0,
1
]
]
[
[
1
],
0
]
...
output:
empty
I expect the output: [null, null, 0, ...] but I get empty instead.
I told reduce to iterate 30 times but the size of inputs is less than that. I'm expecting it will empty those input of length other than 2 and produce an array containing all leaf values.
I don't know how this will behave when there is no more input with length 2 left and there are iterations of reduce left.
I want to know why my filter returns empty. What am I doing wrong? Thanks!

These filters should do what you want:
jq -n 'reduce inputs as $in ([]; if $in | has(1) then . + [$in[1]] else . end)'
Demo
jq -n '[inputs | select(has(1))[1]]'
Demo

Related

Earley algorithm gone wrong

I am trying to implement Earley's algorithm for parsing a grammar, however I must have done something wrong because after the first entry in the chart it doesn't go through the rest of the input string. My test grammar is the following:
S -> aXbX | bXaX
X -> aXbX | bXaX | epsilon
S and X are non-terminals; a and b are terminals.
The string I want to check if it is accepted or not by the grammar is: 'abba'.
Here is my code:
rules = {
"S": [
['aXbX'],
['bXaX'],
],
"X" : [
['aXbX'],
['bXaX'],
['']
]
}
def predictor(rule, state):
if rule["right"][rule["dot"]].isupper(): # NON-TERMINAL
return [{
"left": rule["right"][rule["dot"]],
"right": right,
"dot": 0,
"op": "PREDICTOR",
"completor": []
} for right in rules[rule["right"][rule["dot"]]]]
else:
return []
def scanner(rule, next_input):
# TERMINAL
if rule["right"][rule["dot"]].islower() and next_input in rules[rule["right"][rule["dot"]]]:
print('scanner')
return [{
"left": rule["right"][rule["dot"]],
"right": [next_input],
"dot": 1,
"op": "SCANNER",
"completor": []
}]
else:
return []
def completor(rule, charts):
if rule["dot"] == len(rule["right"]):
print('completor')
return list(map(
lambda filter_rule: {
"left": filter_rule["left"],
"right": filter_rule["right"],
"dot": filter_rule["dot"] + 1,
"op": "COMPLETOR",
"completor": [rule] + filter_rule["completor"]
},
filter(
lambda p_rule: p_rule["dot"] < len(p_rule["right"]) and rule["left"] == p_rule["right"][p_rule["dot"]],
charts[rule["state"]]
)
))
else:
return []
input_string = 'abba'
input_arr = [char for char in input_string] + ['']
charts = [[{
"left": "S'",
"right": ["S"],
"dot": 0,
"op": "-",
"completor": []
}]]
for curr_state in range(len(input_arr)):
curr_chart = charts[curr_state]
next_chart = []
for curr_rule in curr_chart:
if curr_rule["dot"] < len(curr_rule["right"]): # not finished
curr_chart += [i for i in predictor(curr_rule, curr_state) if i not in curr_chart]
next_chart += [i for i in scanner(curr_rule, input_arr[curr_state]) if i not in next_chart]
else:
print('else')
curr_chart += [i for i in completor(curr_rule, charts) if i not in curr_chart]
charts.append(next_chart)
def print_charts(charts, inp):
for chart_no, chart in zip(range(len(charts)), charts):
print("\t{}".format("S" + str(chart_no)))
print("\t\n".join(map(
lambda x: "\t{} --> {}, {} {}".format(
x["left"],
"".join(x["right"][:x["dot"]] + ["."] + x["right"][x["dot"]:]),
str(chart_no) + ',',
x["op"]
),
chart
)))
print()
print_charts(charts[:-1], input_arr)
And this is the output I get (for states 1 to 4 I should get 5 to 9 entries):
S0
S' --> .S, 0, -
S --> .aXbX, 0, PREDICTOR
S --> .bXaX, 0, PREDICTOR
S1
S2
S3
S4

Paste multi line code in elm-repl

I'm just trying to evaluate some expressions in elm-repl but I don't know how to paste it in.
Something like:
List.map
(\l ->
li []
[ span [ class "position filled" ]
[]
]
)
[ 1, 2, 3 ]
You can span multiple lines in Elm REPL by ending each line with a backslash (\) character:
List.map \
(\l -> \
li [] \
[ span [ class "position filled" ] \
[] \
] \
) \
[ 1, 2, 3 ]

Inverse of `split` function: `join` a string using a delimeter

IN Red and Rebol(3), you can use the split function to split a string into a list of items:
>> items: split {1, 2, 3, 4} {,}
== ["1" " 2" " 3" " 4"]
What is the corresponding inverse function to join a list of items into a string? It should work similar to the following:
>> join items {, }
== "1, 2, 3, 4"
There's no inbuild function yet, you have to implement it yourself:
>> join: function [series delimiter][length: either char? delimiter [1][length? delimiter] out: collect/into [foreach value series [keep rejoin [value delimiter]]] copy {} remove/part skip tail out negate length length out]
== func [series delimiter /local length out value][length: either char? delimiter [1] [length? delimiter] out: collect/into [foreach value series [keep rejoin [value delimiter]]] copy "" remove/part skip tail out negate length length out]
>> join [1 2 3] #","
== "1,2,3"
>> join [1 2 3] {, }
== "1, 2, 3"
per request, here is the function split into more lines:
join: function [
series
delimiter
][
length: either char? delimiter [1][length? delimiter]
out: collect/into [
foreach value series [keep rejoin [value delimiter]]
] copy {}
remove/part skip tail out negate length length
out
]
There is an old modification of rejoin doing that
rejoin: func [
"Reduces and joins a block of values - allows /with refinement."
block [block!] "Values to reduce and join"
/with join-thing "Value to place in between each element"
][
block: reduce block
if with [
while [not tail? block: next block][
insert block join-thing
block: next block
]
block: head block
]
append either series? first block [
copy first block
] [
form first block
]
next block
]
call it like this rejoin/with [..] delimiter
But I am pretty sure, there are other, even older solutions.
Following function works:
myjoin: function[blk[block!] delim [string!]][
outstr: ""
repeat i ((length? blk) - 1)[
append outstr blk/1
append outstr delim
blk: next blk ]
append outstr blk ]
probe myjoin ["A" "B" "C" "D" "E"] ", "
Output:
"A, B, C, D, E"

How do I refer to variable in func argument when same is used in foreach

How can I refer to date as argument in f within the foreach loop if date is also used as block element var ? Am I obliged to rename my date var ?
f: func[data [block!] date [date!]][
foreach [date o h l c v] data [
]
]
A: simple, compose is your best friend.
f: func[data [block!] date [date!]][
foreach [date str] data compose [
print (date)
print date
]
]
>> f [2010-09-01 "first of sept" 2010-10-01 "first of october"] now
7-Sep-2010/21:19:05-4:00
1-Sep-2010
7-Sep-2010/21:19:05-4:00
1-Oct-2010
You need to either change the parameter name from date or assign it to a local variable.
You can access the date argument inside the foreach loop by binding the 'date word from the function specification to the data argument:
>> f: func[data [block!] date [date!]][
[ foreach [date o h l c v] data [
[ print last reduce bind find first :f 'date 'data
[ print date
[ ]
[ ]
>> f [1-1-10 1 2 3 4 5 2-1-10 1 2 3 4 5] 8-9-10
8-Sep-2010
1-Jan-2010
8-Sep-2010
2-Jan-2010
It makes the code very difficult to read though. I think it would be better to assign the date argument to a local variable inside the function as Graham suggested.
>> f: func [data [block!] date [date!] /local the-date ][
[ the-date: :date
[ foreach [date o h l c v] data [
[ print the-date
[ print date
[ ]
[ ]
>> f [1-1-10 1 2 3 4 5 2-1-10 1 2 3 4 5] 8-9-10
8-Sep-2010
1-Jan-2010
8-Sep-2010
2-Jan-2010

I need to generate 50 Millions Rows csv file with random data: how to optimize this program?

The program below can generate random data according to some specs (example here is for 2 columns)
It works with a few hundred of thousand lines on my PC (should depend on RAM). I need to scale to dozen of millions row.
How can I optimize the program to write directly to disk ? Subsidiarily how can I "cache" the parsing rule execution as it is always the same pattern repeated 50 Millions times ?
Note: to use the program below, just type generate-blocks and save-blocks output will be db.txt
Rebol[]
specs: [
[3 digits 4 digits 4 letters]
[2 letters 2 digits]
]
;====================================================================================================================
digits: charset "0123456789"
letters: charset "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
separator: charset ";"
block-letters: [A B C D E F G H I J K L M N O P Q R S T U V W X Y Z]
blocks: copy []
generate-row: func[][
Foreach spec specs [
rule: [
any [
[
set times integer! [['digits (
repeat n times [
block: rejoin [block random 9]
]
)
|
'letters (repeat n times [
block: rejoin [ block to-string pick block-letters random 24]
]
)
]
|
[
'letters (repeat n times [block: rejoin [ block to-string pick block-letters random 24]
]
)
|
'digits (repeat n times [block: rejoin [block random 9]]
)
]
]
|
{"} any separator {"}
]
]
to end
]
block: copy ""
parse spec rule
append blocks block
]
]
generate-blocks: func[m][
repeat num m [
generate-row
]
]
quote: func[string][
rejoin [{"} string {"}]
]
save-blocks: func[file][
if exists? to-rebol-file file [
answer: ask rejoin ["delete " file "? (Y/N): "]
if (answer = "Y") [
delete %db.txt
]
]
foreach [field1 field2] blocks [
write/lines/append %db.txt rejoin [quote field1 ";" quote field2]
]
]
Use open with /direct and /lines refinement to write directly to file without buffering the content:
file: open/direct/lines/write %myfile.txt
loop 1000 [
t: random "abcdefghi"
append file t
]
Close file
This will write 1000 random lines without buffering.
You can also prepare a block of lines (lets say 10000 rows) then write it directly to file, this will be faster than writing line-by-line.
file: open/direct/lines/write %myfile.txt
loop 100 [
b: copy []
loop 1000 [append b random "abcdef"]
append file b
]
close file
this will be much faster, 100000 rows less than a second.
Hope this will help.
Note that, you can change the number 100 and 1000 according to your needs an memory of your pc, and use b: make block! 1000 instead of b: copy [], it will be faster.