How to create a variable length RowParser in Scala for Anorm? - sql

A typical parser in Anorm looks a bit like this:
val idSeqParser: RowParser[IDAndSequence] = {
long("ID") ~
int("sequence") map {
case id ~ sequence => IDAndSequence(id, sequence)
}
}
Assuming of course that you had a case class like so:
case class IDAndSequence(id: Long, sequence: Int = 0)
All handy-dandy when you know this up front however what if you want to run ad-hoc queries (raw SQL) that you write at run time? (hint: an on the fly reporting engine)
How does one tackle that problem?
Can you create a series of generic parsers or various numbers of fields (which I see Scala itself had to resort to when processing tuples on Forms meaning you can only go to 22 elements in a form and unsure what the heck you do after that...)
You can assume that "everything is a string" for the purpose of reporting so Option[String] should cut it.
Can a parser be created on the fly however? If so what would doing that look like?
Is the a more elegant way to address this "problem"?
EDIT (to help clarify what I'm after)
As I could "ask" using aliases
Select f1 as 'a', f2 as 'b', f3 as 'c' from sometable
Then I could collect that with a pre-written parser like so
val idSeqParser: RowParser[IDAndSequence] = {
get[Option[String]]("a") ~
get[Option[String]]("b") ~
get[Option[String]]("c") map {
case a ~ b ~ c => GenericCase(a, b, c)
}
}
However that means I would need to de alias the columns for the actual report output. The suggestion of SqlParser.flatten already puts me ahead there as it has up to 22 (there's that "literal" kludge!) columns.
As I've written reports with greater than 22 columns in times past -- mostly as inputs to spreadsheets for further manual dat mining -- I would like to escape that limit if possible. Hard to tell a client you can't have that urgent 27 column report for 5 days but this 21 column one you can have in 5 minutes...
Going to try an experiment today to see if I can't find my own workable solution.

Related

Referencing nested arrays in awk

I'm creating a bunch of mappings that can be indexed into using 3 keys such as below:
mappings["foo"]["bar"]["blah"][1]=0
split( "10,13,19,49", mappings["foo"]["bar"]["blah"] )
I can then index into the nested array using for example
mappings[product][format][version][i]
But this is a bit long-winded when I need to refer to the same nested array several times, so in other languages I'd create a reference to the inner array:
map=mappings[product][format][version]
map[i]
However, I can't seem to get this to work in awk (gawk 4.1.3).
I can only find one link over google, that suggests this is impossible in previous versions of awk, and a loop setting the keys and values one-by-one is the only solution. Is this still the case or does anyone have a suggestions for a better solution?
https://developer.apple.com/library/archive/documentation/OpenSource/Conceptual/ShellScripting/Howawk-ward/Howawk-ward.html
EDIT
In response to comments a bit more background on what I'm trying to do. If there is a better approach, I'm all for using it!
I have set of CSV files that I'm feeding into AWK. The idea is to calculate a checksum based on specific columns after applying filtering to the rows.
The columns to checksum on, and the filtering to apply, are derivived from runtime parameters sent into the script.
The runtime parameters are a triple of (product,format,version), hence my use of a 3-nested assoicative array.
Another approach would be to use triple as a single key, rather than nesting, but gawk doesn't seem to natively support this, so I'd end-up concatenating the values as string. This felt a bit less structured to me, but if I'm wrong, happy to change my mind on this apporach.
Anyway, it is these parameters that are used to index into the array to structure to retrieve the column numbers, etc.
You can then build-up a tree-like structure, for example, the below shows 2 formats for product foo on version blah, and so on...:
mappings["product-foo"]["format-bar"]["version-blah"][1]=0
split( "10,13,19,49", mappings["product-foo"]["format-bar"]["version-blah"] )
mappings["product-foo"]["format-moo"]["version-blah"][1]=0
split( "55,23,14,6", mappings["product-foo"]["format-moo"]["version-blah"] )
The magic happens like this, you can see how long-winded the mappings indexing becomes without referencing:
(FNR>1 && (format!="some-format" ||
(version=="some-version" && $1=="some-filter") ||
(version=="some-other-version" && $8=="some-other-filter"))) {
# Loop over each supplied field summing an absolute tally for each
for (i=1; i <= length(mappings[product][format][version]); i++) {
sumarr[i] += ( $mappings[product][format][version][i] < 0 ? -$mappings[product][format][version][i]:$mappings[product][format][version][i] )
}
}
The comment from #ed-morton simplifies this as originally requested, but interested if their is a simpler approach.
The right answer is from #ed-morton above (thanks!).
Ed - if you write it out as an answer I'll accept it, otherwise I'll accept this quote in a few days for good housekeeping.
Right, there is no array copy functionality in awk and there are no pointers/references so you can't create a pointer to an array. You can of course create function map(i) { return mappings[product][format][version][i]}

AS400 RPGLE/free dynamic variables in operations

I'm fairly certain after years of searching that this is not possible, but I'll ask anyway.
The question is whether it's possible to use a dynamic variable in an operation when you don't know the field name. For example, I have a data structure that contains a few hundred fields. The operator selects one of those fields and the program needs to know what data resides in the field from the data structure passed. So we'll say that there are 100 fields, and field50 is what the operator chose to operate on. The program would be passed in the field name (i.e. field50) in the FLDNAM variable. The program would read something like this the normal way:
/free
if field50 = 'XXX'
// do something
endif;
/end-free
The problem is that I would have to code this 100 times for every operation. For example:
/free
if fldnam = 'field1';
// do something
elseif fldnam = 'field2';
// do something
..
elseif fldnam = 'field50';
// do something
endif;
Is there any possible way of performing an operation on a field not yet known? (i.e. IF FLDNAM(pointer data) = 'XXX' then do something)
If the data structure is externally-described and you know what file it comes from, you could use the QUSLFLD API to find out the offset, length, and type of the field in the data structure, and then use substring to get the data and then use other calculations to get the value, depending on the data type.
Simple answer, no.
RPG's simply not designed for that. Few languages are.
You may want to look at scripting languages. Perl for instance, can evaluate on the fly. REXX, which comes installed on the IBM i, has an INTERPRET keyword.
REXX Reference manual

R - find name in string that matches a lookup field using regex

I have a data frame of ad listings for pets:
ID Ad_title
1 1 year old ball python
2 Young red Blood python. - For Sale
3 1 Year Old Male Bearded Dragon - For Sale
I would like take the common name in the Ad_listing (i.e. ball pyton) and create a new field with the Latin name for the species. To assist, I have another data frame that has the latin names and common names:
ID Latin_name Common_name
1 Python regius E: Ball Python, Royal Python G: Königspython
2 Python brongersmai E: Red Blood Python, Malaysian Blood Python
3 Pogona barbata E: Eastern Bearded Dragon, Bearded Dragon
How can I go about doing this? The tricky part is that the common names are hidden in between text both in the ad listing and in the Common_name. If that were not the case I could just use %in%. If there was a way/function to use regex I think that would be helpful.
The other answer does a good job outlining the general logic, so here's a few thoughts on a simple (though not optimized!!) way to do this:
First, you'll want to make a big table, two columns of all 'common names' (each name gets its own row) alongside it's Latin name. You could also make a dictionary here, but I like tables.
reference_table <- data.frame(common = c("cat", "kitty", "dog"), technical = c("feline", "feline", "canine"))
common technical
1 cat feline
2 kitty feline
3 dog canine
From here, just loop through every element of "ad_title" (use apply() or a for loop, depending on your preference). Now use something like this:
apply(reference_table,1, function(X) {
if (length(grep(X$common, ad_title)) > 0){ #If the common name was found in the ad_title
[code to replace the string]})
For inserting the new string, play with your regular regex tools. Alternatively, play with strsplit(ad_title, X$common). You'll be able to rebuild the ad_title using paste(), and the parts that make up the strsplit.
Again, this is NOT the best way to do this, but hopefully the logic is simple.
Well, I tried to create a workable solution for your requirement. There could be better ways to execute it, though, probably using packages such as data.table and/or stringr. Anyway, this snippet could be a working starting point. Oh, and I modified the Ad_title data a bit so that the species names are in titlecase.
# Re-create data
Ad_title <- c("1 year old Ball Python", "Young Red Blood Python. - For Sale",
"1 Year Old Male Bearded Dragon - For Sale")
df2 <- data.frame(Latin_name = c("Python regius", "Python brongersmai", "Pogona barbata"),
Common_name = c("E: Ball Python, Royal Python G: Königspython",
"E: Red Blood Python, Malaysian Blood Python",
"E: Eastern Bearded Dragon, Bearded Dragon"),
stringsAsFactors = F)
# Aggregate common names
Common_name <- paste(df2$Common_name, collapse = ", ")
Common_name <- unlist(strsplit(Common_name, "(E: )|( G: )|(, )"))
Common_name <- Common_name[Common_name != ""]
# Data frame latin names vs common names
df3 <- data.frame(Common_name, Latin_name = sapply(Common_name, grep, df2$Common_name),
row.names = NULL, stringsAsFactors = F)
df3$Latin_name <- df2$Latin_name[df3$Latin_name]
# Data frame Ad vs common names
Ad_Common_name <- unlist(sapply(Common_name, grep, Ad_title))
df4 <- data.frame(Ad_title, Common_name = sapply(1:3, function(i) names(Ad_Common_name[Ad_Common_name==i])),
stringsAsFactors = F)
obviously you need a loop structure for all your common name lookup table and another loop that splits this compound field on comma, before doing simple regex. there's no sane regex that will do it all.
in future avoid using packed/compound structures that require packing and unpacking. it looks fine for human consumption but semantically and for computer program consumption, you have multiple data values packed in single field, i.e. it's not "common name" it's "common names" delimited by comma, that you have there.
sorry if i haven't provided R or whatever specific answer. I'm a technology veteran and use many languages/technologies depending on problem and available resources. you will need to iterate over every record of your latin names lookup table, within which you will need to iterate over the comma delimited packed field of "common names", so you're working with one common name at a time. with that single common name you search/replace using regex or whatever means available to you, over the whole input file. it's plain and simple that you need to start at it from that end, i.e. the lookup table. you need to iterlate/loop through that. iteration/looping should be familiar to you, as it's a basic building block of any program/script. this kind of procedural logic is not part of the capability (or desired functionality) of regex itself. I assume you know how to create a iterative construct in R or whatever you're using for this.

Esper - pattern detection

I have a question for the community regarding pattern detection with Esper.
Suppose you want to detect the following pattern among a collection of data : A B C
However, it is possible, that in the actual data, you might have: A,B,D,E,C. My goal is to design a rule that could still detect A B C by keeping A B in memory, and fire the alert as soon as it sees C.
Is it possible to do this? With the standard select * from pattern(a = event -> b= event -> c=event), It only outputs when the three are in sequence in the data, but not when there are other useless data between them
With the standard "select * from pattern [a=A -> b=B]" there can be any events between A and B. Your statement is therefore wrong. I think you are confused about how to remove useless data. Use a filter such as "a=event(...not useless...) -> b=event(...not useless...)". Within the parens place the filter expressions that distinguish between useless and not useless events, i.e. "a=event(amount>10)" or whatever.

Constructing a recursive compare with SQL

This is an ugly one. I wish I wasn't having to ask this question, but the project is already built such that we are handling heavy loads of validations in the database. Essentially, I'm trying to build a function that will take two stacks of data, weave them together with an unknown batch of operations or comparators, and produce a long string.
Yes, that was phrased very poorly, so I'm going to give an example. I have a form that can have multiple iterations of itself. For some reason, the system wants to know if the entered start date on any of these forms is equal to the entered end date on any of these forms. Unfortunately, due to the way the system is designed, everything is stored as a string, so I have to format it as a date first, before I can compare. Below is pseudo code, so please don't correct me on my syntax
Input data:
'logFormValidation("to_date(#) == to_date(^)"
, formname.control1name, formname.control2name)'
Now, as I mentioned, there are multiple iterations of this form, and I need to loop build a fully recursive comparison (note: it may not always be typical boolean comparisons, it could be internally called functions as well, so .In or anything like that won't work.) In the end, I need to get it into a format like below so the validation parser can read it.
OR(to_date(formname.control1name.1) == to_date(formname.control2name.1)
,to_date(formname.control1name.2) == to_date(formname.control2name.1)
,to_date(formname.control1name.3) == to_date(formname.control2name.1)
,to_date(formname.control1name.1) == to_date(formname.control2name.2)
:
:
,to_date(formname.control1name.n) == to_date(formname.control2name.n))
Yeah, it's ugly...but given the way our validation parser works, I don't have much of a choice. Any input on how this might be accomplished? I'm hoping for something more efficient than a double recursive loop, but don't have any ideas beyond that
Okay, seeing as my question is apparently terribly unclear, I'm going to add some more info. I don't know what comparison I will be performing on the items, I'm just trying to reformat the data into something useable for ANY given function. If I were to do this outside the database, it'd look something like this. Note: Pseudocode. '#' is the place marker in a function for vals1, '^' is a place marker for vals2.
function dynamicRecursiveValidation(string functionStr, strArray vals1, strArray vals2){
string finalFunction = "OR("
foreach(i in vals1){
foreach(j in vals2){
finalFunction += functionStr.replace('#', i).replace('^', j) + ",";
}
}
finalFunction.substring(0, finalFunction.length - 1); //to remove last comma
finalFunction += ")";
return finalFunction;
}
That is all I'm trying to accomplish. Take any given comparator and two arrays, and create a string that contains every possible combination. Given the substitution characters I listed above, below is a list of possible added operations
# > ^
to_date(#) == to_date(^)
someFunction(#, ^)
# * 2 - 3 <= ^ / 4
All I'm trying to do is produce the string that I will later execute, and I'm trying to do it without having to kill the server in a recursive loop
I don't have a solution code for this but you can algorithmically do the following
Create a temp table (start_date, end_date, formid) and populate it with every date from any existing form
Get the start_date from the form and simply:
SELECT end_date, form_id FROM temp_table WHERE end_date = <start date to check>
For the reverse
SELECT start_date, form_id FROM temp_table WHERE start_date = <end date to check>
If the database is available why not let it do all the heavy lifting.
I ended up performing a cross product of the data, and looping through the results. It wasn't the sort of solution I really wanted, but it worked.