SQL Array aggregation in Haskell + Squeal - sql

I'm using an SQL library called Squeal in Haskell.
What's the correct way to aggregate multiple text rows into one array in Squeal library?
Say I got a very simple Schema with just one table containing a column 'keyword' (of PG text type) + associated types:
import Squeal.PostgreSQL
import qualified GHC.Generics as GHC
import qualified Generics.SOP as SOP
type Constraints = '["pk_keywords" ::: 'PrimaryKey '["id"]]
type Columns
= '["id" ::: 'Def :=> 'NotNull 'PGint8, "keyword" ::: 'NoDef :=> 'NotNull 'PGtext]
type Table = 'Table (Constraints :=> KColumns)
type Schema = '["keywords" ::: Table]
type Schemas = '["public" ::: Schema]
newtype Keywords = Keywords {unKeywords :: [Text]} deriving (GHC.Generic)
instance SOP.Generic Keywords
instance SOP.HasDatatypeInfo Keywords
type instance PG Keywords = 'PGvararray ( 'NotNull 'PGtext)
This is the part I need help with:
I'm trying an aggregation query like this:
keywords :: Query_ Schemas () Keywords
keywords =
select_ ((arrayAgg (All #keyword)) `as` #fromOnly) (from (table #keywords))
However, I keep getting an error:
* Couldn't match type 'NotNull (PG [Text])
with 'Null ('PGvararray ty0)
arising from a use of `as'
From what I understand, arrayAgg can produce NULL so I need to provide a default of empty array [] somehow with fromNull from here:
https://hackage.haskell.org/package/squeal-postgresql-0.5.1.0/docs/Squeal-PostgreSQL-Expression-Null.html#v:fromNull
But I don't quite know how to provide that.
What about the value type mismatch (PG [Text] vs 'PGvararray ty0)? How to solve that?

For the record, the library's author provided a solution as follows:
keywords :: Query_ Schemas () (Only (VarArray [Text]))
keywords = select_
(fromNull (array [] & inferredtype) (arrayAgg (All #keyword)) `as` #fromOnly)
(from (table #keywords) & groupBy Nil)
The key factors here are:
Provide a default empty array with fromNull (array [] & inferredtype) .... This way we can avoid using Maybe in the return type
Provide grouping with groupBy Nil
Choose either Distinct or All rows in arrayAgg
Finally, the return type should be VarArray x

Related

How to retrieve the list of dynamic nested keys of BigQuery nested records

My ELT tools imports my data in bigquery and generates/extends automatically the schema for dynamic nested keys (in the schema below, under properties)
It looks like this
How can I get the list of nested keys of a repeated record ? so for example I can group by properties when those items have said property non-null ?
I have tried
select column_name
from my_schema.INFORMATION_SCHEMA.COLUMNS
where
table_name = 'my_table
But it will only list first level keys
From the picture above, I want, as a first step, a SQL query that returns
message
user_id
seeker
liker_id
rateable_id
rateable_type
from_organization
likeable_type
company
existing_attempt
...
My real goal through, is to group/count my data based on a non-null value of a 2nd level nested properties properties.filters.[filter_type]
The schema may evolve when our application adds more filters, so this need to be dynamically generated, I can't just hard-code the list of nested keys.
Note: this is very similar to this question How to extract all the keys in a JSON object with BigQuery but in my case my data is already in a shcema and it's not a JSON object
EDIT:
Suppose I have a list of such records with nested properties, how do I write a SQL query that adds a field "enabled_filters" which aggregates, for each item, the list of properties for wihch said property is not null ?
Example input (properties.x are dynamic and not known by the programmer)
search_id
properties.filters.school
properties.filters.type
1
MIT
master
2
Princetown
null
3
null
master
Example output
search_id
enabled_filters
1
["school", "type"]
2
["school"]
3
["type"]
Have you looked at COLUMN_FIELD_PATHS? It should give you the paths for all columns.
select field_path from my_schema.INFORMATION_SCHEMA.COLUMN_FIELD_PATHS where table_name = '<table>'
[https://cloud.google.com/bigquery/docs/information-schema-column-field-paths]
The field properties is not nested by array only by structures. Then a UDF in JavaScript to parse thise field should work fast enough.
CREATE TEMP FUNCTION jsonObjectKeys(input STRING, shownull BOOL,fullname Bool)
RETURNS Array<String>
LANGUAGE js AS """
function test(input,old){
var out=[]
for(let x in input){
let te=input[x];
out=out.concat(te==null ? (shownull?[x+'==null']:[]) : typeof te=='object' ? test(te,old+x+'.') : [fullname ? old+x : x] );
}
return out;
Object.keys(JSON.parse(input));
}
return test(JSON.parse(input),"");
""";
with tbl as (select struct(1 as alpha,struct(2 as x, 3 as y,[1,2,3] as z ) as B) A from unnest(generate_array(1,10*1))
union all select struct(null,struct(null,1,[999])) )
select *,
TO_JSON_STRING (A ) as string_output,
jsonObjectKeys(TO_JSON_STRING (A),true,false) as output1,
jsonObjectKeys(TO_JSON_STRING (A),false,true) as output2,
concat('["', array_to_string(jsonObjectKeys(TO_JSON_STRING (A),false,true),'","' ) ,'"]') as output_sring,
jsonObjectKeys(TO_JSON_STRING (A.B),false,true) as outpu
from tbl

how to dynamically build select list from a API payload using PyPika

I have a JSON API payload containing tablename, columnlist - how to build a SELECT query from it using pypika?
So far I have been able to use a string columnlist, but not able to do advanced querying using functions, analytics etc.
from pypika import Table, Query, functions as fn
def generate_sql (tablename, collist):
table = Table(tablename)
columns = [str(table)+'.'+each for each in collist]
q = Query.from_(table).select(*columns)
return q.get_sql(quote_char=None)
tablename = 'customers'
collist = ['id', 'fname', 'fn.Sum(revenue)']
print (generate_sql(tablename, collist)) #1
table = Table(tablename)
q = Query.from_(table).select(table.id, table.fname, fn.Sum(table.revenue))
print (q.get_sql(quote_char=None)) #2
#1 outputs
SELECT "customers".id,"customers".fname,"customers".fn.Sum(revenue) FROM customers
#2 outputs correctly
SELECT id,fname,SUM(revenue) FROM customers
You should not be trying to assemble the query in a string by yourself, that defeats the whole purpose of pypika.
What you can do in your case, that you have the name of the table and the columns coming as texts in a json object, you can use * to unpack those values from the collist and use the syntax obj[key] to get the table attribute with by name with a string.
q = Query.from_(table).select(*(table[col] for col in collist))
# SELECT id,fname,fn.Sum(revenue) FROM customers
Hmm... that doesn't quite work for the fn.Sum(revenue). The goal is to get SUM(revenue).
This can get much more complicated from this point. If you are only sending column names that you know to belong to that table, the above solution is enough.
But if you have complex sql expressions, making reference to sql functions or even different tables, I suggest you to rethink your decision of sending that as json. You might end up with something as complex as pypika itself, like a custom parser or wathever. than your better option here would be to change the format of your json response object.
If you know you only need to support a very limited set of capabilities, it could be feasible. For example, you can assume the following constraints:
all column names refer to only one table, no joins or alias
all functions will be prefixed by fn.
no fancy stuff like window functions, distinct, count(*)...
Then you can do something like:
from pypika import Table, Query, functions as fn
import re
tablename = 'customers'
collist = ['id', 'fname', 'fn.Sum(revenue / 2)', 'revenue % fn.Count(id)']
def parsed(cols):
pattern = r'(?:\bfn\.[a-zA-Z]\w*)|([a-zA-Z]\w*)'
subst = lambda m: f"{'' if m.group().startswith('fn.') else 'table.'}{m.group()}"
yield from (re.sub(pattern, subst, col) for col in cols)
table = Table(tablename)
env = dict(table=table, fn=fn)
q = Query.from_(table).select(*(eval(col, env) for col in parsed(collist)))
print (q.get_sql(quote_char=None)) #2
Output:
SELECT id,fname,SUM(revenue/2),MOD(revenue,COUNT(id)) FROM customers

Dynamic ASSIGN of table row expression

In my ABAP report I have some structure:
data:
begin of struct1,
field1 type char10,
end of struct1.
I can access to it's field field1 directly:
data(val) = struct1-field1
or dynamically with assign:
assign ('struct1-field1') to field-symbol(<val>).
Also I have some internal table:
data: table1 like standard table of struct1 with default key.
append initial line to table1.
I can access to column field1 of first row directly:
data(val) = table1[ 1 ]-field1.
But I can not get access to field1 with dynamic assign:
assign ('table1[ 1 ]-field1') to field-symbol(<val>).
After assignment sy-subrc equals "4".
Why?
The syntax of ASSIGN (syntax1) ... is not the same as the syntax of the Right-Hand Side (RHS) of assignments ... = syntax2.
The syntax for ASSIGN is explained in the documentation of ASSIGN (variable_containing_name) ... or ASSIGN ('name') ... (chapter 1. (name) of page ASSIGN - dynamic_dobj).
Here is an abstract of what is accepted:
"name can contain a chain of names consisting of component selectors [(-)]"
"the first name [can be] followed by an object component selector (->)"
"the first name [can be] followed by a class component selector (=>)"
No mention of table expressions, so they are forbidden. Same for meshes...
Concerning the RHS of assignments, as described in the documentation, it can be :
Data Objects
They can be attributes or components using selectors -, ->, =>, which can be chained multiple times (see Names for Individual Operands
Return values or results of functional methods, return values or results of built-in functions and constructor expressions, or return values or results of table expressions
Results of calculation expressions
Sandra is absolutely right, if table expressions are not specified in help, then they are not allowed.
You can use ASSIGN COMPONENT statement for your dynamicity:
FIELD-SYMBOLS: <tab> TYPE INDEX TABLE.
ASSIGN ('table1') TO <tab>.
ASSIGN COMPONENT 'field1' OF STRUCTURE <tab>[ 1 ] TO FIELD-SYMBOL(<val>).
However, such dynamics is only possible with index tables (standard + sorted) due to the nature of this version of row specification. If you try to pass hashed table into the field symbol, it will dump.

Slick: Pass in column to update

Let's say we have a FoodTable with the following columns: Name, Calories, Carbs, Protein. I have an entry for Name = Chocolate, Calories = 100, Carbs = "10g", and Protein = "2g".
I'm wondering if there's a way to pass in a column name and a new value to update with. For example, I want a method that's like
def updateFood(food, columnName, value):
table.filter(_.name === food).map(x => x.columnName).update(value)
It seems like dynamic columns are not possible with Slick? I want to avoid writing a SQL query because that could lead to security flaws or bugs in the code. Is there really no way to do this?
I also don't want to have to pass in the entire object to update, since ideally, it should be:
I want to update column X to value Y. I should only need to pass in the id of the object, the column, and the value to update to.
I'm wondering if there's a way to pass in a column name and a new value to update with
This depends a little bit on what you want the "column name" to be. To maintain safety, what I'd suggest is having the "column name" be a function that can select a column in your table.
At a high level that would look like this:
// Won't compile, but we'll fix that in a moment
def updateFood[V](food: Food, column: FoodTable => Rep[V], value: V): DBIO[Int] =
foods.filter(_.name === food.name).map(column).update(value)
...which we'd call like this:
updateFood(choc, _.calories, 99)
Notice how the "column name" is a function from FoodTable to a column of some value V. Then you provide a value for the V and we do a normal update.
The problem is that Slick knows how to map certain types of values (String, Int, etc) into SQL, but not any kind of value. And the code above won't compile because V is unconstrained.
We can sort of fix that my adding a constraint on V, and it mostly will work:
// Will compile, will work for basic types
def updateFood[V : slick.ast.BaseTypedType](food: Food, column: FoodTable => Rep[V], value: V): DBIO[Int] =
foods.filter(_.name === food.name).map(column).update(value)
However, if you have custom column mappings, they won't match the constraint. We need to go another step on and have an implicit shape in scope:
def updateFood[V](food: Food, column: FoodTable => Rep[V], value: V)(implicit shape: Shape[_ <: FlatShapeLevel, Rep[V], V, _]): DBIO[Int] =
foods.filter(_.name === food.name).map(column).update(value)
I think of Shape as an extra level of abstraction in Slick, above Rep[V]. The mechanisms of the "shape levels" and other details are not something I can explain because I don't understand them yet! (There is a talk that goes into the design of Slick called "Polymorphic Record Types in a Lifted Embedding" which you can find at http://slick.lightbend.com/docs/)
A final note: if you really want the column name to be a String or something like that, I'd suggest pattern matching the string (or validate in some way) to a FoodTable => Rep function and use that in your SQL. That's going to be tricky because your value V is going to have to match the type of the column you want to update.
Off the top of my head, that could look something like this:
def tryUpdateFood(food: Food, columnName: String, value: String): DBIO[Int] =
columnName match {
case "calories" => updateFood(food, _.calories, value.toInt)
case "carbs" => updateFood(food, _.carbs, value)
// etc...
case unknown => DBIO.failed(new Exception(s"Don't know how to update $unknown columns"))
}
I can imagine better error handling, safer or smarter parsing of the value, but in outline the above could work.
For hints at other ways to approach dynamic problems, take a look at the talk "Patterns for Slick database applications" (also listed at: http://slick.lightbend.com/docs/), and towards the end of the presentation there's a section on "Dynamic sorting".

Dynamic INTO clause in OpenSQL?

I'm attempting to write a program that will grab the content from fields from a table both specified by the user on the selection screen.
For example, the user could specify the fields equnr, b_werk, b_lager from the table eqbs.
I've been able to accomplish this like so:
" Determine list of fields provided by user
DATA(lv_fields) = COND string(
WHEN p_key3 IS NOT INITIAL AND p_string IS NOT INITIAL THEN
|{ p_key1 }, { p_key2 }, { p_key3 }, { p_string }|
WHEN p_key2 IS NOT INITIAL AND p_string IS NOT INITIAL THEN
|{ p_key1 }, { p_key2 }, { p_string }|
WHEN p_key2 IS NOT INITIAL AND p_string IS NOT INITIAL THEN
|{ p_key1 }, { p_string }| ).
DATA: lv_field_tab TYPE TABLE OF line.
APPEND lv_fields TO lv_field_tab.
" Determine table specified by user and prepare for Open SQL query
DATA t_ref TYPE REF TO data.
FIELD-SYMBOLS: <t> TYPE any,
<comp> TYPE any.
CREATE DATA t_ref TYPE (p_table).
ASSIGN t_ref->* TO <t>.
ASSIGN COMPONENT lv_fields OF STRUCTURE <t> TO <comp>.
" Prepare result container
DATA: lt_zca_str_to_char TYPE TABLE OF zca_str_to_char,
ls_zca_str_to_char TYPE zca_str_to_char.
SELECT (lv_field_tab) FROM (p_table) INTO (#ls_zca_str_to_char-key1, #ls_zca_str_to_char-key2, #ls_zca_str_to_char-key3, #ls_zca_str_to_char-string).
APPEND ls_zca_str_to_char TO lt_zca_str_to_char.
ENDSELECT.
This will correctly populate lt_zca_str_to_char with data from the table specified by the user.
However, this implies that the user is always providing p_key1, p_key2, and p_key3. I could perform a different selection statement based on how many key fields the user provides, but what's the fun in that?
I set out to solve this like this:
DATA(lv_results) = COND string(
WHEN p_key3 IS NOT INITIAL AND p_string IS NOT INITIAL THEN
|(#ls_zca_str_to_char-key1, #ls_zca_str_to_char-key2, #ls_zca_str_to_char-key3, #ls_zca_str_to_char-string)|
WHEN p_key2 IS NOT INITIAL AND p_string IS NOT INITIAL THEN
|(#ls_zca_str_to_char-key1, #ls_zca_str_to_char-key2, #ls_zca_str_to_char-string)|
WHEN p_key2 IS NOT INITIAL AND p_string IS NOT INITIAL THEN
|(#ls_zca_str_to_char-key1, #ls_zca_str_to_char-string)| ).
SELECT (lv_field_tab) FROM (p_table) INTO (#lv_results).
APPEND ls_zca_str_to_char TO lt_zca_str_to_char.
ENDSELECT.
This will activate, and when I get to my Open SQL query (from a Z table, only filling out the first two of three possible key fields), the values are the following:
lv_field_tab = GUID, TEXT_ID, TEXT_DATA (Good)
p_table = ZCR_TRANS_TEXT (Good)
lv_results = (#ls_zca_str_to_char-key1, #ls_zca_str_to_char-key2, #ls_zca_str_to_char-string) (Good, 3 = 3!)
But, since I'm assuming the compiler is seeing (#lv_results) as one single variable, the program dumps with the following error:
The current ABAP program attempted to execute an Open SQL statement
containing a dynamic entry. The parser returned the following error:
"The field list and the INTO list must have the same number of
elements."
Is it possible for me to use the new Open SQL syntax to accomplish my dynamic INTO clause in harmony with my dynamic field list?
The brackets on the INTO do not do what you expect, from the ABAP help:
... INTO (#dobj1, #dobj2, ... )
Effect
If the results set consists of multiple columns or aggregate expressions specified explicitly in the SELECT list, a list of elementary data objects dobj1, dobj2, ... (in parentheses and separated by commas) can be specified after INTO.
In your case you only have one value in there so you can only select one column and the data will be passed in the variable LV_RESULT. Not what you are looking for. Since you want to fill the fields of an existing structure the INTO CORRESPONDING FIELDS OF construct will work here. And you can use TABLE to make your command more efficient as well. This leads to:
SELECT (lv_field_tab) FROM (p_table)
INTO CORRESPONDING FIELDS OF TABLE #lt_zca_str_to_char.
As said previously, you may use INTO CORRESPONDING FIELDS OF ..., but it's not mandatory, it's only for simplifying the code.
So, instead of using CORRESPONDING FIELDS, you may create a structure dynamically (RTTC) with its components corresponding to the columns in LV_FIELD_TAB, and you may then use:
SELECT (lv_field_tab) FROM (p_table) INTO #<structure> ... ENDSELECT.
But of course, as explained by Gert Beukema, you should better do only one SELECT, by creating an internal table dynamically with the same logic as for the structure above, and you may then use:
SELECT (lv_field_tab) FROM (p_table) INTO TABLE #<internal table> ...
Refer to the many examples in the web how to create data objects dynamically with RTTC.
Do not use a fields list for your INTO clause.
Try with
INTO CORRESPONDING FIELDS OF TABLE
must be a FIELD-SYMBOL type any table, and the rest of the logic is up to you (to put the proper information from your generic and almost-empty to your specific destination one).