Does jOOQ support parsing of nested rows? - sql

I am evaluating if we can migrate from plain JDBC to jOOQ for our project. Most of it looks promising, but I am wondering currently about one specific flow: nested rows. Let me explain.
Say you have the following two tables:
class(id, name)
student(id, name, class_id)
(We assume that a student can only be part of one class.)
Let's create a response type for these tables. I will be using these in the queries below.
create type type_student as(id integer, name text);
create type type_class as(id integer, name text, students type_student[]);
Now let's fetch all classes with its student by using nested rows:
select row(class.id, class.name, array
(
select row(student.id, student.name)::type_student
from student
where student.class_id = class.id
))::type_class
from class
A useful variant is to use only nested rows in arrays:
select class.id, class.name, array
(
select row(student.id, student.name)::type_student
from student
where student.class_id = class.id
) as students
from class
I am wondering if jOOQ has an elegant approach to parse such results containing nested rows?

Your usage of the word "parse" could mean several things, and I'll answer them all in case anyone finds this question looking for "jOOQ" / "parse" / "row".
Does the org.jooq.Parser support row value expressions?
Not yet (as of jOOQ 3.10 and 3.11). jOOQ ships with a SQL parser that parses (almost) anything that can be represented using the jOOQ API. This has various benefits, including:
Being able to reverse engineer DDL scripts for the code generator
Translating SQL between dialects (see an online version here: https://www.jooq.org/translate)
Unfortunately, it cannot parse row value expressions in the projection yet i.e. in the SELECT clause.
Does the jOOQ API support ("parse") row value expressions?
Yes, you can use them using the various DSL.row() constructors, mainly for predicates, but also for projections by wrapping them in a Field using DSL.rowField(). As of jOOQ 3.11, this is still a bit experimental as there are many edge cases in PostgreSQL itself, related to what is allowed and what isn't. But in principle, queries like yours should be possible
Does jOOQ support parsing the serialised version of a PostgreSQL record
PostgreSQL supports these anonymous record types, as well as named "composite" types. And arrays thereof. And nesting of arrays and composite types. jOOQ can serialise and deserialise these types if type information is available to jOOQ, i.e. if you're using the code generator. For instance, if your query is stored as a view
create view test as
select row(class.id, class.name, array
(
select row(student.id, student.name)::type_student
from student
where student.class_id = class.id
))::type_class
from class
Then, the code generator will produce the appropriate types, including:
TypeStudentRecord
TypeClassRecord
Which can be serialised as expected. In principle, this would be possible also without the code generator, but you'd have to create the above types yourself, manually, so why not just use the code generator.

Yes it does: https://www.jooq.org/doc/latest/manual/sql-building/table-expressions/nested-selects/
Field<Object> records =
create.select(student.id, student.name)
.from(student)
.where(student.class_id.eq(class.id)
.asField("students");
create.select(class.id, class.name, array, records)
.from(class)
.fetch();
The above example might not work directly as I have not tried, but just wanted to give a general idea.
Note: that the object records is not executed alone. When fetch is called in the second statement, JOOQ should create one SQL statement internally.

Related

Generate a json object using column names as keys

Is it possible to generate a json object using the column name as keys automatically?
I have a table with many columns and I need to dump it into a json object.
I know I can do this using the JSON_OBJECT function but I was looking for a more condensed syntax that would allow me to do this without having to specify the name of all the columns
SELECT JSON_OBJECT("col_a", m.col_a, "col_b", m.col_b, "col_c", m.col_c, ...)
FROM largetable AS m
Something like this?
SELECT JSON_OBJECT(m.*)
FROM largetable AS m
I'm using MariaDB version 10.8.2
Json objects make sense in other languages ​​like javascript, C#... There are many libraries to convert the result of a MariaDB query into a list of json objects in most languages.
Also, a good practice is to make the database engine do as little effort as possible when performing queries and processing the result in the application.
This is of course not possible, since the parser would not accept an odd number of parameters for the JSON_OBJECT function.
To do that in pure SQL, you can't do that within a single statement, since you need to retrieve the column names from information_schema first:
select #statement:=concat("SELECT JSON_OBJECT(", group_concat(concat("\"",column_name,"\"", ",", column_name)),") FROM mytable") from information_schema.columns where table_name="mytable" and table_schema="test";
prepare my_statement from statement;
execute my;
Much easier and faster is to convert the result in your application, for example in Python:
import mariadb, json
conn= mariadb.connect(db="test")
cursor= conn.cursor(dictionary=True)
cursor.execute("select * from mytable")
json_row= json.dumps(cursor.fetchone())

SQL UDF - Struct Diff

We have a table with 2 top level columns of type 'struct' - one is a 'before', and an 'after' image. The struct schemas are non trivial - nested, with arrays to a variable depth. The are sent to us from replication, so the schemas are always the same (but the schemas of course can be updated at some point, but always together)
Objective is for the two input structs, to return 2 struct 'diffs' of the before and after with only fields that have changed - essentially the 'delta' diff of the changes produce by the replication source. We know something has changed, but not 'what' since we get the full before and after image. this raw data lands in BQ and is then processed from there but need to determine the more granular change for high order BQ processing.
The table schema is very wide (1000's of leaf fields), and the data populated fairly spare (so alot of nulls will be present on both sides of the snapshot) - so would need to be performant as best as possible when executing over 10s of millions of rows.
All things are nullable for maximum flexibility.
So change could look like:
null -> value
value -> null
valueA -> valueB
Arrays:
recursive use of above for arrays of structs, ordering could be relaxed if that makes it easier?
It might not be possible.
Ive not attempted this yet as it seems really difficult so am looking to the community boffins for some support for this. I feel the arrays could be difficult part. There is probably an easy way perhaps in Python I dont or even doing some JSON conversion and comparison using JOSN tools? It feels like it would be a super cool feature built in to BQ as well, so if can get this to work, will add a feature request for it.
Id like to have a SQL UDF for reuse (we have SQL skills not python, although if easier in python then thats ok), and now with the new feature of persistent SQL UDFs, this seems the right time to ask and test the feature out!
sql
def struct_diff(before Struct, after Struct)
(beforeChange, afterChange) - type of signature but open to suggestions?
It appears to be really difficult to get a piece of reusable code. Since currently there is no support for recursive functions for SQL UDF, you cannot use a recursive approach for the nested structs.
Although, you might be able to get some specific SQL UDF functions depending on your array and structs structures. You can use an approach like this one to compare the structs.
CREATE TEMP FUNCTION final_compare(s1 ANY TYPE, s2 ANY TYPE) AS (
STRUCT(s1 as prev, s2 as cur)
);
CREATE TEMP FUNCTION compare(s1 ANY TYPE, s2 ANY TYPE) AS (
STRUCT(final_compare(s1.structA, s2.structA))
);
You can use UNNEST to work with arrays, and the final SQL UDF would really depend on your data.
As #rtenha suggested, Python could be a lot easier to handle this problem.
Finally, I did some tests using JavaScript UDF, and it was basically the same result, if not worst than SQL UDF.
The console allows a recursive definition of the function, however it will fail during execution. Also, javascript doesn't allow the ANY TYPE data type on the signature, so you would have to define the whole STRUCT definition or use a workaround like applying TO_JSON_STRING to your struct in order to pass it as a string.

Using Bookshelf to execute a query on Postgres JSONB array elements

I have a postgres table with jsonb array elements and I'm trying to do sql queries to extract the matching elements. I have the raw SQL query running from the postgres command line interface:
select * from movies where director #> any (array ['70', '45']::jsonb[])
This returns the results I'm looking for (all records from the movies table where the director jsonb elements contain any of the elements in the input element).
In the code, the value for ['70, '45'] would be a dynamic variable ie. fixArr and the length of the array is unknown.
I'm trying to build this into my Bookshelf code but haven't been able to find any examples that address the complexity of the use case. I've tried the following approaches but none of them work:
models.Movies.where('director', '#> any', '(array' + JSON.stringify(fixArr) + '::jsonb[])').fetchAll()
ERROR: The operator "#> any" is not permitted
db.knex.raw('select * from movies where director #> any(array'+[JSON.stringify(fixArr)]+'::jsonb[])')
ERROR: column "45" does not exist
models.Movies.query('where', 'director', '#>', 'any (array', JSON.stringify(fixArr) + '::jsonb[])').fetchAll()
ERROR: invalid input syntax for type json
Can anyone help with this?
As you have noticed, knex nor bookshelf doesn't bring any support for making jsonb queries easier. As far as I know the only knex based ORM that supports jsonb queries etc. nicely is Objection.js
In your case I suppose better operator to find if jsonb column contains any of the given values would be ?|, so query would be like:
const idsAsString = ids.map(val => `'${val}'`).join(',');
db.knex.raw(`select * from movies where director \\?| array[${idsAsString}]`);
More info how to deal with jsonb queries and indexing with knex can be found here https://www.vincit.fi/en/blog/objection-js-postgresql-power-json-queries/
No, you're just running into the limitations of that particular query builder and ORM.
The easiest way is using bookshelf.Model.query and knex.raw (whereRaw, etc.). Alias with AS and subclass your Bookshelf model to add these aliased attributes if you care about such things.
If you want things to look clean and abstracted through Bookshelf, you'll just need to denormalize the JSONB into flat tables. This might be the best approach if your JSONB is mostly flat and simple.
If you end up using lots of JSONB (it can be quite performant with appropriate indexes) then Bookshelf ORM is wasted effort. The knex query builder is only not a waste of time insofar as it handles escaping, quoting, etc.

SQL:1999 Array Type Constructor Usage?

Can anyone confirm whether or not the SQL:1999 Array type Constructor provides any operations for searching the Array in a WHERE clause?.
As an Example If a table EMPLOYEES had a column
QUALIFICATION VARCHAR(20) ARRAY[10]
containing values such as ARRAY['BSC','MBA']
Does the standard support some way of querying EMPLOYEES to find all Employees with an MBA?
Well, you can always use an element reference (ISO/IEC 9075-2:1999, 6.13 ):
WHERE QUALIFICATION(1) = 'BSC'
OR QUALIFICATION(2) = 'BSC'
...
Of course, the problem is that you need to write a comparison for each possible position.
I am not aware of any operators that allows you to compare a scalar with an array, although I would suppose a DBMS that has native support for ARRAY types ould let you create a function that does the job.
I must say I never had the need for array types - I would typically build a one-to-many detail table, or in rare cases, add multiple columns (yeah - a repeating group. send the relational police to hunt me if you like :)
Would you care to explain why you need to know this, or what problem you are trying to solve with an ARRAY?

nhibernate and DDD suggestion

I am fairly new to nHibernate and DDD, so please bear with me.
I have a requirement to create a new report from my SQL table. The report is read-only and will be bound to a GridView control in an ASP.NET application.
The report contains the following fields Style, Color, Size, LAQty, MTLQty, Status.
I have the entities for Style, Color and Size, which I use in other asp.net pages. I use them via repositories. I am not sure If should use the same entities for my report or not. If I use them, where I am supposed to map the Qty and Status fields?
If I should not use the same entities, should I create a new class for the report?
As said I am new to this and just trying to learn and code properly.
Thank you
For reports its usually easier to use plain values or special DTO's. Of course you can query for the entity that references all the information, but to put it into the list (eg. using databinding) it's handier to have a single class that contains all the values plain.
To get more specific solutions as the few bellow you need to tell us a little about your domain model. How does the class model look like?
generally, you have at least three options to get "plain" values form the database using NHibernate.
Write HQL that returns an array of values
For instance:
select e1.Style, e1.Color, e1.Size, e2.LAQty, e2.MTLQty
from entity1 inner join entity2
where (some condition)
the result will be a list of object[]. Every item in the list is a row, every item in the object[] is a column. This is quite like sql, but on a higher level (you describe the query on entity level) and is database independent.
Or you create a DTO (data transfer object) only to hold one row of the result:
select new ReportDto(e1.Style, e1.Color, e1.Size, e2.LAQty, e2.MTLQty)
from entity1 inner join entity2
where (some condition)
ReportDto need to implement a constructor that has all this arguments. The result is a list of ReportDto.
Or you use CriteriaAPI (recommended)
session.CreateCriteria(typeof(Entity1), "e1")
.CreateCriteria(typeof(Entity2), "e2")
.Add( /* some condition */ )
.Add(Projections.Property("e1.Style", "Style"))
.Add(Projections.Property("e1.Color", "Color"))
.Add(Projections.Property("e1.Size", "Size"))
.Add(Projections.Property("e2.LAQty", "LAQty"))
.Add(Projections.Property("e2.MTLQty", "MTLQty"))
.SetResultTransformer(AliasToBean(typeof(ReportDto)))
.List<ReportDto>();
The ReportDto needs to have a proeprty with the name of each alias "Style", "Color" etc. The output is a list of ReportDto.
I'm not schooled in DDD exactly, but I've always modeled my nouns as types and I'm surprised the report itself is an entity. DDD or not, I wouldn't do that, rather I'd have my reports reflect the results of a query, in which quantity is presumably count(*) or sum(lineItem.quantity) and status is also calculated (perhaps, in the page).
You haven't described your domain, but there is a clue on those column headings that you may be doing a pivot over the data to create LAQty, MTLQty which you'll find hard to do in nHibernate as its designed for OLTP and does not even do UNION last I checked. That said, there is nothing wrong with abusing HQL (Hibernate Query Language) for doing lightweight reporting, as long as you understand you are abusing it.
I see Stefan has done a grand job of describing the syntax for that, so I'll stop there :-)