SQL:1999 Array Type Constructor Usage? - sql

Can anyone confirm whether or not the SQL:1999 Array type Constructor provides any operations for searching the Array in a WHERE clause?.
As an Example If a table EMPLOYEES had a column
QUALIFICATION VARCHAR(20) ARRAY[10]
containing values such as ARRAY['BSC','MBA']
Does the standard support some way of querying EMPLOYEES to find all Employees with an MBA?

Well, you can always use an element reference (ISO/IEC 9075-2:1999, 6.13 ):
WHERE QUALIFICATION(1) = 'BSC'
OR QUALIFICATION(2) = 'BSC'
...
Of course, the problem is that you need to write a comparison for each possible position.
I am not aware of any operators that allows you to compare a scalar with an array, although I would suppose a DBMS that has native support for ARRAY types ould let you create a function that does the job.
I must say I never had the need for array types - I would typically build a one-to-many detail table, or in rare cases, add multiple columns (yeah - a repeating group. send the relational police to hunt me if you like :)
Would you care to explain why you need to know this, or what problem you are trying to solve with an ARRAY?

Related

SQL UDF - Struct Diff

We have a table with 2 top level columns of type 'struct' - one is a 'before', and an 'after' image. The struct schemas are non trivial - nested, with arrays to a variable depth. The are sent to us from replication, so the schemas are always the same (but the schemas of course can be updated at some point, but always together)
Objective is for the two input structs, to return 2 struct 'diffs' of the before and after with only fields that have changed - essentially the 'delta' diff of the changes produce by the replication source. We know something has changed, but not 'what' since we get the full before and after image. this raw data lands in BQ and is then processed from there but need to determine the more granular change for high order BQ processing.
The table schema is very wide (1000's of leaf fields), and the data populated fairly spare (so alot of nulls will be present on both sides of the snapshot) - so would need to be performant as best as possible when executing over 10s of millions of rows.
All things are nullable for maximum flexibility.
So change could look like:
null -> value
value -> null
valueA -> valueB
Arrays:
recursive use of above for arrays of structs, ordering could be relaxed if that makes it easier?
It might not be possible.
Ive not attempted this yet as it seems really difficult so am looking to the community boffins for some support for this. I feel the arrays could be difficult part. There is probably an easy way perhaps in Python I dont or even doing some JSON conversion and comparison using JOSN tools? It feels like it would be a super cool feature built in to BQ as well, so if can get this to work, will add a feature request for it.
Id like to have a SQL UDF for reuse (we have SQL skills not python, although if easier in python then thats ok), and now with the new feature of persistent SQL UDFs, this seems the right time to ask and test the feature out!
sql
def struct_diff(before Struct, after Struct)
(beforeChange, afterChange) - type of signature but open to suggestions?
It appears to be really difficult to get a piece of reusable code. Since currently there is no support for recursive functions for SQL UDF, you cannot use a recursive approach for the nested structs.
Although, you might be able to get some specific SQL UDF functions depending on your array and structs structures. You can use an approach like this one to compare the structs.
CREATE TEMP FUNCTION final_compare(s1 ANY TYPE, s2 ANY TYPE) AS (
STRUCT(s1 as prev, s2 as cur)
);
CREATE TEMP FUNCTION compare(s1 ANY TYPE, s2 ANY TYPE) AS (
STRUCT(final_compare(s1.structA, s2.structA))
);
You can use UNNEST to work with arrays, and the final SQL UDF would really depend on your data.
As #rtenha suggested, Python could be a lot easier to handle this problem.
Finally, I did some tests using JavaScript UDF, and it was basically the same result, if not worst than SQL UDF.
The console allows a recursive definition of the function, however it will fail during execution. Also, javascript doesn't allow the ANY TYPE data type on the signature, so you would have to define the whole STRUCT definition or use a workaround like applying TO_JSON_STRING to your struct in order to pass it as a string.

Does jOOQ support parsing of nested rows?

I am evaluating if we can migrate from plain JDBC to jOOQ for our project. Most of it looks promising, but I am wondering currently about one specific flow: nested rows. Let me explain.
Say you have the following two tables:
class(id, name)
student(id, name, class_id)
(We assume that a student can only be part of one class.)
Let's create a response type for these tables. I will be using these in the queries below.
create type type_student as(id integer, name text);
create type type_class as(id integer, name text, students type_student[]);
Now let's fetch all classes with its student by using nested rows:
select row(class.id, class.name, array
(
select row(student.id, student.name)::type_student
from student
where student.class_id = class.id
))::type_class
from class
A useful variant is to use only nested rows in arrays:
select class.id, class.name, array
(
select row(student.id, student.name)::type_student
from student
where student.class_id = class.id
) as students
from class
I am wondering if jOOQ has an elegant approach to parse such results containing nested rows?
Your usage of the word "parse" could mean several things, and I'll answer them all in case anyone finds this question looking for "jOOQ" / "parse" / "row".
Does the org.jooq.Parser support row value expressions?
Not yet (as of jOOQ 3.10 and 3.11). jOOQ ships with a SQL parser that parses (almost) anything that can be represented using the jOOQ API. This has various benefits, including:
Being able to reverse engineer DDL scripts for the code generator
Translating SQL between dialects (see an online version here: https://www.jooq.org/translate)
Unfortunately, it cannot parse row value expressions in the projection yet i.e. in the SELECT clause.
Does the jOOQ API support ("parse") row value expressions?
Yes, you can use them using the various DSL.row() constructors, mainly for predicates, but also for projections by wrapping them in a Field using DSL.rowField(). As of jOOQ 3.11, this is still a bit experimental as there are many edge cases in PostgreSQL itself, related to what is allowed and what isn't. But in principle, queries like yours should be possible
Does jOOQ support parsing the serialised version of a PostgreSQL record
PostgreSQL supports these anonymous record types, as well as named "composite" types. And arrays thereof. And nesting of arrays and composite types. jOOQ can serialise and deserialise these types if type information is available to jOOQ, i.e. if you're using the code generator. For instance, if your query is stored as a view
create view test as
select row(class.id, class.name, array
(
select row(student.id, student.name)::type_student
from student
where student.class_id = class.id
))::type_class
from class
Then, the code generator will produce the appropriate types, including:
TypeStudentRecord
TypeClassRecord
Which can be serialised as expected. In principle, this would be possible also without the code generator, but you'd have to create the above types yourself, manually, so why not just use the code generator.
Yes it does: https://www.jooq.org/doc/latest/manual/sql-building/table-expressions/nested-selects/
Field<Object> records =
create.select(student.id, student.name)
.from(student)
.where(student.class_id.eq(class.id)
.asField("students");
create.select(class.id, class.name, array, records)
.from(class)
.fetch();
The above example might not work directly as I have not tried, but just wanted to give a general idea.
Note: that the object records is not executed alone. When fetch is called in the second statement, JOOQ should create one SQL statement internally.

How to make criteria with array field in Hibernate

I'm using Hibernate and Postgres and defined a character(1)[] column type.
So I donĀ“t know how to make this criteria to find a value in the array.
Like this query
SELECT * FROM cpfbloqueado WHERE bloqueados #> ARRAY['V']::character[]
I am not familiar with Postgres and its types but you can define your own type using custom basic type mapping. That could simplify the query.
There are many threads here on SO regarding Postres array types and Hibernate, for instance, this one. Another array mapping example that could be useful is here. At last, here is an example of using Criteria with user type.
Code example could be
List result = session.createCriteria(Cpfbloqueado.class)
.setProjection(Projections.projectionList()
.add(Projections.property("characterColumn.attribute"), PostgresCharArrayType.class)
)
.setResultTransformer(Transformer.aliasToBean(Cpfbloqueado.class))
.add(...) // add where restrictions here
.list()
Also, if it is not important for the implementation, you can define max length in the entity model, annotating your field with #Column(length = 1).
Or if you need to store an array of characters with length of 1 it is possible to use a collection type.
I hope I got the point right, however, it would be nice if the problem domain was better described.
So you have array of single characters... Problem is that in PG that is not fixed length. I had this problem, but around 10 years ago. At that time I had that column mapped as string, and that way I was able to process internal data - simply slice by comma, and do what is needed.
If you hate that way, as I did... Look for columns with text[] type - that is more common, so it is quite easy to find out something. Please look at this sample project:
https://github.com/phstudy/jpa-array-converter-sample

Using bind variables in large insert statements

I am inheriting an application which has to read data from various types of files and use the OCI interface to move the data into an Oracle database. Most of the tables in question have about 40-50 columns, so the SQL insert statements become pretty lengthy.
When I inherited this code, it basically built up the insert statements via a series of strcats as a C string, then passed it to the appropriate OCI functions to set up and execute the statement. However, since much of the data is read directly from files into the column values, this leaves the application open to easy SQL injection. So I am trying to use bind variables to solve this problem.
In every example OCI application I can find, each variable is statically allocated and bound individually. This would lead to quite a bit of boilerplate, however and I'd like to reduce it to some sort of looping construct. So my solution is to, for each table, create a static array of strings containing the names of the table columns:
const char const *TABLE_NAME[N_COLS] = {
"COL_1",
"COL_2",
"COL_3",
...
"COL_N"
};
along with a short function that makes a placeholder out of a column name:
void makePlaceholder(char *buf, const char *col);
// "COLUMN_NAME" -> ":column_name"
So I then loop through each array and bind my values to each column, generating the placeholders as I go. One potential problem here is that, because the types of each column vary, I bind everything as SQLT_STR (strings) and thus expect Oracle to convert to the proper datatype on insertion.
So, my question(s) are:
What is the proper/idiomatic (if such a thing exists for SQL/OCI) to use bind variables for SQL insert statements with a large number of columns/params? More generally, what is the best way to use OCI to make this type of large insert statement?
Do large numbers of bind calls have a significant cost in efficiency compared to building and using vanilla C strings?
Is there any risk in binding all variables as strings and allowing Oracle to make the proper type conversion?
Thanks in advance!
Not sure about the C aspects of this. My answer will be from a DBA perspective.
Question 2:
Always use bind variables. It prevent SQL-injection and enhances performance.
The performance aspect is often overlooked by programmers. When Oracle receives a SQL it makes a hash of the entire SQL-text and looks in it's internal repository of execution plans to see if it has one. If bind variables was used it the SQL-text will be the same each time you run the query, not matter what the value of a variable is. However if you have concatenated the string your self Oracle will hash the SQL-text including content of (what you aught to have put in) variables, getting a unique hash every time. So if you do a query one million times Oracle will if you used bind variables make one execution plan, while if you did not use bind variables it will make one million execution plans and waste loads of resources doing that.

How do I write an SQL function that takes an unknown amount of numbers as parameters?

I am trying to write an Oracle SQL function that takes a list of numbers as arguments and return a pipelined list of table rows. My main problem is the quantity of numbers that can be passed is never certain with no real upper limit. I'll try and demonstrate what I mean:
Say I have a table defined as so:
create table items (
id number primary key,
class number,
data string
);
I want to return all rows that match one of a list of class numbers that I submit. The function I'm shooting at looks a little like this:
function get_list_items_from_class([unknown number of parameters]
in items.class%type)
return tbl_list_item pipelined; -- I have types defined to handle the return values
I've been looking at ways to handle defining a function that can take an undefined amount of integers and so far the most promising search has taken me to this page which explains about using collections and records. I don't think a VARRAY is what I'm looking for as the size has to be predefined. As Associative Array may be what I'm looking for, but before I spend a lot of time trying things out, I want to make sure the tool is fit for the job. I'm pretty inexperienced with Oracle SQL right now and I'm working on a time sensitive project.
Any help that you could offer would be appreciated. I realise that there are simpler ways to achieve what I'm trying to do in this example (simply multiple calls to a function that takes one parameter is one) but this example is simplified. Other parts of the project I'm working on require me to seek a solution using this multiple parameter method.
EDIT: That being said, I would welcome other design suggestions if I'm way off base with what I'm trying to attempt. It would be a learning experience if nothing else.
Many thanks in advance for your time.
EDIT: I will be accessing the database from proprietary client software written in Java.
You could use a table parameter as I linked in the comments or you could pass in a comma separated list of values parse it to a table and join to that.
something like this (with input_lst as a string):
select *
from tbl_list_item
where tbl_list_item.class in
(
select regexp_substr(input_lst,'[^,]+', 1, level) from dual
connect by regexp_substr(input_lst, '[^,]+', 1, level) is not null
);
adapted from https://blogs.oracle.com/aramamoo/entry/how_to_split_comma_separated_string_and_pass_to_in_clause_of_select_statement
Which choice is better depends on your expected number of entries and what is easier for your client side. I think with a small number (2-20) the comma separated is a fine choice. With a large number you probably want to pass a table.
A colleague actually suggested another way to achieve this and I think it is worth sharing. Basically, define a table type that can contain the arguments, then pass an array from the Java program that can be read from this table.
In my example, firstly define a table type of number:
create or replace type tbl_number as table of number;
Then, in the SQL package, define the function as:
function get_list_items_from_class(i_numbers in tbl_number)
return tbl_list_item pipelined;
The function in the package body has one major change (apart from the definition obviously). Use the following select statement:
select *
from tbl_list_item
where tbl_list_item.class in
(
select * from table(i_numbers)
);
This will select all the relevant items that match one of the integers that were passed to the "i_numbers" table. I like this way as it means less string parsing, both in the Java application and the SQL pacakage.
Here's how I passed the number arguments from the Java application using an ARRAY object.
ArrayDescriptor arrayDesc = ArrayDescriptor.createDescriptor("NUMBERS", con); //con is the database connection
ARRAY array = new ARRAY(arrayDesc, con, numberList.toArray()); // numberList is an ArrayList of integers which holds the arguments
array is then passed to the SQL function.