How to optimize a XQUERY in a SELECT statement? - sql

I am using Oracle database. I have a table in which one of the Column is of XMLTYPE. Now, the problem statement is that I need to extract the count of those record that have an XML with a particular root element and one more condition. Suppose the XML stored are of following formats:
<ns1:Warehouse whNo="102" xmlns:ns1="xyz">
<ns1:Building></ns1:Building>
</ns1:Warehouse>
and
<ns1:Warehouse whNo="102" xmlns:ns1="xyz">
<ns1:Building>Owned</ns1:Building>
</ns1:Warehouse>
and there are other XMLs with Root elements other than Warehouse
Now, I need to fetch those records which have
Root element as Warehouse
Building element as empty
I wrote the following SQL query:
select count(XMLQuery('declare namespace ns1="xyz.com";
for $i in /*
where fn:local-name($i) eq "Warehouse"
and fn:string-length($i/ns1:Building ) = 0
return <Test>{$i/ns1:Building}</Test>'
PASSING xml_response RETURNING CONTENT)) countOfWareHouses
from test
Here, test is the name of the table and *xml_response* is the name of the XMLTYPE column in the table test.
This query works fine when the records are less. I have tested it for around 500 records in the table and the time it takes is around 0.1s. But as you increase the number of records in the table, the time increases. When I increased the number of records to 5000, the time it took was ~11s. And for a production table, where the number of records currently stored are 185000, this query never completes.
Please help me to optimize this query or the xquery.
Edit 1:
I tried using this:
select count(XMLQuery(
'declare namespace ns1 = "xyz";
for $i in /
return /ns1:Warehouse[not(ns1:Building/text())]'
PASSING xml_response RETURNING CONTENT))
from test
and
select count(XMLQuery(
'declare namespace ns1 = "xyz";
return /ns1:Warehouse[fn:string-length(ns1:Building)=0]'
PASSING xml_response RETURNING CONTENT))
from test
But this is not working.
When I try to run these, it asks for binding values for Building and Warehouse.

Instead of where you should use predicates which would work faster like:
ns1:Warehouse[string-length(ns1:Building)=0]

Do not use local-name(...) if not necessary. Node tests will probably be faster and enable index use. You're also able to remove the string-length(...) call.
Search for <Warehouse/> elements, which do not have text nodes below their <Building/> node. If you also want to scan for arbitrary subnodes (including attributes!) use node() instead of text(). If you just want to make sure there's text somewhere possibly as child of other nodes, use ns1:Building//text() instead, for example in cases like this: <ns1:Building><foo>bar</foo></ns1:Building>.
This simple XPath expression is doing what you need:
/ns1:Warehouse[not(ns1:Building/text())]
If you need to construct those <Test/> elements, use
for $warehouse in /ns1:Warehouse[not(ns1:Building/text())]
return <Test>{$warehouse/ns1:Building}</Test>
which should be a real drop-in replacement to your XQuery.
I just realized all you want to know is the number, then better count in XQuery (I cannot tell you how to read the single result then though):
count(/ns1:Warehouse[not(ns1:Building/text())])

Related

pass SQL Query values to Data Factory variable as array for foreachloop

similar to this question how to pass variables to Azure Data Factory REST url's query stirng
However, I have a pipeline to query against graphapi, where I need to pass in a userid as part of the Url to get their manager to build an ActiveDirectory staff hierarchy, this is fine on an individual basis, or even as a predefined array variable where I insert["xx","xxx"] into the pipeline variable etc. My challenge is that I need to pass the results of a SQL query to be the array variable. So, instead of defining the list of users, I need to pass into the foreach loop the results from a SQL query.
I can use a lookup to a set variable, but the url seems to be misconstructed and has extra characters added in for some reason.
returning graph.microsoft.com/v1.0/users/%7B%7B%22id%22:%22xx9e7878-bwbbb-bwbwbwr-7897-414a8e60c78c%22%7D%7D/?$expand=xxxxxx where the "%7B%7B%22id%22:%" and "%22%7D%7D/" is all unnecessary and appears to come from the json rather than just utilising the value.
The lookup runs the query from SQL
The Set variable uses the lookup value's (below) to assign to a pipeline variable as an array.
then the foreachloop uses the variable value in the source
#concat('users/{',item(),'}/?$expand=manager($levels=max;$select=id,displayName,userPrincipalName,createdDate)')
If anyone can suggest how to construct the array value dynamically that would be great.
I have used
SELECT '["'+STRING_AGG(CONVERT(NVARCHAR(MAX),t.[id]),'","')+'"]' AS id FROM
stage.extract_msgraphapi_users t LEFT JOIN stage.extract_msgraphapi_users s ON s.id = t.id
and this returns something that looks like an array ["xx","xxx"] but data factory still interpreted this as a string and not an array. Any help would be appreciated.
10 minutes later:
#concat('users/{',item().id,'}/?$expand=manager($levels=max;$select=id,displayName,userPrincipalName,createdDate)')
note the reference to item().id to use the id level of the array. Works like a dream for anyone else facing the same issue

How do I write an SQL function that takes an unknown amount of numbers as parameters?

I am trying to write an Oracle SQL function that takes a list of numbers as arguments and return a pipelined list of table rows. My main problem is the quantity of numbers that can be passed is never certain with no real upper limit. I'll try and demonstrate what I mean:
Say I have a table defined as so:
create table items (
id number primary key,
class number,
data string
);
I want to return all rows that match one of a list of class numbers that I submit. The function I'm shooting at looks a little like this:
function get_list_items_from_class([unknown number of parameters]
in items.class%type)
return tbl_list_item pipelined; -- I have types defined to handle the return values
I've been looking at ways to handle defining a function that can take an undefined amount of integers and so far the most promising search has taken me to this page which explains about using collections and records. I don't think a VARRAY is what I'm looking for as the size has to be predefined. As Associative Array may be what I'm looking for, but before I spend a lot of time trying things out, I want to make sure the tool is fit for the job. I'm pretty inexperienced with Oracle SQL right now and I'm working on a time sensitive project.
Any help that you could offer would be appreciated. I realise that there are simpler ways to achieve what I'm trying to do in this example (simply multiple calls to a function that takes one parameter is one) but this example is simplified. Other parts of the project I'm working on require me to seek a solution using this multiple parameter method.
EDIT: That being said, I would welcome other design suggestions if I'm way off base with what I'm trying to attempt. It would be a learning experience if nothing else.
Many thanks in advance for your time.
EDIT: I will be accessing the database from proprietary client software written in Java.
You could use a table parameter as I linked in the comments or you could pass in a comma separated list of values parse it to a table and join to that.
something like this (with input_lst as a string):
select *
from tbl_list_item
where tbl_list_item.class in
(
select regexp_substr(input_lst,'[^,]+', 1, level) from dual
connect by regexp_substr(input_lst, '[^,]+', 1, level) is not null
);
adapted from https://blogs.oracle.com/aramamoo/entry/how_to_split_comma_separated_string_and_pass_to_in_clause_of_select_statement
Which choice is better depends on your expected number of entries and what is easier for your client side. I think with a small number (2-20) the comma separated is a fine choice. With a large number you probably want to pass a table.
A colleague actually suggested another way to achieve this and I think it is worth sharing. Basically, define a table type that can contain the arguments, then pass an array from the Java program that can be read from this table.
In my example, firstly define a table type of number:
create or replace type tbl_number as table of number;
Then, in the SQL package, define the function as:
function get_list_items_from_class(i_numbers in tbl_number)
return tbl_list_item pipelined;
The function in the package body has one major change (apart from the definition obviously). Use the following select statement:
select *
from tbl_list_item
where tbl_list_item.class in
(
select * from table(i_numbers)
);
This will select all the relevant items that match one of the integers that were passed to the "i_numbers" table. I like this way as it means less string parsing, both in the Java application and the SQL pacakage.
Here's how I passed the number arguments from the Java application using an ARRAY object.
ArrayDescriptor arrayDesc = ArrayDescriptor.createDescriptor("NUMBERS", con); //con is the database connection
ARRAY array = new ARRAY(arrayDesc, con, numberList.toArray()); // numberList is an ArrayList of integers which holds the arguments
array is then passed to the SQL function.

why would you use WHERE 1=0 statement in SQL?

I saw a query run in a log file on an application. and it contained a query like:
SELECT ID FROM CUST_ATTR49 WHERE 1=0
what is the use of such a query that is bound to return nothing?
A query like this can be used to ping the database. The clause:
WHERE 1=0
Ensures that non data is sent back, so no CPU charge, no Network traffic or other resource consumption.
A query like that can test for:
server availability
CUST_ATTR49 table existence
ID column existence
Keeping a connection alive
Cause a trigger to fire without changing any rows (with the where clause, but not in a select query)
manage many OR conditions in dynamic queries (e.g WHERE 1=0 OR <condition>)
This may be also used to extract the table schema from a table without extracting any data inside that table. As Andrea Colleoni said those will be the other benefits of using this.
A usecase I can think of: you have a filter form where you don't want to have any search results. If you specify some filter, they get added to the where clause.
Or it's usually used if you have to create a sql query by hand. E.g. you don't want to check whether the where clause is empty or not..and you can just add stuff like this:
where := "WHERE 0=1"
if X then where := where + " OR ... "
if Y then where := where + " OR ... "
(if you connect the clauses with OR you need 0=1, if you have AND you have 1=1)
As an answer - but also as further clarification to what #AndreaColleoni already mentioned:
manage many OR conditions in dynamic queries (e.g WHERE 1=0 OR <condition>)
Purpose as an on/off switch
I am using this as a switch (on/off) statement for portions of my Query.
If I were to use
WHERE 1=1
AND (0=? OR first_name = ?)
AND (0=? OR last_name = ?)
Then I can use the first bind variable (?) to turn on or off the first_name search criterium. , and the third bind variable (?) to turn on or off the last_name criterium.
I have also added a literal 1=1 just for esthetics so the text of the query aligns nicely.
For just those two criteria, it does not appear that helpful, as one might thing it is just easier to do the same by dynamically building your WHERE condition by either putting only first_name or last_name, or both, or none. So your code will have to dynamically build 4 versions of the same query. Imagine what would happen if you have 10 different criteria to consider, then how many combinations of the same query will you have to manage then?
Compile Time Optimization
I also might add that adding in the 0=? as a bind variable switch will not work very well if all your criteria are indexed. The run time optimizer that will select appropriate indexes and execution plans, might just not see the cost benefit of using the index in those slightly more complex predicates. Hence I usally advice, to inject the 0 / 1 explicitly into your query (string concatenating it in in your sql, or doing some search/replace). Doing so will give the compiler the chance to optimize out redundant statements, and give the Runtime Executer a much simpler query to look at.
(0=1 OR cond = ?) --> (cond = ?)
(0=0 OR cond = ?) --> Always True (ignore predicate)
In the second statement above the compiler knows that it never has to even consider the second part of the condition (cond = ?), and it will simply remove the entire predicate. If it were a bind variable, the compiler could never have accomplished this.
Because you are simply, and forcedly, injecting a 0/1, there is zero chance of SQL injections.
In my SQL's, as one approach, I typically place my sql injection points as ${literal_name}, and I then simply search/replace using a regex any ${...} occurrence with the appropriate literal, before I even let the compiler have a stab at it. This basically leads to a query stored as follows:
WHERE 1=1
AND (0=${cond1_enabled} OR cond1 = ?)
AND (0=${cond2_enabled} OR cond2 = ?)
Looks good, easily understood, the compiler handles it well, and the Runtime Cost Based Optimizer understands it better and will have a higher likelihood of selecting the right index.
I take special care in what I inject. Prime way for passing variables is and remains bind variables for all the obvious reasons.
This is very good in metadata fetching and makes thing generic.
Many DBs have optimizer so they will not actually execute it but its still a valid SQL statement and should execute on all DBs.
This will not fetch any result, but you know column names are valid, data types etc. If it does not execute you know something is wrong with DB(not up etc.)
So many generic programs execute this dummy statement for testing and fetching metadata.
Some systems use scripts and can dynamically set selected records to be hidden from a full list; so a false condition needs to be passed to the SQL. For example, three records out of 500 may be marked as Privacy for medical reasons and should not be visible to everyone. A dynamic query will control the 500 records are visible to those in HR, while 497 are visible to managers. A value would be passed to the SQL clause that is conditionally set, i.e. ' WHERE 1=1 ' or ' WHERE 1=0 ', depending who is logged into the system.
quoted from Greg
If the list of conditions is not known at compile time and is instead
built at run time, you don't have to worry about whether you have one
or more than one condition. You can generate them all like:
and
and concatenate them all together. With the 1=1 at the start, the
initial and has something to associate with.
I've never seen this used for any kind of injection protection, as you
say it doesn't seem like it would help much. I have seen it used as an
implementation convenience. The SQL query engine will end up ignoring
the 1=1 so it should have no performance impact.
Why would someone use WHERE 1=1 AND <conditions> in a SQL clause?
If the user intends to only append records, then the fastest method is open the recordset without returning any existing records.
It can be useful when only table metadata is desired in an application. For example, if you are writing a JDBC application and want to get the column display size of columns in the table.
Pasting a code snippet here
String query = "SELECT * from <Table_name> where 1=0";
PreparedStatement stmt = connection.prepareStatement(query);
ResultSet rs = stmt.executeQuery();
ResultSetMetaData rsMD = rs.getMetaData();
int columnCount = rsMD.getColumnCount();
for(int i=0;i<columnCount;i++) {
System.out.println("Column display size is: " + rsMD.getColumnDisplaySize(i+1));
}
Here having a query like "select * from table" can cause performance issues if you are dealing with huge data because it will try to fetch all the records from the table. Instead if you provide a query like "select * from table where 1=0" then it will fetch only table metadata and not the records so it will be efficient.
Per user milso in another thread, another purpose for "WHERE 1=0":
CREATE TABLE New_table_name as select * FROM Old_table_name WHERE 1 =
2;
this will create a new table with same schema as old table. (Very
handy if you want to load some data for compares)
An example of using a where condition of 1=0 is found in the Northwind 2007 database. On the main page the New Customer Order and New Purchase Order command buttons use embedded macros with the Where Condition set to 1=0. This opens the form with a filter that forces the sub-form to display only records related to the parent form. This can be verified by opening either of those forms from the tree without using the macro. When opened this way all records are displayed by the sub-form.
In ActiveRecord ORM, part of RubyOnRails:
Post.where(category_id: []).to_sql
# => SELECT * FROM posts WHERE 1=0
This is presumably because the following is invalid (at least in Postgres):
select id FROM bookings WHERE office_id IN ()
It seems like, that someone is trying to hack your database. It looks like someone tried mysql injection. You can read more about it here: Mysql Injection

SQL to filter by multiple criteria including containment in string list

so i have a table lets say call it "tbl.items" and there is a column "title" in "tbl.items" i want to loop through each row and for each "title" in "tbl.items" i want to do following:
the column has the datatype nvarchar(max) and contains a string...
filter the string to remove words like in,out, where etc (stopwords)
compare the rest of the string to a predefined list and if there is a match perform some action which involves inserting data in other tables as well..
the problem is im ignotent when it comes to writing T-sql scripts, plz help and guide me how can i achieve this?
whether it can be achieved by writing a sql script??
or i have to develope a console application in c# or anyother language??
im using mssql server 2008
thanks in advance
You want a few things. First, look up SQL Server's syntax for functions, and write something like this:
-- Warning! Code written off the top of my head,
-- don't expect this to work w/copy-n-paste
create function removeStrings(#input nvarchar(4000))
as begin
-- We're being kind of simple-minded and using strings
-- instead of regular expressions, so we are assuming a
-- a space before and after each word. This makes this work better:
#input = ' ' + #input
-- Big list of replaces
#input = replace(' in ','',#input)
#input = replace(' out ','',#input)
--- more replaces...
end
Then you need your list of matches in a table, call this "predefined" with a column "matchString".
Then you can retrieve the matching rows with:
select p.matchString
from items i
join predefined p
on removeStrings(i.title) = p.matchString
Once you have those individual pieces working, I suggest a new question on what particular process you may be doing with them.
Warning: Not knowing how many rows you have or how often you have to do this (every time a user saves something? Once/day?), this will not exactly be zippy, if you know what I mean. So once you have these building blocks in hand, there may also be a follow-up question for how and when to do it.

Oracle extractValue failing when query returns many rows

I have the probably unenviable task of writing a data migration query to populate existing records with a value for a new column being added to a table on a production database. The table has somewhere in the order of 200,000 rows.
The source of the value is in some XML that is stored in the database. I have some XPath that will pull out the value I want, and using extractValue to get the value out seems all good, until the number of records being updated by the query starts getting larger than the few I had in my test DB.
Once the record set grows to some random amount of size, somewhere around 500 rows, then I start getting the error "ORA-19025: EXTRACTVALUE returns value of only one node". Figuring that there was some odd row in the database for which there wasn't a unique result for the XPath, I tried to limit down the records in the query to isolate the bad record. But as soon as I shrunk the record set, the error disappeared. There is actually no row that will return more than one value for this XPath query. It looks like there's something fishy happening with extractValue.
Does anyone know anything about this? I'm not an SQL expert, and even then have spent more time on SQL Server than Oracle. So if I can't get this to work I guess I'm left to do some messing with cursors or something.
Any advice/help? If it helps, we're running 10g on the server. Thanks!
It's almost certain that at least one row from the table with the source XML has an invalid value. Rewrite the query to try and find it.
For example, use the XMLPath predicate that would give you the node in the second position (I don't currently have access to a db with XML objects, so I can't guarantee the following is syntactically perfect):
SELECT SourceKey
, EXTRACTVALUE(SourceXML, '/foo/bar/baz[1]')
, EXTRACTVALUE(SourceXML, '/foo/bar/baz[2]')
FROM SourceTable
WHERE EXTRACTVALUE(SourceXML, '/foo/bar/baz[2]') IS NOT NULL;