Oracle XML functions with large datasets

Oracle XML functions with large datasets - sql

I have problems to use the Oracle XML functions like
xmlelement, xmlagg, xmlattributes
For instance:
select
XMLELEMENT(
"OrdrList",
XMLAGG(
XMLELEMENT(
"IDs",
XMLATTRIBUTES(
USERCODE AS "usrCode",
VALDATE AS "validityDate"
)
)
)
) from TMP
/
The code seems to be correct as it does work when returning a small number of messages
And yes, I did try to set "long", "pagesize", "linesize" etc... but have never been able to retrieve the full set of approx. 500.000 XML-messages (i.e. table rows).
Reading some background literature (e.g. "Oracle SQL" by Jürgen Sieben) it seems that the functions are not designed for large data sets. Mr. Sieben explains that he uses these only for small queries (max. 1 MB output size), above that he recommends to use "object-oriented functions" but does not explain which.
Does somebody have experience with this and has the above XML-functions working or knows alternatives?
As per advice below: converting to CLOB through [...].getclobval(0, 2) from TMP iterates now through the whole table. Slow, but complete.
I have to make a correction: getclobval delivers a longer but still not complete list.
As my confidence in the implementation/documentation quality of the above Oracle XML functions is weak, I will create a standard file-output from the database and implement the XML-conversion myself.
New update: I found the culprit: XMLAGG! If I take it out, the database is speedily, properly, stepwise and completely parsed. Strange since XMLAGG does not really have a complicated job: creating an ingoing and outgoing XML-tag

I think showing this data in sqlplus + spool completely is going to be a struggle.
I have used these functions for > 100Mb of data without problems, but I have written the returned XMLType out to files after converting to CLOB, using either UTL_FILE for server side or in client apps in Java/C#.
If you are stuck with sqlplus, have you tried it with "SET TERM OFF" and spool? It might give better results, would certainly be quicker. Note to use SET TERM OFF you have to be careful how you invoke sqlplus; sqlplus #script will work, but "cat <

Related

Creating XML Output with Carriage Returns from SQL Server 2014

I've written code in SQL Server to create an XML output. However, this exports with no carriage returns.
I initially built a workaround with a replace statement around the entire XML output code that would embed carriage returns between the nodes, but because that only allows me to export a small amount of data at a time, it's not sufficient long-term. When I try to run this on larger datasets, it truncates the text around 65000 characters.
I've tried to cast the entire statement as nvarchar(max) to increase the output size but that doesn't seem to work either. Does anybody have any recommendations for how to do this that isn't just find+replace once the file has already been output from SQL?

First, I would educate the client first. I would imagine it is to make it human readable, but it also expands the size of the returned set. They will likely stick to their guns, but education often stops people from spending money on stupid crap.
Second, I would not do this in SQL Server. This is a user interface type of task (including service endpoints as "user" interface here) and not a task to be done in the database. Doing it outside of SQL Server gives you better access to the XML DOM, which can help if they are truly CRLF and not the &#__; numeric equivalents. If the later, you will have to do a replace function.
If you HAVE to do this in SQL Server, grab the XML result and then replace. I would do this the easy way and replace > with >CRLF and see if that is acceptable, as it is less time consuming. Without the DOM it is difficult to know the difference between open tags and end tags. You can find the right tag using regex, if you want to go that far, but SQL Server's implementation is not as good as many programming languages, so this will be time consuming.
Ultimately, if they are willing to pay you for something that does not make a difference, then that is their baby, but it is a useless exercise IMO.

cursor_sharing parameter in Oracle

I would like to know the tradeoff in setting cursor_sharing parameter in Oracle as "FORCE" .
As this would try to soft-parse any SQL statement, and certainly the performance must be improved.
But the default value is "EXACT", so I would like to know if there is any danger in setting it as FORCE or SIMILAR.

Unless you really know what you're doing, I'd recommend not changing this setting.
Usually, if you've got a high number of hard parses, it's an indication of bad application design.
A typical example for selecting all products for a given category (pseudocode):
stmt = 'select * from products where category = ' || my_category
results = stmt.execute
This is flawed for a number of reasons:
it creates a different SQL statement for each category, therefore increasing the number of hard parses dramatically
it is vulnerable to SQL injection attacks

A good application runs perfectly OK with cursor_sharing = exact. A good application can use literals for specific reasons, for example select orders with state = new. That use of literals is OK. If the application uses literals to identify an order by ID it would be different since the will be many different order ID's.
Best is to clean-up the app to use literals in the correct way or to start using prepared statements for best performance.
IF you happen to have an application that only uses literals, set cursor_sharing to FORCE. In 11g there are mechanisms, like cardinality feedback to be able to adjust an execution plan based on un expected row counts that came from a query to make sure that the plans that are originally planned for a query are corrected based on the input and output, for the next time it is used.

Why do SQL errors not show you the error source?

Is it possible to find the line or column where an error is occurring when executing SQL code in Oracle SQL developer?
For example, imagine you are running a very simple line of code
SELECT * FROM employeesTbl WHERE active = 1
But for some reason, active is VARCHAR and someone has entered the ";!/asd02" into this field.
You will only get an ORA- error, but it does not tell you which row caused it.
Does anyone know why this is?

The reason behind this is that in general developer support in sql, pl/sql and the like is really abysmal. One result is a really broken exception concept in pl/sql, almost useless exceptions in (oracle) sql and little hope that it is better in any rdbms.
I think the reason behind all that is that databases are persistent beasts (pun intended). Many companies and developers change from time to time there preferred main development language (C, C++, VB, Java, C#, Groovy, Scala ..). But they rarely change the database, possibly because you will still have the old databases around with no chance to migrate them.
This in turn means most DB-devs know only a single database system reasonable well, so they don't see what is possible in other systems. Therefore there is little to no pressure to make database systems any more usable for developers.

Multiple rows may contain errors. For the system to be consistent (as a "set-based" language), it ought to return you all rows which contain errors - and not all row errors may be caused by the same error.
However, it could be computationally expensive to compute this entire error set - and the system "knows" that any further computation on this query is going to result in failure anyway - so it represents wasted resources when other queries could be running successfully.
I agree that it would be nice to turn on this type of reporting as an option (especially in non-production environments), but no database vendor seems to have done so.

You get an error because the field is a character and you're assuming it's a number. Which, you shouldn't be doing. If you want the field to be numeric then you have to have a numeric field! This is a general rule, all non-character columns should be the correct data-type to avoid this type of problem.
I'm not certain why Oracle doesn't tell you what row caused the error, it may be physically possible using the rowid in a simple select as you have here. If you're joining tables or using conversion functions such as to_number it would become a lot more difficult, if possible at all.
I would imagine that Oracle did not want to implement something only partially, especially when this is not an Oracle error but a coding error.
To sort out the problem create the following function:
create or replace function is_number( Pvalue varchar2
) return number is
/* Test whether a value is a number. Return a number
rather than a plain boolean so it can be used in
SQL statements as well as PL/SQL.
*/
l_number number;
begin
-- Explicitly convert.
l_number := to_number(Pvalue);
return 1;
exception when others then
return 0;
end;
/
Run the following to find your problem rows:
SELECT * FROM employeesTbl WHERE is_number(active) = 0
Or this to ignore them:
SELECT *
FROM ( SELECT *
FROM employeesTbl
WHERE is_number(active) = 1 )
WHERE active = 1

Performance of SQL functions vs. code functions

We're currently investigating the load against our SQL server and looking at ways to alleviate it. During my post-secondary education, I was always told that, from a performance standpoint, it was cheaper to make SQL Server do the work. But is this true?
Here's an example:
SELECT ord_no FROM oelinhst_sql
This returns 783119 records in 14 seconds. The field is a char(8), but all of our order numbers are six-digits long so each has two blank characters leading. We typically trim this field, so I ran the following test:
SELECT LTRIM(ord_no) FROM oelinhst_sql
This returned the 783119 records in 13 seconds. I also tried one more test:
SELECT LTRIM(RTRIM(ord_no)) FROM oelinhst_sql
There is nothing to trim on the right, but I was trying to see if there was any overhead in the mere act of calling the function, but it still returned in 13 seconds.
My manager was talking about moving things like string trimming out of the SQL and into the source code, but the test results suggest otherwise. My manager also says he heard somewhere that using SQL functions meant that indexes would not be used. Is there any truth to this either?

Only optimize code that you have proven to be the slowest part of your system. Your data so far indicates that SQL string manipulation functions are not effecting performance at all. take this data to your manager.
If you use a function or type cast in the WHERE clause it can often prevent the SQL server from using indexes. This does not apply to transforming returned columns with functions.

It's typically user defined functions (UDFs) that get a bad rap with regards to SQL performance and might be the source of the advice you're getting.
The reason for this is you can build some pretty hairy functions that cause massive overhead with exponential effect.
As you've found with rtrim and ltrim this isn't a blanket reason to stop using all functions on the sql side.

It somewhat depends on what all is encompassed by: "things like string trimming", but, for string trimming at least, I'd definitely let the database do that (there will be less network traffic as well). As for the indexes, they will still be used if you're where clause is just using the column itself (as opposed to a function of the column). Use of the indexes won't be affected whatsoever by using functions on the actual columns you're retrieving (just on how you're selecting the rows).
You may want to have a look at this for performance improvement suggestions: http://net.tutsplus.com/tutorials/other/top-20-mysql-best-practices/

As I said in my comment, reduce the data read per query and you will get a speed increase.
You said:
our order numbers are six-digits long
so each has two blank characters
leading
Makes me think you are storing numbers in a string, if so why are you not using a numeric data type? The smallest numeric type which will take 6 digits is an INT (I'm assuming SQL Server) and that already saves you 4 bytes per order number, over the number of rows you mention that's quite a lot less data to read off disk and send over the network.
Fully optimise your database before looking to deal with the data outside of it; it's what a database server is designed to do, serve data.

As you found it often pays to measure but I what I think your manager may have been referring to is somthing like this.
This is typically much faster
SELECT SomeFields FROM oelinhst_sql
WHERE
datetimeField > '1/1/2011'
and
datetimeField < '2/1/2011'
than this
SELECT SomeFields FROM oelinhst_sql
WHERE
Month(datetimeField) = 1
and
year(datetimeField) = 2011
even though the rows that are returned are the same

SQL Concatenation filling up tempDB

We are attempting to concatenate possibly thousands of rows of text in SQL with a single query. The query that we currently have looks like this:
DECLARE #concatText NVARCHAR(MAX)
SET #concatText = ''
UPDATE TOP (SELECT MAX(PageNumber) + 1 FROM #OrderedPages) [#OrderedPages]
SET #concatText = #concatText + [ColumnText] + '
'
WHERE (RTRIM(LTRIM([ColumnText])) != '')
This is working perfectly fine from a functional standpoint. The only issue we're having is that sometimes the ColumnText can be a few kilobytes in length. As a result, we're filling up tempDB when we have thousands of these rows.
The best reason that we have come up with is that as we're doing these updates to #concatText, SQL is using implicit transactions so the strings are effectively immutable.
We are trying to figure out a good way of solving this problem and so far we have two possible solutions:
1) Do the concatenation in .NET. This is an OK option, but that's a lot of data that may go back across the wire.
2) Use .WRITE which operates in a similar fashion to .NET's String.Join method. I can't figure out the syntax for this as BoL doesn't cover this level of SQL shenanigans.
This leads me to the question: Will .WRITE work? If so, what's the syntax? If not, are there any other ways to do this without sending data to .NET? We can't use FOR XML because our text may contain illegal XML characters.
Thanks in advance.

I'd look at using CLR integration, as suggested in #Martin's comment. A CLR aggregate function might be just the ticket.

What exactly is filling up tempdb? It cannot be #concatText = #concatText + [ColumnText], there is no immutability involved and the #concatText variable will be at worst case 2GB size (I expect your tempdb is much larger than that, if not increase it). It seems more like your query plan creates a spool for haloween protection and that spool is the culprit.
As a generic answer, using the UPDATE ... SET #var = #var + ... for concatenation is known to have correctness issues and is not supported. Alternative approaches that work more reliably are discussed in Concatenating Row Values in Transact-SQL.

First, from your post, it isn't clear whether or why you need temp tables. Concatenation can be done inline in a query. If you show us more about the query that is filling up tempdb, we might be able to help you rewrite it. Second, an option that hasn't been mentioned is to do the string manipulation outside of T-SQL entirely. I.e., in your middle-tier query for the raw data, do the manipulation and push it back to the database. Lastly, you can use Xml such that the results handle escapes and entities properly. Again, we'd need to know more about what and how you are trying to accomplish.

Agreed..A CLR User Defined Function would be the best approach for what you guys are doing. You could actually read the text values into an object and then join them all together (inside the CLR) and have the function spit out a NVARCHAR(MAX) result. If you need details on how to do this let me know.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas