Replacing poor performing cursor in SQL - sql

I've wrote a sql statement using cursor but unfortunately the performance is pretty poor.
The data set that I'm running it on is about 8m records.
The cursor using table with some regex expressions (around 100) to try to extract parts from the column data itself.
FOR cur_row AS r_cursor(x)
DO
SELECT SUBSTR_REGEXPR(cur_row.REGEX_PATTERN IN im_str GROUP cur_row.REGEX_GROUP) INTO col_part FROM DUMMY;
IF :col_part IS NOT NULL THEN
SELECT LTRIM(RTRIM(col_part)) FROM DUMMY; //some more assignments go there
BREAK;
ELSE
col_part = im_str;
END IF;
END FOR;
Unfortunately for those 8m records this take over 40min.
Does anyone have any idea how can I rewrite it? (I'm using SAP HANA)

Instead of using the reg ex for each row of the cursor, start with a smaller set of data of possible. Make your data set as small as possible using an initial filter.
I don't know SAP HANA syntax, but in pseudocode:
select records into temptable where field contains search criteria
for each row in temptable
do something

Related

How to use a table's content for querying other tables in BIgQuery

My team and I are using a query on a daily basis to receive specific results from a large dataset. This query is constantly updated with different terms that I would like to receive from the dataset.
To make this job more scaleable, I built a table of arrays, each containing the terms and conditions for the query. That way the query can lean on the table, and changes that I make in the table will affect the query without the need to change it.
The thing is - I can't seem to find a way to reference the table in the actual query without selecting it. I want to use the content of the table as a WHERE condition. for example:
table1:
terms
[term1, term2, term3]
query:
select * from dataset
where dataset.collumn like '%term1'
or dataset.collumn like '%term2'
or dataset.collumn like '%term3'
etc.
If you have any ideas please let me know (if the solution involves Python or JS this is also great)
thanks!
You can "build" the syntax you want using Procedural Language in BigQuery and then execute it. Here is a way of doing it without "leaving" BQ (meaning, without using external code):
BEGIN
DECLARE statement STRING DEFAULT 'SELECT col FROM dataset.table WHERE';
FOR record IN (SELECT * FROM UNNEST(['term1','term2','term3']) as term)
DO
SET statement = CONCAT(statement, ' col LIKE "', '%', record.term, '" OR');
END FOR;
SET statement = CONCAT(statement, ' 1=2');
EXECUTE IMMEDIATE statement;
END;

FOR loop in Oracle SQL or Apply SQL to multiple Oracle tables

My SQL is a bit rusty, so I don't know whether the following is even possible:
I have multiple tables t_a, t_b, t_c with the same column layout and I want to apply the same operation to them, namely output some aggregation into another table. For a table t_x this would look like this:
CREATE TABLE t_x_aggregate (
<here the col definitions which are the same for all new tables t_[abc]_aggregate>
);
INSERT INTO t_x_aggregate(id, ...)
SELECT id, SUM(factor*amount)
FROM t_x
WHERE some fixed condition
GROUP BY id;
I now want to execute something like a FOR loop around this:
for t_x in t_a, t_b, t_c
CREATE TABLE ...
INSERT INTO ...
end for
Is this possible in SQL? Or would I need to build a wrapper in another language for this?
So, the result of that operation would be 3 new tables? T_A_AGGREGATE, T_B_AGGREGATE and T_C_AGGREGATE?
I think that the fastest way is to write 3 separate CREATE TABLE statements, e.g.
create table t_a_aggregate as
select id, sum(factor * amount) suma
from t_a
where some_condition
group by id;
create table t_b_aggregate as
select id, sum(factor * amount) suma
from t_b
where some_condition
group by id;
create table t_c_aggregate as
select id, sum(factor * amount) suma
from t_c
where some_condition
group by id;
OK; I understand that queries aren't that simple, but nothing much changes - only table names in CREATE and FROM (perhaps somewhere else, but that's more or less "it"). Any decent text editor's search/replace capabilities should be able to do it quickly.
If you want to do it dynamically in a loop (read: PL/SQL), you can - but dynamic SQL doesn't scale, is difficult to maintain, is painful to debug. Therefore, if you're doing it only once, consider running 3 separate statements.
How to do it dynamically?
You'd have to create a string (we usually put them into a locally declared variable) which contains the whole DDL statement. Why? Because you can't execute DDL from a PL/SQL otherwise.
If there are multiple tables and/or columns involved, you'll have to combine "fixed" parts of the statement (like create table, select, from, order by) concatenated with "dynamic" parts - such as column names. Note that in between you have to concatenate commas as separators. Pay attention to usage of multiple single quotes as you have to escape them (or use the q-quoting mechanism).
Also, for multiple columns you'll probably have to do it in a loop, concatenating each new column to previously composed string.
It (the statement stored into the varirable) is the executed by EXECUTE IMMEDIATE. If it is correctly written, it'll succeed. Otherwise, it'll fail, but it won't tell you why it failed (that's why I said difficult debugging").
So, instead of executing it, we usually display that string (using dbms_output.put_line) so that we see how it looks like and - using copy/paste - try to execute it.
Basically, it can be quite complex and - as I said - difficult to maintain and debug.
For the FOR loop you need to use PL/SQL like this:(*)
declare
type array_t is table of varchar2(10);
array array_t := array_t('a', 'b', 'c');
lo_stmt varchar2(2000);
begin
lo_stmt :=
'CREATE TABLE t_'||array(i)||'_aggregate ('||
' <here the col definitions which are the same for all new tables t_[abc]_aggregate>'||
');'||
''||
'INSERT INTO t_'||array(i)||'_aggregate(id, ...)'||
'SELECT id, SUM(factor*amount)'||
'FROM t_'||array(i)||
'WHERE some fixed condition'||
'GROUP BY id;'||
execute immediate lo_stmt;
end loop;
end;
/
Look also at this SO question: How to use Oracle PL/SQL to create...
(*) #Littlefoot describes in the 2nd part of his answer valuable background to this program.

Optimization when merging from Oracle datalink

I am trying to write an Oracle procedure to merge data from a remote datalink into a local table. Individually the pieces work quickly, but together they time out. Here is a simplified version of what I am trying.
What works:
Select distinct ProjectID from Project where LastUpdated < (sysdate - 6/24);
--Works in split second.
Merge into project
using (select /*+DRIVING_SITE(remoteCompData)*/
rp.projectID,
rp.otherdata
FROM Them.Remote_Data#DBLink rd
WHERE rd.projectID in (1,2,3)) sourceData -- hardcoded IDs
On (rd.projectID = project.projectID)
When matched...
-- Merge statement works quickly when the IDs are hard coded
What doesn't work: Combining the two statements above.
Merge into project
using (select /*+DRIVING_SITE(rd)*/ -- driving site helps when this piece is extracted from the larger statement
rp.projectID,
rp.otherdata
FROM Them.Remote_Data#DBLink rd
WHERE rd.projectID in --in statement that works quickly by itself.
(Select distinct ProjectID from Project where LastUpdated < (sysdate - 6/24))
-- This select in the in clause one returns 10 rows. Its a test database.
On (rd.projectID = project.projectID)
)
When matched...
-- When I run this statement in SQL Developer, this is all that I get without the data updating
Connecting to the database local.
Process exited.
Disconnecting from the database local.
I also tried pulling out the in statement into a with statement hoping it would execute differently, but it had no effect.
Any direction for paths to pursue would be appreciated.
Thanks.
The /*+DRIVING_SITE(rd)*/ hint doesn't work with MERGE because the operation must run in the database where the merged table sits. Which in this case is the local database. That means the whole result set from the remote table is pulled across the database link and then filtered against the data from the local table.
So, discard the hint. I also suggest you convert the IN clause into a join:
Merge into project p
using (select rp.projectID,
rp.otherdata
FROM Project ld
inner join Them.Remote_Data#DBLink rd
on rd.projectID = ld.projectID
where ld.LastUpdated < (sysdate - 6/24)) q
-- This select in the in clause one returns 10 rows. Its a test database.
On (q.projectID = p.projectID)
)
Please bear in mind that answers to performance tuning questions without sufficient detail are just guesses.
I found your question having same problem. Yes, the hint in query is ignored when the query is included into using clause of merge command.
In my case I created work table, say w_remote_data for your example, and splitted merge command into two commands: (1) fill the work table, (2) invoke merge command using work table.
The pitfall is, we cannot simply use neither of commands create w_remote_data as select /*+DRIVING_SITE(rd)*/ ... or insert into w_remote_data select /*+DRIVING_SITE(rd)*/ ... to fill the work table. Both of these commands are valid but they are slow - the hint does not apply too so we would not get rid of the problem. The solution is in PLSQL: collect result of query in using clause using intermediate collection. See example (for simplicity I assume w_remote_data has same structure as remote_data, otherwise we have to define custom record instead of %rowtype):
declare
type ct is table of w_remote_data%rowtype;
c ct;
i pls_integer;
begin
execute immediate 'truncate table w_remote_data';
select /*+DRIVING_SITE(rd)*/ *
bulk collect into c
from Them.Remote_Data#DBLink rd ...;
if c.count > 0 then
forall i in c.first..c.last
insert into w_remote_data values c(i);
end if;
merge into project p using (select * from w_remote_data) ...;
execute immediate 'truncate table w_remote_data';
end;
My case was ETL script where I could rely it won't run in parallel. Otherwise we would have to cope with temporary (session-private) tables, I didn't try if it works with them.

SQL Server Stored Procedure - Use Row Count in Select query

My stored procedure (SQL Server 2005) returns a dataset where one field depends, among other things, on the number of rows returned by the query. I can make a simplified first query that allows me to get ##ROWCOUNT but, in that case, the procedure returns the two sets, which is not what I want.
I tried putting the first query in a WITH statement but haven't found the syntax to extract the row count and put it in a variable that I could use in the second query. An alternative would be to get ##ROWCOUNT from the first query and tell the procedure to return only the result of the second query.
There are probably better ways to do that but my expertise in SQL is quite limited...
Thanks for any help!
Is this what you're looking for? If not, could you please describe your problem in more details (perhaps, with code snippets)
alter procedure ComplicatedStoredProcedure as
begin
declare #lastQueryRowCount int
-- Storing the number of rows returned by the first query into a variable.
select #lastQueryRowCount =
-- First resultset (not seen by caller).
(select count(*) from A where ID > 100)
-- Second resultset. This will be the actual result returned from the SP.
select * from B where SomeDependentField > #lastQueryRowCount
end

Oracle SQL "meta" query for records which have specific column values

I'd like to get all the records from a huge table where any of the number columns countains a value greater than 0. What's the best way to do it?
E.g.:
/* table structure*/
create table sometable (id number,
somestring varchar2(12),
some_amount_1 number(17,3),
some_amount_2 number(17,3),
some_amount_3 number(17,3),
...
some_amount_xxx number(17,3));
/* "xxx" > 100, and yeah I did not designed that table structure... */
And I want any row where any of the some_amount_n > 0 (even better solution is to add a field in the first place to show which field(s) are greater than zero).
I know I can write this with a huge some_amount_1 > 0 OR some_amount_2 > 0 OR ... block (and the field names with some case when but is there should be some more elegant solution, isn't there?
Possible solutions:
Normalize the table. You said you are not allowed to. Try to convince those that forbid such a change by explaining the benefits (performance, ease of writing queries, etc).
Write the huge ugly OR query. You could also print it along with the version of the query for the normalized tables. Add performance tests (you are allowed to create another test table or database, I hope.)
Write a program (either in PL/SQL or in another procedural language) that produces the horrible OR query. (Again, print along with the elegant version)
Add a new column, say called Any_x_bigger_than_zero which is automatically filled with either 0 or 1 via a trigger (that uses a huge ugly OR). Then you just need to check: WHERE Test_x_bigger_than_zero = 1 to see if any of the rows is > 0
Similar to previous but even better, create a materialized view with such a column.
First, create a table to sort the data into something more easily read from...something simple like id,column_name,column_value. You'll have to bear with me, been a while since I've operated in oracle, so this is heavy pseudo code at best:
Quick dynamic sql blurb...you can set a variable to a sql statement and then execute that variable. There are some security risks and it's possible this feature is disabled in your environment...so confirm you can run this first. Declare a variable, set the variable to 'select 1' and then use 'execute immediate' to execute the sql stored in your variable.
set var = 'select id, ''some_amount_' || 1 || '', some_amount || 1 || ' from table where some_amount_' || 1 || ' <> 0'
Assuming I've got my oracle syntax right...( pipe is append right? I believe a 3 single quote as ''' should result in one ' when in a variable too, you may have to trial and error this line until you have the var set to):
select id, 'some_amount_1',some_amount_1
from table
where some_amount_1 <> 0
This should select the ID and the value in some_amount_1 for each id in your database. You can turn this into an insert statement pretty easily.
I'm assuming some_amount_xxx has an upper limit...next trick is to loop this giant statement. Once again, horrible pseudo code:
declare sql_string
declare i and set to 1
for i = 1 to xxx (whatever your xxx is)
set sql_string to the first set var statement we made, replacing the '1' with the i var here.
execute sql
increment i
loop
Hopefully it makes sense...it's one of the very few scenarios you would ever want to loop dynamic sql on. Now you have a relatively straight forward table to read from and this should be a relatively easy query from here