Optimization when merging from Oracle datalink - sql

I am trying to write an Oracle procedure to merge data from a remote datalink into a local table. Individually the pieces work quickly, but together they time out. Here is a simplified version of what I am trying.
What works:
Select distinct ProjectID from Project where LastUpdated < (sysdate - 6/24);
--Works in split second.
Merge into project
using (select /*+DRIVING_SITE(remoteCompData)*/
rp.projectID,
rp.otherdata
FROM Them.Remote_Data#DBLink rd
WHERE rd.projectID in (1,2,3)) sourceData -- hardcoded IDs
On (rd.projectID = project.projectID)
When matched...
-- Merge statement works quickly when the IDs are hard coded
What doesn't work: Combining the two statements above.
Merge into project
using (select /*+DRIVING_SITE(rd)*/ -- driving site helps when this piece is extracted from the larger statement
rp.projectID,
rp.otherdata
FROM Them.Remote_Data#DBLink rd
WHERE rd.projectID in --in statement that works quickly by itself.
(Select distinct ProjectID from Project where LastUpdated < (sysdate - 6/24))
-- This select in the in clause one returns 10 rows. Its a test database.
On (rd.projectID = project.projectID)
)
When matched...
-- When I run this statement in SQL Developer, this is all that I get without the data updating
Connecting to the database local.
Process exited.
Disconnecting from the database local.
I also tried pulling out the in statement into a with statement hoping it would execute differently, but it had no effect.
Any direction for paths to pursue would be appreciated.
Thanks.

The /*+DRIVING_SITE(rd)*/ hint doesn't work with MERGE because the operation must run in the database where the merged table sits. Which in this case is the local database. That means the whole result set from the remote table is pulled across the database link and then filtered against the data from the local table.
So, discard the hint. I also suggest you convert the IN clause into a join:
Merge into project p
using (select rp.projectID,
rp.otherdata
FROM Project ld
inner join Them.Remote_Data#DBLink rd
on rd.projectID = ld.projectID
where ld.LastUpdated < (sysdate - 6/24)) q
-- This select in the in clause one returns 10 rows. Its a test database.
On (q.projectID = p.projectID)
)
Please bear in mind that answers to performance tuning questions without sufficient detail are just guesses.

I found your question having same problem. Yes, the hint in query is ignored when the query is included into using clause of merge command.
In my case I created work table, say w_remote_data for your example, and splitted merge command into two commands: (1) fill the work table, (2) invoke merge command using work table.
The pitfall is, we cannot simply use neither of commands create w_remote_data as select /*+DRIVING_SITE(rd)*/ ... or insert into w_remote_data select /*+DRIVING_SITE(rd)*/ ... to fill the work table. Both of these commands are valid but they are slow - the hint does not apply too so we would not get rid of the problem. The solution is in PLSQL: collect result of query in using clause using intermediate collection. See example (for simplicity I assume w_remote_data has same structure as remote_data, otherwise we have to define custom record instead of %rowtype):
declare
type ct is table of w_remote_data%rowtype;
c ct;
i pls_integer;
begin
execute immediate 'truncate table w_remote_data';
select /*+DRIVING_SITE(rd)*/ *
bulk collect into c
from Them.Remote_Data#DBLink rd ...;
if c.count > 0 then
forall i in c.first..c.last
insert into w_remote_data values c(i);
end if;
merge into project p using (select * from w_remote_data) ...;
execute immediate 'truncate table w_remote_data';
end;
My case was ETL script where I could rely it won't run in parallel. Otherwise we would have to cope with temporary (session-private) tables, I didn't try if it works with them.

Related

Drop tables using table names from a SELECT statement, in SQL (Impala)?

How do I drop a few tables (e.g. 1 - 3) using the output of a SELECT statement for the table names? This is probably standard SQL, but specifically I'm using Apache Impala SQL accessed via Apache Zeppelin.
So I have a table called tables_to_drop with a single column called "table_name". This will have one to a few entries in it, each with the name of another temporary table that was generated as the result of other processes. As part of my cleanup I need to drop these temporary tables whose names are listed in the "tables_to_drop" table.
Conceptually I was thinking of an SQL command like:
DROP TABLE (SELECT table_name FROM tables_to_drop);
or:
WITH subquery1 AS (SELECT table_name FROM tables_to_drop) DROP TABLE * FROM subquery1;
Neither of these work (syntax errors). Any ideas please?
even in standard sql this is not possible to do it the way you showed.
in standard sql usually you can use dynamic sql which impala doesn't support.
however you can write an impala script and run it in impala shell but it's going to be complicated for such task, I would prepare the drop statement using select and run it manually if this is one-time thing:
select concat('DROP TABLE IF EXISTS ',table_name) dropstatements
from tables_to_drop

Query just runs, doesn't execute

my query just runs and doesnt execute, what is wrong. work on oracle sql developer, company server
CREATE TABLE voice2020 AS
SELECT
to_char(SDATE , 'YYYYMM') as month,
MSISDN,
SUM(CH_MONEY_SUBS_DED)/100 AS AIRTIME_VOICE,
SUM(CALLDURATION/60) AS MIN_USAGE,
sum(DUR_ONNET_OOB/60) as DUR_ONNET_OOB,
sum(DUR_ONNET_IB/60) as DUR_ONNET_IB,
sum(DUR_ONNET_FREE/60) as DUR_ONNET_FREE,
sum(DUR_OFFNET_OOB/60) as DUR_OFFNET_OOB,
sum(DUR_OFFNET_IB/60) as DUR_OFFNET_IB,
sum(DUR_OFFNET_FREE/60) as DUR_OFFNET_FREE,
SUM(case when sdate < to_date('20190301','YYYYMMDD')
then CH_MONEY_PAID_DED-nvl(CH_MONEY_SUBS_DED,0)-REV_VOICE_INT-REV_VOICE_ROAM_OUTGOING-REV_VOICE_ROAM_Incoming
else (CH_MONEY_OOB-REV_VOICE_INT-REV_VOICE_ROAM_OUTGOING-REV_VOICE_ROAM_Incoming) end)/100 AS VOICE_OOB_SPEND
FROM CCN.CCN_VOICE_MSISDN_MM#xdr1
where MSISDN IN ( SELECT MSISDN FROM saayma_a.BASE30112020) --change date
GROUP BY
MSISDN,
to_char(SDATE , 'YYYYMM')
;
This is a performance issue. Clearly the query driving your CREATE TABLE statement is taking too long to return a result set.
You are querying from a table in a remote database (CCN.CCN_VOICE_MSISDN_MM#xdr1) and then filtering against a local table (saayma_a.BASE30112020) . This means you are going to copy all of that remote table across the network, then discard the records which don't match the WHERE clause.
You know your data (or at least you should know it): does that sound efficient? If you're actually discarding most of the records you should try to filter CCN_VOICE_MSIDN_MM in the remote database.
If you need more advice you need to provide more information. Please read this post about asking Oracle tuning questions on this site, then edit your question to include some details.
You are executing CTAS (CREATE TABLE AS SELECT) and the purpose of this query is to create the table with data which is generated via this query.
If you want to just execute the query and see the data then remove first line of your query.
-- CREATE TABLE voice2020 AS
SELECT
.....
Also, the data of your actual query must be present in the voice2020 table if you have already executed it once.
Select * from voice2020;
Looks like you are trying to copying the data from one table to another table, Can you once create the table if it's not created and then try this statement.
insert into target_table select * from source_table;

INSERT FROM EXISTING SELECT without amending

With GDPR in the UK on the looming horizon and already have a team of 15 users creating spurious SELECT statements (in excess of 2,000) across 15 differing databases I need to be able to create a method to capture an already created SELECT statement and be able to assign surrogate keys/data WITHOUT rewriting every procedure we already have.
There will be a need to run the original team members script as normal and there will be requirements to pseudo the values.
My current thinking is to create a stored procedure along the lines of:
CREATE PROC Pseudo (#query NVARCHAR(MAX))
INSERT INTO #TEMP FROM #query
Do something with the data via a mapping table of real and surrogate/pseudo data.
UPDATE #TEMP
SET FNAME = (SELECT Pseudo_FNAME FROM PseudoTable PT WHERE #TEMP.FNAME = PT.FNAME)
SELECT * FROM #TEMP
So that team members can run their normal SELECT statements and get pseudo data simply by using:
EXEC Pseudo (SELECT FNAME FROM CUSTOMERS)
The problem I'm having is you can't use:
INSERT INTO #TEMP FROM #query
So I tried via CTE:
WITH TEMP AS (#query)
..but I can't use that either.
Surely there's a way of capturing the recordset from an existing select that I can pull into a table to amend it or capture the SELECT statement; without having to amend the original script. Please bear in mind that each SELECT statement will be unique so I can't write COLUMN or VALUES etc.
Does any anyone have any ideas or a working example(s) on how to best tackle this?
There are other lengthy methods I could externally do to carry this out but I'm trying to resolve this within SQL if possible.
So after a bit of deliberation I resolved it.
I passed the Original SELECT SQL to SP that used some SQL Injection, which when executed INSERTed data. I then Updated from that dataset.
The end result was "EXEC Pseudo(' Orginal SQL ;')
I will have to set some basic rules around certain columns for now as a short term fix..but at least users can create NonPseudo and Pseudo data as required without masses of reworking :)

Using table variables in Oracle Stored Procedure

I have lots of experience with T-SQL (MS SQL Server).
There it is quite common to first select some set of records into a
table variable or say temp table t, and then work with this t
throughout the whole SP body using it just like a regular table
(for JOINS, sub-queries, etc.).
Now I am trying the same thing in Oracle but it's a pain.
I get errors all the way and it keeps saying
that it does not recognize my table (i.e. my table variable).
Error(28,7): PL/SQL: SQL Statement ignored
Error(30,28): PL/SQL: ORA-00942: table or view does not exist
I start thinking what at all is possible to do with this
table variable and what not (in the SP body) ?
I have this declaration:
TYPE V_CAMPAIGN_TYPE IS TABLE OF V_CAMPAIGN%ROWTYPE;
tc V_CAMPAIGN_TYPE;
What on Earth can I do with this tc now in my SP?!
This is what I am trying to do in the body of the SP.
UPDATE ( SELECT t1.STATUS_ID, t2.CAMPAIGN_ID
FROM V_CAMPAIGN t1
INNER JOIN tc t2 ON t1.CAMPAIGN_ID = t2.CAMPAIGN_ID
) z
SET z.STATUS_ID = 4;
V_CAMPAIGN is a DB view, tc is my table variable
Presumably you are trying to update a subset of the V_CAMPAIGN records.
While in SQLServer it may be useful to define a 'temporary' table containing the subset and then operate on that it isn't necessary in Oracle.
Simply update the table with the where clause you would have used to define the temp table.
E.g.
UPDATE v_campaign z
SET z.status_id = 4
WHERE z.column_name = 'a value'
AND z.status <> 4
I assume that the technique you are familiar with is to minimise the effect of read locks that are taken while selecting the data.
Oracle uses a different locking strategy so the technique is mostly unnecessary.
Echoing a comment above - tell us what you want to achieve in Oracle and you will get suggestions for the best way forward.

Conditionally using a SQL Temp Table throws errors

I have a (not normalized) legacy SQL database and am working on a task to gather code from one of several similar tables. I want to execute a lot of identical code against the #Temp table, but the tables it draws from are very dissimilar (with some columns being the same).
My code is:
IF #Variable = 'X'
BEGIN
SELECT * INTO #Temp FROM TABLE1 WHERE Condition1 = #Condition
END
IF #Variable = 'Y'
BEGIN
SELECT * INTO #Temp FROM TABLE2 WHERE Condition1 = #Condition
END
At this point, I execute some common code. There is quite a lot and I want to just use #Temp, not have another IF condition with the code copied in multiple times. I cannot really declare the table ahead of time (it is very wide - and they are not the same) and I cannot really normalize the DB (the legacy system is far to 'mature' and my time frame is far to small). Also, at the end of the query, the #Temp table is used for creating new rows back in the original table (so again, I cannot just declare the common parts).
At this point, I cannot make my stored proc because
There is already an object named '#Temp' in the database.
This error highlights the 2nd IF block. Adding a DROP TABLE #Temp in the IF block does not help either. So I'm having to offload the work in additional SPROCs or repeat the code in conditional statements. For readability, I don't like either of these options.
Any way to use #Temp within multiple IF blocks as above ( I really have more IF conditions, only 2 shown to give an idea of the issue).
Example SqlFiddle