SQL Performance on joining 2 scripts ORACLE - sql

I have 2 scripts which are quite complicated, one was written by me personally, another was done 10 years ago. The first script gets necessary id's and executes in around 30 sec, example:
| ID | some other info ...
+----+--------------------
| 1 | ...
| 2 | ...
| 3 | ...
| 4 | ...
The second script get's some more complicated data, which is calculated through many sub queries, and executes in around 30 sec, ex:
| ID | Computed Info
+----+--------------------
| 1 | 111
| 2 | 222
| 3 | 333
| 4 | 444
Now my script1 needs to include some partial results from script2, because the script2 is very complicated it is quite hard to break down the necessary parts, that is why I have tried to left join results of the script2 to the script1 using ID's
SELECT TABLE1.*, TABLE2.COMPUTED_INFO FROM SCRIPT1 TABLE1 LEFT JOIN SCRIPT2 TABLE2 ON TABLE2.ID = TABLE1.ID
The result I got and also the expected result is:
| ID | some other info ... | Computed Info
+----+---------------------+---------------
| 1 | ... | 111
| 2 | ... | 222
| 3 | ... | 333
| 4 | ... | 444
The problem is that after joining both of them the time of execution is now 20+ min.
I have also tried
with table1 as
(script1),
table2 as
(script2)
select t1.*, t2.computed_data
from table1 t1 left join table2 t2 on t2.id = t1.id
Which resulted in 10+ min.
I am wondering why such thing occurs, when definitely script1 and script2 separately run in around 30 sec each, but when run together go as much as 10+
Is there another way to accomplish that?

You can create temp tables before joın those tables.
first create table temp1 as select * from script1
and create table temp2 as select * from script2
then select your query
SELECT temp1.*, temp2.COMPUTED_INFO FROM temp1 TABLE1 LEFT JOIN temp2 TABLE2 ON temp2.ID = temp1.ID

Last time when I had this kind of issue, I resolved it with a temporary table. I'd created temporary tables with the SCRIPT1 and SCRIPT2 results. Then added indices to the ID columns.
After this, a similar query than yours must execute faster.
This happened on a postgresql server, but the problem's root is the same. Usually an RDBMS can't optimise a subquery/resultset from a PROCEDURE/FUNCTION and cannot use indices on its rows.

Related

Print output of count to DBMS

I'm trying to run the same query over and over again so I can get the status of a process.
How do I use PL/SQL and the DBMS window or just the Query Result window to accomplish this?
My query looks like this:
Select count(table1.*) from table1
inner join table2 on table1.fk = table2.fk
inner join table3 on table3.fk = table2.fk
where
table1.col2={a value}
I've looked at several answers which discussed using a loop or dbms.output.put_line() but can't get my court to be what's displayed.
Sample data of info in these tables:
table1.columnName has company IDs and a program/process IDs
Col1 | Col2 | col3
1 | 42 | 2
2 | 42 | 2
3 | 42 | 2
1 | 41 | 2
4 | 41 | 2
1 | 43 | 2
Example output for the query where table1.col2=42 would be 3 because there are 3 rows where Col2 has value 42
Thank you
in the result window just type / and hit enter and you get the new count without typing in the entire query again.
in the pl_sql you can use a unix script and a sendmail command to get the results delivered to your inbox.

Table Left Join

I'm currently trying to get an output from two tables that I want to join and it seems like I have a block in my mind on how to resolve this.
Table 1 has products with unique IDs.
ID | (other info)
-----------------
AA |
BB |
CC |
Table 2 has the unique ID of Table 1 as FK as well as a model number and a part-code that I would like to join onto Table 1. Table 2 has a multitude of other information resulting in the following possible constellation:
ID | FK | model number | part-code
-----------------------------------
01 | AA | model0001 | part923
02 | AA | model0001 |
03 | AA | | part923
04 | BB | model0002 |
05 | BB | | part876
06 | CC | | part551
Information in Table 2 is therefore very scattered and not necessarily complete. I also do not want to assume that for a given FK the model number and the part-code remain the same across all entries (if there are multiple variants for a given FK, I only want one entry, even if it is at random).
The result I am trying to achieve is to get all the information I extract from Table 1, and it is given that there will always be a unique ID (=FK in Table 2), and add the model number and part-code, if existing, to the table without creating any duplicates. The example above should therefore give the following output.
ID | model number | part-code | (other info from table 1)
---------------------------------------------------------
AA | model0001 | part923 |
BB | model0002 | part876 |
CC | | part551 |
I should also mention that Table 2 is extremely large (millions of entries) and I have no way to match the data except with the IDs from Table 1. This table is also quite big - an efficient way of approaching this is therefore necessary.
Thank you for your time in reading this and helping me understand how to approach this.
Best,
Jonas
You are right, you need an OUTER JOIN to get all the records in table1 with whatever matching records are in table2.
Getting only one record per hit from table2 is tricky. This aggregating subquery will produce your desired output. Note that this solution can produce a permutation of (model_number,part_code) which does not exist in any single record in table2 ; I guess it's okay as that is what your sample result set shows for BB. The performance across "millions of entries" may be slow. But that is a (separate) tuning issue.
select t1.id
, t2.model_number
, t2.part_code
, t1.whatever
, t1.blah
, t1.etc
from table1 t1
left outer join ( select fk
, max (model_number) as model_number
, max (part_code) as part_code
from table2
group by fk ) t2
on t1.id = t2.fk
order by t1.id
/
You can try this SQL
SELECT t1.ID, t2.model_number, t2.part-code FROM table1 t1
LEFT JOIN table2 t2 ON t1.ID = t2.FK
GROUP BY t1.ID
Hope that helps!

SQL script runs VERY slowly with small change

I am relatively new to SQL. I have a script that used to run very quickly (<0.5 seconds) but runs very slowly (>120 seconds) if I add one change - and I can't see why this change makes such a difference. Any help would be hugely appreciated!
This is the script and it runs quickly if I do NOT include "tt2.bulk_cnt
" in line 26:
with bulksum1 as
(
select t1.membercode,
t1.schemecode,
t1.transdate
from mina_raw2 t1
where t1.transactiontype in ('RSP','SP','UNTV','ASTR','CN','TVIN','UCON','TRAS')
group by t1.membercode,
t1.schemecode,
t1.transdate
),
bulksum2 as
(
select t1.schemecode,
t1.transdate,
count(*) as bulk_cnt
from bulksum1 t1
group by t1.schemecode,
t1.transdate
having count(*) >= 10
),
results as
(
select t1.*, tt2.bulk_cnt
from mina_raw2 t1
inner join bulksum2 tt2
on t1.schemecode = tt2.schemecode and t1.transdate = tt2.transdate
where t1.transactiontype in ('RSP','SP','UNTV','ASTR','CN','TVIN','UCON','TRAS')
)
select * from results
EDIT: I apologise for not putting enough detail in here previously - although I can use basic SQL code, I am a complete novice when it comes to databases.
Database: Oracle (I'm not sure which version, sorry)
Execution plans:
QUICK query:
Plan hash value: 1712123489
---------------------------------------------
| Id | Operation | Name |
---------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | HASH JOIN | |
| 2 | VIEW | |
| 3 | FILTER | |
| 4 | HASH GROUP BY | |
| 5 | VIEW | VM_NWVW_0 |
| 6 | HASH GROUP BY | |
| 7 | TABLE ACCESS FULL| MINA_RAW2 |
| 8 | TABLE ACCESS FULL | MINA_RAW2 |
---------------------------------------------
SLOW query:
Plan hash value: 1298175315
--------------------------------------------
| Id | Operation | Name |
--------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | FILTER | |
| 2 | HASH GROUP BY | |
| 3 | HASH JOIN | |
| 4 | VIEW | VM_NWVW_0 |
| 5 | HASH GROUP BY | |
| 6 | TABLE ACCESS FULL| MINA_RAW2 |
| 7 | TABLE ACCESS FULL | MINA_RAW2 |
--------------------------------------------
A few observations, and then some things to do:
1) More information is needed. In particular, how many rows are there in the MINA_RAW2 table, what indexes exist on this table, and when was the last time it was analyzed? To determine the answers to these questions, run:
SELECT COUNT(*) FROM MINA_RAW2;
SELECT TABLE_NAME, LAST_ANALYZED, NUM_ROWS
FROM USER_TABLES
WHERE TABLE_NAME = 'MINA_RAW2';
From looking at the plan output it looks like the database is doing two FULL SCANs on MINA_RAW2 - it would be nice if this could be reduced to no more than one, and hopefully none. It's always tough to tell without very detailed information about the data in the table, but at first blush it appears that an index on TRANSACTIONTYPE might be helpful. If such an index doesn't exist you might want to consider adding it.
2) Assuming that the statistics are out-of-date (as in, old, nonexistent, or a significant amount of data (> 10%) has been added, deleted, or updated since the last analysis) run the following:
BEGIN
DBMS_STATS.GATHER_TABLE_STATS(owner => 'YOUR-SCHEMA-NAME',
table_name => 'MINA_RAW2');
END;
substituting the correct schema name for "YOUR-SCHEMA-NAME" above. Remember to capitalize the schema name! If you don't know if you should or shouldn't gather statistics, err on the side of caution and do it. It shouldn't take much time.
3) Re-try your existing query after updating the table statistics. I think there's a fair chance that having up-to-date statistics in the database will solve your issues. If not:
4) This query is doing a GROUP BY on the results of a GROUP BY. This doesn't appear to be necessary as the initial GROUP BY doesn't do any grouping - instead, it appears this is being done to get the unique combinations of MEMBERCODE, SCHEMECODE, and TRANSDATE so that the count of the members by scheme and date can be determined. I think the whole query can be simplified to:
WITH cteWORKING_TRANS AS (SELECT *
FROM MINA_RAW2
WHERE TRANSACTIONTYPE IN ('RSP','SP','UNTV',
'ASTR','CN','TVIN',
'UCON','TRAS')),
cteBULKSUM AS (SELECT a.SCHEMECODE,
a.TRANSDATE,
COUNT(*) AS BULK_CNT
FROM (SELECT DISTINCT MEMBERCODE,
SCHEMECODE,
TRANSDATE
FROM cteWORKING_TRANS) a
GROUP BY a.SCHEMECODE,
a.TRANSDATE)
SELECT t.*, b.BULK_CNT
FROM cteWORKING_TRANS t
INNER JOIN cteBULKSUM b
ON b.SCHEMECODE = t.SCHEMECODE AND
b.TRANSDATE = t.TRANSDATE
I managed to remove an unnecessary subquery, but this syntax with distinct inside count may not work outside of PostgreSQL or may not be the desired result. I know I've certainly used it there.
select t1.*, tt2.bulk_cnt
from mina_raw2 t1
inner join (select t2.schemecode,
t2.transdate,
count(DISTINCT membercode) as bulk_cnt
from mina_raw2 t2
where t2.transactiontype in ('RSP','SP','UNTV','ASTR','CN','TVIN','UCON','TRAS')
group by t2.schemecode,
t2.transdate
having count(DISTINCT membercode) >= 10) tt2
on t1.schemecode = tt2.schemecode and t1.transdate = tt2.transdate
where t1.transactiontype in ('RSP','SP','UNTV','ASTR','CN','TVIN','UCON','TRAS')
When you use those with queries, instead of subqueries when you don't need to, you're kneecapping the query optimizer.

Join two tables juxtaposing columns with same name sql

I have two sqlite3 tables with same column names and I want to compare them. To do that, I need to join the tables and juxtapose the columns with same name.
The tables share an identical column which I want to put as the first column.
Let's imagine I have table t1 and table t2
Table t1:
SharedColumn | Height | Weight
A | 2 | 70
B | 10 | 100
Table t2:
SharedColumn | Height | Weight
A | 5 | 25
B | 32 | 30
What I want get as a result of my query is :
SharedColumn | Height_1 | Height_2 | Weight_1 | Weight_2
A | 2 | 5 | 70 | 25
B | 10 | 32 | 100 | 30
In my real case i have a lot of columns so I would like to avoid writing each column name twice to specify the order.
Renaming the columns is not my main concern, what interests me the most is the juxtaposition of columns with same name.
There is no way to do that directly in SQL especially because you also want to rename the columns to identify their source, you'll have to use dynamic SQL and honestly? Don't! .
Simply write the columns names, most SQL tools provide a way to generate the select, just copy them and place them in the correct places :
SELECT t1.sharedColumn,t1.height as height_1,t2.height as height_2 ...
FROM t1
JOIN t2 ON(t1.sharedColumn = t2.sharedColumn)+
Try the following query to get the desired result!!
SELECT t1.Height AS Height_1, t1.Weight AS Weight_1, t1.sharedColumn AS SharedColumn
t2.Height AS Height_2, t2.Weight AS Weight_2
FROM t1 INNER JOIN t2
ON t1.sharedColumn = t2.sharedColumn
ORDER By t1.sharedColumn ASC
After that, you can fetch the result by following lines:
$result['SharedColumn'];
$result['Height_1'];
$result['Height_2'];
$result['Weight_1'];
$result['Weight_1'];

Finding the difference between two sets of data from the same table

My data looks like:
run | line | checksum | group
-----------------------------
1 | 3 | 123 | 1
1 | 7 | 123 | 1
1 | 4 | 123 | 2
1 | 5 | 124 | 2
2 | 3 | 123 | 1
2 | 7 | 123 | 1
2 | 4 | 124 | 2
2 | 4 | 124 | 2
and I need a query that returns me the new entries in run 2
run | line | checksum | group
-----------------------------
2 | 4 | 124 | 2
2 | 4 | 124 | 2
I tried several things, but I never got to a satisfying answer.
In this case I'm using H2, but of course I'm interested in a general explanation that would help me to wrap my head around the concept.
EDIT:
OK, it's my first post here so please forgive if I didn't state the question precisely enough.
Basically given two run values (r1, r2, with r2 > r1) I want to determine which rows having row = r2 have a different line, checksum or group from any row where row = r1.
select * from yourtable
where run = 2 and checksum = (select max(checksum)
from yourtable)
Assuming your last run will have the higher run value than others, below SQL will help
select * from table1 t1
where t1.run in
(select max(t2.run) table1 t2)
Update:
Above SQL may not give you the right rows because your requirement is not so clear. But the overall idea is to fetch the rows based on the latest run parameters.
SELECT line, checksum, group
FROM TableX
WHERE run = 2
EXCEPT
SELECT line, checksum, group
FROM TableX
WHERE run = 1
or (with slightly different result):
SELECT *
FROM TableX x
WHERE run = 2
AND NOT EXISTS
( SELECT *
FROM TableX x2
WHERE run = 1
AND x2.line = x.line
AND x2.checksum = x.checksum
AND x2.group = x.group
)
A slightly different approach:
select min(run) run, line, checksum, group
from mytable
where run in (1,2)
group by line, checksum, group
having count(*)=1 and min(run)=2
Incidentally, I assume that the "group" column in your table isn't actually called group - this is a reserved word in SQL and would need to be enclosed in double quotes (or backticks or square brackets, depending on which RDBMS you are using).