Multiple Joins - Performance - sql

I have three tables:
Table A (approx. 500 000 records)
ID
ID_B
Text
1
10
bla
2
10
blabla
3
30
blablabla
Table B (approx. 100 000 records)
ID
Text
10
blab
20
blaba
30
blabb
Table C (approx. 600 000 records)
ID
ID_A
1
1
2
1
3
2
Now I want to join this three tables:
SELECT A.Text
FROM A
JOIN B ON B.ID = A.ID_B
JOIN C ON C.ID_A = A.ID
I have created a clustered primary key index (ID) and non-clustered index (ID_B) on table A.
According to the execution plan, at the beginning the clustered index is used to join A and C.
Afterwards the result set is sorted on column ID_B and used then in a merge join with B.
Execution Plan
The sort operation is the most expensive one. (about 40% of total costs)
Is there any way to optimize this query in terms of overall performance?

You haven't mentioned if you have any indexes on table B. Perhaps an index on it with the identifier and 'including' any columns you want to output.
Now, from what I gather in the comments, you're really joining to tables b and c primarily as a filter, not because you need to output data from those tables. If that's really the case, you should use exists. You may shy away from subqueries, but the engine knows what to do with exists. You'll see in the plan that it will run a 'semi join'.
select a.text
from a
where exists (select 0 from b where b.id = a.id_b)
and exists (select 0 from c where c.id_a = a.id)

Related

Merging/Joining Multiple Tables SQL

I am working with a database that has 5 tables each with a different number of observations. I include a description of the columns in each table below. As you can see from below, Table 1,2 and 5 have SecurID in common, Table 3 and 4 have Factor in common, and lastly 3 and 5 have BID in common. I need to perform an anlysis of Table 1 vs 2's exposure and return by date. To do this I need to do mulitple merge/joins. I need to join tables 3 and 4 then join them with table 5 and lastly with 1 and 2. What I tried was multiple joins like:
SELECT *
FROM Table3
INNER JOIN Table4 ON Table3.Factor = Table4.Factor
LEFT JOIN Table5 ON Table3.BID = Table5.BID
LEFT JOIN Table1 ON Table5.SecurID = Table1.SecurID
LEFT JOIN Table2 ON Table5.SecurID = Table2.SecurID
My problem is that when I run this query I get a crazy amount of extra observations. Are multiple join functions the most efficient way to combine all these tables? I'm very new to SQL, but each table has an index, which I believe is a faster way to speed up the data retrieval process compared with SELECT.
Table 1 (32,800 Observ.): SecurID, HoldingDate, Weight
Table 2 (2200 Observ.): SecurID, HoldingDate, Weight
Table 3 (808400 Observ.): BID, Factor, Exposure, Date
Table 4 (8000 Observ.): Factor, Return, FactorGrpName
Table 5 (1600 Observ.): SecurID, SecurName, BID

How to display data in SQL from multiple tables, but only if one column data matches another column?

I'm still learning SQL, so this may just be my ignorance or inability to express in a search what I'm looking for. I've spent roughly an hour searching for some variation of the title (both here and general searches on Google). I apologize, I apparently also don't know how to format here. I'll try to clean it up now that I've posted.
I have a database of customer data that I did not design. In the GUI, there are multiple tabs, and it seems like each tab earned it's own table. The tables are linked together with a field called RecordID. In one of the tables is the Customer Data tab. The way that it's organized is that a single customer record from table A can have multiple rows in table B. I only want data from column B in table B is "CompanyA" and if column A in table B = 1. Sample data is below.
Expected output:
CardNumber LastName FirstName CustomerID DataItem
------------------------------------------------------
32154 Clapton Eric 181212 CompanyA
Table A:
RecordID CardNumber LastName FirstName CustomerID
---------------------------------------------------------------
1 12345 Smith John 190201
2 12346 Jones Sandy 190202
3 23456 Petty Tom 190203
4 32154 Clapton Eric 181212
5 14728 Tyler Steven 180225
Table B:
RecordID DataID DataItem
--------------------------------
1 0 CompanyA
1 1 Yes
1 2 No
1 3 Revoked
1 4 NULL
1 5 CompanyB
2 0 CompanyB
2 1 Yes
2 2 No
2 3 NULL
2 4 24-54A
2 5 CompanyC
3 0 CompanyA
3 1 No
3 2 No
3 3 NULL
3 4 68-69B
3 5 NULL
4 0 CompanyA
4 1 Yes
4 2 Yes
5 0 CompanyB
5 1 No
5 2 No
5 5 CompanyA
The concept you're looking for is a JOIN. In this case specifically you need an INNER JOIN. Joins connects two tables together based on criteria you specify (such as matching values in fields) and merges the result into one table in the output.
Here's an example to suit your scenario:
SELECT
A.CardNumber,
A.LastName,
A.FirstName,
A.CustomerID,
B.DataItem
FROM
TableA A
INNER JOIN TableB B -- join tableB onto tableA
ON A.RecordID = B.RecordID -- in the ON clause you specify criteria by you match the fields
WHERE
B.columnA = 'CompanyA'
AND B.columnB = 1
Here's the relevant SQL Server Documentation
Also I'd advise you to potentially take a comprehensive introductory SQL tutorial, and/or find a book. A good one will introduce all of the basic, key concepts such as this to you in a logical way, then you're not grasping in the dark trying to google things for which you don't know the correct terminology.
select a.CardNumber, a.LastName, a.FirstName, a.CustomerID, b.dataitem
from tableA A inner join TableB b
on a.recordid = b.recordid
where b.columnA= 'CompanyA' and b.columnB = 1
Here is your solution,
select a.CardNumber, a.LastName, a.FirstName, a.CustomerID, b.DataItem from
tableA a
inner join tableB b
on (a.RecordID = b.RecordID)
where
b.DataItem='CompanyA'
b.RecordID=1;
Le me know if the result is not as expected
Your question is quite hard to understand, but let me give you an example that resembles the what i think you are asking.
SELECT a.*, b.DataItem FROM A a INNER JOIN B b
ON a.RecordID = b.RecordID AND
b.DataItem = `CompanyA`
At the database engine level, if you are using Microsoft technology, the most efficient structure is to use an indexed foreign key constraint on Table B, and a Primary Surrogate Key (PSK) column on Table A. The Primary Surrogate Key in your case is on the Parent table, Table A, and is called RecordID. The foreign key column with the FKC is on Table B, on the column named RecordID. Once you verify that there is a FKC (foreign key constraint on Table B, which pins both columns named RecordID between both tables on matched values), then address the GUI. At the GUI, between the tabs, you generally indicate you have a parent table with a unique set of Record IDs (one column named Record ID with absolutely unique values in each row and no empty rows on that column). There will also be child tables on each Tab in your GUI, and those are bound to the parent table in a "1 to Many (1:M)" fashion, where 1 parent has many children. Your commentary or question indicates that you also want to filter, where Record ID on the child in one of the related tabs equates to the integer value 1 on the Record ID. So, there needs to be a query somewhere:
SELECT [columns]
FROM [Table B]
INNER JOIN [Table A]
ON A.RecordID = B.RecordID
AND B.RecordID = 1;
Does that help?

Assigning a value from one table to other table

There are two tables Table A and Table B. These contains the same columns cost and item. The Table B contains the list of items and their corresponding costs whereas the Table A contains only the list of items.
Now we need to check the items of Table A, if they are present in the Table B then the corresponging item cost should be assigned to the item's cost in Table A.
Can someone help me out by writing a query for this.
Consider the tables as shown:
Table A:
item cost
-------------
pen null
book null
watch null
Table B:
item cost
-------------
watch 1000
book 50
Expected output
Table A:
item cost
pen 0
book 50
watch 1000
Just add a foreign key (primary key of table A) in the Table B as you can say table A ID then add a join(right join may be) in the query to get or assign the prices respective items.
join be like
SELECT item, cost
FROM tablename a
RIGHT JOIN tablename b ON a.item= b.item;
Edit:
Just edit this table name ,now run it.
I would structure the update like this:
with cost_data as (
select
item,
max (cost) filter (where item = 'watch') as watch,
max (cost) filter (where item = 'book') as book
from table_b
group by item
)
update table_a a
set
watch = c.watch,
book = c.book
from cost_data c
where
a.item = c.item and
(a.watch is distinct from c.watch or
a.book is distinct from c.book)
In essence, I am doing a common table expression to do a poor man's pivot table on the Table B to get the rows into columns. One caveat here -- if there are multiple costs listed for the same item, this may not do what you want, but then you would need to know how to handle that in almost any case.
Then I am doing an "update A from B" against the CTE.
The last part is not critical, per se, but it is helpful -- to limit the query to only execute on rows that need to change. It's best to limit DML if it doesn't need to occur (the best way to optimize something is to not do it).
There are plenty of ways you could do this, if you are taking table b to be the one containing the price then a left outer join would do the trick.
SELECT
table_a.item,
CASE
WHEN table_b.cost IS NULL
THEN 0
ELSE table_b.cost
END as cost
FROM table_a
LEFT OUTER JOIN table_b ON table_a.item = table_b.item
The result also appears to suggest that pen which is not in table b should have a price of 0 (this is bad practice) but for the sake of returning the desired result you will want a case statement to assign a value if it is null.
In order to update the table, as per the comment
update table_a set cost = some_alias.cost
from (
SELECT
table_a.item,
CASE
WHEN table_b.cost IS NULL
THEN 0
ELSE table_b.cost
END as cost
FROM table_a
LEFT OUTER JOIN table_b ON table_a.item = table_b.item
) some_alias
where table_a.item = some_alias.item

Multiple small deletes

I have a PL/SQL script that loops over records of people (~4 million) and executes multiple updates (~100) and a single delete statement (all of these updates and delete are on different tables). The problem I am facing is that the one delete statement takes about half the run time by itself. I understand that when you execute a delete statement, it needs to update the index, but I find it rather ridiculous. I am currently testing this script with one thread using dbms_parallel_execute but I plan to multithread this script.
I am executing a query similar to the following:
DELETE FROM table1 t1
WHERE (t1.key1, t1.key2) IN (SELECT t2.key1, t2.key2
FROM table2 t2
WHERE t2.parm1 = 1234
AND t2.parm2 = 5678).
Following facts:
Table2 (~30 million records) is ~10 times larger than table1 (~3 million records).
There is a primary key on table1(key1, key2)
There is a primary key on table2(key1, key2)
There is an index on table2(parm1, parm2)
I have disabled the foreign key constraint on table1(key1, key2) that references table2(key1, key2)
There are no other constraints on table1, but many more constraints on table2.
All triggers on table1 have been disabled
The explain plan for this query comes up with a cost lower than that of many of my update statements (but I know this doesn't account for much).
Explain plan output:
OPERATION OPTIONS OBJECT_INSTANCE OBJECT_TYPE OPTIMIZER SEARCH_COLUMNS ID PARENT_ID DEPTH POSITION COST CARDINALITY BYTES CPU_COST IO_COST TIME
------------------------------------ ---------------------------------------------------------------------------------------------------- -------------------------------------------- ------------------------------------ ---------------------------------------------------------------------------------------------------- -------------------------------------------- -------------------------------------------- -------------------------------------------- -------------------------------------------- -------------------------------------------- -------------------------------------------- -------------------------------------------- -------------------------------------------- -------------------------------------------- -------------------------------------------- --------------------------------------------
DELETE STATEMENT ALL_ROWS 0 0 5 5 1 36 38043 5 1
DELETE 1 0 1 1
NESTED LOOPS 2 1 2 1 5 1 36 38043 5 1
TABLE ACCESS BY INDEX ROWID 2 TABLE ANALYZED 3 2 3 1 4 1 25 29022 4 1
INDEX RANGE SCAN INDEX ANALYZED 1 4 3 4 1 3 1 21564 3 1
INDEX UNIQUE SCAN INDEX (UNIQUE) ANALYZED 2 5 2 3 2 1 1 11 9021 1 1
I was wondering if there were any way to make this delete go faster. I tried to do a bulk delete but it didn't seem to improve the run time. If there were any way to execute all the deletes and then update the index after, I suspect it would run faster. Obviously doing a create table from a select is out of the picture since I am looping over records (and running through multiple conditions) from another table to do the delete.
Your each delete call, running a query in table 2 on 30m records, which definitely degrade performance and may also create locking issue, which in turn again slow down the query.
I suggest to move out inline query which is selecting data from table2. Table2 should be driving the delete and have delete candidate records. It can run as a cursor or place this data in temporary table. Let delete be executed in chunk of 500 , 1000 and followed by commit. This chunk can be optimized based on results.
Index update during delete is not redundant, if this process is running in non working hours, you may disable index and recreate again..
I think so if the outer query is "small" and the inner query is "big" -- a WHERE EXISTS can be quite efficient.
Try where exists clause instead of In clause then check for the explain plan and the performance.
DELETE FROM table1 t1
WHERE
Exists (select 1
FROM table2 t2
WHERE t2.parm1 = 1234
AND t2.parm2 = 5678
AND t2.key1 = t1.key1
AND t2.key2 = t1.key2
)

how to calculate and store "groupings" in sql

This is a SQL conceptual question.
Start with "Table 1" with a large number of records and a primary key.
Add a cross reference table called "Table 2" which holds key pairs from Table 1. Each key pair means that two records should be in the same group.
How do you quickly calculate those groups assuming a large number of records?
Example:
Table1
ID other data
-- ----------
A ...
B ...
C ...
D ...
E ...
F ...
Table 2
ID1 ID2
A B aka: A is equivalent to B. not a parent/child relationship
B C
D E
Final Result
ID Group
-- -----
A 1 A, B, & C are in a group
B 1
C 1
D 2 D & E are in a group
E 2
F 3 F is in a group by itself
Keep in mind that there are a large number of records. Fast processing is desirable. I'm not looking for someone to create something from scratch, but tell me if there is already a established technique for doing this sort of thing. I've already written something myself, but it seems overly complex.
Note: edited for clarification with regard to answer by Paul. Table 2 is not a parent / child relationship. Its a relationship of equivalence.
If the data in table 2 can be considered as 'parent-child' relationships (with id1 as the 'parent' and id2 as the 'child') then the result that you want can be achieved using the T-SQL below. The key assumption being made here is that ids that don't appear in the 'child' column (id2) in table2 can be used as the root elements in a group.
with groups(parent, child) as
(
select t1.id as parent, t1.id as child
from dbo.Table1 as t1
where not exists
(
select 1
from dbo.Table2 as t2
where t2.id2 = t1.id
)
union all
select g.parent, t2.id2
from dbo.table2 as t2
inner join groups as g
on g.child = t2.id1
)
select g.child as id, DENSE_RANK() over (order by g.parent) as grp
from groups as g
order by g.child
This may or may not be of any use to you, but you seem to have modeled yourself into a tricky situation. Which means, of course, that you can model your way out of it. Here are a couple of suggestions.
Since if A=B and B=C then A=C, you could enter the data in Table2 as follows. This has the advantage of leaving the structure of Table2 as it is, but it still leads to a moderately tricky query. And it severely complicates some actions such as moving A to a different group.
ID1 ID2
A A
A B
A C
D D
D E
F F
Or, if you don't mind making a slight change to Table2, the data could look like this.
GROUP ID
1 A
1 B
1 C
2 D
2 E
3 F
The advantage of this design, in case you haven't noticed already, is that it is amazingly similar to the output you wanted in the first place -- making for a correspondingly simple query. But if you work it out, you will see that the code to maintain this table will also be simple. You can easily insert a new ID as either a member of group 3 or as member of the group containing F. You can move any ID from one group to another, merge groups, split groups or even enter an ID into multiple groups (if that is allowed).
Good modeling can eliminate a lot of atrocious code.