Select and update on the rows in a table by using joins multiple table in SQL Server - sql

I am trying to update 1.2 million rows in a table that had data inserted kind of incorrectly via legacy application. I am not very good at writing efficient SQL queries as I am experiencing these sort of larger set of data for the first time.
I have written query as below and it's taking a very long time to run this query. I have commented out my update logic in the statement.
SELECT T1.Old_id,
T1. Report_id,
T2.New_id /* update a set file_id = T2.new_id*/
FROM
(SELECT A.File_id AS Old_id,
A.Id AS Report_id,
A.User_id AS USER
FROM A
INNER JOIN B ON A.Id = B.A_id
INNER JOIN C ON B.Id = C. B_id
INNER JOIN D ON C.Id = D.C_id
INNER JOIN E ON D.Id = E.D_id
WHERE E.Name = 'student_report') AS T1
LEFT JOIN
(SELECT Max(C.Report_id) AS New_id,
C.Created_by AS User_id
FROM C
INNER JOIN D ON C.Id = D.C_id
INNER JOIN E ON D.Id = E.D_id
WHERE E.Name = 'teacher_report'
GROUP BY C.Created_by) ON T1.User_id = T2.User_id /* where a.id = T1.report_id*/
I need to update the file_id in table a by the report_id of c. With a small set of data, the select query works fine and gives the result as intended. But on the server where it has 1.2 million rows, it takes extremely long time.
Is there a way we could put those two sub-queries into one and make it work for 'update' as well? Because, update also fails as it has 'group by' on the second sub-query.

Main problem is using Subquery in join condition.
Second problem,when same resultset is to be use multiple time then you should put common resultset in CTE or #temp table.
create table #temp(B_id int,cTeport_id int,cUserID int,EName varchar(100))
insert into #temp
select B_id,C.Report_id,C.Created_by,E.Name
INNER JOIN C ON B.Id = C. B_id
INNER JOIN D ON C.Id = D.C_id
INNER JOIN E ON D.Id = E.D_id
WHERE E.Name in( 'teacher_report','student_report')
;With CTE as
(
SELECT Max(C.Report_id) AS New_id,
C.Created_by AS User_id
FROM #temp c
WHERE c.Name = 'teacher_report'
GROUP BY C.Created_by
)
SELECT T1.Old_id,
T1. Report_id,
T2.New_id /* update a set file_id = T2.new_id*/
FROM
(SELECT A.File_id AS Old_id,
A.Id AS Report_id,
A.User_id AS USER
FROM A
INNER JOIN B ON A.Id = B.A_id
INNER JOIN #temp t ON B.Id = t. B_id
WHERE t.Name = 'student_report'
and exists(select 1 from cte t1 T1.User_id = T.User_id)
My script is not Tested so you can fix any minor bug if any.
In Temp table carefully define all columns which is require for this query along with their datatype.

Please analyze the Query cost by using Execution Plan. Check the table which is making delay then check proper Indexing used for that particular table or not.

Related

Pull columns from series of joins in SQL

I am kind of stuck at one problem at my job where I need to pull 2 cols from base table and 1 column from a series of joins.
Please note that, I can not provide real data so I am using dummy column/table names and there are 100s of columns in real project.
Select A.Name,B.Age, D.Sal
From A Left join B on A.iD=B.id and B.Date=CURRENT_DATE
(/* join A and B table and return distinct column which is B.XYZ)
inner join C on C.iD=B.XYZ
(/* join B and C take C.YYY column for next join */)
inner join D on D.id=C.YYY
(/* Take out the D.Sal column from this join */) where A.Dept='IT'
I have written this query but it is taking forever to run because B.XYZ column has a lot of duplicates. how can I get distinct of B.XYZ column from that join.
For Joining Table B, you first get a distinct table of the columns you need from B then join.
SELECT
A.Name,
B.Age,
D.Sal
From A
LEFT JOIN ( -- Instead of all cols (*), just id, Date, Age and xyz might do
SELECT DISTINCT * FROM B
) B ON A.iD = B.id AND B.Date = CURRENT_DATE
--(/* join A and B table and return distinct column which is B.XYZ */)
INNER JOIN C ON C.iD = B.XYZ
--(/*join B and C take C.YYY column for next join */)
INNER JOIN D ON D.id = C.YYY
--(/* Take out the D.Sal column from this join */)
WHERE A.Dept='IT'
You say you get the same rows multifold, because for a b.id, date and age you get the same xyz more than once, or so I understand it.
One option is to join with a subquery that gets the distinct data:
SELECT a.name, b.age, d.sal
FROM a
LEFT JOIN
(
SELECT DISTINCT id, date, age, xyz FROM b
) dist_b ON dist_b.id = a.id and dist_b.date = CURRENT_DATE
INNER JOIN c ON c.id = dist_b.xyz
INNER JOIN d ON d.id = c.yyy
WHERE a.dept = 'IT';
Of course you can even move the date condition inside the subquery:
SELECT a.name, b.age, d.sal
FROM a
LEFT JOIN
(
SELECT DISTINCT id, age, xyz FROM b WHERE date = CURRENT_DATE
) dist_b ON dist_b.id = a.id
INNER JOIN c ON c.id = dist_b.xyz
INNER JOIN d ON d.id = c.yyy
WHERE a.dept = 'IT';
Your LEFT OUTER JOIN doesn't work by the way. As you are inner joining the following tables, a match must exists, so your outer join becomes an inner join. For the outer join to work you would have to outer join the following tables, too.

When to open and close brackets surrounding joins in MS Access SQL

I want to understand when to open and close brackets when representing joins in MS Access queries as I am developing a query builder using C++ for MS Access queries so that I can apply the same code to generate similar queries.
SELECT
MasterTool.Name, Toolsets.SlaveToolID, Tools.MachineID
FROM
Tools AS MasterTool
LEFT JOIN
(
Toolsets LEFT JOIN Tools ON Toolsets.SlaveToolID = Tools.ID
)
ON MasterTool.ID = Toolsets.MasterToolID
Edit:
#LeeMac as per your explaination when i modified the query which i presented earlier to this
SELECT Tools.Name, Toolsets.SlaveToolID, Tools.MachineID FROM (Tools
LEFT JOIN Toolsets ON Toolsets.SlaveToolID = Tools.ID )
LEFT JOIN Tools ON Toolsets.MasterToolID = Tools.ID
i am getting error Join Expression Not Supported is there is any simple way to write the above query.
Essentially, when an MS Access query references more than two tables, every successive join between a pair of tables should be nested within parentheses.
For example, a query with two tables requires no parentheses:
select *
from a inner join b on a.id = b.id
The addition of a third joined table necessitates parentheses surrounding the original join in order to distinguish it from the additional join:
select *
from
(
a inner join b on a.id = b.id
)
inner join c on a.id = c.id
Every successive addition of a table will then cause the existing joins to be nested within another level of parentheses:
select *
from
(
(
a inner join b on a.id = b.id
)
inner join c on a.id = c.id
)
inner join d on a.id = d.id
Hence, in general:
select *
from
(
(
(
(
table1 [inner/left/right] join table2 on [conditions]
)
[inner/left/right] join table3 on [conditions]
)
[inner/left/right] join table4 on [conditions]
)
...
)
[inner/left/right] join tableN on [conditions]
There is a subtlety where LEFT/RIGHT joins are concerned, in that the order of nesting must maintain the direction of the join, for example:
select *
from
(
c left join b on c.id = b.id
)
left join a on a.id = b.id
Could be permuted to:
select *
from
c left join
(
b left join a on b.id = a.id
)
on c.id = b.id

Sql NOT IN optimization

I'm having trouble optimizing a query. Here are two example tables I am working with:
Table 1:
UID
A
B
Table 2:
UID Parent
A 2
B 2
C 3
D 2
E 3
F 2
Here is what I am doing now:
Select Table1.UID
FROM Table1 R
INNER JOIN Table2 T ON
R.UID = T.UID
INNER JOIN Table2 E ON
T.PARENT = E.PARENT
AND E.UID NOT IN (SELECT UID FROM Table1)
I'm trying to avoid using the NOT IN clause because of obvious hindrances in performance for large numbers of records.
I know the typical ways to avoid NOT IN clauses like the LEFT JOIN where the other table is null, but can't seem to get what I want with all of the other Joins going on.
I will continue working and post if I find a solution.
EDIT: Here is what I am trying to end up with
After the first Inner Join I would have
A
B
AFter the second Inner join I would have:
A D
A F
B D
B F
The second column above is just to represent that it is matching to the other UIDs with the same parent, but I still need the As and Bs as the UID.
EDIT: RDBMS is SQL server 2005, 2008r2, 2012
Table1 is declared in the query with no index
DECLARE #Table1 TABLE ( [UNIQUE_ID] INT PRIMARY KEY )
Table2 has a clustered index on Unique ID
The general approach to this is to use a LEFT JOIN with a where clause that only selects the non-matching rows:
Select Table1.UID
FROM Table1 R
JOIN Table2 T ON R.UID = T.UID
JOIN Table2 E ON T.PARENT = E.PARENT
LEFT JOIN Table3 E2 ON E.UID = R.UID
WHERE E2.UID IS NULL
SELECT Table2.*
FROM Table2
INNER JOIN (
SELECT id FROM Table2
EXCEPT
SELECT id FROM Table1
) AS Filter ON (Table2.id = Filter.id)

Using Join based on condition

Can anyone please explain me how can we use join on the basis of condition.
Lets say i am filtering data on the basis of a condition now my concern is if a particular BIT type parameters value is 1 then the data set include one more join else return same as earlier.
Here is three tables A,B,C
now i want to make a proc which has the #bool bit parameter
if #bool=0
then
select A.* from A
inner join B on B.id=A.id
and if #bool=1
then
select A.* from A
INNER JOIN B on B.id=A.id
inner join C on C.id=A.id
Thanks In Advance.
What you have will work (certainly in a SPROC in MS SQL Server anyway) with minor mods.
if #bool=0 then
select A.* from A
inner join B on B.id=A.id
else if #bool=1 then -- Or just else if #boll is limited to [0,1]
select A.* from A
INNER JOIN B on B.id=A.id
inner join C on C.id=A.id
However, the caveat is that SQL parameter sniffing will cache a plan for the first path it goes down, which won't necessarily be optimal for other paths through your code.
Also, if you do take this 'multiple alternative query' approach to your procs, it is generally a good idea to ensure that the column names and types returned are identitical in all cases (Your query is fine because it is A.*).
Edit
Assuming that you are using SQL Server, an alternative is to use dynamic sql:
DECLARE #sql NVARCHAR(MAX)
SET #sql = N'select A.* from A
inner join B on B.id=A.id'
IF #bool = 1
SET #sql = #sql + N' inner join C on C.id=A.id'
sp_executesql #sql
If you need to add filters etc, have a look at this post: Add WHERE clauses to SQL dynamically / programmatically
select A.* from A
inner join B on B.id = A.id
left outer join C on C.id = A.id and #bool = 1
where (#bool = 1 and C.id is not null) or #bool = 0
The #bool = 1 "activates" the left outer join, so to speak, and turns it, in effect, into an inner join by applying it in the WHERE clause, too. If #bool = 0 then the left outer join returns nothing from C and removes the WHERE restriction.
Try the following query
SELECT A.*
FROM A
INNER JOIN B on B.id=A.id
INNER JOIN C on C.id=A.id and #bool=1
You do it using a union:
SELECT A.*
FROM A
INNER JOIN B on B.id=A.id
WHERE bool = 0
UNION ALL
SELECT A.*
FROM A
INNER JOIN B on B.id=A.id
INNER JOIN C on C.id=A.id
WHERE bool = 1
I'm assuming that bool is stored in table A or B.

How to update with inner join in Oracle

Could someone please verify whether inner join is valid with UPDATE statment in PL SQL?
e.g.
Update table t
set t.value='value'
from tableb b inner join
on t.id=b.id
inner join tablec c on
c.id=b.id
inner join tabled d on
d.id=c.id
where d.key=1
This synthax won't work in Oracle SQL.
In Oracle you can update a join if the tables are "key-preserved", ie:
UPDATE (SELECT a.val_a, b.val_b
FROM table a
JOIN table b ON a.b_pk = b.b_pk)
SET val_a = val_b
Assuming that b_pk is the primary key of b, here the join is updateable because for each row of A there is at most one row from B, therefore the update is deterministic.
In your case since the updated value doesn't depend upon another table you could use a simple update with an EXIST condition, something like this:
UPDATE mytable t
SET t.VALUE = 'value'
WHERE EXISTS
(SELECT NULL
FROM tableb b
INNER JOIN tablec c ON c.id = b.id
INNER JOIN tabled d ON d.id = c.id
WHERE t.id = b.id
AND d.key = 1)
update t T
set T.value = 'value'
where T.id in (select id from t T2, b B, c C, d D
where T2.id=B.id and B.id=C.id and C.id=D.id and D.key=1)
-- t is the table name, T is the variable used to reffer to this table