I have a query against a table that contains like 2 million rows using linked server.
Select * from OPENQUERY(LinkedServerName,
'SELECT
PV.col1
,PV.col2
,PV.col3
,VTR.col1
,CTR.col1
,PSR.col1
FROM
LinkedDbName.dbo.tbl1 PV
INNER JOIN LinkedDbName.dbo.tbl2 VTR
ON PV.col_id = VTR.col_id
INNER JOIN LinkedDbName.dbo.tbl3 CTR
ON PV.col_id = CTR.col_id
INNER JOIN LinkedDbName.dbo.tbl4 PSR
ON PV.col_id = PSR.col_id
WHERE
PV.col_id = ''80C53C9B-6272-11DA-BB34-000E0C7F3ED2''')
That query results into 365 rows and is executed within 0 second.
However when I make that query into a view it runs for about minimum of 20 seconds and sometimes it reaches to 40 seconds tops.
Here's my create view script
CREATE VIEW [dbo].[myview]
AS
Select * from OPENQUERY(LinkedServerName,
'SELECT
PV.col1
,PV.col2
,PV.col3
,VTR.col1
,CTR.col1
,PSR.col1
FROM
LinkedDbName.dbo.tbl1 PV
INNER JOIN LinkedDbName.dbo.tbl2 VTR
ON PV.col_id = VTR.col_id
INNER JOIN LinkedDbName.dbo.tbl3 CTR
ON PV.col_id = CTR.col_id
INNER JOIN LinkedDbName.dbo.tbl4 PSR
ON PV.col_id = PSR.col_id')
then
Select * from myview where PV.col_id = '80C53C9B-6272-11DA-BB34-000E0C7F3ED2'
Any idea ? Thanks !
Your queries are quite different. In the first, the where clause is part of the SQL statement passed to OPENQUERY(). This has two important effects:
The amount of data returned is much smaller, only being the rows that match the condition.
The query can be optimized with the WHERE clause.
If you need to share the table, I might suggest that you make a copy on the local server -- either using replication or scheduling a job to copy it over.
So I am trying to optimize a bunch of queries which are taking a lot of time. What I am trying to figure out is how to create an index on columns from different tables.
Here is a simple version of my problem.
What I did
After Googling I looked into bitmap index but I am not sure if this is the right way to solve the issue
Issue
There is a many to many relationship b/w Student(sid,...) and Report(rid, year, isdeleted)
StudentReport(id, sid, rid) is the join table
Query
Select *
from Report
inner join StudentReport on Report.rid = StudentReport.rid
where Report.isdeleted = 0 and StudentReport.sid = x and Report.year = y
What is the best way to create an index?
Please try this:
with TMP_REP AS (
Select * from Report where Report.isdeleted = 0 AND Report.year = y
)
,TMP_ST_REP AS(
Select *
from StudentReport where StudentReport.sid = x
)
SELECT * FROM TMP_REP R, TMP_ST_REP S WHERE S.rid = R.rid
I'm trying to using the aggregation features of the django ORM to run a query on a MSSQL 2008R2 database, but I keep getting a timeout error. The query (generated by django) which fails is below. I've tried running it directs the SQL management studio and it works, but takes 3.5 min
It does look it's aggregating over a bunch of fields which it doesn't need to, but I wouldn't have though that should really cause it to take that long. The database isn't that big either, auth_user has 9 records, ticket_ticket has 1210, and ticket_watchers has 1876. Is there something I'm missing?
SELECT
[auth_user].[id],
[auth_user].[password],
[auth_user].[last_login],
[auth_user].[is_superuser],
[auth_user].[username],
[auth_user].[first_name],
[auth_user].[last_name],
[auth_user].[email],
[auth_user].[is_staff],
[auth_user].[is_active],
[auth_user].[date_joined],
COUNT([tickets_ticket].[id]) AS [tickets_captured__count],
COUNT(T3.[id]) AS [assigned_tickets__count],
COUNT([tickets_ticket_watchers].[ticket_id]) AS [tickets_watched__count]
FROM
[auth_user]
LEFT OUTER JOIN [tickets_ticket] ON ([auth_user].[id] = [tickets_ticket].[capturer_id])
LEFT OUTER JOIN [tickets_ticket] T3 ON ([auth_user].[id] = T3.[responsible_id])
LEFT OUTER JOIN [tickets_ticket_watchers] ON ([auth_user].[id] = [tickets_ticket_watchers].[user_id])
GROUP BY
[auth_user].[id],
[auth_user].[password],
[auth_user].[last_login],
[auth_user].[is_superuser],
[auth_user].[username],
[auth_user].[first_name],
[auth_user].[last_name],
[auth_user].[email],
[auth_user].[is_staff],
[auth_user].[is_active],
[auth_user].[date_joined]
HAVING
(COUNT([tickets_ticket].[id]) > 0 OR COUNT(T3.[id]) > 0 )
EDIT:
Here are the relevant indexes (excluding those not used in the query):
auth_user.id (PK)
auth_user.username (Unique)
tickets_ticket.id (PK)
tickets_ticket.capturer_id
tickets_ticket.responsible_id
tickets_ticket_watchers.id (PK)
tickets_ticket_watchers.user_id
tickets_ticket_watchers.ticket_id
EDIT 2:
After a bit of experimentation, I've found that the following query is the smallest that results in the slow execution:
SELECT
COUNT([tickets_ticket].[id]) AS [tickets_captured__count],
COUNT(T3.[id]) AS [assigned_tickets__count],
COUNT([tickets_ticket_watchers].[ticket_id]) AS [tickets_watched__count]
FROM
[auth_user]
LEFT OUTER JOIN [tickets_ticket] ON ([auth_user].[id] = [tickets_ticket].[capturer_id])
LEFT OUTER JOIN [tickets_ticket] T3 ON ([auth_user].[id] = T3.[responsible_id])
LEFT OUTER JOIN [tickets_ticket_watchers] ON ([auth_user].[id] = [tickets_ticket_watchers].[user_id])
GROUP BY
[auth_user].[id]
The weird thing is that if I comment out any two lines in the above, it runs in less that 1s, but it doesn't seem to matter which lines I remove (although obviously I can't remove a join without also removing the relevant SELECT line).
EDIT 3:
The python code which generated this is:
User.objects.annotate(
Count('tickets_captured'),
Count('assigned_tickets'),
Count('tickets_watched')
)
A look at the execution plan shows that SQL Server is first doing a cross-join on all the table, resulting in about 280 million rows, and 6Gb of data. I assume that this is where the problem lies, but why is it happening?
SQL Server is doing exactly what it was asked to do. Unfortunately, Django is not generating the right query for what you want. It looks like you need to count distinct, instead of just count: Django annotate() multiple times causes wrong answers
As for why the query works that way: The query says to join the four tables together. So say an author has 2 captured tickets, 3 assigned tickets, and 4 watched tickets, the join will return 2*3*4 tickets, one for each combination of tickets. The distinct part will remove all the duplicates.
what about this?
SELECT auth_user.*,
C1.tickets_captured__count
C2.assigned_tickets__count
C3.tickets_watched__count
FROM
auth_user
LEFT JOIN
( SELECT capturer_id, COUNT(*) AS tickets_captured__count
FROM tickets_ticket GROUP BY capturer_id ) AS C1 ON auth_user.id = C1.capturer_id
LEFT JOIN
( SELECT responsible_id, COUNT(*) AS assigned_tickets__count
FROM tickets_ticket GROUP BY responsible_id ) AS C2 ON auth_user.id = C2.responsible_id
LEFT JOIN
( SELECT user_id, COUNT(*) AS tickets_watched__count
FROM tickets_ticket_watchers GROUP BY user_id ) AS C3 ON auth_user.id = C3.user_id
WHERE C1.tickets_captured__count > 0 OR C2.assigned_tickets__count > 0
--WHERE C1.tickets_captured__count is not null OR C2.assigned_tickets__count is not null -- also works (I think with beter performance)
The below query that I'm executing through SQL Server Management Studio is painfully slow.
The input table tbl_sb12_bhs has about 40000 records and after an hour only 40 records are processed.
What can be changed here to make this run a bit faster?
DECLARE #bsrange INT
SET #bsrange = 0
WHILE #bsrange <= (SELECT max([p_a_l_out])
FROM [DB001].[FD\f7].[tbl_sb12_bhs])
BEGIN
INSERT INTO [FD\f7].tbl_sb13_b_lin1
(aId,
p_a_l_out,
bs_id,
bs_db,
bs_tbl,
bs_column,
Int1,
cd1,
Hop1,
Int2,
cd2,
Hop2,
Int3,
cd3,
Hop3,
Int4,
cd4,
Hop4,
Int5,
cd5,
Hop5,
Int6,
cd6,
Hop6,
Int7,
cd7,
Hop7,
Int8,
cd8,
Hop8,
Int9,
cd9,
Hop9,
Int10,
cd10,
Hop10,
Int11,
cd11,
Hop11,
Int12,
cd12,
Hop12,
Int13,
cd13,
Hop13,
Int14,
cd14,
Hop14,
Int15,
cd15,
Hop15,
Int16,
cd16,
Hop16)
SELECT DISTINCT tbl_sb12_bhs.aId,
tbl_sb12_bhs.p_a_l_out,
tbl_sb12_bhs.bs_id,
tbl_sb12_bhs.bs_db,
tbl_sb12_bhs.bs_tbl,
tbl_sb12_bhs.bs_column,
tbl_rpt_val_pt_crl.pt_el_Int AS Int1,
tbl_rpt_val_pt_crl.user_cd AS cd1,
tbl_rpt_val_pt_crl.cfk_upel AS Hop1,
tbl_rpt_val_pt_crl_1.pt_el_Int AS Int2,
tbl_rpt_val_pt_crl_1.user_cd AS cd2,
tbl_rpt_val_pt_crl_1.cfk_upel AS Hop2,
tbl_rpt_val_pt_crl_2.pt_el_Int AS Int3,
tbl_rpt_val_pt_crl_2.user_cd AS cd3,
tbl_rpt_val_pt_crl_2.cfk_upel AS Hop3,
tbl_rpt_val_pt_crl_3.pt_el_Int AS Int4,
tbl_rpt_val_pt_crl_3.user_cd AS cd4,
tbl_rpt_val_pt_crl_3.cfk_upel AS Hop4,
tbl_rpt_val_pt_crl_4.pt_el_Int AS Int5,
tbl_rpt_val_pt_crl_4.user_cd AS cd5,
tbl_rpt_val_pt_crl_4.cfk_upel AS Hop5,
tbl_rpt_val_pt_crl_5.pt_el_Int AS Int6,
tbl_rpt_val_pt_crl_5.user_cd AS cd6,
tbl_rpt_val_pt_crl_5.cfk_upel AS Hop6,
tbl_rpt_val_pt_crl_6.pt_el_Int AS Int7,
tbl_rpt_val_pt_crl_6.user_cd AS cd7,
tbl_rpt_val_pt_crl_6.cfk_upel AS Hop7,
tbl_rpt_val_pt_crl_7.pt_el_Int AS Int8,
tbl_rpt_val_pt_crl_7.user_cd AS cd8,
tbl_rpt_val_pt_crl_7.cfk_upel AS Hop8,
tbl_rpt_val_pt_crl_8.pt_el_Int AS Int9,
tbl_rpt_val_pt_crl_8.user_cd AS cd9,
tbl_rpt_val_pt_crl_8.cfk_upel AS Hop9,
tbl_rpt_val_pt_crl_9.pt_el_Int AS Int10,
tbl_rpt_val_pt_crl_9.user_cd AS cd10,
tbl_rpt_val_pt_crl_9.cfk_upel AS Hop10,
tbl_rpt_val_pt_crl_10.pt_el_Int AS Int11,
tbl_rpt_val_pt_crl_10.user_cd AS cd11,
tbl_rpt_val_pt_crl_10.cfk_upel AS Hop11,
tbl_rpt_val_pt_crl_11.pt_el_Int AS Int12,
tbl_rpt_val_pt_crl_11.user_cd AS cd12,
tbl_rpt_val_pt_crl_11.cfk_upel AS Hop12,
tbl_rpt_val_pt_crl_12.pt_el_Int AS Int13,
tbl_rpt_val_pt_crl_12.user_cd AS cd13,
tbl_rpt_val_pt_crl_12.cfk_upel AS Hop13,
tbl_rpt_val_pt_crl_13.pt_el_Int AS Int14,
tbl_rpt_val_pt_crl_13.user_cd AS cd14,
tbl_rpt_val_pt_crl_13.cfk_upel AS Hop14,
tbl_rpt_val_pt_crl_14.pt_el_Int AS Int15,
tbl_rpt_val_pt_crl_14.user_cd AS cd15,
tbl_rpt_val_pt_crl_14.cfk_upel AS Hop15,
tbl_rpt_val_pt_crl_15.pt_el_Int AS Int16,
tbl_rpt_val_pt_crl_15.user_cd AS cd16,
tbl_rpt_val_pt_crl_15.cfk_upel AS Hop16
FROM (SELECT DISTINCT pk_a AS aId,
p_a_l_out,
bs_id,
bs_db,
bs_tbl,
bs_column,
hop_pt_id_1,
hop_pt_id_2,
hop_pt_id_3,
hop_pt_id_4,
hop_pt_id_5,
hop_pt_id_6,
hop_pt_id_7,
hop_pt_id_8,
hop_pt_id_9,
hop_pt_id_10,
hop_pt_id_11,
hop_pt_id_12,
hop_pt_id_13,
hop_pt_id_14,
hop_pt_id_15,
hop_pt_id_16
FROM [FD\f7].tbl_sb12_bhs
WHERE [p_a_l_out] >= #bsrange
AND [p_a_l_out] < ( #bsrange + 1 )) AS tbl_sb12_bhs
LEFT JOIN tbl_rpt_val_pt_crl
ON tbl_sb12_bhs.hop_pt_id_1 = tbl_rpt_val_pt_crl.sk_el_pt
LEFT JOIN tbl_rpt_val_pt_crl AS tbl_rpt_val_pt_crl_1
ON tbl_sb12_bhs.hop_pt_id_2 = tbl_rpt_val_pt_crl_1.sk_el_pt
LEFT JOIN tbl_rpt_val_pt_crl AS tbl_rpt_val_pt_crl_2
ON tbl_sb12_bhs.hop_pt_id_3 = tbl_rpt_val_pt_crl_2.sk_el_pt
LEFT JOIN tbl_rpt_val_pt_crl AS tbl_rpt_val_pt_crl_3
ON tbl_sb12_bhs.hop_pt_id_4 = tbl_rpt_val_pt_crl_3.sk_el_pt
LEFT JOIN tbl_rpt_val_pt_crl AS tbl_rpt_val_pt_crl_4
ON tbl_sb12_bhs.hop_pt_id_5 = tbl_rpt_val_pt_crl_4.sk_el_pt
LEFT JOIN tbl_rpt_val_pt_crl AS tbl_rpt_val_pt_crl_5
ON tbl_sb12_bhs.hop_pt_id_6 = tbl_rpt_val_pt_crl_5.sk_el_pt
LEFT JOIN tbl_rpt_val_pt_crl AS tbl_rpt_val_pt_crl_6
ON tbl_sb12_bhs.hop_pt_id_7 = tbl_rpt_val_pt_crl_6.sk_el_pt
LEFT JOIN tbl_rpt_val_pt_crl AS tbl_rpt_val_pt_crl_7
ON tbl_sb12_bhs.hop_pt_id_8 = tbl_rpt_val_pt_crl_7.sk_el_pt
LEFT JOIN tbl_rpt_val_pt_crl AS tbl_rpt_val_pt_crl_8
ON tbl_sb12_bhs.hop_pt_id_9 = tbl_rpt_val_pt_crl_8.sk_el_pt
LEFT JOIN tbl_rpt_val_pt_crl AS tbl_rpt_val_pt_crl_9
ON tbl_sb12_bhs.hop_pt_id_10 = tbl_rpt_val_pt_crl_9.sk_el_pt
LEFT JOIN tbl_rpt_val_pt_crl AS tbl_rpt_val_pt_crl_10
ON tbl_sb12_bhs.hop_pt_id_11 = tbl_rpt_val_pt_crl_10.sk_el_pt
LEFT JOIN tbl_rpt_val_pt_crl AS tbl_rpt_val_pt_crl_11
ON tbl_sb12_bhs.hop_pt_id_12 = tbl_rpt_val_pt_crl_11.sk_el_pt
LEFT JOIN tbl_rpt_val_pt_crl AS tbl_rpt_val_pt_crl_12
ON tbl_sb12_bhs.hop_pt_id_13 = tbl_rpt_val_pt_crl_12.sk_el_pt
LEFT JOIN tbl_rpt_val_pt_crl AS tbl_rpt_val_pt_crl_13
ON tbl_sb12_bhs.hop_pt_id_14 = tbl_rpt_val_pt_crl_13.sk_el_pt
LEFT JOIN tbl_rpt_val_pt_crl AS tbl_rpt_val_pt_crl_14
ON tbl_sb12_bhs.hop_pt_id_15 = tbl_rpt_val_pt_crl_14.sk_el_pt
LEFT JOIN tbl_rpt_val_pt_crl AS tbl_rpt_val_pt_crl_15
ON tbl_sb12_bhs.hop_pt_id_16 = tbl_rpt_val_pt_crl_15.sk_el_pt
SET #bsrange = #bsrange + 1
END
My best guess is that it's slow because you're doing a number of intensive operations all in one go. Without any sample data it's tough, but I can try to make a few suggestions.
From what you said about it only processing 40 records after an hour, it's what's going on inside the loop that's slowing you down.
SELECT DISTINCT isn't cheap because it has to compare all the data, and you're comparing quite a lot of columns as well. If you can, it might run quicker if you limit the number of columns to the bare minimum required for a distinct selection then self joining that to the original table. It should be simple enough to test in isolation to the rest of it to make sure you're getting the same results and whether or not it's quicker.
Also the more joins you have, the worse the performance is in general... the price we pay for normalisation.
Anyway, I would take a step back from it and try to break this down into its smallest units of work and then you can test each one individually until you find the culprit. In doing so, you might think of a much better way to do this. Again, without any sample data this is a difficult one for me to help with.
Well if you have an index or indexes on the target then SQL will reindex every row. I'd disable any indexes on the target table and then renable them when the insert is complete. Id batch the inserts inro ranges of (say) 5k records so any blocking will be reduced, or I'd create a temp file as a result of the select and bcp in the results. Because your doing that horrendous set of left joins each time prior to one record insert. SQL just cant optimise more than about 7 or 8 left or right joins. My guess is that there are little or no indexes on the table being inserted from which means a table scan on for each join or around 17 tables scans for each one row inserted. Sorry but this approach is wrong at every stage. Or you could get you boss to buy you a datecentre....
Another thing you might want to do is do the join initially into a temp table and then reference that. You would not have to do the distinct or joins every time. Just add the where clause for the bsrange.
So it would be something like:
Create temporary table with as much of the joins/distinct as you can.
while.....
insert into [FD\f7].tbl_sb13_b_lin1
select * from temptable where [p_a_l_out] >= #bsrange
AND [p_a_l_out] < ( #bsrange + 1 )
I`m working on some sql queries to get some data out of a table; I have made 2 queries for the
same data but both give another result. The 2 queries are:
SELECT Samples.Sample,
data_overview.Sample_Name,
data_overview.Sample_Group,
data_overview.NorTum,
data_overview.Sample_Plate,
data_overview.Sentrix_ID,
data_overview.Sentrix_Position,
data_overview.HybNR,
data_overview.Pool_ID
FROM tissue INNER JOIN (
( patient INNER JOIN data_overview
ON patient.Sample = data_overview.Sample)
INNER JOIN Samples ON
(data_overview.Sample_id = Samples.Sample_id) AND
(patient.Sample = Samples.Sample)
) ON
(tissue.Sample_Name = data_overview.Sample_Name) AND
(tissue.Sample_Name = patient.Sample_Name)
WHERE data_overview.Sentrix_ID= 1416198
OR data_overview.Pool_ID='GS0005701-OPA'
OR data_overview.Pool_ID='GS0005702-OPA'
OR data_overview.Pool_ID='GS0005703-OPA'
OR data_overview.Pool_ID='GS0005704-OPA'
OR data_overview.Sentrix_ID= 1280307
ORDER BY Samples.Sample;")
And the other is
SELECT Samples.Sample,
data_overview.Sample_Name,
data_overview.Sample_Group,
data_overview.NorTum,
data_overview.Sample_Plate,
data_overview.Sentrix_ID,
data_overview.Sentrix_Position,
data_overview.HybNR,
data_overview.Pool_ID
FROM tissue INNER JOIN
(
(patient INNER JOIN data_overview
ON patient.Sample = data_overview.Sample)
INNER JOIN Samples ON
(data_overview.Sample_id = Samples.Sample_id)
AND (patient.Sample = Samples.Sample)) ON
(tissue.Sample_Name = data_overview.Sample_Name)
AND (tissue.Sample_Name = patient.Sample_Name)
WHERE ((
(data_overview.Sentrix_ID)=1280307)
AND (
(data_overview.Pool_ID)="GS0005701-OPA"
OR (data_overview.Pool_ID)="GS0005702-OPA"
OR (data_overview.Pool_ID)="GS0005703-OPA"
OR (data_overview.Pool_ID)="GS0005704-OPA"))
OR (((data_overview.Sentrix_ID)=1416198))
ORDER BY data_overview.Sample;
The one in the top is working quite well but it still won't filter the sentrix_ID.
The second 1 is created with Access but when I try to run this Query in R it gave
a unexpected symbol error. So if anyone knows how to create a query that filter POOL_ID and Sentrix_id with the given parameters thanks in advance
Is it a case of making the where clause something like this:
WHERE Sentrix_ID = 1280307 AND (Pool_ID = 'VAL1' OR Pool_ID = 'VAL2' OR Pool_ID = 'VAL3')
i.e. making sure you have brackets around the "OR" components?
Maybe you meant:
...
WHERE data_overview.Sentrix_ID IN (1280307,1416198 )
AND data_overview.Pool_ID IN ("GS0005701-OPA", "GS0005702-OPA", "GS0005703-OPA" ,"GS0005704-OPA")
;