Optimize SQL query with many left join - sql

I have a SQL query with many left joins
SELECT COUNT(DISTINCT po.o_id)
FROM T_PROPOSAL_INFO po
LEFT JOIN T_PLAN_TYPE tp ON tp.plan_type_id = po.Plan_Type_Fk
LEFT JOIN T_PRODUCT_TYPE pt ON pt.PRODUCT_TYPE_ID = po.cust_product_type_fk
LEFT JOIN T_PROPOSAL_TYPE prt ON prt.PROPTYPE_ID = po.proposal_type_fk
LEFT JOIN T_BUSINESS_SOURCE bs ON bs.BUSINESS_SOURCE_ID = po.CONT_AGT_BRK_CHANNEL_FK
LEFT JOIN T_USER ur ON ur.Id = po.user_id_fk
LEFT JOIN T_ROLES ro ON ur.roleid_fk = ro.Role_Id
LEFT JOIN T_UNDERWRITING_DECISION und ON und.O_Id = po.decision_id_fk
LEFT JOIN T_STATUS st ON st.STATUS_ID = po.piv_uw_status_fk
LEFT OUTER JOIN T_MEMBER_INFO mi ON mi.proposal_info_fk = po.O_ID
WHERE 1 = 1
AND po.CUST_APP_NO LIKE '%100010233976%'
AND 1 = 1
AND po.IS_STP <> 1
AND po.PIV_UW_STATUS_FK != 10
The performance seems to be not good and I would like to optimize the query.
Any suggestions please?

Try this one -
SELECT COUNT(DISTINCT po.o_id)
FROM T_PROPOSAL_INFO po
WHERE PO.CUST_APP_NO LIKE '%100010233976%'
AND PO.IS_STP <> 1
AND po.PIV_UW_STATUS_FK != 10

First, check your indexes. Are they old? Did they get fragmented? Do they need rebuilding?
Then, check your "execution plan" (varies depending on the SQL Engine): are all joins properly understood? Are some of them 'out of order'? Do some of them transfer too many data?
Then, check your plan and indexes: are all important columns covered? Are there any outstandingly lengthy table scans or joins? Are the columns in indexes IN ORDER with the query?
Then, revise your query:
- can you extract some parts that normally would quickly generate small rowset?
- can you add new columns to indexes so join/filter expressions will get covered?
- or reorder them so they match the query better?
And, supporting the solution from #Devart:
Can you eliminate some tables on the way? does the where touch the other tables at all? does the data in the other tables modify the count significantly? If neither SELECT nor WHERE never touches the other joined columns, and if the COUNT exact value is not that important (i.e. does that T_PROPOSAL_INFO exist?) then you might remove all the joins completely, as Devart suggested. LEFTJOINs never reduce the number of rows. They only copy/expand/multiply the rows.

Related

Trying to write SQL query where I need to join 5 tables

I am fairly new to SQL. I am working on a query script where I need to join date from five tables. I have no way to test this, so I am not sure if it's right.
SELECT tblQCC.CSN,
tblQCC.question_id,
tblQCC.answer,
tblEncounters.CSN,
tblEncounters.department_id,
tblEncounters.prc_id,
tblEncounters.patient_id,
tblEncounters.account_id,
tblEncounters.ser_id,
tblEncounters.visit_date,
tblPatient.patient_id,
tblPatient.patient_name_last,
tblPatient.patient_name_first,
tblPatient.MRN,
tblPatient.DOB,
tblAccount.account_id,
tblAccount.benefit_plan_name,
tblSer.ser_id,
tblSer.provider_name
FROM theQCC
JOIN tblEncounters
ON tblQCC.CSN = tblEncouter.CSN
JOIN tblPatient
ON tblEncounters.patient_id = tblPatient.patient_id
JOIN tblAccount
ON tblEncounters.account_id = tblAccount.account_id
JOIN tblSer
ON tblEncounters.ser_id = tblSer.ser_id
WHERE tblEncounters.depatement_id = 500
AND tblQCC.answer = ‘yes’
AND tblEncounters.visit_date <= 2016-12-10;
First off, what database?
Specify your JOINs. Are they INNER, OUTER, LEFT, RIGHT?
Do you have tables with acceptable null values coming in?
Quotes around the date part ... etc.

How can I do a SQL join to get a value 4 tables farther from the value provided?

My title is probably not very clear, so I made a little schema to explain what I'm trying to achieve. The xxxx_uid labels are foreign keys linking two tables.
Goal: Retrieve a column from the grids table by giving a proj_uid value.
I'm not very good with SQL joins and I don't know how to build a single query that will achieve that.
Actually, I'm doing 3 queries to perform the operation:
1) This gives me a res_uid to work with:
select res_uid from results where results.proj_uid = VALUE order by res_uid asc limit 1"
2) This gives me a rec_uid to work with:
select rec_uid from receptor_results
inner join results on results.res_uid = receptor_results.res_uid
where receptor_results.res_uid = res_uid_VALUE order by rec_uid asc limit 1
3) Get the grid column I want from the grids table:
select grid_name from grids
inner join receptors on receptors.grid_uid = grids.grid_uid
where receptors.rec_uid = rec_uid_VALUE;
Is it possible to perform a single SQL that will give me the same results the 3 I'm actually doing ?
You're not limited to one JOIN in a query:
select grids.grid_name
from grids
inner join receptors
on receptors.grid_uid = grids.grid_uid
inner join receptor_results
on receptor_results.rec_uid = receptors.rec_uid
inner join results
on results.res_uid = receptor_results.res_uid
where results.proj_uid = VALUE;
select g.grid_name
from results r
join resceptor_results rr on r.res_uid = rr.res_uid
join receptors rec on rec.rec_uid = rr.rec_uid
join grids g on g.grid_uid = rec.grid_uid
where r.proj_uid = VALUE
a small note about names, typically in sql the table is named for a single item not the group. thus "result" not "results" and "receptor" not "receptors" etc. As you work with sql this will make sense and names like you have will seem strange. Also, one less character to type!

Ignore null values in select statement

I'm trying to retrieve a list of components via my computer_system, BUT if a computer system's graphics card is set to null (I.e. It has an onboard), the row isn't returned by my select statement.
I've been trying to use COALESCE without results. I've also tried with and OR in my WHERE clause, which then just returns my computer system with all different kinds of graphic cards.
Relevant code:
SELECT
computer_system.cs_id,
computer_system.cs_name,
motherboard.name,
motherboard.price,
cpu.name,
cpu.price,
gfx.name,
gfx.price
FROM
public.computer_case ,
public.computer_system,
public.cpu,
public.gfx,
public.motherboard,
public.ram
WHERE
computer_system.cs_ram = ram.ram_id AND
computer_system.cs_cpu = cpu.cpu_id AND
computer_system.cs_mb = motherboard.mb_id AND
computer_system.cs_case = computer_case.case_id AND
computer_system.cs_gfx = gfx.gfx_id; <-- ( OR computer_system.cs_gfx IS NULL)
Returns:
1;"Computer1";"Fractal Design"; 721.00; "MSI Z87"; 982.00; "Core i7 I7-4770K "; 2147.00; "Crucial Gamer"; 1253.00; "ASUS GTX780";3328.00
Should I use Joins? Is there no easy way to say return the requested row, even if there's a bloody NULL value. Been struggling with this for at least 2 hours.
Tables will be posted if needed.
EDIT: It should return a second row:
2;"Computer2";"Fractal Design"; 721.00; "MSI Z87"; 982.00; "Core i7 I7-4770K "; 2147.00; "Crucial Gamer"; 1253.00; "null/nothing";null/nothing
You want a LEFT OUTER JOIN.
First, clean up your code so you use ANSI joins so it's readable:
SELECT
computer_system.cs_id,
computer_system.cs_name,
motherboard.name,
motherboard.price,
cpu.name,
cpu.price,
gfx.name,
gfx.price
FROM
public.computer_system
INNER JOIN public.computer_case ON computer_system.cs_case = computer_case.case_id
INNER JOIN public.cpu ON computer_system.cs_cpu = cpu.cpu_id
INNER JOIN public.gfx ON computer_system.cs_gfx = gfx.gfx_id
INNER JOIN public.motherboard ON computer_system.cs_mb = motherboard.mb_id
INNER JOIN public.ram ON computer_system.cs_ram = ram.ram_id;
Then change the INNER JOIN on public.gfx to a LEFT OUTER JOIN:
LEFT OUTER JOIN public.gfx ON computer_system.cs_gfx = gfx.gfx_id
See PostgreSQL tutorial - joins.
I very strongly recommend reading an introductory tutorial to SQL - at least the PostgreSQL tutorial, preferably some more material as well.
It looks like it's just a bracket placement issue. Pull the null check and the graphics card id comparison into a clause by itself.
...
computer_system.cs_case = computer_case.case_id AND
(computer_system.cs_gfx IS NULL OR computer_system.cs_gfx = gfx.gfx_id)
Additionally, you ask if you should use joins. You are in fact using joins, by virtue of having multiple tables in your FROM clause and specifying the join criteria in the WHERE clause. Changing this to use the JOIN ON syntax might be a little easier to read:
FROM sometable A
JOIN someothertable B
ON A.somefield = B.somefield
JOIN somethirdtable C
ON A.somefield = C.somefield
etc
Edit:
You also likely want to make the join where you expect the null value to be a left outer join:
SELECT * FROM
first_table a
LEFT OUTER JOIN second_table b
ON a.someValue = b.someValue
If there is no match in the join, the row from the left side will still be returned.

Timeout running SQL query

I'm trying to using the aggregation features of the django ORM to run a query on a MSSQL 2008R2 database, but I keep getting a timeout error. The query (generated by django) which fails is below. I've tried running it directs the SQL management studio and it works, but takes 3.5 min
It does look it's aggregating over a bunch of fields which it doesn't need to, but I wouldn't have though that should really cause it to take that long. The database isn't that big either, auth_user has 9 records, ticket_ticket has 1210, and ticket_watchers has 1876. Is there something I'm missing?
SELECT
[auth_user].[id],
[auth_user].[password],
[auth_user].[last_login],
[auth_user].[is_superuser],
[auth_user].[username],
[auth_user].[first_name],
[auth_user].[last_name],
[auth_user].[email],
[auth_user].[is_staff],
[auth_user].[is_active],
[auth_user].[date_joined],
COUNT([tickets_ticket].[id]) AS [tickets_captured__count],
COUNT(T3.[id]) AS [assigned_tickets__count],
COUNT([tickets_ticket_watchers].[ticket_id]) AS [tickets_watched__count]
FROM
[auth_user]
LEFT OUTER JOIN [tickets_ticket] ON ([auth_user].[id] = [tickets_ticket].[capturer_id])
LEFT OUTER JOIN [tickets_ticket] T3 ON ([auth_user].[id] = T3.[responsible_id])
LEFT OUTER JOIN [tickets_ticket_watchers] ON ([auth_user].[id] = [tickets_ticket_watchers].[user_id])
GROUP BY
[auth_user].[id],
[auth_user].[password],
[auth_user].[last_login],
[auth_user].[is_superuser],
[auth_user].[username],
[auth_user].[first_name],
[auth_user].[last_name],
[auth_user].[email],
[auth_user].[is_staff],
[auth_user].[is_active],
[auth_user].[date_joined]
HAVING
(COUNT([tickets_ticket].[id]) > 0 OR COUNT(T3.[id]) > 0 )
EDIT:
Here are the relevant indexes (excluding those not used in the query):
auth_user.id (PK)
auth_user.username (Unique)
tickets_ticket.id (PK)
tickets_ticket.capturer_id
tickets_ticket.responsible_id
tickets_ticket_watchers.id (PK)
tickets_ticket_watchers.user_id
tickets_ticket_watchers.ticket_id
EDIT 2:
After a bit of experimentation, I've found that the following query is the smallest that results in the slow execution:
SELECT
COUNT([tickets_ticket].[id]) AS [tickets_captured__count],
COUNT(T3.[id]) AS [assigned_tickets__count],
COUNT([tickets_ticket_watchers].[ticket_id]) AS [tickets_watched__count]
FROM
[auth_user]
LEFT OUTER JOIN [tickets_ticket] ON ([auth_user].[id] = [tickets_ticket].[capturer_id])
LEFT OUTER JOIN [tickets_ticket] T3 ON ([auth_user].[id] = T3.[responsible_id])
LEFT OUTER JOIN [tickets_ticket_watchers] ON ([auth_user].[id] = [tickets_ticket_watchers].[user_id])
GROUP BY
[auth_user].[id]
The weird thing is that if I comment out any two lines in the above, it runs in less that 1s, but it doesn't seem to matter which lines I remove (although obviously I can't remove a join without also removing the relevant SELECT line).
EDIT 3:
The python code which generated this is:
User.objects.annotate(
Count('tickets_captured'),
Count('assigned_tickets'),
Count('tickets_watched')
)
A look at the execution plan shows that SQL Server is first doing a cross-join on all the table, resulting in about 280 million rows, and 6Gb of data. I assume that this is where the problem lies, but why is it happening?
SQL Server is doing exactly what it was asked to do. Unfortunately, Django is not generating the right query for what you want. It looks like you need to count distinct, instead of just count: Django annotate() multiple times causes wrong answers
As for why the query works that way: The query says to join the four tables together. So say an author has 2 captured tickets, 3 assigned tickets, and 4 watched tickets, the join will return 2*3*4 tickets, one for each combination of tickets. The distinct part will remove all the duplicates.
what about this?
SELECT auth_user.*,
C1.tickets_captured__count
C2.assigned_tickets__count
C3.tickets_watched__count
FROM
auth_user
LEFT JOIN
( SELECT capturer_id, COUNT(*) AS tickets_captured__count
FROM tickets_ticket GROUP BY capturer_id ) AS C1 ON auth_user.id = C1.capturer_id
LEFT JOIN
( SELECT responsible_id, COUNT(*) AS assigned_tickets__count
FROM tickets_ticket GROUP BY responsible_id ) AS C2 ON auth_user.id = C2.responsible_id
LEFT JOIN
( SELECT user_id, COUNT(*) AS tickets_watched__count
FROM tickets_ticket_watchers GROUP BY user_id ) AS C3 ON auth_user.id = C3.user_id
WHERE C1.tickets_captured__count > 0 OR C2.assigned_tickets__count > 0
--WHERE C1.tickets_captured__count is not null OR C2.assigned_tickets__count is not null -- also works (I think with beter performance)

Need help optimizing this tSQL Query

I'm definitely not a DBA and unfortunately we don't have a DBA to consult within at our company. I was wondering if someone could give me a recommendation on how to improve this query, either by changing the query itself or adding indexes to the database.
Looking at the execution plan of the query it seems like the outer joins are killing the query. This query only returns 350k results, but it takes almost 30 seconds to complete. I don't know much about DB's, but I don't think this is good? Perhaps I'm wrong?
Any suggestions would be greatly appreciated. Thanks in advance.
As a side note this is obviously being create by an ORM and not me directly. We are using Linq-to-SQL.
SELECT
[t12].[value] AS [DiscoveryEnabled],
[t12].[value2] AS [isConnected],
[t12].[Interface],
[t12].[Description] AS [InterfaceDescription],
[t12].[value3] AS [Duplex],
[t12].[value4] AS [IsEnabled],
[t12].[value5] AS [Host],
[t12].[value6] AS [HostIP],
[t12].[value7] AS [MAC],
[t12].[value8] AS [MACadded],
[t12].[value9] AS [PortFast],
[t12].[value10] AS [PortSecurity],
[t12].[value11] AS [ShortHost],
[t12].[value12] AS [SNMPlink],
[t12].[value13] AS [Speed],
[t12].[value14] AS [InterfaceStatus],
[t12].[InterfaceType],
[t12].[value15] AS [IsUserPort],
[t12].[value16] AS [VLAN],
[t12].[value17] AS [Code],
[t12].[Description2] AS [Description],
[t12].[Host] AS [DeviceName],
[t12].[NET_OUID],
[t12].[DisplayName] AS [Net_OU],
[t12].[Enclave]
FROM (
SELECT
[t1].[DiscoveryEnabled] AS [value],
[t1].[IsConnected] AS [value2],
[t0].[Interface],
[t0].[Description],
[t2].[Duplex] AS [value3],
[t0].[IsEnabled] AS [value4],
[t3].[Host] AS [value5],
[t6].[Address] AS [value6],
[t3].[MAC] AS [value7],
[t3].[MACadded] AS [value8],
[t2].[PortFast] AS [value9],
[t2].[PortSecurity] AS [value10],
[t4].[Host] AS [value11],
[t0].[SNMPlink] AS [value12],
[t2].[Speed] AS [value13],
[t2].[InterfaceStatus] AS [value14],
[t8].[InterfaceType],
[t0].[IsUserPort] AS [value15],
[t2].[VLAN] AS [value16],
[t9].[Code] AS [value17],
[t9].[Description] AS [Description2],
[t7].[Host], [t7].[NET_OUID],
[t10].[DisplayName],
[t11].[Enclave],
[t7].[Decommissioned]
FROM [dbo].[IDB_Interface] AS [t0]
LEFT OUTER JOIN [dbo].[IDB_InterfaceLayer2] AS [t1] ON [t0].[IDB_Interface_ID] = [t1].[IDB_Interface_ID]
LEFT OUTER JOIN [dbo].[IDB_LANinterface] AS [t2] ON [t1].[IDB_InterfaceLayer2_ID] = [t2].[IDB_InterfaceLayer2_ID]
LEFT OUTER JOIN [dbo].[IDB_Host] AS [t3] ON [t2].[IDB_LANinterface_ID] = [t3].[IDB_LANinterface_ID]
LEFT OUTER JOIN [dbo].[IDB_Infrastructure] AS [t4] ON [t0].[IDB_Interface_ID] = [t4].[IDB_Interface_ID]
LEFT OUTER JOIN [dbo].[IDB_AddressMapIPv4] AS [t5] ON [t3].[IDB_AddressMapIPv4_ID] = ([t5].[IDB_AddressMapIPv4_ID])
LEFT OUTER JOIN [dbo].[IDB_AddressIPv4] AS [t6] ON [t5].[IDB_AddressIPv4_ID] = [t6].[IDB_AddressIPv4_ID]
INNER JOIN [dbo].[ART_Asset] AS [t7] ON [t7].[ART_Asset_ID] = [t0].[ART_Asset_ID]
LEFT OUTER JOIN [dbo].[NSD_InterfaceType] AS [t8] ON [t8].[NSD_InterfaceTypeID] = [t0].[NSD_InterfaceTypeID]
INNER JOIN [dbo].[NSD_InterfaceCode] AS [t9] ON [t9].[NSD_InterfaceCodeID] = [t0].[NSD_InterfaceCodeID]
INNER JOIN [dbo].[NET_OU] AS [t10] ON [t10].[NET_OUID] = [t7].[NET_OUID]
INNER JOIN [dbo].[NET_Enclave] AS [t11] ON [t11].[NET_EnclaveID] = [t10].[NET_EnclaveID]
) AS [t12]
WHERE ([t12].[Enclave] = 'USMC') AND (NOT ([t12].[Decommissioned] = 1))
LINQ-TO-SQL Query:
return from t in db.IDB_Interfaces
join v in db.IDB_InterfaceLayer3s on t.IDB_Interface_ID equals v.IDB_Interface_ID
join u in db.ART_Assets on t.ART_Asset_ID equals u.ART_Asset_ID
join c in db.NET_OUs on u.NET_OUID equals c.NET_OUID
join w in
(from d in db.IDB_InterfaceIPv4s
select new { d.IDB_InterfaceIPv4_ID, d.IDB_InterfaceLayer3_ID, d.IDB_AddressMapIPv4_ID, d.IDB_AddressMapIPv4.IDB_AddressIPv4.Address })
on v.IDB_InterfaceLayer3_ID equals w.IDB_InterfaceLayer3_ID
join h in db.NET_Enclaves on c.NET_EnclaveID equals h.NET_EnclaveID into enclaveLeftJoin
from i in enclaveLeftJoin.DefaultIfEmpty()
join m in
(from z in db.IDB_StandbyIPv4s
select new
{
z.IDB_InterfaceIPv4_ID,
z.IDB_AddressMapIPv4_ID,
z.IDB_AddressMapIPv4.IDB_AddressIPv4.Address,
z.Preempt,
z.Priority
})
on w.IDB_InterfaceIPv4_ID equals m.IDB_InterfaceIPv4_ID into standbyLeftJoin
from k in standbyLeftJoin.DefaultIfEmpty()
where t.ART_Asset.Decommissioned == false
select new NetIDBGridDataResults
{
DeviceName = u.Host,
Host = u.Host,
Interface = t.Interface,
IPAddress = w.Address,
ACLIn = v.InboundACL,
ACLOut = v.OutboundACL,
VirtualAddress = k.Address,
VirtualPriority = k.Priority,
VirtualPreempt = k.Preempt,
InterfaceDescription = t.Description,
Enclave = i.Enclave
};
As a rule (and this is very general), you want an index on:
JOIN fields (both sides)
Common WHERE filter fields
Possibly fields you aggregate
For this query, start with checking your JOIN criteria. Any one of those missing will force a table scan which is a big hit.
Looking at the execution plan of the query it seems like the outer joins are killing the query.
This query only returns 350k results, but it takes almost 30 seconds to complete. I don't know
much about DB's, but I don't think this is good? Perhaps I'm wrong?
A man has got to do waht a mana has got to do.
The joins may kill you, but when you need them YOU NEED THEM. Some tasks take long.
Make sure you ahve all indices you need.
Make sure your sql server is not a sad joke hardware wise.
All you can do.
I woudl bet someone has no clue about SQL and needs to be enlighted to the power of indices.