Join optimization PostgresSQL - sql

I have 2 tables : Calls (10,000 rows) , CRM (25 million rows)
I want to do Calls left join CRM.
select *
from calls a
left join crm b
on (
(a.customerID = b.customerID)
OR
(a.Number1 in (b.Number_A,b.Number_B))
OR
(a.Number2 in (b.Number_A,b.Number_B))
);
When I do just the customerID join, it runs fine. But the above code causes timeout and it crashes.

I would suggest multiple left joins:
select c.*,
coalesce(cc.col1, c1a.col1, c1b.col1, c2a.col1, c2b.col1)
from calls c left join
crm cc
on c.customerID = cc.customerID left join
crm c1a
on c.Number1 = c1a.Number_A left join
crm c1b
on c.Number1 = c1b.Number_B left join
crm c2a
on c.Number2 = c2a.Number_A left join
crm c2b
on c.Number2 = c2b.Number_B;
This can then take advantage of indexes on crm(CustomerId), crm(Number1), and crm(Number2).

Sometimes, when replacing one query that contains two conditons with OR with two queries that get glued together with UNION, this results in a better execution plan. I have never understood why DBMS optimizers don't take this in consideration themselves. And I don't know whether this is true for PostgreSQL or not. But it may be worth a try.
In your case there is an outer join in the query. That complicates the matter. With the separate queries we may get both outer joined and matching crm rows for a call and must get rid of the former in that case.
select *
from
(
select * from calls left join crm on crm.customerID = calls.customerID
union
select * from calls left join crm on crm.number_a = calls.number1
union
select * from calls left join crm on crm.number_a = calls.number2
union
select * from calls left join crm on crm.number_b = calls.number1
union
select * from calls left join crm on crm.number_b = calls.number2
) data
order by rank() over (partition by calls.id order by case when crm.id is null then 2 else 1 end)
fetch first row with ties;
For this to work fast you should have one index per column in the query, i.e. six single-column indexes.
Whether this is faster than your original query depends on a lot of things. Mainly: the fewer matches the better.

Related

Request Optimization (CTE, multiple LEFT JOIN, WHERE with OR)

Can anyone give me advice on how to optimize this request?
More left joins than in this example (20+), principally to get values with foreign key, what optimization is possible?
CTE used to create aggregates but CTE tables are used in principal request, so is it useful?
Where condition with a simple condition on the principal table and a second condition OR with fields of several tables, could it be better to add a column with a max date of the 3 fields and have a simple second condition (without OR)?
SQL Server 2015+
WITH
cte AS
(
SELECT
e_ofcte.id,
SUM(CASE WHEN f_ofcte.lib='G' THEN 1 ELSE 0 END) AS n1,
SUM(CASE WHEN f_ofcte.lib='H' THEN 1 ELSE 0 END) AS n2
FROM e_ofcte
INNER JOIN f_ofcte ON f_ofcte.id=e_ofcte.id
WHERE f_ofcte.lib IN ('G','H')
AND e_ofcte.date>=DATEFROMPARTS(YEAR(CURRENT_TIMESTAMP)-2,1,1)
GROUP BY
e_ofcte.id
)
SELECT
a.id,
b.sid,
c.sid,
cte.n1,
cte.n2
FROM a
LEFT JOIN cte ON a.id=cte.id
LEFT JOIN b ON a.id=b.id
LEFT JOIN c ON a.id=c.id
LEFT JOIN e_ofcte ON a.id=e_ofcte.id
LEFT JOIN i ON a.id=i.id
LEFT JOIN j ON a.id=j.id
LEFT JOIN f_ofcte ON a.id=f_ofcte.id
WHERE a.code='A'
AND
(
a.date>=>=DATEFROMPARTS(YEAR(CURRENT_TIMESTAMP)-2,1,1)
OR
b.date>=>=DATEFROMPARTS(YEAR(CURRENT_TIMESTAMP)-2,1,1)
OR
c.date>=>=DATEFROMPARTS(YEAR(CURRENT_TIMESTAMP)-2,1,1)
)
If you move "OR" conditions to JOIN it will return different results. My answer would be "NO", unless you exactly know what you are doing.
There are multiple possible approaches you can try to fight performance:
Move CTE to temporary table if you can. It makes query smaller?
which will help optimizer to come up with the best plan. Also, you
can tune two parts separately.
Possibly build filtered index on table A(id,date) with "WHERE
code='A'" - that would work if number of filtered records is
relatively small
Possibly build filtered index on table f_ofcte(id) with "WHERE
lib IN ('G','H')"
Build indexes on other tables on (id,date)
Not sure if you provided full query, but it looks like following
part is completely unused:
LEFT JOIN e_ofcte ON a.id=e_ofcte.id
LEFT JOIN i ON a.id=i.id
LEFT JOIN j ON a.id=j.id
LEFT JOIN f_ofcte ON a.id=f_ofcte.id

Inner Join and Left Join on 5 tables in Access using SQL

I am attempting to access data from the following tables:
OrgPlanYear
ProjOrgPlnYrJunction
DC
DCMaxEEContribLevel
DCNonDiscretionaryContribLevel
Basically, I need to inner join OrgPlanYear + DC and ProjOrgPlnYrJunction then I need to Left Join the remaining tables (tables 4 and 5) due to the fact the tables 1-3 have all the rows I need and only some have data in tables 4-5. I need several variables from each table. I also need the WHERE function to be across all fields (meaning I want all this data for a select group where projectID=919).
Please help!
I have tried many things with errors including attempting to use the Design Query side (i.e. JOIN function issues, badly formatted FROM function, etc.)! Here is an example of one excluding all variables I need:
SELECT
ProjOrgPlnYrJunction.fkeyProjectID, OrgPlanYear.OrgName, DC.PlanCode, DCNonDiscretionaryContribLevel.Age,DCNonDiscretionaryContribLevel.Service
FROM
(((OrgPlanYear INNER JOIN DC ON OrgPlanYear.OrgPlanYearID = DC.fkeyOrgPlanYearID) INNER JOIN ProjOrgPlnYrJunction ON OrgPlanYear.OrgPlanYearID = ProjOrgPlnYrJunction.fkeyOrgPlanYearID)
LEFT JOIN
(SELECT DCNonDiscretionaryContribLevel.Age AS Age, DCNonDiscretionaryContribLevel.Service AS Service FROM DCNonDiscretionaryContribLevel WHERE ProjOrgPlnYrJunction.fkeyProjectID)=919)
LEFT JOIN (
SELECT DCMaxEEContribLevel.EEContribRoth FROM EEContribRoth WHERE ProjOrgPlnYrJunction.fkeyProjectID)=919)
ORDER BY OrgPlanYear.OrgName;
Main issues with your query:
Missing ON clauses for each LEFT JOIN.
Referencing other table columns in SELECT and WHERE of a different subquery (e.g., FROM DCNonDiscretionaryContribLevel WHERE ProjOrgPlnYrJunction.fkeyProjectID).
Unmatched parentheses around subqueries and joins per Access SQL requirements.
See below adjusted SQL that now uses short table aliases. Be sure to adjust SELECT and ON clauses with appropriate columns.
SELECT p.fkeyProjectID, o.OrgName, DC.PlanCode, dcn.Age, dcn.Service, e.EEContribRoth
FROM (((OrgPlanYear o
INNER JOIN DC
ON o.OrgPlanYearID = DC.fkeyOrgPlanYearID)
INNER JOIN ProjOrgPlnYrJunction p
ON o.OrgPlanYearID = p.fkeyOrgPlanYearID)
LEFT JOIN
(SELECT Age AS Age, Service AS Service
FROM DCNonDiscretionaryContribLevel
WHERE fkeyProjectID = 919) AS dcn
ON dcn.fkeyProjectID = p.fkeyOrgPlanYearID)
LEFT JOIN
(SELECT EEContribRoth
FROM EEContribRoth
WHERE fkeyProjectID = 919) AS e
ON e.fkeyProjectID = p.fkeyProjectID
ORDER BY o.OrgName;

Finding multiple results in SQL Server 2012 (not duplicates)

I need some help. I've got a list of customers and services that they've used, but I need to narrow that list to customers that have used more than one service (excluding those who've only used one service). They have sometimes used the same service more than once, but I need a list of unique services.
The below brings back the main list of customers.
SELECT
DISTINCT M.CustID
,S.ServiceID
,R.ReceivedDate
,S.ServiceRequestID
FROM Customers AS M
LEFT OUTER JOIN CustomerDates AS R ON M.CustID = R.CustID
LEFT OUTER JOIN Service1 AS S ON R.ServiceRequestID = S.ServiceRequestID
WHERE S.CloseDate IS NULL
What I need is a list that excludes the first three lines as they have only used one service, whereas the next seven I need as they've used more than one service.
It is quite likely that you can just use NOT EXISTS. This is likely to do what you want:
SELECT M.CustID, S.ServiceID, R.ReceivedDate, S.ServiceRequestID
FROM Customers M JOIN
CustomerDates R
ON M.CustID = R.CustID JOIN
Service1 S
ON R.ServiceRequestID = S.ServiceRequestID
WHERE S.CloseDate IS NULL AND
EXISTS (SELECT 1
FROM CustomerDates cd2
WHERE cd2.CustId = m.custId AND cd2.ServiceRequestID <> r.ServiceRequestID
);
I'm not 100% sure this is equivalent to your query. The SELECT DISTINCT should not be needed, unless you have duplicates in your tables. Also, you seem to require matches between the tables, so LEFT JOIN is not appropriate. It it is not clear if the WHERE condition is relevant for finding duplicates.
Try the following, it should work
SELECT
CustID
,ServiceID
,ReceivedDate
,ServiceRequestID
FROM
(SELECT
DISTINCT M.CustID
,S.ServiceID
,R.ReceivedDate
,S.ServiceRequestID
,count(*) over (partition by M.CustID) as total
FROM Customers AS M
LEFT OUTER JOIN CustomerDates AS R ON M.CustID = R.CustID
LEFT OUTER JOIN Service1 AS S ON R.ServiceRequestID = S.ServiceRequestID
WHERE S.CloseDate IS NULL
) vals
where total > 1

Difference between Where and Join on Id

I recently saw this query, which finds all the party a client can go to:
SELECT *
FROM Party
INNER JOIN Organizer on Organizer.OrganizerId = Party.OrganizerId
LEFT JOIN Client on Client.ClientID = 1
LEFT JOIN PartyRegistration on PartyRegistration.PartyId = Party.PartyId
WHERE Party.OrganizerId = 0
AND (Party.HasGuestList = 0 OR PartyRegistration.ClientId = Client.ClientId)
I had never seen a join on a specific value before. Is it normal to see SQL code like this?
I don't have much knowledge of left joins but it can apply to any join, for example, how would this:
SELECT *
FROM Party
INNER JOIN Organizer on Organizer.OrganizerId = 0
compare to that since the results are the same:
SELECT *
FROM Party
INNER JOIN Organizer on Organizer.OrganizerId = Party.OrganizerId
WHERE Organizer.OrganizerId = 0
This is very good practice -- in fact, you cannot (easily) get this logic in a WHERE clause.
A LEFT JOIN returns all rows in the first table -- even when there are no matches in the second.
So, this returns all rows in the preceding tables -- and any rows from Client where ClientId = 1. If there is no match on that ClientId, then the columns will be NULL, but the rows are not filtered.
This can only be a matter of good/bad practice if you compare to an alternative. Putting a test in a left join on vs a where does two different things--so it's not a matter of good/bad practice.
If that is the correct left join condition, meaning you want inner join rows on that condition plus unmatched left table rows, then that is the left join condition. It wouldn't go anywhere else.
Learn what left join on returns: inner join on rows plus unmatched left table rows extended by nulls. Always know what inner join you want as part of a left join.
That is true. Left join on a specific value is really bad practice. But some times, we may need to all the column from one table though we don't have common columns to join and required to join by specific condition like A="some value". In this case adding LEFT JOIN on specific condition bad practice, though we can little better way, below is updated code, Please let me know if you have any questions, I would be happy to help you on this.
SELECT *
FROM Party
INNER JOIN Organizer on Organizer.OrganizerId = Party.OrganizerId
LEFT JOIN Client USING(CLIENTID)
LEFT JOIN PartyRegistration on PartyRegistration.PartyId = Party.PartyId
WHERE CLIENTID=1 AND Party.OrganizerId = 0
AND (Party.HasGuestList = 0 OR PartyRegistration.ClientId = Client.ClientId)

Left join effectiveness when using IS NULL

I'm using a left join to check if certain types of information have been stored in the database.
I'm wondering if a lot of resources will be wasted if the joined table contains a lot of rows which matches the JOIN clause.
i.e.:
SELECT Applications.*
FROM Applications
LEFT JOIN SomeFeatureRows ON (SomeFeatureRows.ApplicationId = Applications.Id)
WHERE SomeFeatureRows.Id IS NULL;
Do the DB scan through all rows in SomeFeatureRows to see if there is a row where Id is NULL?
I just want to check if there is a row or not in that table (with the specified application id).
Edit, might as well include the real SQL statement:
SELECT organizations.id AS OrganizationId,
organizations.Name,
Application.Id as ApplicationId,
Application.Name as ApplicationName,
Account.id AS AccountId,
Account.Email,
Account.Username ,
SentEmails. SentAtUtc
FROM organizations
INNER JOIN applications ON ( organizations.id = applications.organizationid )
LEFT JOIN Incidents ON ( organizations.id = Incidents.organizationid )
LEFT JOIN SentEmails ON ( organizations.id = SentEmails.organizationid AND EmailTypeName = 'IncidentsReminder')
CROSS apply (SELECT accounts.id,
accounts.email,
accounts.username
FROM accounts,
organizationmembers
WHERE accounts.id = organizationmembers.accountid
AND organizationmembers.organizationid =
organizations.id)
Account
WHERE Incidents.id IS NULL
Here is a very good article explaining the different techniques and performance benefits of using: Not Exists vs. Not In vs. Left join / Is null
To summarize:
LEFT JOIN / IS NULL is less efficient, since it makes no attempt to skip the already matched values in the right table, returning all results and filtering them out instead. Use Not Exists for best performance as it will create a LEFT ANTI SEMI JOIN in the execution plan.