Need help understanding a complex query with multiple join conditions

Need help understanding a complex query with multiple join conditions - sql

I have a query that I am trying to understand. Can someone shed light on to the details of what this query does?
I've only ever used one ON clause in a join condition. This one has multiple conditions for the LEFT JOIN, making it tricky to understand.
INSERT INTO nop_tbl
(q_date, community_id, newsletter_t, subscription_count)
SELECT date(now()), a.community_id,
a.newsletter_type,
count(a.subscriber_user_id)
FROM
newsletter_subscribers_main a
LEFT OUTER JOIN nop_tbl b
ON (a.community_id = b.community_id)
AND (a.newsletter_type = b.newsletter_t)
AND (a.created_at = b.q_date)
WHERE b.q_date is null
AND b.mailing_list is null
GROUP BY a.community_id, a.newsletter_t, a.created_at

You have your explanation:
The objective of the query is to count subscriptions per (q_date, community_id, newsletter_t) in newsletter_subscribers_main and write the result to nop_tbl.
The LEFT JOIN prevents that rows are added multiple times.
But I also think, the query is inefficient and probably wrong.
The 2nd WHERE clause:
AND b.mailing_list is null
is just noise and can be removed. If b.q_date is null, then b.mailing_list is guaranteed to be null in this query.
You don't need parentheses around JOIN conditions.
If subscriber_user_id is defined NOT NULL, count(*) does the same, cheaper.
I suspect that grouping by a.created_at, while you insert date(now()) is probably wrong. Hardly makes any sense. My educated guess (assuming that created_at is type date):
INSERT INTO nop_tbl
(q_date, community_id, newsletter_t, subscription_count)
SELECT a.created_at
,a.community_id
,a.newsletter_type
,count(*)
FROM newsletter_subscribers_main a
LEFT JOIN nop_tbl b ON a.community_id = b.community_id
AND a.newsletter_type = b.newsletter_t
AND a.created_at = b.q_date
WHERE b.q_date IS NULL
GROUP BY a.created_at, a.community_id, a.newsletter_t;

The short short version is:
insert ... select ...
-> the query is filling nob_tbl
from ...
-> based on data in newsletter_subscribers_main
left join ... where ... is null
-> that are not already present in nob_tbl

Step by step
INSERT INTO nop_tbl
(q_date, community_id, newsletter_t, subscription_count)
The INSERT syntax This is telling the database in what table and what column will be used for the insert query
SELECT date(now()), a.community_id,
a.newsletter_type,
count(a.subscriber_user_id)
Those are instead the selected fields to insert
FROM
newsletter_subscribers_main a
Here is telling to the database to select fields which has alias prepended a. from table newsletter_subscribers_main
LEFT OUTER JOIN nop_tbl b
Here is left joining another table nop_tbl where other fields will be selected
ON (a.community_id = b.community_id)
AND (a.newsletter_type = b.newsletter_t)
AND (a.created_at = b.q_date)
Those are the rules of the JOIN, actually is telling what columns will be used for join
WHERE b.q_date is null
AND b.mailing_list is null
Those are the WHERE clauses , they are used to limit result to the requested data, in this case where two columns are null
GROUP BY a.community_id, a.newsletter_t, a.created_at
GROUP BY clauses, used to group result on given column
You can have a visual explanation of joins here

Related

Multiple left joins in a single statement

I have multiple queries that I'm trying to combine into a single one with no luck. I'm using left join on the same table twice with a different field and that sounds wrong.
SELECT a.*
, b.CODE_DESCRIPTION AS highest_grade
FROM BBOP.EP_MAIN_FACT a
LEFT JOIN BBOP.EP_CODE_WORK b
ON a.HIGHESTGRADE_CA = b.code
AND code_type LIKE 'High%'
LEFT JOIN BBOP.EP_CODE_WORK ab
ON a.Goal_Steps = ab.code
AND code_type LIKE 'Goal%'
WHERE plan_date BETWEEN '01-mar-2019' AND '31-may-2019';
-- ORA-00918: column ambiguously defined
00918. 00000 - "column ambiguously defined"
Here are the 2 queries, separately they produce results with no problems.
-- Highest Grade
SELECT a.*
, b.CODE_DESCRIPTION AS highest_grade_desc
FROM BBOP.EP_MAIN_FACT a
LEFT JOIN BBOP.EP_CODE_WORK b
ON a.HIGHESTGRADE_CA = b.code
WHERE plan_date BETWEEN '01-mar-2019' AND '31-may-2019'
AND code_type LIKE 'High%';
-- Goals
SELECT a.*
, b.CODE_DESCRIPTION AS Goal
FROM BBOP.EP_MAIN_FACT a
LEFT JOIN BBOP.EP_CODE_WORK b
ON a.Goal_Steps = b.code
WHERE plan_date BETWEEN '01-mar-2019' AND '31-may-2019'
AND code_type LIKE 'Goal%';

I think you want:
select mf.*, coalesce(cwh.CODE_DESCRIPTION, cwg.CODE_DESCRIPTION) as highest_grade
from BBOP.EP_MAIN_FACT mf left join
BBOP.EP_CODE_WORK cwh
on mf.HIGHESTGRADE_CA = cwh.code and
cwh.code_type like 'High%' left join
BBOP.EP_CODE_WORK cwg
on mf.Goal_Steps = cwg.code and
cwg.code_type like 'Goal%'
where mf.plan_date >= date '2019-03-01' and
mf.plan_date < date '2019-06-01';
Notes:
In a query that references multiple tables, qualify all column references. This is the root of your problem. You have "bare" column references in the on clauses.
Use meaningful table aliases, rather than arbitrary letters.
The coalesce() chooses the values based on the priority order of the joins.
Oracle supports the date keyword to introduce date literals. This is safer than relying on default formats which may change on a given server.
between is dangerous for dates, particularly in Oracle where the date type always has a time component. Inequalities capture the logic.

As someone has already stated in comments, the columns plan_date and code_type need to have table alias specified in their usage.
I am guessing at least one of these columns is in BBOP.EP_CODE_WORK table and hence the table alias needs to be specified while referencing these colums.
Regards
Akash

SQL - Table Join To Compare NULL Values Where Join is NULL

I've been asked to basically write a report that displays data in two different databases and be able to see in either database if something is missing.
IE, the invoice number may exist in database1, but not in database2 and vice versa.
I've got the following query below but it only returns all the data from the second table, with NULL values for the first. I'd like to set it up to return the NULL Values in both, but I think the problem is because my join is on the values that can be NULL, so it won't return the values that exist in the first table and not the second.
Can someone step me through how to resolve an issue like this?
As far as I'm aware, I don't necessarily have any other tables to join unless I try to join more tables from each database.
Query:
Select TC.PO_Number, TC.Invoice_Date, TC.Invoice_, H.RefPoNum, H.InvoiceNum
From Table1 TC
RIGHT JOIN [SERVERNAME].[DBNAME].[TABLE2] H ON (TC.Invoice_ = H.InvoiceNum)
Where TC.Invoice_Date Between '2018-10-31' AND '2018-10-31'
AND H.Company Like 'COMPANY'

You can do what you want with a full join. Filtering is tricky with a full join, so I recommend subqueries:
select tc.PO_Number, tc.Invoice_Date, tc.Invoice_, h.RefPoNum, h.InvoiceNum
From (select tc.*
from Table1 tc
where tc.Invoice_Date Between '2018-10-31' AND '2018-10-31'
) tc full join
(select h.*
from [SERVERNAME].[DBNAME].[TABLE2] h
where h.Company Like 'COMPANY'
) h
on TC.Invoice_ = H.InvoiceNum;

Just make sure that the column you are comparing has the same data type and you can safely use this query below:
Server01 NOT IN Server02
select t1.InvoiceNumber from server01.dbo.Invoice t1
except
select t2.InvoiceNumber from server02.dbo.Invoice t2
Server02 NOT IN Server01
select t1.InvoiceNumber from server02.dbo.Invoice t1
except
select t2.InvoiceNumber from server01.dbo.Invoice t2
P.S.
While this may not be the exact query you are looking for, but this template may help.

SQL Query to count the records

I am making up a SQL query which will get all the transaction types from one table, and from the other table it will count the frequency of that transaction type.
My query is this:
with CTE as
(
select a.trxType,a.created,b.transaction_key,b.description,a.mode
FROM transaction_data AS a with (nolock)
RIGHT JOIN transaction_types b with (nolock) ON b.transaction_key = a.trxType
)
SELECT COUNT (trxType) AS Frequency, description as trxType,mode
from CTE where created >='2017-04-11' and created <= '2018-04-13'
group by trxType ,description,mode
The transaction_types table contains all the types of transactions only and transaction_data contains the transactions which have occurred.
The problem I am facing is that even though it's the RIGHT join, it does not select all the records from the transaction_types table.
I need to select all the transactions from the transaction_types table and show the number of counts for each transaction, even if it's 0.
Please help.

LEFT JOIN is so much easier to follow.
I think you want:
select tt.transaction_key, tt.description, t.mode, count(t.trxType)
from transaction_types tt left join
transaction_data t
on tt.transaction_key = t.trxType and
t.created >= '2017-04-11' and t.created <= '2018-04-13'
group by tt.transaction_key, tt.description, t.mode;
Notes:
Use reasonable table aliases! a and b mean nothing. t and tt are abbreviations of the table name, so they are easier to follow.
t.mode will be NULL for non-matching rows.
The condition on dates needs to be in the ON clause. Otherwise, the outer join is turned into an inner join.
LEFT JOIN is easier to follow (at least for people whose native language reads left-to-right) because it means "keep all the rows in the table you have already read".

SQL query not returning Null-value records

I am using SQL Server 2014 on a Windows 10 PC. I am sending SQL queries directly into Swiftpage’s Act! CRM system (via Topline Dash). I am trying to figure out how to get the query to give me records even when some of the records have certain Null values in the Opportunity_Name field.
I am using a series of Join statements in the query to connect 4 tables: History, Contacts, Opportunity, and Groups. History is positioned at the “center” of it all. They all have many-to-many relationships with each other, and are thus each linked by an intermediate table that sits “between” the main tables, like so:
History – Group_History – Group
History – Contact_History – Contact
History – Opportunity_History – Opportunity
The intermediate tables consist only of the PKs in each of the main tables. E.g. History_Group is only a listing of HistoryIDs and GroupIDs. Thus, any given History entry can have multiple Groups, and each Group has many Histories associated with it.
Here’s what the whole SQL statement looks like:
SELECT Group_Name, Opportunity_Name, Start_Date_Time, Contact.Contact, Contact.Company, History_Type, (SQRT(SQUARE(Duration))/60) AS Hours, Regarding, HistoryID
FROM HISTORY
JOIN Group_History
ON Group_History.HistoryID = History.HistoryID
JOIN "Group"
ON Group_History.GroupID = "Group".GroupID
JOIN Contact_History
ON Contact_History.HistoryID = History.HistoryID
JOIN Contact
ON Contact_History.ContactID = Contact.ContactID
JOIN Opportunity_History
ON Opportunity_History.HistoryID = History.HistoryID
JOIN Opportunity
ON Opportunity_History.OpportunityID = Opportunity.OpportunityID
WHERE
( Start_Date_Time >= ('2018/02/02') AND
Start_Date_Time <= ('2018/02/16') )
ORDER BY Group_NAME, START_DATE_TIME;
The problem is that when the Opportunity table is linked in, any record that has no Opportunity (i.e. a Null value) won’t show up. If you remove the Opportunity references in the Join statement, the listing will show all history events in the Date range just fine, the way I want it, whether or not they have an Opportunity associated with them.
I tried adding the following to the WHERE part of the statement, and it did not work.
AND ( ISNULL(Opportunity_Name, 'x') = 'x' OR
ISNULL(Opportunity_Name, 'x') <> 'x' )
I also tried changing the Opportunity_Name reference up in the SELECT part of the statement to read: ISNULL(Opportunity_Name, 'x') – this didn’t work either.
Can anyone suggest a way to get the listing to contain all records regardless of whether they have a Null value in the Opportunity Name or not? Many thanks!!!

I believe this is because a default JOIN statement discards unmatched rows from both tables. You can fix this by using LEFT JOIN.
Example:
CREATE TABLE dataframe (
A int,
B int
);
insert into dataframe (A,B) values
(1, null),
(null, 1)
select a.A from dataframe a
join dataframe b ON a.A = b.A
select a.A from dataframe a
left join dataframe b ON a.A = b.A
You can see that the first query returns only 1 record, while the second returns both.
SELECT Group_Name, Opportunity_Name, Start_Date_Time, Contact.Contact, Contact.Company, History_Type, (SQRT(SQUARE(Duration))/60) AS Hours, Regarding, HistoryID
FROM HISTORY
LEFT JOIN Group_History
ON Group_History.HistoryID = History.HistoryID
LEFT JOIN "Group"
ON Group_History.GroupID = "Group".GroupID
LEFT JOIN Contact_History
ON Contact_History.HistoryID = History.HistoryID
LEFT JOIN Contact
ON Contact_History.ContactID = Contact.ContactID
LEFT JOIN Opportunity_History
ON Opportunity_History.HistoryID = History.HistoryID
LEFT JOIN Opportunity
ON Opportunity_History.OpportunityID = Opportunity.OpportunityID
WHERE
( Start_Date_Time >= ('2018/02/02') AND
Start_Date_Time <= ('2018/02/16') )
ORDER BY Group_NAME, START_DATE_TIME;

You will want to make sure you are using a LEFT JOIN with the table Opportunity. This will keep records that do not relate to records in the Opportunity table.
Also, BE CAREFUL you do not filter records using the WHERE clause for the Opportunity table being LEFT JOINED. Include those filter conditions relating to Opportunity instead in the LEFT JOIN ... ON clause.

How to optimize the query? t-sql

This query works about 3 minutes and returns 7279 rows:
SELECT identity(int,1,1) as id, c.client_code, a.account_num,
c.client_short_name, u.uso, us.fio, null as new, null as txt
INTO #ttable
FROM accounts a INNER JOIN Clients c ON
c.id = a.client_id INNER JOIN Uso u ON c.uso_id = u.uso_id INNER JOIN
Magazin m ON a.account_id = m.account_id LEFT JOIN Users us ON
m.user_id = us.user_id
WHERE m.status_id IN ('1','5','9') AND m.account_new_num is null
AND u.branch_id = #branch_id
ORDER BY c.client_code;
The type of 'client_code' field is VARCHAR(6).
Is it possible to somehow optimize this query?

Insert the records in the Temporary table without using Order by Clause and then Sort them using the c.client_code. Hope it should help you.
Create table #temp
(
your columns...
)
and Insert the records in this table Without Using the Order by Clause. Now run the select with Order by Clause

Do you have indexes set up for your tables? An index on foreign key columns as well as Magazin.status might help.

Make sure there is an index on every field used in the JOINs and in the WHERE clause
If one or the tables you select from are actually views, the problem may be in the performance of these views.

Always try to list tables earlier if they are referenced in the where clause - it cuts off row combinations as early as possible. In this case, the Magazin table has some predicates in the where clause, but is listed way down in the tables list. This means that all the other joins have to be made before the Magazin rows can be filtered - possibly millions of extra rows.
Try this (and let us know how it went):
SELECT ...
INTO #ttable
FROM accounts a
INNER JOIN Magazin m ON a.account_id = m.account_id
INNER JOIN Clients c ON c.id = a.client_id
INNER JOIN Uso u ON c.uso_id = u.uso_id
LEFT JOIN Users us ON m.user_id = us.user_id
WHERE m.status_id IN ('1','5','9')
AND m.account_new_num is null
AND u.branch_id = #branch_id
ORDER BY c.client_code;
This kind of optimization can greatly improve query performance.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas