Difference between 2 queries? - sql

The 1st query returns 27384 rows. The 2nd query returns 142899 rows. Can someone please explain what is happening with the RIGHT JOIN and LEFT JOIN that is causing the output difference?
1st query :
SELECT u.id AS id,
MIN(q.creation_date) AS q_creation_date,
MIN(a.creation_date) AS a_creation_date
FROM `bigquery-public-data.stackoverflow.posts_questions`AS q
FULL JOIN `bigquery-public-data.stackoverflow.posts_answers` AS a
ON q.owner_user_id = a.owner_user_id
LEFT JOIN `bigquery-public-data.stackoverflow.users` AS u
ON q.owner_user_id = u.id
WHERE u.creation_date >= '2019-01-01'
and u.creation_date < '2019-02-01'
GROUP BY id
2nd query :
SELECT u.id AS id,
MIN(q.creation_date) AS q_creation_date,
MIN(a.creation_date) AS a_creation_date
FROM `bigquery-public-data.stackoverflow.posts_questions` AS q
FULL JOIN `bigquery-public-data.stackoverflow.posts_answers` AS a
ON q.owner_user_id = a.owner_user_id
RIGHT JOIN `bigquery-public-data.stackoverflow.users` AS u
ON q.owner_user_id = u.id
WHERE u.creation_date >= '2019-01-01' and u.creation_date < '2019-02-01'
GROUP BY id
I expected the result from the 1st query to be 142899 rows but I don't know why the LEFT JOIN returns a massively different result.

The recordset produced by the 1st query includes ALL records of 'q' AND ALL records of 'a' (and where either table doesn't have data to match, the database will fill those empty cells with nulls) BUT LIMITED TO records where both 'q' and 'u' have a match.
So, in the 1st query, the recordset is basically limited by the rows in 'u'. The query will never return more than the maximum number of rows in 'u'.
The recordset produced by the 2nd query includes ALL records of 'q' AND ALL records of 'a' (and where either table doesn't have data to match, the database will fill those empty cells with nulls) AND ALSO ALL records of 'u' (and where either table doesn't have data to match, the database will fill those empty cells with nulls).
So, the 2nd query may produce a recordset with as many rows as the largest table.

When you use RIGHT JOIN the table with priority is always the one on the right. Similarly LEFT JOIN prioritizes the table to the left of JOIN. Thus the number of rows is different in that the priority table has the search data and the non-priority table does not have the required combination. More details here.

Related

Left join in view with condition on the left table but still want all records

I have a date dimension table from which I want to left join to another table in order to show records (worker_availability) that exist for dates for the next couple weeks, for instance.
Date Dimension simply has every date for the next hundred years.
SELECT
dd.date_actual,
wa.worker_id,
string_agg(sh.name, ', ')
FROM public.worker_availability wa
LEFT JOIN public.d_date dd on wa.day = dd.date_actual --and wa.worker_id = '00000000-0000-0000-0000-000000000000'
LEFT JOIN public.shift sh on sh.shift_id = wa.shift_id
where
wa.worker_id = '00000000-0000-0000-0000-000000000000' AND
dd.date_actual >= NOW()
GROUP BY dd.date_actual, wa.worker_id
ORDER BY dd.date_actual asc
LIMIT 100
If I violate a principle of left join and use a where, then the results are incorrect.
If I add WHERE (worker_id = '00000000-0000-0000-0000-000000000000' OR wa.* IS NULL) then the results are still incorrect.
I want to see every date regardless of if there is a worker_availability record for the date.
The issue is that if I filter the left worker_availability left join (which works), then I can no longer make this query into a view and use it from EntityFramework because I cannot use the worker_id column in a where clause.

SQL Query to count the records

I am making up a SQL query which will get all the transaction types from one table, and from the other table it will count the frequency of that transaction type.
My query is this:
with CTE as
(
select a.trxType,a.created,b.transaction_key,b.description,a.mode
FROM transaction_data AS a with (nolock)
RIGHT JOIN transaction_types b with (nolock) ON b.transaction_key = a.trxType
)
SELECT COUNT (trxType) AS Frequency, description as trxType,mode
from CTE where created >='2017-04-11' and created <= '2018-04-13'
group by trxType ,description,mode
The transaction_types table contains all the types of transactions only and transaction_data contains the transactions which have occurred.
The problem I am facing is that even though it's the RIGHT join, it does not select all the records from the transaction_types table.
I need to select all the transactions from the transaction_types table and show the number of counts for each transaction, even if it's 0.
Please help.
LEFT JOIN is so much easier to follow.
I think you want:
select tt.transaction_key, tt.description, t.mode, count(t.trxType)
from transaction_types tt left join
transaction_data t
on tt.transaction_key = t.trxType and
t.created >= '2017-04-11' and t.created <= '2018-04-13'
group by tt.transaction_key, tt.description, t.mode;
Notes:
Use reasonable table aliases! a and b mean nothing. t and tt are abbreviations of the table name, so they are easier to follow.
t.mode will be NULL for non-matching rows.
The condition on dates needs to be in the ON clause. Otherwise, the outer join is turned into an inner join.
LEFT JOIN is easier to follow (at least for people whose native language reads left-to-right) because it means "keep all the rows in the table you have already read".

SQL query not returning Null-value records

I am using SQL Server 2014 on a Windows 10 PC. I am sending SQL queries directly into Swiftpage’s Act! CRM system (via Topline Dash). I am trying to figure out how to get the query to give me records even when some of the records have certain Null values in the Opportunity_Name field.
I am using a series of Join statements in the query to connect 4 tables: History, Contacts, Opportunity, and Groups. History is positioned at the “center” of it all. They all have many-to-many relationships with each other, and are thus each linked by an intermediate table that sits “between” the main tables, like so:
History – Group_History – Group
History – Contact_History – Contact
History – Opportunity_History – Opportunity
The intermediate tables consist only of the PKs in each of the main tables. E.g. History_Group is only a listing of HistoryIDs and GroupIDs. Thus, any given History entry can have multiple Groups, and each Group has many Histories associated with it.
Here’s what the whole SQL statement looks like:
SELECT Group_Name, Opportunity_Name, Start_Date_Time, Contact.Contact, Contact.Company, History_Type, (SQRT(SQUARE(Duration))/60) AS Hours, Regarding, HistoryID
FROM HISTORY
JOIN Group_History
ON Group_History.HistoryID = History.HistoryID
JOIN "Group"
ON Group_History.GroupID = "Group".GroupID
JOIN Contact_History
ON Contact_History.HistoryID = History.HistoryID
JOIN Contact
ON Contact_History.ContactID = Contact.ContactID
JOIN Opportunity_History
ON Opportunity_History.HistoryID = History.HistoryID
JOIN Opportunity
ON Opportunity_History.OpportunityID = Opportunity.OpportunityID
WHERE
( Start_Date_Time >= ('2018/02/02') AND
Start_Date_Time <= ('2018/02/16') )
ORDER BY Group_NAME, START_DATE_TIME;
The problem is that when the Opportunity table is linked in, any record that has no Opportunity (i.e. a Null value) won’t show up. If you remove the Opportunity references in the Join statement, the listing will show all history events in the Date range just fine, the way I want it, whether or not they have an Opportunity associated with them.
I tried adding the following to the WHERE part of the statement, and it did not work.
AND ( ISNULL(Opportunity_Name, 'x') = 'x' OR
ISNULL(Opportunity_Name, 'x') <> 'x' )
I also tried changing the Opportunity_Name reference up in the SELECT part of the statement to read: ISNULL(Opportunity_Name, 'x') – this didn’t work either.
Can anyone suggest a way to get the listing to contain all records regardless of whether they have a Null value in the Opportunity Name or not? Many thanks!!!
I believe this is because a default JOIN statement discards unmatched rows from both tables. You can fix this by using LEFT JOIN.
Example:
CREATE TABLE dataframe (
A int,
B int
);
insert into dataframe (A,B) values
(1, null),
(null, 1)
select a.A from dataframe a
join dataframe b ON a.A = b.A
select a.A from dataframe a
left join dataframe b ON a.A = b.A
You can see that the first query returns only 1 record, while the second returns both.
SELECT Group_Name, Opportunity_Name, Start_Date_Time, Contact.Contact, Contact.Company, History_Type, (SQRT(SQUARE(Duration))/60) AS Hours, Regarding, HistoryID
FROM HISTORY
LEFT JOIN Group_History
ON Group_History.HistoryID = History.HistoryID
LEFT JOIN "Group"
ON Group_History.GroupID = "Group".GroupID
LEFT JOIN Contact_History
ON Contact_History.HistoryID = History.HistoryID
LEFT JOIN Contact
ON Contact_History.ContactID = Contact.ContactID
LEFT JOIN Opportunity_History
ON Opportunity_History.HistoryID = History.HistoryID
LEFT JOIN Opportunity
ON Opportunity_History.OpportunityID = Opportunity.OpportunityID
WHERE
( Start_Date_Time >= ('2018/02/02') AND
Start_Date_Time <= ('2018/02/16') )
ORDER BY Group_NAME, START_DATE_TIME;
You will want to make sure you are using a LEFT JOIN with the table Opportunity. This will keep records that do not relate to records in the Opportunity table.
Also, BE CAREFUL you do not filter records using the WHERE clause for the Opportunity table being LEFT JOINED. Include those filter conditions relating to Opportunity instead in the LEFT JOIN ... ON clause.

SQL grouping. How to select row with the highest column value when joined. No CTEs please

I've been banging my head against the wall for something that I think should be simple but just cant get to work.
I'm trying to retrieve the row with the highest multi_flag value when I join table A and table B but I can't seem to get the SQL right because it returns all the rows rather than the one with the highest multi_flag value.
Here are my tables...
Table A
Table B
This is almost my desired result but only if I leave out the value_id row
SELECT CATALOG, VENDOR_CODE, INVLINK, NAME_ID, MAX(multi_flag) AS multiflag
FROM TBLINVENT_ATTRIBUTE AS A
INNER JOIN TBLATTRIBUTE_VALUE AS B
ON A.VALUE_ID = B.VALUE_ID
GROUP BY CATALOG, VENDOR_CODE, INVLINK, NAME_ID
ORDER BY CATALOG DESC
This is close to what I want to retreive but not quite notice how it returns unique name_id and the highest multi_flag but I also need the value_id that belongs to such multi_flag / name_id grouping...
If I include the value_id in my SQL statement then it returns all rows and is no longer grouped
Notic ein the results below how it no longer returns the row for the highest multi_flag and how all the different values for name_id (Ex. name_id 1) are also returned
You can choose to use a sub-query, derived table or CTE to solve this problem. Performance will be depending on the amount of data you are querying. To achieve your goal of getting the max multiflag you must first get the max value based on the grouping you want to achieve this you can use a CTE or sub query. The below CTE will give the max multi_flag by value that you can use to get the max multi_flag and then you can use that to join back to your other tables. I have three joins in this example but this can be reduce and as far a performance it may be better to use a subquery but you want know until you get the se the actual execution plans side by side.
;with highest_multi_flag as
(
select value_id, max(multi_flag) AS multiflag
FROM TBLINVENT_ATTRIBUTE
group by value_id
)
select A.CATALOG, a.VENDOR_CODE, a.INVLINK, b.NAME_ID,m.multiflag
from highest_multi_flag m
inner join TBLINVENT_ATTRIBUTE AS A on a.VALUE_ID =b. m.VALUE_ID
INNER JOIN TBLATTRIBUTE_VALUE AS B ON m.VALUE_ID = B.VALUE
You can use Lateral too, its an other solution
SELECT
A.CATALOG, A.VENDOR_CODE, A.INVLINK, B.NAME_ID, M.maxmultiflag
FROM TBLINVENT_ATTRIBUTE AS A
inner join lateral
(
select max(B.multi_flag) as maxmultiflag from TBLINVENT_ATTRIBUTE C
where A.VALUE_ID = C.VALUE_ID
) M on 1=1
INNER JOIN TBLATTRIBUTE_VALUE AS B ON M.maxmultiflag = B.VALUE

what is the difference between these two queries

I have written two queries to check the differences between two tables, both shown below. Query 2 shows me the correct results.
In each table there is one record that is not in the other. So I wanted a query which would show both these records, which Query 2 does. It show me the 90 rows where the tables match plus another 2 rows, one where a record is in tblIH but not in tblTempN and another record which is in tblTempN but not in tblIH.
Whereas Query 1 shows me only the 90 records where the tables match and one extra row where the record is in tblIH but not in tblTempN - it does NOT however show me the record in tblTempN which is not in tblIH - why? I thought using a full outer join would show me all records from both tables? I don't really understand the difference between the two queries as they seem the same to me?
Query 1
select coalesce(h.Sedol, nav.Sedol) Sedol,
coalesce(nav.Name, h.Name) Name,
isnull(h.Nominal, 0) - isnull(nav.Nominal, 0) NomDiff
from tblIH h full outer join tblTempN nav
on h.Sedol = nav.Sedol and h.Code = nav.Code
where h.FundCode = 'ABC' and h.DatePrice = '2015-03-20'
Query 2
;with hld as
(
select Sedol, Name, FX, Nominal from tblIH
where DatePrice = '2015-03-20' and FundCode = 'ABC'
), nav as
(
select Sedol, Name, Nominal from tblTempN
where DateAcc = '2015-03-20' and FundCode = 'ABC'
)
select coalesce(hld.Sedol, nav.Sedol) Sedol,
coalesce(nav.Name, hld.Name) Name,
isnull(hld.Nominal, 0) - isnull(nav.Nominal, 0) NomDiff
from hld full outer join nav
on hld.Sedol = nav.Sedol
In full outer join if you don't have satisfied condition field values from that table fetched as null
I suppose you missed to write conditions
nav.FundCode = 'ABC' and nav.DatePrice = '2015-03-20'
but apart from this you are missing one more fundamental that where clause will be applicable on the result from that full outer join.
So actually you are getting 90+1+1 out of full outer join but your where condition is filtering one record from this result because for one desired record h.FundCode and h.DatePrice value is NULL.
You can use NVL function while checking for these condition.