SQL Server JOINS - sql

Can someone help explain to me how when I have 12 rows in table A and 10 in B and I do an inner join , I would get more rows than
in both A and B ?
Same with left and right joins...
This is just a simplified example. Let me share one of my issues with you
I have 2 views ; which was originally SQL on 2 base tables Culture and Trials.
And then when attempting to add another table Culture Steps, one of the team members separated the SQL into 2 views
Since this produces an error when updating(modification cannot be done as it affects multiple base tables), I would like to get
back to changing the SQL such that I no longer use the views but achieve the same results.
One of the views has
SELECT some columns
FROM dbo.Culture RIGHT JOIN
dbo.Trial ON dbo.Culture.cultureID = dbo.Trial.CultureID LEFT OUTER JOIN
dbo.TrialCultureSteps_view_part1 ON dbo.Culture.cultureID = dbo.TrialCultureSteps_view_part1.cultureID
The other TrialCultureSteps_view_part1 view
SELECT DISTINCT dbo.Culture.cultureID,
(SELECT TOP (1) WeekNr
FROM dbo.CultureStep
WHERE (CultureID = dbo.Culture.cultureID)
ORDER BY CultureStepID) AS normalstartweek
FROM dbo.Culture INNER JOIN
dbo.CultureStep AS CultureStep_1 ON dbo.Culture.cultureID = CultureStep_1.CultureID
So how can I combine the joins the achieve the same results using SQL only on tables without the need for views?

Welcome to StackOverflow! This link might be a good place to start in your understanding of JOINs. Essentially, the 'problem' you describe boils down to the fact that one or more of your sources (Trial, Culture, or the TrialCultureSteps view) has more than one record per CultureID - in other words, the same CultureID (#1) shows up on multiple rows.
Based solely on that ID, I'd execute the following three queries. Anything that is returned by them is the 'cause' of your duplications - the culture ID shows up more than once, so you'll have to JOIN on more than just CultureID. If, as I half-suspect, your view is the one that has multiple Culture IDs, you'll need to modify it to only return one record, or change the way that you JOIN to it.
SELECT *
FROM Trial
WHERE CultureID IN
(
SELECT CultureID
FROM Trial
GROUP BY CultureID
HAVING COUNT(*) > 1
)
ORDER BY CultureID
SELECT *
FROM Culture
WHERE CultureID IN
(
SELECT CultureID
FROM Culture
GROUP BY CultureID
HAVING COUNT(*) > 1
)
ORDER BY CultureID
SELECT *
FROM TrialCultureSteps_view_part1
WHERE CultureID IN
(
SELECT CultureID
FROM TrialCultureSteps_view_part1
GROUP BY CultureID
HAVING COUNT(*) > 1
)
ORDER BY CultureID
Let me know if any of these return values!

The comments explain the JOIN issues. As for rewriting, any views could be replaced with CTEs.
One other way to rewrite the query, would be : (Though having sample data and expected result would make this easier to confirm that it's correct)
;with TrialCultureSteps_view_part1 AS
(
Select Row_number() OVER (Partition BY CultureID ORDER BY CultureStepID) RowNumber
, WeekNr
, CultureID
)
SELECT some columns
dbo.trial LEFT OUTER JOIN
dbo.Culture ON dbo.Culture.cultureID = dbo.Trial.CultureID LEFT OUTER JOIN
TrialCultureSteps_view_part1 ON dbo.Culture.cultureID = dbo.TrialCultureSteps_view_part1.cultureID and RowNumber=1
Access code, I'm less familiar with the syntax, but I know that Row_Number() isn't available and I don't believe it has CTE syntax either. So, we'd need to put in some more nested derived tables.
SELECT some columns
dbo.trial LEFT OUTER JOIN
dbo.Culture ON dbo.Culture.cultureID = dbo.Trial.CultureID LEFT OUTER JOIN
( Select cs.CultureID, cs.WeekNr FROM
( SELECT CultureID, MIN(CultureStepID) CultureStepID
FROM dbo.CultureStep
GROUP BY CultureID
) Fcs INNER JOIN
CultureStep cs ON fcs.cultureStepID=cs.CultureStepID
) TrialCultureSteps_view_part1 ON dbo.Culture.cultureID = TrialCultureSteps_view_part1.cultureID
Assumptions here, is that CultureStepID is a PK for CultureStep. No assumption that a step must exist for each Culture entry.

Related

How do I get all values for Store_ID pulled into my Snowflake Query?

I have a query below and am trying to get all the week_id's, upc_id's and upc_dsc's pulled in even if there is no net_amt or item_qty for them. I'm successfully pulling in all stores, but I also want to show a upc and week id for these stores so that they can see if they have 0 sales for a upc. I tried doing a right join with my date table under the right join of the s table as well as a right join for the upc table, but it messes up my data and doesn't pull in the columns I need. Does anyone know how to fix this?
Thank you
select
a.week_id
, s.district_cd
, s.store_id
, a.upc_id
, a.upc_dsc
, sum(a.net_amt) as current_week_sales
, sum(t.net_amt) as last_week_sales
, sum(a.item_qty) as current_week_units
, sum(t.item_qty) as last_week_units
from (
select
week_id
, district_cd
, str.store_id
, txn.upc_id
, upc_dsc
, dense_rank() over (order by week_id) as rank
, sum(txn.net_amt) as net_amt
, sum(txn.item_qty) as item_qty
from dw_dss.txn_facts txn
left join dw_dss.lu_store_finance_om STR
on str.store_id = txn.store_id
join dw_dss.lu_upc upc
on upc.upc_id = txn.upc_id
join lu_day_merge day
on day.d_date = txn.txn_dte
where district_cd in (72,73)
and txn.upc_id in (27610100000
,27610200000
,27610300000
,27610400000
)
and division_id = 19
and txn_dte between '2021-07-25' and current_date - 1
group by 1,2,3,4,5
) a
left join temp_tables.ab_week_ago t
on t.rank = a.rank and a.store_id = t.store_id and a.upc_id = t.upc_id
right join dw_dss.lu_store_finance_om s
on s.store_id = a.store_id
where s.division_id = 19
and s.district_cd in (72,73)
group by 1,2,3,4,5
As stated in a previous comment, the example is too long to debug, especially since the source tables are not provided.
However, as a general rule, when adding zeroes for missing dimensions, I follow these steps:
Construct the main query, this is the query with all the complexity
that pulls the data you need - just the available data, without
missing dimensions; test this query to make sure it gives correct
results, aggregated correctly by each dimension
Then use this query as a CTE in a WITH statement and to this query, you can right join all dimensions for which you want to add zero values for missing data
Be sure to double check filtering on the dimensions to ensure that you don't filter out too much in your WHERE conditions, for example, instead of filtering with WHERE on the final query, like in your example:
right join dw_dss.lu_store_finance_om s
on s.store_id = a.store_id
where s.division_id = 19
and s.district_cd in (72,73)
I might rather filter the dimension itself in a subquery:
right join (select store_id from dw_dss.lu_store_finance_om
where s.division_id = 19 and s.district_cd in (72,73)) s
on s.store_id = a.store_id
I have a query below and am trying to get all the week_id's, upc_id's and upc_dsc's pulled in even if there is no net_amt or item_qty for them.
You want to generate the rows that you want using cross join and then use left join to bring in the the data you want. In your case, you also want aggregation.
You have not explained the tables, and I find your query quite hard to follow. But the idea is:
select c.weekid, c.store_id, c.upc_id,
count(f.dayid) as num_sales,
sum(f.net_amt) as total_amt
from calendar c cross join
stores s cross join
upcs u left join
facts f
using (dayid, store_id, upc_id) -- or whatever the right conditions are
group by c.weekid, c.store_id, c.upc_id;
Obviously, you have additional filters. You would apply these filters in the where clause to the dimension tables (or use a subquery if the logic is more complicated).

SQL Left Join - OR clause

I am trying to join two tables. I want to join where all the three identifiers (Contract id, company code and book id) are a match in both tables, if not match using contract id and company code and the last step is to just look at contract id
Can the task be performed wherein you join using all three parameters, if does not, check the two parameters and then just the contract id ?
Code:
SELECT *
INTO #prem_claim_wtauto_test
FROM #contract_detail A
LEFT JOIN #claim_total C
ON ( ( C.contract_id_2 = A.contract_id
AND C.company_cd_2 = A.company_cd
AND C.book_id_2 = A.book_id )
OR ( C.contract_id_2 = A.contract_id
AND C.company_cd_2 = A.company_cd )
OR ( C.contract_id_2 = A.contract_id ) )
Your ON clause boils down to C.contract_id_2 = A.contract_id. This gets you all matches, no matter whether the most precise match including company and book or a lesser one. What you want is a ranking. Two methods come to mind:
Join on C.contract_id_2 = A.contract_id, then rank the rows with ROW_NUMBER and keep the best ranked ones.
Use a lateral join in order to only join the best match with TOP.
Here is the second option. You forgot to tell us which DBMS you are using. SELECT INTO looks like SQL Server. I hope I got the syntax right:
SELECT *
INTO #prem_claim_wtauto_test
FROM #contract_detail A
OUTER APPLY
(
SELECT TOP(1) *
FROM #claim_total C
WHERE C.contract_id_2 = A.contract_id
ORDER BY
CASE
WHEN C.company_cd_2 = A.company_cd AND C.book_id_2 = A.book_id THEN 1
WHEN C.company_cd_2 = A.company_cd THEN 2
ELSE 3
END
);
If you want to join all rows in case of ties (e.g. many rows matching contract, company and book), then make this TOP(1) WITH TIES.

How to get a result set containing the absence of a value?

Scenario: Have a table with four columns. District_Number, District_name, Data_Collection_Week, enrollments. Each week we get data, BUT sometimes we do not.
Task: My supervisor wants me to produce a query that will let us know, which districts did not submit a given week.
What I have tried is below, but I cannot get a NULL value on those that did not submit a week.
SELECT DISTINCT DistrictNumber, DistrictName, DataCollectionWeek
into #test4
FROM EDW_REQUESTS.INSTRUCTION_DELIVERY_ENROLLMENT_2021
order by DistrictNumber, DataCollectionWeek asc
select DISTINCT DataCollectionWeek
into #test5
from EDW_REQUESTS.INSTRUCTION_DELIVERY_ENROLLMENT_2021
order by DataCollectionWeek
select b.DistrictNumber, b.DistrictName, b.DataCollectionWeek
from #test5 a left outer join #test4 b on (a.DataCollectionWeek = b.DataCollectionWeek)
order by b.DistrictNumber, b.DataCollectionWeek asc
One option uses a cross join of two select distinct subqueries to generate all possible combinations of districts and weeks, and then not exists to identify those that are not available in the table:
select d.districtnumber, w.datacollectionweek
from (select distinct districtnumber from edw_requests.instruction_delivery_enrollment_2021) d
cross join (select distinct datacollectionweek from edw_requests.instruction_delivery_enrollment_2021) w
where not exists (
select 1
from edw_requests.instruction_delivery_enrollment_2021 i
where i.districtnumber = d.districtnumber and i.datacollectionweek = w.datacollectionweek
)
This would be simpler (and much more efficient) if you had referential tables to store the districts and weeks: you would then use them directly instead of the select distinct subqueries.

How to find the most frequent value in a select statement as a subquery?

I am trying to get the most frequent Zip_Code for the Location ID from table B. Table A(transaction) has one A.zip_code per Transaction but table B(Location) has multiple Zip_code for one area or City. I am trying to get the most frequent B.Zip_Code for the Account using Location_D that is present in both table.I have simplified my code and changed the names of the columns for easy understanding but this is the logic for my query I have so far.Any help would be appreciated. Thanks in advance.
Select
A.Account_Number,
A.Utility_Type,
A.Sum(usage),
A.Sum(Cost),
A.Zip_Code,
( select B.zip_Code from B where A.Location_ID= B.Location_ID having count(*)= max(count(B.Zip_Code)) as Location_Zip_Code,
A.Transaction_Date
From
Transaction_Table as A Left Join
Location Table as B On A.Location_ID= B.Location_ID
Group By
A.Account_Number,
A.Utility_Type,
A.Zip_Code,
A.Transaction_Date
This is what I come up with:
Select tt.Account_Number, tt.Utility_Type, Sum(tt.usage), Sum(tt.Cost),
tt.Zip_Code,
(select TOP 1 l.zip_Code
Location_Table l
where tt.Location_ID = l.Location_ID
group by l.zip_code
order by count(*) desc
) as Location_Zip_Code,
tt.Transaction_Date
From Transaction_Table tt
Group By tt.Account_Number, tt.Utility_Type, tt.Zip_Code, tt.Transaction_Date;
Notes:
Table aliases are a good thing. However, they should be abbreviations for the tables referenced, rather than arbitrary letters.
The table alias qualifies the column name, not the function. Hence sum(tt.usage) rather than tt.sum(usage).
There is no need for a join in the outer query. You are doing all the work in the subquery.
An order by with top seems the way to go to get the most common zip code (which, incidentally, is called the mode in statistics).

Filter a SQL Server table dynamically using multiple joins

I am trying to filter a single table (master) by the values in multiple other tables (filter1, filter2, filter3 ... filterN) using only joins.
I want the following rules to apply:
(A) If one or more rows exist in a filter table, then include only those rows from the master that match the values in the filter table.
(B) If no rows exist in a filter table, then ignore it and return all the rows from the master table.
(C) This solution should work for N filter tables in combination.
(D) Static SQL using JOIN syntax only, no Dynamic SQL.
I'm really trying to get rid of dynamic SQL wherever possible, and this is one of those places I truly think it's possible, but just can't quite figure it out. Note: I have solved this using Dynamic SQL already, and it was fairly easy, but not particularly efficient or elegant.
What I have tried:
Various INNER JOINS between master and filter tables - works for (A) but fails on (B) because the join removes all records from the master (left) side when the filter (right) side has no rows.
LEFT JOINS - Always returns all records from the master (left) side. This fails (A) when some filter tables have records and some do not.
What I really need:
It seems like what I need is to be able to INNER JOIN on each filter table that has 1 or more rows and LEFT JOIN (or not JOIN at all) on each filter table that is empty.
My question: How would I accomplish this without resorting to Dynamic SQL?
In SQL Server 2005+ you could try this:
WITH
filter1 AS (
SELECT DISTINCT
m.ID,
HasMatched = CASE WHEN f.ID IS NULL THEN 0 ELSE 1 END,
AllHasMatched = MAX(CASE WHEN f.ID IS NULL THEN 0 ELSE 1 END) OVER ()
FROM masterdata m
LEFT JOIN filtertable1 f ON join_condition
),
filter2 AS (
SELECT DISTINCT
m.ID,
HasMatched = CASE WHEN f.ID IS NULL THEN 0 ELSE 1 END,
AllHasMatched = MAX(CASE WHEN f.ID IS NULL THEN 0 ELSE 1 END) OVER ()
FROM masterdata m
LEFT JOIN filtertable2 f ON join_condition
),
…
SELECT m.*
FROM masterdata m
INNER JOIN filter1 f1 ON m.ID = f1.ID AND f1.HasMatched = f1.AllHasMatched
INNER JOIN filter2 f2 ON m.ID = f2.ID AND f2.HasMatched = f2.AllHasMatched
…
My understanding is, filter tables without any matches simply must not affect the resulting set. The output should only consist of those masterdata rows that have matched all the filters where matches have taken place.
SELECT *
FROM master_table mt
WHERE (0 = (select count(*) from filter_table_1)
OR mt.id IN (select id from filter_table_1)
AND (0 = (select count(*) from filter_table_2)
OR mt.id IN (select id from filter_table_2)
AND (0 = (select count(*) from filter_table_3)
OR mt.id IN (select id from filter_table_3)
Be warned that this could be inefficient in practice. Unless you have a specific reason to kill your existing, working, solution, I would keep it.
Do inner join to get results for (A) only and do left join to get results for (B) only (you will have to put something like this in the where clause: filterN.column is null) combine results from inner join and left join with UNION.
Left Outer Join - gives you the MISSING entries in master table ....
SELECT * FROM MASTER M
INNER JOIN APPRENTICE A ON A.PK = M.PK
LEFT OUTER JOIN FOREIGN F ON F.FK = M.PK
If FOREIGN has keys that is not a part of MASTER you will have "null columns" where the slots are missing
I think that is what you looking for ...
Mike
First off, it is impossible to have "N number of Joins" or "N number of filters" without resorting to dynamic SQL. The SQL language was not designed for dynamic determination of the entities against which you are querying.
Second, one way to accomplish what you want (but would be built dynamically) would be something along the lines of:
Select ...
From master
Where Exists (
Select 1
From filter_1
Where filter_1 = master.col1
Union All
Select 1
From ( Select 1 )
Where Not Exists (
Select 1
From filter_1
)
Intersect
Select 1
From filter_2
Where filter_2 = master.col2
Union All
Select 1
From ( Select 1 )
Where Not Exists (
Select 1
From filter_2
)
...
Intersect
Select 1
From filter_N
Where filter_N = master.colN
Union All
Select 1
From ( Select 1 )
Where Not Exists (
Select 1
From filter_N
)
)
I have previously posted a - now deleted - answer based on wrong assumptions on you problems.
But I think you could go for a solution where you split your initial search problem into a matter of constructing the set of ids from the master table, and then select the data joining on that set of ids. Here I naturally assume you have a kind of ID on your master table. The filter tables contains the filter values only. This could then be combined into the statement below, where each SELECT in the eligble subset provides a set of master ids, these are unioned to avoid duplicates and that set of ids are joined to the table with data.
SELECT * FROM tblData INNER JOIN
(
SELECT id FROM tblData td
INNER JOIN fa on fa.a = td.a
UNION
SELECT id FROM tblData td
INNER JOIN fb on fb.b = td.b
UNION
SELECT id FROM tblData td
INNER JOIN fc on fc.c = td.c
) eligible ON eligible.id = tblData.id
The test has been made against the tables and values shown below. These are just an appendix.
CREATE TABLE tblData (id int not null primary key identity(1,1), a varchar(40), b datetime, c int)
CREATE TABLE fa (a varchar(40) not null primary key)
CREATE TABLE fb (b datetime not null primary key)
CREATE TABLE fc (c int not null primary key)
Since you have filter tables, I am assuming that these tables are probably dynamically populated from a front-end. This would mean that you have these tables as #temp_table (or even a materialized table, doesn't matter really) in your script before filtering on the master data table.
Personally, I use the below code bit for filtering dynamically without using dynamic SQL.
SELECT *
FROM [masterdata] [m]
INNER JOIN
[filter_table_1] [f1]
ON
[m].[filter_column_1] = ISNULL(NULLIF([f1].[filter_column_1], ''), [m].[filter_column_1])
As you can see, the code NULLs the JOIN condition if the column value is a blank record in the filter table. However, the gist in this is that you will have to actively populate the column value to blank in case you do not have any filter records on which you want to curtail the total set of the master data. Once you have populated the filter table with a blank, the JOIN condition NULLs in those cases and instead joins on itself with the same column from the master data table. This should work for all the cases you mentioned in your question.
I have found this bit of code to be faster in terms of performance.
Hope this helps. Please let me know in the comments.