How to get a result set containing the absence of a value? - sql

Scenario: Have a table with four columns. District_Number, District_name, Data_Collection_Week, enrollments. Each week we get data, BUT sometimes we do not.
Task: My supervisor wants me to produce a query that will let us know, which districts did not submit a given week.
What I have tried is below, but I cannot get a NULL value on those that did not submit a week.
SELECT DISTINCT DistrictNumber, DistrictName, DataCollectionWeek
into #test4
FROM EDW_REQUESTS.INSTRUCTION_DELIVERY_ENROLLMENT_2021
order by DistrictNumber, DataCollectionWeek asc
select DISTINCT DataCollectionWeek
into #test5
from EDW_REQUESTS.INSTRUCTION_DELIVERY_ENROLLMENT_2021
order by DataCollectionWeek
select b.DistrictNumber, b.DistrictName, b.DataCollectionWeek
from #test5 a left outer join #test4 b on (a.DataCollectionWeek = b.DataCollectionWeek)
order by b.DistrictNumber, b.DataCollectionWeek asc

One option uses a cross join of two select distinct subqueries to generate all possible combinations of districts and weeks, and then not exists to identify those that are not available in the table:
select d.districtnumber, w.datacollectionweek
from (select distinct districtnumber from edw_requests.instruction_delivery_enrollment_2021) d
cross join (select distinct datacollectionweek from edw_requests.instruction_delivery_enrollment_2021) w
where not exists (
select 1
from edw_requests.instruction_delivery_enrollment_2021 i
where i.districtnumber = d.districtnumber and i.datacollectionweek = w.datacollectionweek
)
This would be simpler (and much more efficient) if you had referential tables to store the districts and weeks: you would then use them directly instead of the select distinct subqueries.

Related

BigQuery/Sql cross join question: exclude from a column

I got 2 tables:
One player id can engage with different features. One can engage any number of features in a month.
What I want to achieve is to have the table produced below:i want to have a non-engaging table that will have all features that are not engaged by a player id.
WITH feature_table as (select distinct
Month,
Feature
From Feature_list),
engaged_player as(
select
'Y' Engaged_YN,
Month,
Engaged_Feature
FROM ga_data
),
select
from engaged_player
cross join ?????
)
What I have thought is to probably use cross join first.
Given your sample data, try this approach:
SELECT
'N' AS Engaged_YN,
gd.Month,
gd.Player_ID,
fl.Feature AS Not_Engaged_Feature
FROM mydataset.ga_data gd
CROSS JOIN mydataset.Feature_list fl
WHERE fl.Feature != gd.Engaged_Feature
ORDER BY Player_ID, Not_Engaged_Feature
Output:

Subtracting values of columns from two different tables

I would like to take values from one table column and subtract those values from another column from another table.
I was able to achieve this by joining those tables and then subtracting both columns from each other.
Data from first table:
SELECT max_participants FROM courses ORDER BY id;
Data from second table:
SELECT COUNT(id) FROM participations GROUP BY course_id ORDER BY course_id;
Here is some code:
SELECT max_participants - participations AS free_places FROM
(
SELECT max_participants, COUNT(participations.id) AS participations
FROM courses
INNER JOIN participations ON participations.course_id = courses.id
GROUP BY courses.max_participants, participations.course_id
ORDER BY participations.course_id
) AS course_places;
In general, it works, but I was wondering, if there is some way to make it simplier or maybe my approach isn't correct and this code will not work in some conditions? Maybe it needs to be optimized.
I've read some information about not to rely on natural order of result set in databases and that information made my doubts to appear.
If you want the values per course, I would recommend:
SELECT c.id, (c.max_participants - COUNT(p.id)) AS free_places
FROM courses c LEFT JOIN
participations p
ON p.course_id = c.id
GROUP BY c.id, c.max_participants
ORDER BY 1;
Note the LEFT JOIN to be sure all courses are included, even those with no participants.
The overall number is a little tricker. One method is to use the above as a subquery. Alternatively, you can pre-aggregate each table:
select c.max_participants - p.num_participants
from (select sum(max_participants) as max_participants from courses) c cross join
(select count(*) as num_participants from participants from participations) p;

SUM a column count from two tables

I have this simple unioned query in SQL Server 2014 where I am getting counts of rows from each table, and then trying to add a TOTAL row at the bottom that will SUM the counts from both tables. I believe the problem is the LEFT OUTER JOIN on the last union seems to be only summing the totals from the first table
SELECT A.TEST_CODE, B.DIVISION, COUNT(*)
FROM ALL_USERS B, SIGMA_TEST A
WHERE B.DOMID = A.DOMID
GROUP BY A.TEST_CODE, B.DIVISION
UNION
SELECT E.TEST_CODE, F.DIVISION, COUNT(*)
FROM BETA_TEST E, ALL_USERS F
WHERE E.DOMID = F.DOMID
GROUP BY E.TEST_CODE, F.DIVISION
UNION
SELECT 'TOTAL', '', COUNT(*)
FROM (SIGMA_TEST A LEFT OUTER JOIN BETA_TEST E ON A.DOMID
= E.DOMID )
Here is a sample of the results I am getting:
I would expect the TOTAL row to display a result of 6 (2+1+3=6)
I would like to avoid using a Common Table Expression (CTE) if possible. Thanks in advance!
Since you are counting users with matching DOMIDs in the first two statements, the final statement also needs to include the ALL_USERS table. The final statement should be:
SELECT 'TOTAL', '', COUNT(*)
FROM ALL_USERS G LEFT OUTER JOIN
SIGMA_TEST H ON G.DOMID = H.DOMID
LEFT OUTER JOIN BETA_TEST I ON I.DOMID = G.DOMID
WHERE (H.TEST_CODE IS NOT NULL OR I.TEST_CODE IS NOT NULL)
I would consider doing a UNION ALL first then COUNT:
SELECT COALESCE(TEST_CODE, 'TOTAL'),
DIVISION,
COUNT(*)
FROM (
SELECT A.TEST_CODE, B.DIVISION
FROM ALL_USERS B
INNER JOIN SIGMA_TEST A ON B.DOMID = A.DOMID
UNION ALL
SELECT E.TEST_CODE, F.DIVISION
FROM BETA_TEST E
INNER JOIN ALL_USERS F ON E.DOMID = F.DOMID ) AS T
GROUP BY GROUPING SETS ((TEST_CODE, DIVISION ), ())
Using GROUPING SETS you can easily get the total, so there is no need to add a third subquery.
Note: I assume you want just one count per (TEST_CODE, DIVISION). Otherwise you have to also group on the source table as well, as in #Gareth's answer.
I think you can achieve this with a single query. It seems your test tables have similar structures, so you can union them together and join to ALL_USERS, finally, you can use GROUPING SETS to get the total
SELECT ISNULL(T.TEST_CODE, 'TOTAL') AS TEST_CODE,
ISNULL(U.DIVISION, '') AS DIVISION,
COUNT(*)
FROM ALL_USERS AS U
INNER JOIN
( SELECT DOMID, TEST_CODE, 'SIGNMA' AS SOURCETABLE
FROM SIGMA_TEST
UNION ALL
SELECT DOMID, TEST_CODE, 'BETA' AS SOURCETABLE
FROM BETA_TEST
) AS T
ON T.DOMID = U.DOMID
GROUP BY GROUPING SETS ((T.TEST_CODE, U.DIVISION, T.SOURCETABLE), ());
As an aside, the implicit join syntax you are using was replaced over a quarter of a century ago in ANSI 92. It is not wrong, but there seems to be little reason to continue to use it, especially when you are mixing and matching with explicit outer joins and implicit inner joins. Anyone else that might read your SQL will certainly appreciate consistency.

SQL Server JOINS

Can someone help explain to me how when I have 12 rows in table A and 10 in B and I do an inner join , I would get more rows than
in both A and B ?
Same with left and right joins...
This is just a simplified example. Let me share one of my issues with you
I have 2 views ; which was originally SQL on 2 base tables Culture and Trials.
And then when attempting to add another table Culture Steps, one of the team members separated the SQL into 2 views
Since this produces an error when updating(modification cannot be done as it affects multiple base tables), I would like to get
back to changing the SQL such that I no longer use the views but achieve the same results.
One of the views has
SELECT some columns
FROM dbo.Culture RIGHT JOIN
dbo.Trial ON dbo.Culture.cultureID = dbo.Trial.CultureID LEFT OUTER JOIN
dbo.TrialCultureSteps_view_part1 ON dbo.Culture.cultureID = dbo.TrialCultureSteps_view_part1.cultureID
The other TrialCultureSteps_view_part1 view
SELECT DISTINCT dbo.Culture.cultureID,
(SELECT TOP (1) WeekNr
FROM dbo.CultureStep
WHERE (CultureID = dbo.Culture.cultureID)
ORDER BY CultureStepID) AS normalstartweek
FROM dbo.Culture INNER JOIN
dbo.CultureStep AS CultureStep_1 ON dbo.Culture.cultureID = CultureStep_1.CultureID
So how can I combine the joins the achieve the same results using SQL only on tables without the need for views?
Welcome to StackOverflow! This link might be a good place to start in your understanding of JOINs. Essentially, the 'problem' you describe boils down to the fact that one or more of your sources (Trial, Culture, or the TrialCultureSteps view) has more than one record per CultureID - in other words, the same CultureID (#1) shows up on multiple rows.
Based solely on that ID, I'd execute the following three queries. Anything that is returned by them is the 'cause' of your duplications - the culture ID shows up more than once, so you'll have to JOIN on more than just CultureID. If, as I half-suspect, your view is the one that has multiple Culture IDs, you'll need to modify it to only return one record, or change the way that you JOIN to it.
SELECT *
FROM Trial
WHERE CultureID IN
(
SELECT CultureID
FROM Trial
GROUP BY CultureID
HAVING COUNT(*) > 1
)
ORDER BY CultureID
SELECT *
FROM Culture
WHERE CultureID IN
(
SELECT CultureID
FROM Culture
GROUP BY CultureID
HAVING COUNT(*) > 1
)
ORDER BY CultureID
SELECT *
FROM TrialCultureSteps_view_part1
WHERE CultureID IN
(
SELECT CultureID
FROM TrialCultureSteps_view_part1
GROUP BY CultureID
HAVING COUNT(*) > 1
)
ORDER BY CultureID
Let me know if any of these return values!
The comments explain the JOIN issues. As for rewriting, any views could be replaced with CTEs.
One other way to rewrite the query, would be : (Though having sample data and expected result would make this easier to confirm that it's correct)
;with TrialCultureSteps_view_part1 AS
(
Select Row_number() OVER (Partition BY CultureID ORDER BY CultureStepID) RowNumber
, WeekNr
, CultureID
)
SELECT some columns
dbo.trial LEFT OUTER JOIN
dbo.Culture ON dbo.Culture.cultureID = dbo.Trial.CultureID LEFT OUTER JOIN
TrialCultureSteps_view_part1 ON dbo.Culture.cultureID = dbo.TrialCultureSteps_view_part1.cultureID and RowNumber=1
Access code, I'm less familiar with the syntax, but I know that Row_Number() isn't available and I don't believe it has CTE syntax either. So, we'd need to put in some more nested derived tables.
SELECT some columns
dbo.trial LEFT OUTER JOIN
dbo.Culture ON dbo.Culture.cultureID = dbo.Trial.CultureID LEFT OUTER JOIN
( Select cs.CultureID, cs.WeekNr FROM
( SELECT CultureID, MIN(CultureStepID) CultureStepID
FROM dbo.CultureStep
GROUP BY CultureID
) Fcs INNER JOIN
CultureStep cs ON fcs.cultureStepID=cs.CultureStepID
) TrialCultureSteps_view_part1 ON dbo.Culture.cultureID = TrialCultureSteps_view_part1.cultureID
Assumptions here, is that CultureStepID is a PK for CultureStep. No assumption that a step must exist for each Culture entry.

Distinct on multi-columns in sql

I have this query in sql
select cartlines.id,cartlines.pageId,cartlines.quantity,cartlines.price
from orders
INNER JOIN
cartlines on(cartlines.orderId=orders.id)where userId=5
I want to get rows distinct by pageid ,so in the end I will not have rows with same pageid more then once(duplicate)
any Ideas
Thanks
Baaroz
Going by what you're expecting in the output and your comment that says "...if there rows in output that contain same pageid only one will be shown...," it sounds like you're trying to get the top record for each page ID. This can be achieved with ROW_NUMBER() and PARTITION BY:
SELECT *
FROM (
SELECT
ROW_NUMBER() OVER(PARTITION BY c.pageId ORDER BY c.pageID) rowNumber,
c.id,
c.pageId,
c.quantity,
c.price
FROM orders o
INNER JOIN cartlines c ON c.orderId = o.id
WHERE userId = 5
) a
WHERE a.rowNumber = 1
You can also use ROW_NUMBER() OVER(PARTITION BY ... along with TOP 1 WITH TIES, but it runs a little slower (despite being WAY cleaner):
SELECT TOP 1 WITH TIES c.id, c.pageId, c.quantity, c.price
FROM orders o
INNER JOIN cartlines c ON c.orderId = o.id
WHERE userId = 5
ORDER BY ROW_NUMBER() OVER(PARTITION BY c.pageId ORDER BY c.pageID)
If you wish to remove rows with all columns duplicated this is solved by simply adding a distinct in your query.
select distinct cartlines.id,cartlines.pageId,cartlines.quantity,cartlines.price
from orders
INNER JOIN
cartlines on(cartlines.orderId=orders.id)where userId=5
If however, this makes no difference, it means the other columns have different values, so the combinations of column values creates distinct (unique) rows.
As Michael Berkowski stated in comments:
DISTINCT - does operate over all columns in the SELECT list, so we
need to understand your special case better.
In the case that simply adding distinct does not cover you, you need to also remove the columns that are different from row to row, or use aggregate functions to get aggregate values per cartlines.
Example - total quantity per distinct pageId:
select distinct cartlines.id,cartlines.pageId, sum(cartlines.quantity)
from orders
INNER JOIN
cartlines on(cartlines.orderId=orders.id)where userId=5
If this is still not what you wish, you need to give us data and specify better what it is you want.