Real life example, when to use OUTER / CROSS APPLY in SQL - sql

I have been looking at CROSS / OUTER APPLY with a colleague and we're struggling to find real life examples of where to use them.
I've spent quite a lot of time looking at When should I use CROSS APPLY over INNER JOIN? and googling but the main (only) example seems pretty bizarre (using the rowcount from a table to determine how many rows to select from another table).
I thought this scenario may benefit from OUTER APPLY:
Contacts Table (contains 1 record for each contact)
Communication Entries Table (can contain a phone, fax, email for each contact)
But using subqueries, common table expressions, OUTER JOIN with RANK() and OUTER APPLY all seem to perform equally. I'm guessing this means the scenario isn't applicable to APPLY.
Please share some real life examples and help explain the feature!

Some uses for APPLY are...
1) Top N per group queries (can be more efficient for some cardinalities)
SELECT pr.name,
pa.name
FROM sys.procedures pr
OUTER APPLY (SELECT TOP 2 *
FROM sys.parameters pa
WHERE pa.object_id = pr.object_id
ORDER BY pr.name) pa
ORDER BY pr.name,
pa.name
2) Calling a Table Valued Function for each row in the outer query
SELECT *
FROM sys.dm_exec_query_stats AS qs
CROSS APPLY sys.dm_exec_query_plan(qs.plan_handle)
3) Reusing a column alias
SELECT number,
doubled_number,
doubled_number_plus_one
FROM master..spt_values
CROSS APPLY (SELECT 2 * CAST(number AS BIGINT)) CA1(doubled_number)
CROSS APPLY (SELECT doubled_number + 1) CA2(doubled_number_plus_one)
4) Unpivoting more than one group of columns
Assumes 1NF violating table structure....
CREATE TABLE T
(
Id INT PRIMARY KEY,
Foo1 INT, Foo2 INT, Foo3 INT,
Bar1 INT, Bar2 INT, Bar3 INT
);
Example using 2008+ VALUES syntax.
SELECT Id,
Foo,
Bar
FROM T
CROSS APPLY (VALUES(Foo1, Bar1),
(Foo2, Bar2),
(Foo3, Bar3)) V(Foo, Bar);
In 2005 UNION ALL can be used instead.
SELECT Id,
Foo,
Bar
FROM T
CROSS APPLY (SELECT Foo1, Bar1
UNION ALL
SELECT Foo2, Bar2
UNION ALL
SELECT Foo3, Bar3) V(Foo, Bar);

There are various situations where you cannot avoid CROSS APPLY or OUTER APPLY.
Consider you have two tables.
MASTER TABLE
x------x--------------------x
| Id | Name |
x------x--------------------x
| 1 | A |
| 2 | B |
| 3 | C |
x------x--------------------x
DETAILS TABLE
x------x--------------------x-------x
| Id | PERIOD | QTY |
x------x--------------------x-------x
| 1 | 2014-01-13 | 10 |
| 1 | 2014-01-11 | 15 |
| 1 | 2014-01-12 | 20 |
| 2 | 2014-01-06 | 30 |
| 2 | 2014-01-08 | 40 |
x------x--------------------x-------x
CROSS APPLY
There are many situation where we need to replace INNER JOIN with CROSS APPLY.
1. If we want to join 2 tables on TOP n results with INNER JOIN functionality
Consider if we need to select Id and Name from Master and last two dates for each Id from Details table.
SELECT M.ID,M.NAME,D.PERIOD,D.QTY
FROM MASTER M
INNER JOIN
(
SELECT TOP 2 ID, PERIOD,QTY
FROM DETAILS D
ORDER BY CAST(PERIOD AS DATE)DESC
)D
ON M.ID=D.ID
The above query generates the following result.
x------x---------x--------------x-------x
| Id | Name | PERIOD | QTY |
x------x---------x--------------x-------x
| 1 | A | 2014-01-13 | 10 |
| 1 | A | 2014-01-12 | 20 |
x------x---------x--------------x-------x
See, it generated results for last two dates with last two date's Id and then joined these records only in outer query on Id, which is wrong. To accomplish this, we need to use CROSS APPLY.
SELECT M.ID,M.NAME,D.PERIOD,D.QTY
FROM MASTER M
CROSS APPLY
(
SELECT TOP 2 ID, PERIOD,QTY
FROM DETAILS D
WHERE M.ID=D.ID
ORDER BY CAST(PERIOD AS DATE)DESC
)D
and forms he following result.
x------x---------x--------------x-------x
| Id | Name | PERIOD | QTY |
x------x---------x--------------x-------x
| 1 | A | 2014-01-13 | 10 |
| 1 | A | 2014-01-12 | 20 |
| 2 | B | 2014-01-08 | 40 |
| 2 | B | 2014-01-06 | 30 |
x------x---------x--------------x-------x
Here is the working. The query inside CROSS APPLY can reference the outer table, where INNER JOIN cannot do this(throws compile error). When finding the last two dates, joining is done inside CROSS APPLY ie, WHERE M.ID=D.ID.
2. When we need INNER JOIN functionality using functions.
CROSS APPLY can be used as a replacement with INNER JOIN when we need to get result from Master table and a function.
SELECT M.ID,M.NAME,C.PERIOD,C.QTY
FROM MASTER M
CROSS APPLY dbo.FnGetQty(M.ID) C
And here is the function
CREATE FUNCTION FnGetQty
(
#Id INT
)
RETURNS TABLE
AS
RETURN
(
SELECT ID,PERIOD,QTY
FROM DETAILS
WHERE ID=#Id
)
which generated the following result
x------x---------x--------------x-------x
| Id | Name | PERIOD | QTY |
x------x---------x--------------x-------x
| 1 | A | 2014-01-13 | 10 |
| 1 | A | 2014-01-11 | 15 |
| 1 | A | 2014-01-12 | 20 |
| 2 | B | 2014-01-06 | 30 |
| 2 | B | 2014-01-08 | 40 |
x------x---------x--------------x-------x
OUTER APPLY
1. If we want to join 2 tables on TOP n results with LEFT JOIN functionality
Consider if we need to select Id and Name from Master and last two dates for each Id from Details table.
SELECT M.ID,M.NAME,D.PERIOD,D.QTY
FROM MASTER M
LEFT JOIN
(
SELECT TOP 2 ID, PERIOD,QTY
FROM DETAILS D
ORDER BY CAST(PERIOD AS DATE)DESC
)D
ON M.ID=D.ID
which forms the following result
x------x---------x--------------x-------x
| Id | Name | PERIOD | QTY |
x------x---------x--------------x-------x
| 1 | A | 2014-01-13 | 10 |
| 1 | A | 2014-01-12 | 20 |
| 2 | B | NULL | NULL |
| 3 | C | NULL | NULL |
x------x---------x--------------x-------x
This will bring wrong results ie, it will bring only latest two dates data from Details table irrespective of Id even though we join with Id. So the proper solution is using OUTER APPLY.
SELECT M.ID,M.NAME,D.PERIOD,D.QTY
FROM MASTER M
OUTER APPLY
(
SELECT TOP 2 ID, PERIOD,QTY
FROM DETAILS D
WHERE M.ID=D.ID
ORDER BY CAST(PERIOD AS DATE)DESC
)D
which forms the following desired result
x------x---------x--------------x-------x
| Id | Name | PERIOD | QTY |
x------x---------x--------------x-------x
| 1 | A | 2014-01-13 | 10 |
| 1 | A | 2014-01-12 | 20 |
| 2 | B | 2014-01-08 | 40 |
| 2 | B | 2014-01-06 | 30 |
| 3 | C | NULL | NULL |
x------x---------x--------------x-------x
2. When we need LEFT JOIN functionality using functions.
OUTER APPLY can be used as a replacement with LEFT JOIN when we need to get result from Master table and a function.
SELECT M.ID,M.NAME,C.PERIOD,C.QTY
FROM MASTER M
OUTER APPLY dbo.FnGetQty(M.ID) C
And the function goes here.
CREATE FUNCTION FnGetQty
(
#Id INT
)
RETURNS TABLE
AS
RETURN
(
SELECT ID,PERIOD,QTY
FROM DETAILS
WHERE ID=#Id
)
which generated the following result
x------x---------x--------------x-------x
| Id | Name | PERIOD | QTY |
x------x---------x--------------x-------x
| 1 | A | 2014-01-13 | 10 |
| 1 | A | 2014-01-11 | 15 |
| 1 | A | 2014-01-12 | 20 |
| 2 | B | 2014-01-06 | 30 |
| 2 | B | 2014-01-08 | 40 |
| 3 | C | NULL | NULL |
x------x---------x--------------x-------x
Common feature of CROSS APPLY and OUTER APPLY
CROSS APPLY or OUTER APPLY can be used to retain NULL values when unpivoting, which are interchangeable.
Consider you have the below table
x------x-------------x--------------x
| Id | FROMDATE | TODATE |
x------x-------------x--------------x
| 1 | 2014-01-11 | 2014-01-13 |
| 1 | 2014-02-23 | 2014-02-27 |
| 2 | 2014-05-06 | 2014-05-30 |
| 3 | NULL | NULL |
x------x-------------x--------------x
When you use UNPIVOT to bring FROMDATE AND TODATE to one column, it will eliminate NULL values by default.
SELECT ID,DATES
FROM MYTABLE
UNPIVOT (DATES FOR COLS IN (FROMDATE,TODATE)) P
which generates the below result. Note that we have missed the record of Id number 3
x------x-------------x
| Id | DATES |
x------x-------------x
| 1 | 2014-01-11 |
| 1 | 2014-01-13 |
| 1 | 2014-02-23 |
| 1 | 2014-02-27 |
| 2 | 2014-05-06 |
| 2 | 2014-05-30 |
x------x-------------x
In such cases a CROSS APPLY or OUTER APPLY will be useful
SELECT DISTINCT ID,DATES
FROM MYTABLE
OUTER APPLY(VALUES (FROMDATE),(TODATE))
COLUMNNAMES(DATES)
which forms the following result and retains Id where its value is 3
x------x-------------x
| Id | DATES |
x------x-------------x
| 1 | 2014-01-11 |
| 1 | 2014-01-13 |
| 1 | 2014-02-23 |
| 1 | 2014-02-27 |
| 2 | 2014-05-06 |
| 2 | 2014-05-30 |
| 3 | NULL |
x------x-------------x

One real life example would be if you had a scheduler and wanted to see what the most recent log entry was for each scheduled task.
select t.taskName, lg.logResult, lg.lastUpdateDate
from task t
cross apply (select top 1 taskID, logResult, lastUpdateDate
from taskLog l
where l.taskID = t.taskID
order by lastUpdateDate desc) lg

To answer the point above knock up an example:
create table #task (taskID int identity primary key not null, taskName varchar(50) not null)
create table #log (taskID int not null, reportDate datetime not null, result varchar(50) not null, primary key(reportDate, taskId))
insert #task select 'Task 1'
insert #task select 'Task 2'
insert #task select 'Task 3'
insert #task select 'Task 4'
insert #task select 'Task 5'
insert #task select 'Task 6'
insert #log
select taskID, 39951 + number, 'Result text...'
from #task
cross join (
select top 1000 row_number() over (order by a.id) as number from syscolumns a cross join syscolumns b cross join syscolumns c) n
And now run the two queries with a execution plan.
select t.taskID, t.taskName, lg.reportDate, lg.result
from #task t
left join (select taskID, reportDate, result, rank() over (partition by taskID order by reportDate desc) rnk from #log) lg
on lg.taskID = t.taskID and lg.rnk = 1
select t.taskID, t.taskName, lg.reportDate, lg.result
from #task t
outer apply ( select top 1 l.*
from #log l
where l.taskID = t.taskID
order by reportDate desc) lg
You can see that the outer apply query is more efficient. (Couldn't attach the plan as I'm a new user... Doh.)

Related

SQL left join with latest record

I want to left join a table with the latest record only.
I have Customer1 table:
+--------+----------+
| CustID | CustName |
+--------+----------+
| 1 | ABC123 |
| 2 | 456XYZ |
| 3 | 5PQR3 |
| 4 | 789XYZ |
| 5 | 789A |
+--------+----------+
SalesInvoice table:
+------------+--------+-----------+
| InvDate | CustID | InvNumber |
+------------+--------+-----------+
| 2020-03-01 | 1 | IV236 |
| 2020-04-07 | 1 | IV644 |
| 2020-06-13 | 2 | IV869 |
| 2020-03-29 | 3 | IV436 |
| 2020-02-06 | 3 | IV126 |
+------------+--------+-----------+
And I want this required output:
+--------+------------+-----------+
| CustID | InvDate | InvNumber |
+--------+------------+-----------+
| 1 | 2020-04-07 | IV644 |
| 2 | 2020-06-13 | IV869 |
| 3 | 2020-03-29 | IV436 |
| 4 | | |
| 5 | | |
+--------+------------+-----------+
For quick and easy, below is the sample code.
drop table if exists #Customer1
create table #Customer1(CustID int, CustName varchar (100))
insert into #Customer1 values
(1,'ABC123'),
(2,'456XYZ'),
(3,'5PQR3'),
(4,'789XYZ'),
(5,'789A')
drop table if exists #SalesInvoice
create table #SalesInvoice(InvDate DATE, CustID INT, InvNumber varchar (100))
insert into #SalesInvoice values
('2020-03-01',1,'IV236'),
('2020-04-07',1,'IV644'),
('2020-06-13',2,'IV869'),
('2020-03-29',3,'IV436'),
('2020-02-06',3,'IV126')
I like using TOP 1 WITH TIES in this case:
SELECT TOP 1 WITH TIES c.CustID, i.InvDate, i.InvNumber
FROM #Customer1 c
LEFT JOIN #Invoices i ON c.CustID = i.CustID
ORDER BY ROW_NUMBER() OVER (PARTITION BY c.CustID ORDER BY i.InvDate DESC);
Demo
The top 1 trick here is to order by row number, assigning a sequence to each customer, with the sequence descending by invoice date. Then, this approach retains just the most recent invoice record for each customer.
I recommend outer apply:
select c.*, i.*
from #c c outer apply
(select top (1) i.*
from #invoices i
where i.custId = c.custId
order by i.invDate desc
) i;
outer apply implements a special type of join called a "lateral join". This is a very powerful construct. But when learning about them, you can think of a lateral join as a correlated subquery that can return more than one column and more than one row.
You can try ROW_NUMBER window function instead of lateral joins with this simple self-explaining T-SQL
SELECT c.CustID
, d.InvDate
, d.InvNumber
FROM #C c
LEFT JOIN (
SELECT *
, ROW_NUMBER() OVER (PARTITION BY CustID ORDER BY InvDate DESC) AS RowNo
FROM #D
) d
ON c.CustID = d.CustID
AND d.RowNo = 1
Basically ROW_NUMBER is used to filter the "last" invoice in one table scan, instead of performing SELECT TOP 1 ... ORDER BY in the correlated query which has to be executed multiple times -- as much as the number of customers.

Google BigQuery: From table of days get a table with all days of year

I have this (sample) table:
+------------+-------------------+-----------+
| Date | User | Attribute |
+------------+-------------------+-----------+
| 2019-01-01 | user1#example.com | apple |
| 2019-02-01 | user2#example.com | pear |
| 2019-03-01 | user1#example.com | carrot |
| 2019-03-01 | user2#example.com | orange |
+------------+-------------------+-----------+
I need to create a full permutation of all (date+user) couples filling all the missing days of the year 2019 (with attribute as null).
Like in my example I have 2 different users:
user1#example.com
user2#example.com
The resulting table should be:
+------------+-------------------+-----------+
| Date | User | Attribute |
+------------+-------------------+-----------+
| 2019-01-01 | user1#example.com | apple |
| ... | user1#example.com | null |
| 2019-03-01 | user1#example.com | carrot |
| ... | user1#example.com | null |
| 2019-12-31 | user1#example.com | null |
| 2019-01-01 | user2#example.com | null |
| ... | user2#example.com | null |
| 2019-02-01 | user2#example.com | pear |
| ... | user2#example.com | null |
| 2019-03-01 | user2#example.com | orange |
| ... | user2#example.com | null |
| 2019-12-31 | user2#example.com | null |
+------------+-------------------+-----------+
The ... implies that there is a row for each single day of the year, and the attribute has a value when the source table provides an actual value, otherwhise a null is used.
As first step, to create all the (date+user) permutations I thought of using bigquery-public-data.utility_eu.date_greg table, using a CROSS JOIN to create all the needed rows.
Here a sample table to be used:
#standardSQL
WITH sample AS (
SELECT DATE('2019-01-01') date, 'user1#example.com' user, 'apple' attribute
UNION ALL
SELECT DATE('2019-02-01'), 'user2#example.com', 'pear'
UNION ALL
SELECT DATE('2019-03-01'), 'user1#example.com', 'carrot'
UNION ALL
SELECT DATE('2019-03-01'), 'user2#example.com', 'orange'
)
And here a first query I attempted:
SELECT d.date,s.* EXCEPT(date)
FROM sample s
CROSS JOIN `bigquery-public-data.utility_eu.date_greg` d
WHERE d.year = 2019
ORDER BY date,user
But this is too much because also the attribute values is used inside the join and I'm getting the value replicated on all days that are not related to the original one.
I think I need to have some sort of DISTINCT in order to get only the unique (date+user) couples, and only then associate the attribute value, if any.
This is first working solution I found:
distinct_couples AS (
SELECT DISTINCT d.date,s.user
FROM sample s CROSS JOIN `bigquery-public-data.utility_eu.date_greg` d
WHERE d.year = 2019
)
SELECT d.*, s.attribute
FROM distinct_couples d
LEFT JOIN sample s USING(date,user)
ORDER BY date,user
But I'm doing a join with sample twice (first in temp table and second in main query), so I'm trying to understand if can be optimized.
Do you have any suggestion on how to make it works?
Thanks
Below is for BigQuery Standard SQL
#standardSQL
WITH users AS (
SELECT DISTINCT user
FROM `project.dataset.sample`
)
SELECT d.date, u.user, s.attribute
FROM `bigquery-public-data.utility_eu.date_greg` d
CROSS JOIN users u
LEFT JOIN `project.dataset.sample` s
ON s.date = d.date
AND s.user = u.user
WHERE d.year = 2019
As a side note - you don't really need using any extra dates table as you can generate it on fly - as in example below
#standardSQL
WITH users AS (
SELECT DISTINCT user
FROM `project.dataset.sample`
), dates AS (
SELECT `date`
FROM UNNEST(GENERATE_DATE_ARRAY('2019-01-01', '2019-12-31')) `date`
)
SELECT d.date, u.user, s.attribute
FROM dates d
CROSS JOIN users u
LEFT JOIN `project.dataset.sample` s
ON s.date = d.date
AND s.user = u.user

Access Queries comparing two tables

I have two tables in Access, Table A and Table B:
Table MasterLockInsNew:
+----+-------+----------+
| ID | Value | Date |
+----+-------+----------+
| 1 | 123 | 12/02/13 |
| 2 | 1231 | 11/02/13 |
| 4 | 1265 | 16/02/13 |
+----+-------+----------+
Table InitialPolData:
+----+-------+----------+---+
| ID | Value | Date |Type
+----+-------+----------+---+
| 1 | 123 | 12/02/13 | x |
| 2 | 1231 | 11/02/13 | x |
| 3 | 1238 | 10/02/13 | y |
| 4 | 1265 | 16/02/13 | a |
| 7 | 7649 | 18/02/13 | z |
+----+-------+----------+---+
All I want are the rows from table B for IDs not contained in A. My current code looks like this:
SELECT Distinct InitialPolData.*
FROM InitialPolData
WHERE InitialPolData.ID NOT IN (SELECT Distinct InitialPolData.ID
from InitialPolData INNER JOIN
MasterLockInsNew
ON InitialPolData.ID=MasterLockInsNew.ID);
But whenever I run this in Access it crashes!! The tables are fairly large but I don't think this is the reason.
Can anyone help?
Thanks
or try a left outer join:
SELECT b.*
FROM InitialPolData b left outer join
MasterLockInsNew a on
b.id = a.id
where
a.id is null
Simple subquery will do.
select * from InitialPolData
where id not in (
select id from MasterLockInsNew
);
Try using NOT EXISTS:
SELECT Distinct i.*
FROM InitialPolData AS i
WHERE NOT EXISTS (SELECT 1
FROM MasterLockInsNew AS m
WHERE m.ID = i.ID)

Update one table using the data from another table

I am dealing with 2 tables.
Users:
+----+----------+-------------------------+
| id | user_id | datetime |
+----+----------+-------------------------+
| 1 | 95678367 | 2015-07-03 02:02:29.863 |
| 2 | 72876424 | 2015-07-07 01:04:14.436 |
| 3 | 74293582 | 2015-07-11 10:02:45.523 |
+----+----------+-------------------------+
UserActivation:
+-----+----------+-------------------------+
| id | user_id | datetime |
+-----+----------+-------------------------+
| 1 | 95678367 | 2015-07-03 02:02:29.863 |
| 2 | 09235892 | 2015-07-03 02:02:29.863 |
| 3 | 90328574 | 2015-07-03 02:02:29.863 |
| 4 | 24714287 | 2015-07-03 02:02:29.863 |
| 5 | 02743723 | 2015-07-03 02:02:29.863 |
| 6 | 72876424 | 2015-07-07 01:04:14.436 |
| 7 | 09385732 | 2015-07-07 01:04:14.436 |
| 8 | 74576234 | 2015-07-07 01:04:14.436 |
| 9 | 75439273 | 2015-07-07 01:04:14.436 |
| 10 | 74293582 | 2015-07-11 10:02:45.523 |
| 11 | 94562872 | 2015-07-11 10:02:45.523 |
| 12 | 80367456 | 2015-07-11 10:02:45.523 |
| 13 | 76537924 | 2015-07-11 10:02:45.523 |
+-----+----------+-------------------------+
I am using SQL server 2012. I want to update the timings of table UserActivation. What is going on here that first code insert data in Users table when a user registers. After sometime he activates his account, that data is saving in UserActivation. UserActivation contains numerous columns, but I am showing only those which I am using. The problem is we added datetime column afterwards and till that time hundreds of data is there. What I am trying to do is to update the datetime of UserActivation like follows:
In User table, user_id: 95678367 is first. second is 72876424. I want to update datetime of UserActivation table of rows having id 1 to 5, because id 6 contains the user_id 72876424. So I want to update datetime of rows 1 to 5 in such a way that they comes in 3 seconds increment of datetime from User table.
User table first row has user_id 95678367 and datetime 2015-07-03 02:02:29.863, so update datetime of rows 1-5 of UserActivation(till second user_id of User encounters) as
row1 -> **2015-07-03 02:02:31.863**
row2 -> **2015-07-03 02:02:34.863**
row3 -> **2015-07-03 02:02:37.863**
row4 -> **2015-07-03 02:02:40.863**
row5 -> **2015-07-03 02:02:43.863**
After that if we strikes second id from Users table. take that datetime from Users table 2015-07-07 01:04:14.436
And start update datetime of UserActivation table with increments of three for rows 6-9 as 10th row cotains 3rd user_id of Users table.
Note: I am trying to write a script so that can I loop through both tables and check one by one user_id of both table and update accordingly, but I am not expert in sql server scripts. Showing how to loop through a SELECT result and update in loop will also help.
One way to calculate the "rolling" timestamp is to use a cross apply. See this:
WITH cte AS (
SELECT ua.id, ua.[datetime], CASE WHEN u.id IS NULL THEN -1 ELSE 0 END AS gapN
FROM UserActivation ua LEFT JOIN Users u
ON ua.user_id = u.user_id
)
SELECT co.id, co.datetime, DATEADD(second, 3*(co.id-cap.maxID), co.[datetime]) newDatetime
FROM cte co CROSS APPLY (
SELECT MAX(id) maxID
FROM cte ca
WHERE ca.id <= co.id AND ca.gapN = 0
) cap
You can see that "live" on SQLFiddle
In order to UPDATE the table, you need to replace the SELECT clause with:
/* SAME AS ABOVE UNTIL THE SELECT LINE */
UPDATE co
SET co.datetime = DATEADD(second, 3 * (co.id - cap.maxID), co.[datetime])
FROM UserActivation co CROSS APPLY (
/* SAME AS ABOVE AFTER THE FROM LINE */
Try this
update ua
set ua.datetime = u.datetime
from users u left join useractivation ua on u.user_id = ua.user_id
;with cte
as
(
select id,user_id,datetime from useractivation where id=1
union all
select ua.id,ua.user_id,
case when ua.datetime is null then DATEADD(ss,3,c.datetime) else ua.datetime end as datetime
from cte c inner join useractivation ua on c.id+1 = ua.id
)
update ua
set ua.datetime = c.datetime
from cte c inner join useractivation ua on c.id = ua.id

I need a specific output

I have to get a specific output format from my tables.
Let's say I have a simple table with 2 columns name and value.
table T1
+---------------+------------------+
| Name | Value |
+---------------+------------------+
| stuff1 | 1 |
| stuff1 | 1 |
| stuff2 | 2 |
| stuff3 | 1 |
| stuff2 | 4 |
| stuff2 | 2 |
| stuff3 | 4 |
+---------------+------------------+
I know the values are in the interval 1-4. I group it by name and value and count number of the same rows as Number and get the following table:
table T2
+---------------+------------------+--------+
| Name | Value | Number |
+---------------+------------------+--------+
| stuff1 | 1 | 2 |
| stuff2 | 2 | 2 |
| stuff3 | 1 | 1 |
| stuff3 | 4 | 1 |
+---------------+------------------+--------+
Here is the part when I need your help! What should I do if I want to get these format?
table T3
+---------------+------------------+--------+
| Name | Value | Number |
+---------------+------------------+--------+
| stuff1 | 1 | 2 |
| stuff1 | 2 | 0 |
| stuff1 | 3 | 0 |
| stuff1 | 4 | 0 |
| stuff2 | 1 | 0 |
| stuff2 | 2 | 2 |
| stuff2 | 3 | 0 |
| stuff2 | 4 | 0 |
| stuff3 | 1 | 1 |
| stuff3 | 2 | 0 |
| stuff3 | 3 | 0 |
| stuff3 | 4 | 1 |
+---------------+------------------+--------+
Thanks for any suggestions!
You start with a cross join to generate all possible combinations and then left-join in the results from your existing query:
select n.name, v.value, coalesce(nv.cnt, 0) as "Number"
from (select distinct name from table t) n cross join
(select distinct value from table t) v left outer join
(select name, value, count(*) as cnt
from table t
group by name, value
) nv
on nv.name = n.name and nv.value = v.value;
Variation on the theme.
Differences between Gordon Linoff and Owen existing answers.
I prefer GROUP BY to get the Names rather than a DISTINCT. This may have better performance in a case like this. (See Rob Farley's still relevant article.)
I explode the subqueries into a series of CTEs for clarity.
I use table T2 as the question now labels the group results set instead of showing that as as subquery.
WITH PossibleValue AS (
SELECT 1 Value
UNION ALL
SELECT Value + 1
FROM PossibleValue
WHERE Value < 4
),
Name AS (
SELECT Name
FROM T1
GROUP BY Name
),
NameValue AS (
SELECT Name
,Value
FROM Name
CROSS JOIN
PossibleValue
)
SELECT nv.Name
,nv.Value
,ISNULL(T2.Number,0) Number
FROM NameValue nv
LEFT JOIN
T2 ON nv.Name = T2.Name
AND nv.Value = T2.Value
Yet another solution, this time using a Table Value Constructor in a CTE to build a table of name value combinations.
WITH value AS
( SELECT DISTINCT t.name, v.value
FROM T1 AS t
CROSS JOIN (VALUES (1),(2),(3),(4)) AS v (value)
)
SELECT v.name AS 'Name', v.value AS 'Value', COUNT(t.name) AS 'Number'
FROM value AS v
LEFT JOIN T1 AS t ON t.value = v.value AND t.name = v.name
GROUP BY v.name, v.value, t.name;