Update one table using the data from another table - sql

I am dealing with 2 tables.
Users:
+----+----------+-------------------------+
| id | user_id | datetime |
+----+----------+-------------------------+
| 1 | 95678367 | 2015-07-03 02:02:29.863 |
| 2 | 72876424 | 2015-07-07 01:04:14.436 |
| 3 | 74293582 | 2015-07-11 10:02:45.523 |
+----+----------+-------------------------+
UserActivation:
+-----+----------+-------------------------+
| id | user_id | datetime |
+-----+----------+-------------------------+
| 1 | 95678367 | 2015-07-03 02:02:29.863 |
| 2 | 09235892 | 2015-07-03 02:02:29.863 |
| 3 | 90328574 | 2015-07-03 02:02:29.863 |
| 4 | 24714287 | 2015-07-03 02:02:29.863 |
| 5 | 02743723 | 2015-07-03 02:02:29.863 |
| 6 | 72876424 | 2015-07-07 01:04:14.436 |
| 7 | 09385732 | 2015-07-07 01:04:14.436 |
| 8 | 74576234 | 2015-07-07 01:04:14.436 |
| 9 | 75439273 | 2015-07-07 01:04:14.436 |
| 10 | 74293582 | 2015-07-11 10:02:45.523 |
| 11 | 94562872 | 2015-07-11 10:02:45.523 |
| 12 | 80367456 | 2015-07-11 10:02:45.523 |
| 13 | 76537924 | 2015-07-11 10:02:45.523 |
+-----+----------+-------------------------+
I am using SQL server 2012. I want to update the timings of table UserActivation. What is going on here that first code insert data in Users table when a user registers. After sometime he activates his account, that data is saving in UserActivation. UserActivation contains numerous columns, but I am showing only those which I am using. The problem is we added datetime column afterwards and till that time hundreds of data is there. What I am trying to do is to update the datetime of UserActivation like follows:
In User table, user_id: 95678367 is first. second is 72876424. I want to update datetime of UserActivation table of rows having id 1 to 5, because id 6 contains the user_id 72876424. So I want to update datetime of rows 1 to 5 in such a way that they comes in 3 seconds increment of datetime from User table.
User table first row has user_id 95678367 and datetime 2015-07-03 02:02:29.863, so update datetime of rows 1-5 of UserActivation(till second user_id of User encounters) as
row1 -> **2015-07-03 02:02:31.863**
row2 -> **2015-07-03 02:02:34.863**
row3 -> **2015-07-03 02:02:37.863**
row4 -> **2015-07-03 02:02:40.863**
row5 -> **2015-07-03 02:02:43.863**
After that if we strikes second id from Users table. take that datetime from Users table 2015-07-07 01:04:14.436
And start update datetime of UserActivation table with increments of three for rows 6-9 as 10th row cotains 3rd user_id of Users table.
Note: I am trying to write a script so that can I loop through both tables and check one by one user_id of both table and update accordingly, but I am not expert in sql server scripts. Showing how to loop through a SELECT result and update in loop will also help.

One way to calculate the "rolling" timestamp is to use a cross apply. See this:
WITH cte AS (
SELECT ua.id, ua.[datetime], CASE WHEN u.id IS NULL THEN -1 ELSE 0 END AS gapN
FROM UserActivation ua LEFT JOIN Users u
ON ua.user_id = u.user_id
)
SELECT co.id, co.datetime, DATEADD(second, 3*(co.id-cap.maxID), co.[datetime]) newDatetime
FROM cte co CROSS APPLY (
SELECT MAX(id) maxID
FROM cte ca
WHERE ca.id <= co.id AND ca.gapN = 0
) cap
You can see that "live" on SQLFiddle
In order to UPDATE the table, you need to replace the SELECT clause with:
/* SAME AS ABOVE UNTIL THE SELECT LINE */
UPDATE co
SET co.datetime = DATEADD(second, 3 * (co.id - cap.maxID), co.[datetime])
FROM UserActivation co CROSS APPLY (
/* SAME AS ABOVE AFTER THE FROM LINE */

Try this
update ua
set ua.datetime = u.datetime
from users u left join useractivation ua on u.user_id = ua.user_id
;with cte
as
(
select id,user_id,datetime from useractivation where id=1
union all
select ua.id,ua.user_id,
case when ua.datetime is null then DATEADD(ss,3,c.datetime) else ua.datetime end as datetime
from cte c inner join useractivation ua on c.id+1 = ua.id
)
update ua
set ua.datetime = c.datetime
from cte c inner join useractivation ua on c.id = ua.id

Related

How do I do an Oracle SQL update from select in a specific order?

I have a table with old values (some null) and new values for various attributes, all inserted at different add times throughout the months. I'm trying to update a second table with records with business month end dates. Right now, these records only contain the most recent new values for all month end dates. The goal is to create historical data by updating the previous month end values with the old values from the first table. I am a beginner and was able to come up with a query to update on one object where there was one entry from the first table. Now I am trying to expand the query to include multiple objects, with possible, multiple old values within the same month. I tried to use "order by" (since I need to make updates for a month in ascending order so it gets the latest value) but read that doesn't work with update statements without a subquery. So I tried my hand at making a more complicated query, without success. I am getting the following error: single-row subquery returns more than one row. Thanks!
TableA:
| ID | TYPE | OLD_VALUE | NEW_VALUE | ADD_TIME|
-----------------------------------------------
| 1 | A | 2 | 3 | 1/11/2019 8:00:00am |
| 1 | B | 3 | 4 | 12/10/2018 8:00:00am|
| 1 | B | 4 | 5 | 12/11/2018 8:00:00am|
| 2 | A | 5 | 1 | 12/5/2018 08:00:00am|
| 2 | A | 1 | 2 | 12/5/2019 09:00:00am|
| 2 | A | 2 | 3 | 12/5/2019 10:00:00am|
| 2 | B | 1 | 2 | 12/5/2019 10:00:00am|
TableB
| ID | MONTH_END | TYPE_A | TYPE_B |
-----------------------------------
| 1 | 1/31/19 | 3 | 5 |
| 1 | 12/31/18 | 3 | 5 |
| 1 | 11/30/18 | 3 | 5 |
| 2 | 12/31/18 | 3 | 2 |
| 2 | 11/30/18 | 3 | 2 |
Desired Output for TableB
| ID | MONTH_END | TYPE_A | TYPE_B |
-----------------------------------
| 1 | 1/31/19 | 3 | 5 |
| 1 | 12/31/18 | 2 | 5 |
| 1 | 11/30/18 | 2 | 3 |
| 2 | 12/31/18 | 3 | 2 |
| 2 | 11/30/18 | 5 | 2 |
My Query for Type A (Which I plan to adapt for Type B and execute as well for the desired output)
update TableB B
set b.type_a =
(
with aa as
(
select id, nvl(old_value, new_value) typea, add_time
from TableA
where type = 'A'
order by id, add_time ascending
)
select typea
from aa
where aa.id = b.id
and b.month_end <= aa.add_tm
)
where exists
(
with aa as
(
select id, nvl(old_value, new_value) typea, add_time
from TableA
where type = 'A'
order by id, add_time ascending
)
select typea
from aa
where aa.id = b.id
and b.month_end <= aa.add_tm
)
Kudo's for giving example input data and desired output. I found your question a bit confusing so let me rephrase to "Provide the last type a value from table a that is in the same month as the month end.
By matching on type and date of entry, we can get your answer. The "ROWNUM=1" is to limit result set to a single entry in case there is more than one row with the same add_time. This SQL is still a mess, maybe someone else can come up with a better one.
UPDATE tableb b
SET b.typea =
(SELECT old_value
FROM tablea a
WHERE LAST_DAY( TRUNC( a.add_time ) ) = b.month_end
AND TYPE = 'A'
AND add_time =
(SELECT MAX( add_time )
FROM tablea
WHERE TYPE = 'A' AND LAST_DAY( TRUNC( a.add_time ) ) = b.month_end)
AND ROWNUM = 1)
WHERE EXISTS
(SELECT old_value
FROM tablea a
WHERE LAST_DAY( TRUNC( a.add_time ) ) = b.month_end AND TYPE = 'A');

Query returned with an extra column in sql -ms access

So I am wondering. I fell into an interesting suggestion from another developer. So i basically have two tables I join in a query and I want the resulting table from the query to have an extra column that comes from the table on from the joint.
Example:
#table A: contains rating of players, changes randomly at any date depending
#on drop of form from the players
PID| Rating | DateChange |
1 | 2 | 10-May-2014 |
1 | 4 | 20-May-2015 |
1 | 20 | 1-June-2015 |
2 | 4 | 1-April-2014|
3 | 4 | 5-April-2014|
2 | 3 | 3-May-2015 |
#Table B: contains match sheets. Every player has a different match sheet
#and plays different dates.
MsID | PID | MatchDate | Win |
1 | 2 | 10-May-2014 | No |
2 | 1 | 15-May-2015 | Yes |
3 | 3 | 10-Apr-2014 | No |
4 | 1 | 21-Apr-2015 | Yes |
5 | 1 | 3-June-2015 | Yes |
6 | 2 | 5-May-2015 | No |
#I am trying to achieve this by running the ms-access query: i want to get
#every players rating at the time the match was played not his current
#rating.
MsID | PID | MatchDate | Rating |
1 | 2 | 10-May-2014 | 4 |
2 | 1 | 15-May-2015 | 2 |
3 | 3 | 10-Apr-2014 | 4 |
4 | 1 | 21-Apr-2015 | 4 |
5 | 1 | 3-June-2015 | 20 |
6 | 2 | 5-May-2015 | 3 |
This is what I have tried below:
Select MsID, PID, MatchDate, A-table.rating as Rating from B-table
left Join A-table
on B-table.PID = A-table.PID
where B-table.MatchDate > A-table.Datechange;
any help is appreciated. The solution can be in Vba as long as it returns something like a view/table I can manipulate using other queries or report.
Think of this in terms of sets of data... you need a set that lists the MAX dateChange for each player's and match date.
Soo...
SELECT MAX(A.DateChange) MDC, A.PID, B.Matchdate
FROM B-table B
INNER Join A-table A
on B.PID = A.PID
and A.DateChange <= B.MatchDate
GROUP BY A.PID, B.Matchdate
Now we take this and join it back to what you've done to limit the results in table A and B to ONLY those with that date player and matchDate (my inline table C)
SELECT B.MsID, B.PID, B.MatchDate, A.rating as Rating
FROM [B-table] B
INNER JOIN [A-table] A
on B.PID = A.PID
INNER JOIN (
SELECT MAX(Y.DateChange) MDC, Y.PID, Z.Matchdate
FROM [B-table] Z
INNER Join [A-table] Y
on Z.PID = Y.PID
and Y.DateChange <= Z.MatchDate
GROUP BY Y.PID, Z.Matchdate) C
on C.mdc = A.DateChange
and A.PID = C.PId
and B.MatchDate = C.Matchdate
I didn't create a sample for this using your data so it's untested but I believe the logic is sound...
Now Tested! SQL Fiddle using SQL server though...
My results don't match yours exactly. I think you're expected results are wrong though for MSID 4 given rules defined.

Using a SQL cursor. Worried about performance. Any way to do this with set operations?

I am working on a production table that constantly updates, it is in its infancy, but eventually there may be multiple insertions per minute to this table (and will likely reach millions of entries). I have no control over the table structure or how data is input.
I need to count the number of _COFF's made after a machine has the status of _SETUP. This count must be made at least every 15 minutes. The ID is used as a key in another table, and must be associated with the MachineNumber and PartNumber in the output. (sample output below)
Here is what i am working with:
'Production' table:
+---------------+---------------------+---------+------------+--------+
| MachineNumber | DateTime | Comment | PartNumber | Status |
+---------------+---------------------+---------+------------+--------+
| 1 | 11/11/2014 12:12:32 | | 104 | _SETUP |
| 1 | 11/11/2014 12:12:40 | 155 | 104 | _ID |
| 1 | 11/11/2014 12:12:45 | | 104 | _CON |
| 1 | 11/11/2014 12:16:45 | | 104 | _COFF |
| 1 | 11/11/2014 12:16:46 | | 104 | _CON |
| 1 | 11/11/2014 12:20:46 | | 104 | _COFF |
| 2 | 11/11/2014 12:20:50 | | 223 | _SETUP |
| 1 | 11/11/2014 12:21:00 | | 104 | _CON |
| 1 | 11/11/2014 12:23:00 | | 104 | _COFF |
| 2 | 11/11/2014 12:25:00 | 543 | 223 | _ID |
| 2 | 11/11/2014 12:25:20 | | 223 | _CON |
| 2 | 11/11/2014 12:26:20 | | 223 | _COFF |
... ... ... ... ...
+---------------+---------------------+---------+------------+--------+
Currently i use a cursor to get the following output:
+---------------+------------+-----+-------------+
| MachineNumber | DateTime | ID | _COFF Count |
+---------------+------------+-----+-------------+
| 1 | 11/11/2014 | 155 | 3 |
| 2 | 11/11/2014 | 543 | 1 |
+---------------+------------+-----+-------------+
Anyway to do this better than looping through (possibly) a million entries? What about deleting records from the table i have already looped through, and storing the output in another table?
EDIT: There will be only one _SETUP and one _ID per part per machine, there will more than one part per machine however (and therefore more than one _SETUP and _ID for each machine in the table).
The best thing would be creating a trigger which would reset the last setup date for a machine, then use it in an indexed view.
Assuming your table is called Status:
CREATE INDEX
IX_Status_Machine_Status_DateTime
ON Status (machineNumber, status, dateTime)
GO
CREATE UNIQUE INDEX
UX_Status_Machine_Part__Id
ON Status (machineNumber, partNumber)
WHERE status = '_ID'
CREATE TABLE
Status_LastSetup
(
machineNumber INT NOT NULL PRIMARY KEY,
partNumber INT NOT NULL,
lastSetup DATETIME NOT NULL
)
GO
CREATE TRIGGER
TR_Status_All
ON Status
AFTER INSERT, UPDATE, DELETE
AS
WITH c AS
(
SELECT machineNumber
FROM INSERTED
UNION ALL
SELECT machineNumber
FROM DELETED
),
t AS
(
SELECT sls.*
FROM c
JOIN Status_LastSetup sls
ON sls.machineNumber = c.machineNumber
),
s AS
(
SELECT *
FROM c
CROSS APPLY
(
SELECT TOP 1
partNumber, dateTime
FROM Status st
WHERE st.machineNumber = c.machineNumber
AND st.status = '_STATE'
ORDER BY
dateTime DESC
)
)
ON t.machineNumber = s.machineNumber
WHEN NOT MATCHED BY TARGET THEN
INSERT (machineNumber, partNumber, lastSetup)
VALUES (s.machineNumber, s.partNumber, s.dateTime)
WHEN MATCHED THEN
UPDATE
SET t.dateTime = s.dateTime,
t.partNumber = s.partNumber
WHERE EXISTS
(
SELECT t.dateTime, t.partNumber
EXCEPT
SELECT s.dateTime, s.partNumber
)
WHEN NOT MATCHED BY SOURCE THEN
DELETE
GO
CREATE VIEW
V_Status_CoffCount
WITH SCHEMABINDING
AS
SELECT s.machineNumber,
COUNT_BIG(*) cnt
FROM dbo.Status_LastSetup sls
JOIN dbo.Status s
ON s.machineNumber = sls.machineNumber
AND s.status = '_COFF'
AND s.dateTime >= sls.lastSetup
GROUP BY
s.machineNumber
GO
CREATE UNIQUE CLUSTERED INDEX
UX_V_Status_CoffCount_Machine
ON V_Status_CoffCount (machineNumber)
GO
Once it's all up and running, just run this query:
SELECT s.machineNumber, sls.lastSetup, comment id, cnt
FROM V_Status_CoffCount scc WITH (NOEXPAND)
JOIN Status_LastSetup sls
ON sls.machineNumber = scc.machineNumber
LEFT JOIN
Status s
ON s.machineNumber = sls.machineNumber
AND s.partNumber = sls.partNumber
AND s.status = '_ID'
Its execution time is linear to the number of machines you have and (almost) does not depend on the number of entries in Status.
See this all on SQLFiddle: http://sqlfiddle.com/#!6/9671c/2
select table.machinenumber, timestamp.starttime, id.comment, count(*)
from table
join ( select MachineNumber, min(comment) as commment
from table
where Status = '_ID'
and comment is not null
group by MachineNumber ) as id
on id.MachineNumber = table.machinenumber
join ( select MachineNumber, min(datetime) as starttime
from table
where Status = '_SETUP'
group by MachineNumber ) as timestamp
on timestamp.ID = table.ID
and timestamp.starttime < table.datetime
and table.Status = '_CON'
group by table.machinenumber, timestamp.starttime, id.comment

SQL Server Management Studio 2008 - why won't my left join provide null records?

I am trying to pull records from two tables, a master table, and a transaction table.
My Master table contains all of my account IDs. My transaction table has any transaction run by these accounts with 3 columns: activity date, income, and charge type.
In the transaction table, some of these accounts may not appear at all because they have not performed a transaction during a given date range. However, I still need these accounts to appear on my result list when I query them.
So my data would look like this:
Master Table: Transaction Table:
| AccountID | | AccountID | ChargeType | ActivityDate| Income |
-------------- -------------------------------------------------
| 1 | | 2 | 2000 | 8/31/2012 | $99.00 |
| 2 | | 3 | 2000 | 7/31/2012 | $79.00 |
| 3 | | 5 | 2000 | 9/30/2012 | $79.00 |
| 4 |
| 5 |
My query currently looks like:
select
a.AccountID,
b.ChargeType,
b.ActivityDate,
b.Income
From
MasterTable as A
left join
TransactionTable as B on a.AccountID = b.AccountID
where
a.AccountID in ('1','2','3','4','5')
and
b.ActivityDate between '5/1/2012' and '11/30/2012'
From what I understand, this query should list all 5 accounts I've chosen, and display NULL values for the accounts not found in the TransactionTable.
Results I expect:
| AccountID | ChargeType | ActivityDate| Income |
-------------------------------------------------
| 1 | NULL | NULL | NULL |
| 2 | 2000 | 8/31/2012 | $99.00 |
| 3 | 2000 | 7/31/2012 | $79.00 |
| 4 | NULL | NULL | NULL |
| 5 | 2000 | 9/30/2012 | $79.00 |
The incorrect results I receive instead:
| AccountID | ChargeType | ActivityDate| Income |
-------------------------------------------------
| 2 | 2000 | 8/31/2012 | $99.00 |
| 3 | 2000 | 7/31/2012 | $79.00 |
| 5 | 2000 | 9/30/2012 | $79.00 |
I assume I am misunderstanding something fundamental here. Any help is greatly appreciated!
Thanks in advance!
The reason is that you have the "b" table referenced in the where clause, so NULL values are filtered out.
Move the condition to the on clause:
From
MasterTable A left join
TransactionTable B
on a.AccountID=b.AccountID and
a.AccountID in ('1','2','3','4','5') and
b.ActivityDate between '5/1/2012' and '11/30/2012'
Change the last filter of your WHERE Clause to; SQL Fiddle Example
DECLARE #d1 DATETIME, #d2 DATETIME
SELECT #d1 = '5/1/2012', #d2 = '11/30/2012'
SELECT ...
WHERE a.AccountID in (1,2,3,4,5) AND
ISNULL(b.ActivityDate, #d1) BETWEEN #d1 and #d2
I'd use the following:
(b.ActivityDate is NULL or b.ActivityDate between '5/1/2012' and '11/30/2012' )
so that the original query becomes
select
a.AccountID,
b.ChargeType,
b.ActivityDate,
b.Income
From
MasterTable as A
left join
TransactionTable as B on a.AccountID=b.AccountID
where
a.AccountID in ('1','2','3','4','5')
and
(b.ActivityDate is NULL or b.ActivityDate between '5/1/2012' and '11/30/2012' )
But the answer below is probably better.
You can simply use the below code:
SELECT
A.AccountID,
B.ChargeType,
B.ActivityDate,
B.Income
From MasterTable AS A
LEFT JOIN TransactionTable AS B
ON A.AccountID = B.AccountID
If you use B.ActivityDate BETWEEN '5/1/2012' AND '11/30/2012' condition in your code, then it will not yield the results that you are expecting. Because for AccountID = 1, ActivityDate is NULL. So, record with AccountID = 1 will not satisfy your between condition.

Real life example, when to use OUTER / CROSS APPLY in SQL

I have been looking at CROSS / OUTER APPLY with a colleague and we're struggling to find real life examples of where to use them.
I've spent quite a lot of time looking at When should I use CROSS APPLY over INNER JOIN? and googling but the main (only) example seems pretty bizarre (using the rowcount from a table to determine how many rows to select from another table).
I thought this scenario may benefit from OUTER APPLY:
Contacts Table (contains 1 record for each contact)
Communication Entries Table (can contain a phone, fax, email for each contact)
But using subqueries, common table expressions, OUTER JOIN with RANK() and OUTER APPLY all seem to perform equally. I'm guessing this means the scenario isn't applicable to APPLY.
Please share some real life examples and help explain the feature!
Some uses for APPLY are...
1) Top N per group queries (can be more efficient for some cardinalities)
SELECT pr.name,
pa.name
FROM sys.procedures pr
OUTER APPLY (SELECT TOP 2 *
FROM sys.parameters pa
WHERE pa.object_id = pr.object_id
ORDER BY pr.name) pa
ORDER BY pr.name,
pa.name
2) Calling a Table Valued Function for each row in the outer query
SELECT *
FROM sys.dm_exec_query_stats AS qs
CROSS APPLY sys.dm_exec_query_plan(qs.plan_handle)
3) Reusing a column alias
SELECT number,
doubled_number,
doubled_number_plus_one
FROM master..spt_values
CROSS APPLY (SELECT 2 * CAST(number AS BIGINT)) CA1(doubled_number)
CROSS APPLY (SELECT doubled_number + 1) CA2(doubled_number_plus_one)
4) Unpivoting more than one group of columns
Assumes 1NF violating table structure....
CREATE TABLE T
(
Id INT PRIMARY KEY,
Foo1 INT, Foo2 INT, Foo3 INT,
Bar1 INT, Bar2 INT, Bar3 INT
);
Example using 2008+ VALUES syntax.
SELECT Id,
Foo,
Bar
FROM T
CROSS APPLY (VALUES(Foo1, Bar1),
(Foo2, Bar2),
(Foo3, Bar3)) V(Foo, Bar);
In 2005 UNION ALL can be used instead.
SELECT Id,
Foo,
Bar
FROM T
CROSS APPLY (SELECT Foo1, Bar1
UNION ALL
SELECT Foo2, Bar2
UNION ALL
SELECT Foo3, Bar3) V(Foo, Bar);
There are various situations where you cannot avoid CROSS APPLY or OUTER APPLY.
Consider you have two tables.
MASTER TABLE
x------x--------------------x
| Id | Name |
x------x--------------------x
| 1 | A |
| 2 | B |
| 3 | C |
x------x--------------------x
DETAILS TABLE
x------x--------------------x-------x
| Id | PERIOD | QTY |
x------x--------------------x-------x
| 1 | 2014-01-13 | 10 |
| 1 | 2014-01-11 | 15 |
| 1 | 2014-01-12 | 20 |
| 2 | 2014-01-06 | 30 |
| 2 | 2014-01-08 | 40 |
x------x--------------------x-------x
CROSS APPLY
There are many situation where we need to replace INNER JOIN with CROSS APPLY.
1. If we want to join 2 tables on TOP n results with INNER JOIN functionality
Consider if we need to select Id and Name from Master and last two dates for each Id from Details table.
SELECT M.ID,M.NAME,D.PERIOD,D.QTY
FROM MASTER M
INNER JOIN
(
SELECT TOP 2 ID, PERIOD,QTY
FROM DETAILS D
ORDER BY CAST(PERIOD AS DATE)DESC
)D
ON M.ID=D.ID
The above query generates the following result.
x------x---------x--------------x-------x
| Id | Name | PERIOD | QTY |
x------x---------x--------------x-------x
| 1 | A | 2014-01-13 | 10 |
| 1 | A | 2014-01-12 | 20 |
x------x---------x--------------x-------x
See, it generated results for last two dates with last two date's Id and then joined these records only in outer query on Id, which is wrong. To accomplish this, we need to use CROSS APPLY.
SELECT M.ID,M.NAME,D.PERIOD,D.QTY
FROM MASTER M
CROSS APPLY
(
SELECT TOP 2 ID, PERIOD,QTY
FROM DETAILS D
WHERE M.ID=D.ID
ORDER BY CAST(PERIOD AS DATE)DESC
)D
and forms he following result.
x------x---------x--------------x-------x
| Id | Name | PERIOD | QTY |
x------x---------x--------------x-------x
| 1 | A | 2014-01-13 | 10 |
| 1 | A | 2014-01-12 | 20 |
| 2 | B | 2014-01-08 | 40 |
| 2 | B | 2014-01-06 | 30 |
x------x---------x--------------x-------x
Here is the working. The query inside CROSS APPLY can reference the outer table, where INNER JOIN cannot do this(throws compile error). When finding the last two dates, joining is done inside CROSS APPLY ie, WHERE M.ID=D.ID.
2. When we need INNER JOIN functionality using functions.
CROSS APPLY can be used as a replacement with INNER JOIN when we need to get result from Master table and a function.
SELECT M.ID,M.NAME,C.PERIOD,C.QTY
FROM MASTER M
CROSS APPLY dbo.FnGetQty(M.ID) C
And here is the function
CREATE FUNCTION FnGetQty
(
#Id INT
)
RETURNS TABLE
AS
RETURN
(
SELECT ID,PERIOD,QTY
FROM DETAILS
WHERE ID=#Id
)
which generated the following result
x------x---------x--------------x-------x
| Id | Name | PERIOD | QTY |
x------x---------x--------------x-------x
| 1 | A | 2014-01-13 | 10 |
| 1 | A | 2014-01-11 | 15 |
| 1 | A | 2014-01-12 | 20 |
| 2 | B | 2014-01-06 | 30 |
| 2 | B | 2014-01-08 | 40 |
x------x---------x--------------x-------x
OUTER APPLY
1. If we want to join 2 tables on TOP n results with LEFT JOIN functionality
Consider if we need to select Id and Name from Master and last two dates for each Id from Details table.
SELECT M.ID,M.NAME,D.PERIOD,D.QTY
FROM MASTER M
LEFT JOIN
(
SELECT TOP 2 ID, PERIOD,QTY
FROM DETAILS D
ORDER BY CAST(PERIOD AS DATE)DESC
)D
ON M.ID=D.ID
which forms the following result
x------x---------x--------------x-------x
| Id | Name | PERIOD | QTY |
x------x---------x--------------x-------x
| 1 | A | 2014-01-13 | 10 |
| 1 | A | 2014-01-12 | 20 |
| 2 | B | NULL | NULL |
| 3 | C | NULL | NULL |
x------x---------x--------------x-------x
This will bring wrong results ie, it will bring only latest two dates data from Details table irrespective of Id even though we join with Id. So the proper solution is using OUTER APPLY.
SELECT M.ID,M.NAME,D.PERIOD,D.QTY
FROM MASTER M
OUTER APPLY
(
SELECT TOP 2 ID, PERIOD,QTY
FROM DETAILS D
WHERE M.ID=D.ID
ORDER BY CAST(PERIOD AS DATE)DESC
)D
which forms the following desired result
x------x---------x--------------x-------x
| Id | Name | PERIOD | QTY |
x------x---------x--------------x-------x
| 1 | A | 2014-01-13 | 10 |
| 1 | A | 2014-01-12 | 20 |
| 2 | B | 2014-01-08 | 40 |
| 2 | B | 2014-01-06 | 30 |
| 3 | C | NULL | NULL |
x------x---------x--------------x-------x
2. When we need LEFT JOIN functionality using functions.
OUTER APPLY can be used as a replacement with LEFT JOIN when we need to get result from Master table and a function.
SELECT M.ID,M.NAME,C.PERIOD,C.QTY
FROM MASTER M
OUTER APPLY dbo.FnGetQty(M.ID) C
And the function goes here.
CREATE FUNCTION FnGetQty
(
#Id INT
)
RETURNS TABLE
AS
RETURN
(
SELECT ID,PERIOD,QTY
FROM DETAILS
WHERE ID=#Id
)
which generated the following result
x------x---------x--------------x-------x
| Id | Name | PERIOD | QTY |
x------x---------x--------------x-------x
| 1 | A | 2014-01-13 | 10 |
| 1 | A | 2014-01-11 | 15 |
| 1 | A | 2014-01-12 | 20 |
| 2 | B | 2014-01-06 | 30 |
| 2 | B | 2014-01-08 | 40 |
| 3 | C | NULL | NULL |
x------x---------x--------------x-------x
Common feature of CROSS APPLY and OUTER APPLY
CROSS APPLY or OUTER APPLY can be used to retain NULL values when unpivoting, which are interchangeable.
Consider you have the below table
x------x-------------x--------------x
| Id | FROMDATE | TODATE |
x------x-------------x--------------x
| 1 | 2014-01-11 | 2014-01-13 |
| 1 | 2014-02-23 | 2014-02-27 |
| 2 | 2014-05-06 | 2014-05-30 |
| 3 | NULL | NULL |
x------x-------------x--------------x
When you use UNPIVOT to bring FROMDATE AND TODATE to one column, it will eliminate NULL values by default.
SELECT ID,DATES
FROM MYTABLE
UNPIVOT (DATES FOR COLS IN (FROMDATE,TODATE)) P
which generates the below result. Note that we have missed the record of Id number 3
x------x-------------x
| Id | DATES |
x------x-------------x
| 1 | 2014-01-11 |
| 1 | 2014-01-13 |
| 1 | 2014-02-23 |
| 1 | 2014-02-27 |
| 2 | 2014-05-06 |
| 2 | 2014-05-30 |
x------x-------------x
In such cases a CROSS APPLY or OUTER APPLY will be useful
SELECT DISTINCT ID,DATES
FROM MYTABLE
OUTER APPLY(VALUES (FROMDATE),(TODATE))
COLUMNNAMES(DATES)
which forms the following result and retains Id where its value is 3
x------x-------------x
| Id | DATES |
x------x-------------x
| 1 | 2014-01-11 |
| 1 | 2014-01-13 |
| 1 | 2014-02-23 |
| 1 | 2014-02-27 |
| 2 | 2014-05-06 |
| 2 | 2014-05-30 |
| 3 | NULL |
x------x-------------x
One real life example would be if you had a scheduler and wanted to see what the most recent log entry was for each scheduled task.
select t.taskName, lg.logResult, lg.lastUpdateDate
from task t
cross apply (select top 1 taskID, logResult, lastUpdateDate
from taskLog l
where l.taskID = t.taskID
order by lastUpdateDate desc) lg
To answer the point above knock up an example:
create table #task (taskID int identity primary key not null, taskName varchar(50) not null)
create table #log (taskID int not null, reportDate datetime not null, result varchar(50) not null, primary key(reportDate, taskId))
insert #task select 'Task 1'
insert #task select 'Task 2'
insert #task select 'Task 3'
insert #task select 'Task 4'
insert #task select 'Task 5'
insert #task select 'Task 6'
insert #log
select taskID, 39951 + number, 'Result text...'
from #task
cross join (
select top 1000 row_number() over (order by a.id) as number from syscolumns a cross join syscolumns b cross join syscolumns c) n
And now run the two queries with a execution plan.
select t.taskID, t.taskName, lg.reportDate, lg.result
from #task t
left join (select taskID, reportDate, result, rank() over (partition by taskID order by reportDate desc) rnk from #log) lg
on lg.taskID = t.taskID and lg.rnk = 1
select t.taskID, t.taskName, lg.reportDate, lg.result
from #task t
outer apply ( select top 1 l.*
from #log l
where l.taskID = t.taskID
order by reportDate desc) lg
You can see that the outer apply query is more efficient. (Couldn't attach the plan as I'm a new user... Doh.)