Basic T-SQL user here. I'm having problems trying to complete a task and would appreciate some guidance. Apologies in advance for any errors as English is not my mother tongue.
I have a table with a lot of transactions, for the sake of simplicity let's say that I only have two columns: CUSTOMER_ID, which is my customer and DATE which is the date of the transaction.
My customers make a lot of transactions while they're in town but then they can spend weeks, months or even years before coming back and start making transactions again. I would like to somehow identify each one of those "Trips" and group the transactions involved, then I'd like to do thins like calculate trip duration, number of transactions, etc.
I'd like to consider a Trip as any new transaction occurring after an IDLE period of 10 days.
Let me try to better explain my request by using some simple example:
This is my transactions table:
+-------------+------------+
| CUSTOMER_ID | DATE |
+-------------+------------+
| JHON | 01-01-2016 |
| JHON | 01-02-2016 |
| PEDRO | 01-02-2016 |
| JHON | 01-05-2016 |
| MIKE | 01-05-2016 |
| MIKE | 01-10-2016 |
| JHON | 01-07-2016 |
| … | … |
| JHON | 02-15-2016 |
| JHON | 02-18-2016 |
| MIKE | 02-19-2016 |
| MIKE | 02-19-2016 |
+-------------+------------+
So far I've made this query to enumerate the customer's visits:
SELECT
CUSTOMER_ID,
DATE,
ROW_NUMBER() OVER(PARTITION BY CUSTOMER_ID ORDER BY DATE) as VISIT_NUM
FROM
TRANSACTIONS
WHERE
CUSTOMER_ID IN ('JHON','MIKE','PEDRO')
Running that query would give a result similar to this:
+-------------+------------+-----------+
| CUSTOMER_ID | DATE | VISIT_NUM |
+-------------+------------+-----------+
| JHON | 01-01-2016 | 1 |
| JHON | 01-02-2016 | 2 |
| JHON | 01-07-2016 | 3 |
| JHON | 02-15-2016 | 4 |
| JHON | 02-18-2016 | 5 |
| MIKE | 01-05-2016 | 1 |
| MIKE | 01-10-2016 | 2 |
| MIKE | 02-19-2016 | 3 |
| MIKE | 02-19-2016 | 4 |
| PEDRO | 01-02-2016 | 1 |
+-------------+------------+-----------+
Now comes the tricky part: I need somehow to create a query that (maybe using the above query as a previous step) show me the customer with they trip info, continuing with the example the ideal result would be like this:
+-------------+----------+---------------+-------------+---------------+--------------+
| CUSTOMER_ID | TRIP_NUM | TRIP_START_DT | TRIP_END_DT | TRIP_DURATION | TRANSACTIONS |
+-------------+----------+---------------+-------------+---------------+--------------+
| JHON | 1 | 01-01-2016 | 01-07-2016 | 7 | 3 |
| JHON | 2 | 02-15-2016 | 02-18-2016 | 3 | 2 |
| MIKE | 1 | 01-05-2016 | 01-10-2016 | 5 | 2 |
| MIKE | 2 | 02-19-2016 | 02-19-2016 | 1 | 2 |
| PEDRO | 1 | 01-02-2016 | 01-02-2016 | 1 | 1 |
+-------------+----------+---------------+-------------+---------------+--------------+
As you can see, Mr. Jhon came 3 times during the month of January and came back again in February. As more than 10 days passed from his last transaction in January, I'd like to consider his new set of transactions as a new "trip" for him. Mike also had some activity in January, and came back in February too, in his second trip he made two transactions in the same day, I'd like to account that too. If a customer only came a single day and had some activity (as the case of Mr. Pedro) I'd also like to consider that single-day, single-transaction record as a trip record.
I would greatly appreciate any light on this, I honestly have no idea on how to proceed (I've been reading about cursors but it seems like dark magic at this point, cant figure out a way to implement them on this).
Apologies again for any grammatical errors and any possible omissions on my part. I'd further clarify anything if necessary.
Calculating trip duration is not standard for all employees in your example,so i have tweaked it to follow first customer id for all
DEMO HERE
;with cte
as
(select cid,datee,datepart(month,datee) as monthh,
dense_rank () over (partition by cid order by datepart(month,datee)) as samemonth,
count(0) over (partition by cid,datepart(month,datee) ) as cnt
from #temp
)
,cte1 as
(
select cid,max(samemonth) as tripnumber,min(datee) as startdate,max(datee) as enddate,
max(cnt) as numberoftrips
from cte
group by cid,samemonth
)
select *,datediff(day,startdate,dateadd(day,1,enddate))as duration
from cte1
Output:
cid tripnumber startdate enddate numberoftransactions duration
JHON 1 2016-01-01 2016-01-07 3 7
JHON 2 2016-02-15 2016-02-18 2 4
MIKE 1 2016-01-05 2016-01-10 2 6
MIKE 2 2016-02-19 2016-02-19 2 1
PEDRO 1 2016-01-02 2016-01-02 1 1
I found the perfect answer elsewhere. All credit goes to to the Reddit user nvarscar for the amazing solution!
I'll just copy his/her answer below, in case someone else need it in the future:
You may use a window function feature, which helps you to aggregate
rows between current row and all preceding ones. The code looks too
long, but at least you will see the steps taken.
DECLARE #t TABLE
([CUSTOMER_ID] varchar(5), [DATE] datetime)
;
INSERT INTO #t
([CUSTOMER_ID], [DATE])
VALUES
('JHON', '2016-01-01 00:00:00'),
('JHON', '2016-01-02 00:00:00'),
('PEDRO', '2016-01-02 00:00:00'),
('JHON', '2016-01-05 00:00:00'),
('MIKE', '2016-01-05 00:00:00'),
('MIKE', '2016-01-10 00:00:00'),
('JHON', '2016-01-07 00:00:00'),
('JHON', '2016-02-15 00:00:00'),
('JHON', '2016-02-18 00:00:00'),
('MIKE', '2016-02-19 00:00:00'),
('MIKE', '2016-02-19 00:00:00'),
('JHON', '2016-02-01 00:00:00'),
('JHON', '2016-02-02 00:00:00'),
('PEDRO', '2016-03-02 00:00:00'),
('JHON', '2016-03-05 00:00:00'),
('MIKE', '2016-05-05 00:00:00'),
('MIKE', '2016-05-10 00:00:00'),
('JHON', '2016-03-07 00:00:00'),
('JHON', '2016-04-15 00:00:00'),
('JHON', '2016-04-18 00:00:00'),
('MIKE', '2016-06-19 00:00:00'),
('MIKE', '2016-06-19 00:00:00')
;
WITH CTE1 AS (
SELECT
[CUSTOMER_ID]
, [DATE]
, COUNT(*) AS Transactions
FROM #t
GROUP BY
[CUSTOMER_ID]
, [DATE]
)
, CTE2 AS (
SELECT
[CUSTOMER_ID]
, [DATE]
, Transactions
, DATEDIFF(day,LAG([DATE]) OVER (PARTITION BY [CUSTOMER_ID] ORDER BY [DATE]),[DATE]) AS DaysSinceLastTransaction
FROM CTE1
)
, CTE3 AS (
SELECT
[CUSTOMER_ID]
, [DATE]
, Transactions
, CASE WHEN DaysSinceLastTransaction > 10 THEN 1 ELSE 0 END AS TripTag --Here we set the idle tag
FROM CTE2
)
, CTE4 AS (
SELECT
[CUSTOMER_ID]
, [DATE]
, Transactions
, SUM(TripTag) OVER (PARTITION BY [CUSTOMER_ID] ORDER BY [DATE] ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS TripTag
FROM CTE3
)
SELECT
[CUSTOMER_ID]
, TripTag+1 AS TripNumber
, MIN ([DATE]) AS TripStartDate
, MAX ([DATE]) AS TripEndDate
, DATEDIFF(day, MIN ([DATE]), MAX ([DATE])) AS TripDuration
, SUM(Transactions) AS Transactions
FROM CTE4
GROUP BY [CUSTOMER_ID], TripTag
Related
My scenario started off similar to a Island and Gaps problem, where I needed to find consecutive days of work. My current SQL query answers "ProductA was produced at LocationA from DateA through DateB, totaling X quantity".
However, this does not suffice when I needed to throw prices into the mix. Prices are in a separate table and handled in C# after the fact. Price changes are essentially a list of records that say "ProductA from LocationA is now Y value per unit effective DateC".
The end result is it works as long as the island does not overlap with a price-change date, but if it does overlap, I get a "close" answer, but it's not precise.
The C# code can handle applying the prices efficiently, what I need to do though is split the islands based on price changes. My goal is to make the SQL's partioning take into account the ranking of days from the other table, but I'm having trouble applying what I want to do.
The current SQL that generates my island is as follows
SELECT MIN(ScheduledDate) as StartDate, MAX(ScheduledDate) as
EndDate, ProductId, DestinationId, SUM(Quantity) as TotalQuantity
FROM (
SELECT ScheduledDate, DestinationId, ProductId, PartitionGroup = DATEADD(DAY ,-1 * DENSE_RANK() OVER (ORDER BY ScheduledDate), ScheduledDate), Quantity
FROM History
) tmp
GROUP BY PartitionGroup, DestinationId, ProductId;
The current SQL that takes from the PriceChange table and ranks the dates is as follows
DECLARE #PriceChangeDates TABLE(Rank int, SplitDate Date);
INSERT INTO #PriceChangeDates
SELECT DENSE_RANK() over (ORDER BY EffectiveDate) as Rank, EffectiveDate as SplitDate
FROM ProductPriceChange
GROUP BY EffectiveDate;
My thought is to somehow update the first queries inner SELECT statement to somehow take advantage of the #PriceChangeDates table created by the second query. I would think we can multiply the DATEADD's increment parameter by the rank from the declared table, but I am struggling to write it.
If I was to somehow do this with loops, my thought process would be to determine which rank the ScheduledDate would be from the #PriceChangeDates table, where its rank is the rank of the closest Date that is smaller than itself it can find. Then take whatever rank that gives and, I would think, multiply it by the increment parameter being passed in (or some math, for example doing a *#PriceChangeDates.Count() on the existing parameter and then adding in the new rank to avoid collisions). However, that's "loop" logic not "set" logic, and in SQL I need to think in sets.
Any and all help/advice is greatly appreciated. Thank you :)
UPDATE:
Sample data & example on SQLFiddle: http://www.sqlfiddle.com/#!18/af568/1
Where the data is:
CREATE TABLE History
(
ProductId int,
DestinationId int,
ScheduledDate date,
Quantity float
);
INSERT INTO History (ProductId, DestinationId, ScheduledDate, Quantity)
VALUES
(0, 1000, '20180401', 5),
(0, 1000, '20180402', 10),
(0, 1000, '20180403', 7),
(3, 5000, '20180507', 15),
(3, 5000, '20180508', 23),
(3, 5000, '20180509', 52),
(3, 5000, '20180510', 12),
(3, 5000, '20180511', 14);
CREATE TABLE PriceChange
(
ProductId int,
DestinationId int,
EffectiveDate date,
Price float
);
INSERT INTO PriceChange (ProductId, DestinationId, EffectiveDate, Price)
VALUES
(0, 1000, '20180201', 1),
(0, 1000, '20180402', 2),
(3, 5000, '20180101', 5),
(3, 5000, '20180510', 20);
The desired results would be to have a SQL statement that generates the result:
StartDate EndDate ProductId DestinationId TotalQuantity
2018-04-01 2018-04-01 0 1000 5
2018-04-02 2018-04-03 0 1000 17
2018-05-07 2018-05-09 3 5000 90
2018-05-10 2018-05-11 3 5000 26
To clarify, the end result does need the TotalQuantity of each split amount, so the procedural code that manipulates the results and applies the pricing knows how much of each product was one on each side of the price change to accurately determine the values.
Here is one more variant that is likely to perform better than my first answer. I decided to put it as a second answer, because the approach is rather different and the answer would be too long. You should compare performance of all variants with your real data on your hardware, and don't forget about indexes.
In the first variant I was using APPLY to pick a relevant price for each row in the History table. For each row from the History table the engine is searching for a relevant row from the PriceChange table. Even with appropriate index on the PriceChange table when this is done via a single seek, it still means 3.7 million seeks in a loop join.
We can simply join History and PriceChange tables together and with appropriate indexes on both tables it will be an efficient merge join.
Here I'm also using an extended sample data set to illustrate the gaps. I added these rows to the sample data from the question.
INSERT INTO History (ProductId, DestinationId, ScheduledDate, Quantity)
VALUES
(0, 1000, '20180601', 5),
(0, 1000, '20180602', 10),
(0, 1000, '20180603', 7),
(3, 5000, '20180607', 15),
(3, 5000, '20180608', 23),
(3, 5000, '20180609', 52),
(3, 5000, '20180610', 12),
(3, 5000, '20180611', 14);
Intermediate query
We do a FULL JOIN here, not a LEFT JOIN because it is possible that the date on which the price changed doesn't appear in the History table at all.
WITH
CTE_Join
AS
(
SELECT
ISNULL(History.ProductId, PriceChange.ProductID) AS ProductID
,ISNULL(History.DestinationId, PriceChange.DestinationId) AS DestinationId
,ISNULL(History.ScheduledDate, PriceChange.EffectiveDate) AS ScheduledDate
,History.Quantity
,PriceChange.Price
FROM
History
FULL JOIN PriceChange
ON PriceChange.ProductID = History.ProductID
AND PriceChange.DestinationId = History.DestinationId
AND PriceChange.EffectiveDate = History.ScheduledDate
)
,CTE2
AS
(
SELECT
ProductID
,DestinationId
,ScheduledDate
,Quantity
,Price
,MAX(CASE WHEN Price IS NOT NULL THEN ScheduledDate END)
OVER (PARTITION BY ProductID, DestinationId ORDER BY ScheduledDate
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS grp
FROM CTE_Join
)
SELECT *
FROM CTE2
ORDER BY
ProductID
,DestinationId
,ScheduledDate
Create the following indexes
CREATE UNIQUE NONCLUSTERED INDEX [IX_History] ON [dbo].[History]
(
[ProductId] ASC,
[DestinationId] ASC,
[ScheduledDate] ASC
)
INCLUDE ([Quantity])
CREATE UNIQUE NONCLUSTERED INDEX [IX_Price] ON [dbo].[PriceChange]
(
[ProductId] ASC,
[DestinationId] ASC,
[EffectiveDate] ASC
)
INCLUDE ([Price])
and the join will be an efficient MERGE join in the execution plan (not a LOOP join)
Intermediate result
+-----------+---------------+---------------+----------+-------+------------+
| ProductID | DestinationId | ScheduledDate | Quantity | Price | grp |
+-----------+---------------+---------------+----------+-------+------------+
| 0 | 1000 | 2018-02-01 | NULL | 1 | 2018-02-01 |
| 0 | 1000 | 2018-04-01 | 5 | NULL | 2018-02-01 |
| 0 | 1000 | 2018-04-02 | 10 | 2 | 2018-04-02 |
| 0 | 1000 | 2018-04-03 | 7 | NULL | 2018-04-02 |
| 0 | 1000 | 2018-06-01 | 5 | NULL | 2018-04-02 |
| 0 | 1000 | 2018-06-02 | 10 | NULL | 2018-04-02 |
| 0 | 1000 | 2018-06-03 | 7 | NULL | 2018-04-02 |
| 3 | 5000 | 2018-01-01 | NULL | 5 | 2018-01-01 |
| 3 | 5000 | 2018-05-07 | 15 | NULL | 2018-01-01 |
| 3 | 5000 | 2018-05-08 | 23 | NULL | 2018-01-01 |
| 3 | 5000 | 2018-05-09 | 52 | NULL | 2018-01-01 |
| 3 | 5000 | 2018-05-10 | 12 | 20 | 2018-05-10 |
| 3 | 5000 | 2018-05-11 | 14 | NULL | 2018-05-10 |
| 3 | 5000 | 2018-06-07 | 15 | NULL | 2018-05-10 |
| 3 | 5000 | 2018-06-08 | 23 | NULL | 2018-05-10 |
| 3 | 5000 | 2018-06-09 | 52 | NULL | 2018-05-10 |
| 3 | 5000 | 2018-06-10 | 12 | NULL | 2018-05-10 |
| 3 | 5000 | 2018-06-11 | 14 | NULL | 2018-05-10 |
+-----------+---------------+---------------+----------+-------+------------+
You can see that the Price column has a lot of NULL values. We need to "fill" these NULL values with the preceding non-NULL value.
Itzik Ben-Gan wrote a nice article showing how to solve this efficiently The Last non NULL Puzzle. Also see Best way to replace NULL with most recent non-null value.
This is done in CTE2 using MAX window function and you can see how it populates the grp column. This requires SQL Server 2012+. After the groups are determined we should remove rows where Quantity is NULL, because these rows are not from the History table.
Now we can do the same gaps-and-islands step using the grp column as an additional partitioning.
The rest of the query is pretty much the same as in the first variant.
Final query
WITH
CTE_Join
AS
(
SELECT
ISNULL(History.ProductId, PriceChange.ProductID) AS ProductID
,ISNULL(History.DestinationId, PriceChange.DestinationId) AS DestinationId
,ISNULL(History.ScheduledDate, PriceChange.EffectiveDate) AS ScheduledDate
,History.Quantity
,PriceChange.Price
FROM
History
FULL JOIN PriceChange
ON PriceChange.ProductID = History.ProductID
AND PriceChange.DestinationId = History.DestinationId
AND PriceChange.EffectiveDate = History.ScheduledDate
)
,CTE2
AS
(
SELECT
ProductID
,DestinationId
,ScheduledDate
,Quantity
,Price
,MAX(CASE WHEN Price IS NOT NULL THEN ScheduledDate END)
OVER (PARTITION BY ProductID, DestinationId ORDER BY ScheduledDate
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS grp
FROM CTE_Join
)
,CTE_RN
AS
(
SELECT
ProductID
,DestinationId
,ScheduledDate
,grp
,Quantity
,ROW_NUMBER() OVER (PARTITION BY ProductId, DestinationId, grp ORDER BY ScheduledDate) AS rn1
,DATEDIFF(day, '20000101', ScheduledDate) AS rn2
FROM CTE2
WHERE Quantity IS NOT NULL
)
SELECT
ProductId
,DestinationId
,MIN(ScheduledDate) AS StartDate
,MAX(ScheduledDate) AS EndDate
,SUM(Quantity) AS TotalQuantity
FROM
CTE_RN
GROUP BY
ProductId
,DestinationId
,grp
,rn2-rn1
ORDER BY
ProductID
,DestinationId
,StartDate
;
Final result
+-----------+---------------+------------+------------+---------------+
| ProductId | DestinationId | StartDate | EndDate | TotalQuantity |
+-----------+---------------+------------+------------+---------------+
| 0 | 1000 | 2018-04-01 | 2018-04-01 | 5 |
| 0 | 1000 | 2018-04-02 | 2018-04-03 | 17 |
| 0 | 1000 | 2018-06-01 | 2018-06-03 | 22 |
| 3 | 5000 | 2018-05-07 | 2018-05-09 | 90 |
| 3 | 5000 | 2018-05-10 | 2018-05-11 | 26 |
| 3 | 5000 | 2018-06-07 | 2018-06-11 | 116 |
+-----------+---------------+------------+------------+---------------+
This variant doesn't output the relevant price (as the first variant), because I simplified the "last non-null" query. It wasn't required in the question. In any case, it is pretty easy to add the price if needed.
The straight-forward method is to fetch the effective price for each row of History and then generate gaps and islands taking price into account.
It is not clear from the question what is the role of DestinationID. Sample data is of no help here.
I'll assume that we need to join and partition on both ProductID and DestinationID.
The following query returns effective Price for each row from History.
You need to add index to the PriceChange table
CREATE NONCLUSTERED INDEX [IX] ON [dbo].[PriceChange]
(
[ProductId] ASC,
[DestinationId] ASC,
[EffectiveDate] DESC
)
INCLUDE ([Price])
for this query to work efficiently.
Query for Prices
SELECT
History.ProductId
,History.DestinationId
,History.ScheduledDate
,History.Quantity
,A.Price
FROM
History
OUTER APPLY
(
SELECT TOP(1)
PriceChange.Price
FROM
PriceChange
WHERE
PriceChange.ProductID = History.ProductID
AND PriceChange.DestinationId = History.DestinationId
AND PriceChange.EffectiveDate <= History.ScheduledDate
ORDER BY
PriceChange.EffectiveDate DESC
) AS A
ORDER BY ProductID, ScheduledDate;
For each row from History there will be one seek in this index to pick the correct price.
This query returns:
Prices
+-----------+---------------+---------------+----------+-------+
| ProductId | DestinationId | ScheduledDate | Quantity | Price |
+-----------+---------------+---------------+----------+-------+
| 0 | 1000 | 2018-04-01 | 5 | 1 |
| 0 | 1000 | 2018-04-02 | 10 | 2 |
| 0 | 1000 | 2018-04-03 | 7 | 2 |
| 3 | 5000 | 2018-05-07 | 15 | 5 |
| 3 | 5000 | 2018-05-08 | 23 | 5 |
| 3 | 5000 | 2018-05-09 | 52 | 5 |
| 3 | 5000 | 2018-05-10 | 12 | 20 |
| 3 | 5000 | 2018-05-11 | 14 | 20 |
+-----------+---------------+---------------+----------+-------+
Now a standard gaps-and-island step to collapse consecutive days with the same price together. I use a difference of two row number sequences here.
I've added some more rows to your sample data to see the gaps within the same ProductId.
INSERT INTO History (ProductId, DestinationId, ScheduledDate, Quantity)
VALUES
(0, 1000, '20180601', 5),
(0, 1000, '20180602', 10),
(0, 1000, '20180603', 7),
(3, 5000, '20180607', 15),
(3, 5000, '20180608', 23),
(3, 5000, '20180609', 52),
(3, 5000, '20180610', 12),
(3, 5000, '20180611', 14);
If you run this intermediate query you'll see how it works:
WITH
CTE_Prices
AS
(
SELECT
History.ProductId
,History.DestinationId
,History.ScheduledDate
,History.Quantity
,A.Price
FROM
History
OUTER APPLY
(
SELECT TOP(1)
PriceChange.Price
FROM
PriceChange
WHERE
PriceChange.ProductID = History.ProductID
AND PriceChange.DestinationId = History.DestinationId
AND PriceChange.EffectiveDate <= History.ScheduledDate
ORDER BY
PriceChange.EffectiveDate DESC
) AS A
)
,CTE_rn
AS
(
SELECT
ProductId
,DestinationId
,ScheduledDate
,Quantity
,Price
,ROW_NUMBER() OVER (PARTITION BY ProductId, DestinationId, Price ORDER BY ScheduledDate) AS rn1
,DATEDIFF(day, '20000101', ScheduledDate) AS rn2
FROM
CTE_Prices
)
SELECT *
,rn2-rn1 AS Diff
FROM CTE_rn
Intermediate result
+-----------+---------------+---------------+----------+-------+-----+------+------+
| ProductId | DestinationId | ScheduledDate | Quantity | Price | rn1 | rn2 | Diff |
+-----------+---------------+---------------+----------+-------+-----+------+------+
| 0 | 1000 | 2018-04-01 | 5 | 1 | 1 | 6665 | 6664 |
| 0 | 1000 | 2018-04-02 | 10 | 2 | 1 | 6666 | 6665 |
| 0 | 1000 | 2018-04-03 | 7 | 2 | 2 | 6667 | 6665 |
| 0 | 1000 | 2018-06-01 | 5 | 2 | 3 | 6726 | 6723 |
| 0 | 1000 | 2018-06-02 | 10 | 2 | 4 | 6727 | 6723 |
| 0 | 1000 | 2018-06-03 | 7 | 2 | 5 | 6728 | 6723 |
| 3 | 5000 | 2018-05-07 | 15 | 5 | 1 | 6701 | 6700 |
| 3 | 5000 | 2018-05-08 | 23 | 5 | 2 | 6702 | 6700 |
| 3 | 5000 | 2018-05-09 | 52 | 5 | 3 | 6703 | 6700 |
| 3 | 5000 | 2018-05-10 | 12 | 20 | 1 | 6704 | 6703 |
| 3 | 5000 | 2018-05-11 | 14 | 20 | 2 | 6705 | 6703 |
| 3 | 5000 | 2018-06-07 | 15 | 20 | 3 | 6732 | 6729 |
| 3 | 5000 | 2018-06-08 | 23 | 20 | 4 | 6733 | 6729 |
| 3 | 5000 | 2018-06-09 | 52 | 20 | 5 | 6734 | 6729 |
| 3 | 5000 | 2018-06-10 | 12 | 20 | 6 | 6735 | 6729 |
| 3 | 5000 | 2018-06-11 | 14 | 20 | 7 | 6736 | 6729 |
+-----------+---------------+---------------+----------+-------+-----+------+------+
Now simply group by the Diff to get one row per interval.
Final query
WITH
CTE_Prices
AS
(
SELECT
History.ProductId
,History.DestinationId
,History.ScheduledDate
,History.Quantity
,A.Price
FROM
History
OUTER APPLY
(
SELECT TOP(1)
PriceChange.Price
FROM
PriceChange
WHERE
PriceChange.ProductID = History.ProductID
AND PriceChange.DestinationId = History.DestinationId
AND PriceChange.EffectiveDate <= History.ScheduledDate
ORDER BY
PriceChange.EffectiveDate DESC
) AS A
)
,CTE_rn
AS
(
SELECT
ProductId
,DestinationId
,ScheduledDate
,Quantity
,Price
,ROW_NUMBER() OVER (PARTITION BY ProductId, DestinationId, Price ORDER BY ScheduledDate) AS rn1
,DATEDIFF(day, '20000101', ScheduledDate) AS rn2
FROM
CTE_Prices
)
SELECT
ProductId
,DestinationId
,MIN(ScheduledDate) AS StartDate
,MAX(ScheduledDate) AS EndDate
,SUM(Quantity) AS TotalQuantity
,Price
FROM
CTE_rn
GROUP BY
ProductId
,DestinationId
,Price
,rn2-rn1
ORDER BY
ProductID
,DestinationId
,StartDate
;
Final result
+-----------+---------------+------------+------------+---------------+-------+
| ProductId | DestinationId | StartDate | EndDate | TotalQuantity | Price |
+-----------+---------------+------------+------------+---------------+-------+
| 0 | 1000 | 2018-04-01 | 2018-04-01 | 5 | 1 |
| 0 | 1000 | 2018-04-02 | 2018-04-03 | 17 | 2 |
| 0 | 1000 | 2018-06-01 | 2018-06-03 | 22 | 2 |
| 3 | 5000 | 2018-05-07 | 2018-05-09 | 90 | 5 |
| 3 | 5000 | 2018-05-10 | 2018-05-11 | 26 | 20 |
| 3 | 5000 | 2018-06-07 | 2018-06-11 | 116 | 20 |
+-----------+---------------+------------+------------+---------------+-------+
Not sure that i understand correctly, but this is just my idea:
Select concat_ws(',',view2.StartDate, string_agg(view1.splitDate, ','),
view2.EndDate), view2.productId, view2.DestinationId from (
SELECT DENSE_RANK() OVER (ORDER BY EffectiveDate) as Rank, EffectiveDate as
SplitDate FROM PriceChange GROUP BY EffectiveDate) view1 join
(
SELECT MIN(ScheduledDate) as StartDate, MAX(ScheduledDate) as
EndDate,ProductId, DestinationId, SUM(Quantity) as TotalQuantity
FROM (
SELECT ScheduledDate, DestinationId, ProductId, PartitionGroup =
DATEADD(DAY ,-1 * DENSE_RANK() OVER (ORDER BY ScheduledDate),
ScheduledDate), Quantity
FROM History
) tmp
GROUP BY PartitionGroup, DestinationId, ProductId
) view2 on view1.SplitDate >= view2.StartDate
and view1.SplitDate <=view2.EndDate
group by view2.startDate, view2.endDate, view2.productId,
view2.DestinationId
The result from this query will be:
| ranges | productId | DestinationId |
|---------------------------------------------|-----------|---------------|
| 2018-04-01,2018-04-02,2018-04-03 | 0 | 1000 |
| 2018-05-07,2018-05-10,2018-05-11 | 3 | 5000 |
Then, with any procedure language, for each row, you can split the string (with appropriate inclusive or exclusive rule for each boundary) to find out a list of condition (:from, :to, :productId, :destinationId).
And finally, you can loop through the list of conditions and use Union all clause to build one query (which is the union of all queries, which states a condition) to find out the final result. For example,
Select * from History where ScheduledDate >= '2018-04-01' and ScheduledDate <'2018-04-02' and productId = 0 and destinationId = 1000
union all
Select * from History where ScheduledDate >= '2018-04-02' and ScheduledDate <'2018-04-03' and productId = 0 and destinationId = 1000
----Update--------
Just based on above idea, i do some quick changes to provide your resultset. Maybe you can optimize it later
with view3 as
(Select concat_ws(',',view2.StartDate, string_agg(view1.splitDate, ','),
dateadd(day, 1, view2.EndDate)) dateRange, view2.productId, view2.DestinationId from (
SELECT DENSE_RANK() OVER (ORDER BY EffectiveDate) as Rank, EffectiveDate as
SplitDate FROM PriceChange GROUP BY EffectiveDate) view1 join
(
SELECT MIN(ScheduledDate) as StartDate, MAX(ScheduledDate) as
EndDate,ProductId, DestinationId, SUM(Quantity) as TotalQuantity
FROM (
SELECT ScheduledDate, DestinationId, ProductId, PartitionGroup =
DATEADD(DAY ,-1 * DENSE_RANK() OVER (ORDER BY ScheduledDate),
ScheduledDate), Quantity
FROM History
) tmp
GROUP BY PartitionGroup, DestinationId, ProductId
) view2 on view1.SplitDate >= view2.StartDate
and view1.SplitDate <=view2.EndDate
group by view2.startDate, view2.endDate, view2.productId,
view2.DestinationId
),
view4 as
(
select productId, destinationId, value from view3 cross apply string_split(dateRange, ',')
),
view5 as(
select *, row_number() over(partition by productId, destinationId order by value) rn from view4
),
view6 as (
select v52.value fr, v51.value t, v51.productid, v51. destinationid from view5 v51 join view5 v52
on v51.productid = v52.productid
and v51.destinationid = v52.destinationid
and v51.rn = v52.rn+1
)
select min(h.ScheduledDate) StartDate, max(h.ScheduledDate) EndDate, v6.productId, v6.destinationId, sum(h.quantity) TotalQuantity from view6 v6 join History h
on v6.destinationId = h.destinationId
and v6.productId = h.productId
and h.ScheduledDate >= v6.fr
and h.ScheduledDate <v6.t
group by v6.fr, v6.t, v6.productId, v6.destinationId
And the result is exactly the same with what you gave.
| StartDate | EndDate | productId | destinationId | TotalQuantity |
|------------|------------|-----------|---------------|---------------|
| 2018-04-01 | 2018-04-01 | 0 | 1000 | 5 |
| 2018-04-02 | 2018-04-03 | 0 | 1000 | 17 |
| 2018-05-07 | 2018-05-09 | 3 | 5000 | 90 |
| 2018-05-10 | 2018-05-11 | 3 | 5000 | 26 |
Use outer apply to choose the nearest price, then do a group by:
Live test: http://www.sqlfiddle.com/#!18/af568/65
select
StartDate = min(h.ScheduledDate),
EndDate = max(h.ScheduledDate),
h.ProductId,
h.DestinationId,
TotalQuantity = sum(h.Quantity)
from History h
outer apply
(
select top 1 pc.*
from PriceChange pc
where
pc.ProductId = h.ProductId
and pc.Effectivedate <= h.ScheduledDate
order by pc.EffectiveDate desc
) UpToDate
group by UpToDate.EffectiveDate,
h.ProductId,
h.DestinationId
order by StartDate, EndDate, ProductId
Output:
| StartDate | EndDate | ProductId | DestinationId | TotalQuantity |
|------------|------------|-----------|---------------|---------------|
| 2018-04-01 | 2018-04-01 | 0 | 1000 | 5 |
| 2018-04-02 | 2018-04-03 | 0 | 1000 | 17 |
| 2018-05-07 | 2018-05-09 | 3 | 5000 | 90 |
| 2018-05-10 | 2018-05-11 | 3 | 5000 | 26 |
I have a table in a SQL database that holds information about the hours worked by employees across a number of years. Each employee can have more than one record for a specific date and each employees start date can be different.
I am trying to sum the weekly hours of each employee based on their first week.
So if the employee started on the 17/04/2018 any hours logged in this week would be considered week 1 for this employee and the following week would be week two etc.
For another employee week one could start in a different day/month/year etc.
My data includes the following fields:
Sequence_ID: relates to an individual employee
Date_European: relates to each date an employee has logged hours with the minimum of this being the first date the employee started in the company
Hours: The amount of hours logged
I also have a year field in the data which is the year of the Date_European column.
The below is what I have attempted but I know it isn't even close to the format I need.
select
Sequence_ID
,DATEPART(week,Date_European) AS Week
,DATEPART(year,Date_European) AS Year
,SUM([Hours]) AS Weekly_Hours
from [AB_DCU_IP_2018].[dbo].[mytable]
group by
Sequence_ID
,DATEPART(week,Date_European)
,DATEPART(year,Date_European)
order by
Sequence_ID
,DATEPART(week,Date_European)
,DATEPART(year,Date_European)
I tried to create the 'Week' field. From the above code it just gives me what week of a particular year a date relates to. I then added the 'Year' column to distinguish between different years, but again this only gives me what particular year that is.
Is there any way to create a 'Week' field in the format I am looking for? (Week of earliest date and surrounding dates would be week 1).
I was attempting to use the rank and partition by function by couldn't get this to work properly.
Any help would be greatly appreciated as I have been searching for a solution for hours.
Thanks in advance.
EDIT:
How to create the initial table
CREATE TABLE mytable(Sequence_ID VARCHAR(6) NOT NULL ,Date_European DATE NOT NULL ,Hours NUMERIC(5,1) NOT NULL);
INSERT INTO mytable(Sequence_ID,Date_European,Hours) VALUES ('da6Wrw','09/05/2016',7.3);
INSERT INTO mytable(Sequence_ID,Date_European,Hours) VALUES ('da6Wrw','09/06/2016',7.3);
INSERT INTO mytable(Sequence_ID,Date_European,Hours) VALUES ('da6Wrw','09/07/2016',7.3);
INSERT INTO mytable(Sequence_ID,Date_European,Hours) VALUES ('da6Wrw','09/08/2016',7.3);
INSERT INTO mytable(Sequence_ID,Date_European,Hours) VALUES ('da6Wrw','09/09/2016',7.3);
INSERT INTO mytable(Sequence_ID,Date_European,Hours) VALUES ('da6Wrw','09/12/2016',7.3);
INSERT INTO mytable(Sequence_ID,Date_European,Hours) VALUES ('da6Wrw','09/13/2016',7.3);
INSERT INTO mytable(Sequence_ID,Date_European,Hours) VALUES ('da6Wrw','09/14/2016',7.3);
INSERT INTO mytable(Sequence_ID,Date_European,Hours) VALUES ('da6Wrw','09/15/2016',7.3);
INSERT INTO mytable(Sequence_ID,Date_European,Hours) VALUES ('da6Wrw','09/16/2016',7.3);
INSERT INTO mytable(Sequence_ID,Date_European,Hours) VALUES ('da6Wrw','09/19/2016',7.3);
INSERT INTO mytable(Sequence_ID,Date_European,Hours) VALUES ('da6Wrw','09/20/2016',7.3);
INSERT INTO mytable(Sequence_ID,Date_European,Hours) VALUES ('da6Wrw','09/21/2016',7.3);
INSERT INTO mytable(Sequence_ID,Date_European,Hours) VALUES ('da6Wrw','09/22/2016',7.3);
INSERT INTO mytable(Sequence_ID,Date_European,Hours) VALUES ('da6Wrw','09/23/2016',7.3);
INSERT INTO mytable(Sequence_ID,Date_European,Hours) VALUES ('da6Wrw','09/26/2016',7.3);
INSERT INTO mytable(Sequence_ID,Date_European,Hours) VALUES ('da6Wrw','09/27/2016',7.3);
INSERT INTO mytable(Sequence_ID,Date_European,Hours) VALUES ('da6Wrw','09/28/2016',7.3);
INSERT INTO mytable(Sequence_ID,Date_European,Hours) VALUES ('da6Wrw','09/29/2016',7.3);
INSERT INTO mytable(Sequence_ID,Date_European,Hours) VALUES ('da6Wrw','09/30/2016',7.3);
What I want as the desired outcome:
| Sequence_ID | Date_European | DATEPART(week,Date_European) | Hours | Desired_OutCome_Week |
| da6Wrw | 05/09/2016 | 37 | 7.3 | 1 |
| da6Wrw | 06/09/2016 | 37 | 7.3 | 1 |
| da6Wrw | 07/09/2016 | 37 | 7.3 | 1 |
| da6Wrw | 08/09/2016 | 37 | 7.3 | 1 |
| da6Wrw | 09/09/2016 | 37 | 7.3 | 1 |
| da6Wrw | 12/09/2016 | 38 | 7.3 | 2 |
| da6Wrw | 13/09/2016 | 38 | 7.3 | 2 |
| da6Wrw | 14/09/2016 | 38 | 7.3 | 2 |
| da6Wrw | 15/09/2016 | 38 | 7.3 | 2 |
| da6Wrw | 16/09/2016 | 38 | 7.3 | 2 |
| da6Wrw | 19/09/2016 | 39 | 7.3 | 3 |
| da6Wrw | 20/09/2016 | 39 | 7.3 | 3 |
| da6Wrw | 21/09/2016 | 39 | 7.3 | 3 |
| da6Wrw | 22/09/2016 | 39 | 7.3 | 3 |
| da6Wrw | 23/09/2016 | 39 | 7.3 | 3 |
| da6Wrw | 26/09/2016 | 40 | 7.3 | 4 |
| da6Wrw | 27/09/2016 | 40 | 7.3 | 4 |
| da6Wrw | 28/09/2016 | 40 | 7.3 | 4 |
| da6Wrw | 29/09/2016 | 40 | 7.3 | 4 |
| da6Wrw | 30/09/2016 | 40 | 7.3 | 4 |
Set DateFirst 1
select
Sequence_ID,
(datediff(day , DQ.WeekStarted, Date_European) / 7 + 1) EmployeeWeekNumber
,SUM([Hours]) AS Weekly_Hours
--into [AB_DCU_IP_2018].[dbo].[Weekly_Work_Hours_Employee]
from [AB_DCU_IP_2018].[dbo].[All_IPower_HR_Assurance_4]
CROSS APPLY (SELECT DATEADD(day, -1 * (datepart(weekday,start_date) % 7), start_date) AS WeekStarted
FROM YourTable
WHERE <condition to get the start_date you need>
) DQ
group by
Sequence_ID,
(datediff(day , DQ.WeekStarted, Date_European) / 7 + 1)
order by
Sequence_ID
,DATEPART(week,Date_European)
,DATEPART(year,Date_European)
Here is another approach using the sample data you posted.
select mt.Sequence_ID
, mt.Date_European
, DATEPART(week, mt.Date_European)
, mt.Hours
, MyRow.GroupNum
from mytable mt
join
(
select WeekNum = DATEPART(week,Date_European)
, GroupNum = ROW_NUMBER() over(order by DATEPART(week,Date_European))
from mytable
group by DATEPART(week,Date_European)
) MyRow on MyRow.WeekNum = DATEPART(week, mt.Date_European)
try this
select *,rn-1 [Employee_week] from (
select *,dense_RANK() over(Partition by Sequence_ID order by iif(weekly_hours=0,0,week) ) [rn] from (
select
Sequence_ID
,DATEPART(week,Date_European) AS Week
,DATEPART(year,Date_European) AS Year
,SUM([Hours]) AS Weekly_Hours
--into [AB_DCU_IP_2018].[dbo].[Weekly_Work_Hours_Employee]
from [AB_DCU_IP_2018].[dbo].[All_IPower_HR_Assurance_4]
group by
Sequence_ID
,DATEPART(week,Date_European)
,DATEPART(year,Date_European)
order by
Sequence_ID
,DATEPART(week,Date_European)
,DATEPART(year,Date_European))a)a
where rn = 2
This'll give you the hours each employee worked on their first week, use rn>2 to get the remaining weeks
I actually found an easier way to calculate the week number of the employee that uses the DENSE_Rank function.
I have included this below incase anyone as similar issues. I have commented out the DATEPART sections as I was only using these columns as a check to ensure it was working correctly:
select
Sequence_ID
,Date_European
--,DATEPART(week,Date_European) AS Week
--,DATEPART(year,Date_European) AS Year
,DENSE_RANK() OVER (PARTITION BY Sequence_ID ORDER BY DATEPART(year,Date_European), DATEPART(week,Date_European) asc) AS EmployeeWeekNumber
,Hours
from [AB_DCU_IP_2018].[dbo].[All_IPower_HR_Assurance_4]
order by
Sequence_ID
,Date_European
--,DATEPART(week,Date_European)
--,DATEPART(year,Date_European)
I have a table called finance that I store all payment of the customer. The main columns are: ID,COSTUMERID,DATEPAID,AMOUNTPAID.
What I need is a list of dates by COSTUMERID with dates of its first payment and any other payment that is grater than 1 year of the last one. Example:
+----+------------+------------+------------+
| ID | COSTUMERID | DATEPAID | AMOUNTPAID |
+----+------------+------------+------------+
| 1 | 1 | 2015-01-10 | 10 |
| 2 | 1 | 2016-01-05 | 30 |
| 2 | 1 | 2017-02-20 | 30 |
| 3 | 2 | 2016-03-15 | 100 |
| 4 | 2 | 2017-02-15 | 100 |
| 5 | 3 | 2017-05-01 | 25 |
+----+------------+------------+------------+
What I expect as result:
+------------+------------+
| COSTUMERID | DATEPAID |
+------------+------------+
| 1 | 2015-01-01 |
| 1 | 2017-02-20 |
| 2 | 2016-03-15 |
| 3 | 2017-05-01 |
+------------+------------+
Costumer 1 have 2 dates: the first one + one more that have more then 1 year after the last one.
I hope I make my self clear.
I think you just want lag():
select t.*
from (select t.*,
lag(datepaid) over (partition by customerid order by datepaid) as prev_datepaid
from t
) t
where prev_datepaid is null or
datepaid > dateadd(year, 1, prev_datepaid);
Gordon's solution is correct, as long as you are only looking at the previous row (previous payment) diff, but I wonder if Antonio is looking for payments greater than one year from the last 1 year payment, in which case this becomes a more complex problem to solve. Take the following example:
CREATE TABLE #Test (
CustomerID smallint
,DatePaid date
,AmountPaid smallint )
INSERT INTO #Test
SELECT 1, '2015-1-10', 10
INSERT INTO #Test
SELECT 1, '2016-1-05', 30
INSERT INTO #Test
SELECT 1, '2017-2-20', 30
INSERT INTO #Test
SELECT 1, '2017-6-30', 50
INSERT INTO #Test
SELECT 1, '2018-3-5', 50
INSERT INTO #Test
SELECT 1, '2018-5-15', 50
INSERT INTO #Test
SELECT 2, '2016-3-15', 100
INSERT INTO #Test
SELECT 2, '2017-6-15', 100
WITH CTE AS (
SELECT
CustomerID
,DatePaid
,LAG(DatePaid) OVER (PARTITION BY CustomerID ORDER BY DatePaid) AS PreviousPaidDate
,AmountPaid
FROM #Test )
SELECT
*
,-DATEDIFF(DAY, DatePaid, PreviousPaidDate) AS DayDiff
,CASE WHEN DATEDIFF(DAY, PreviousPaidDate, DatePaid) >= 365 THEN 1 ELSE 0 END AS Paid
FROM CTE
Row number 5 is > 1 year from the last 1 year payment, but subtracting from previous row doesn't address this. This may or may not matter but I wanted to point it out in case that is what he means.
I am having trouble getting exactly what I need. Here are examples of my tables:
Plan_ID | PlanBeginDate | PlanEndDate |
1 | 1/1/2015 | 1/1/2016 |
2 | 1/1/2016 | 1/1/2017 |
3 | 1/1/2013 | 1/1/2014 |
4 | 1/1/2015 | 1/1/2016 |
SrvID | Srv_Plan_ID | Srv_Discipline_ID | SrvBeginDate | SrvEndDate |
1 | 1 | 1 | 1/1/2015 | 1/1/2016 |
2 | 1 | 3 | 1/1/2015 | 1/1/2016 |
3 | 2 | 2 | 1/1/2016 | 4/4/2016 |
4 | 2 | 2 | 4/5/2016 | 1/1/2017 |
5 | 3 | 1 | 1/1/2013 | 6/1/2013 |
6 | 3 | 2 | 1/1/2013 | 1/1/2014 |
7 | 4 | 3 | 1/1/2015 | 7/1/2016 |
8 | 4 | 3 | 8/1/2015 | 1/1/2016 |
I am looking to see all plans that have dates not covered by Service dates for each distinct discipline that is related to it.
Plan 1 should not show up, as both disciplines related to it cover all of the dates.Plan 2 should not show up, because both related services have the same discipline and together cover the entire plan date range.Plan 3 should show up once, because SrvID 5 does not cover the entire plan date range.Plan 4 should show up, because the month of July is uncovered for discipline 3.I need a select statement that would return with the following fields, following the criteria above.
Plan_ID | PlanBeginDate | PlanEndDate |Srv_Discipline_ID | SrvBeginDate | SrvEndDate |
Here is what I have, so far.
Select Plan_ID, Srv_Discipline_ID, PlanBeginningDate, PlanEndDate, MIN(SrvBeginDate) EarliestStartDate, MAX(SrvEndDate) LatestEndDate
From dbo.Plan
JOIN Services
ON Plan_ID = Srv_Plan_ID
GROUP BY Plan_ID, Srv_Discipline_ID, PlanBeginDate, PlanEndDate
ORDER BY Plan_ID
I figured out one way to do this, but it involves costly recursive CTEs to expand the date ranges into individual days that can be joined against. So, to be clear, I'm not expecting performance to be very good with this query.
Note: I also fixed the SrvEndDate value for SrvID = 7 from 7/1/2016 to 7/1/2015 in the test data that I used. That must have been your intention when you said that the month of july was uncovered.
Setup
create table Plans (
Plan_ID int not null primary key,
PlanBeginDate date not null,
PlanEndDate date not null
)
create table Services (
SrvID int not null primary key,
Srv_Plan_ID int not null,
Srv_Discipline_ID int not null,
SrvBeginDate date not null,
SrvEndDate date not null
)
alter table Services
add constraint Services_fk
foreign key (Srv_Plan_ID)
references Plans(Plan_ID)
insert into Plans (Plan_ID, PlanBeginDate, PlanEndDate)
values
(1, '2015-01-01', '2016-01-01'),
(2, '2016-01-01', '2017-01-01'),
(3, '2013-01-01', '2014-01-01'),
(4, '2015-01-01', '2016-01-01')
insert into Services (SrvID, Srv_Plan_ID, Srv_Discipline_ID, SrvBeginDate, SrvEndDate)
values
(1, 1, 1, '2015-01-01', '2016-01-01'),
(2, 1, 3, '2015-01-01', '2016-01-01'),
(3, 2, 2, '2016-01-01', '2016-04-04'),
(4, 2, 2, '2016-04-05', '2017-01-01'),
(5, 3, 1, '2013-01-01', '2013-06-01'),
(6, 3, 2, '2013-01-01', '2014-01-01'),
(7, 4, 3, '2015-01-01', '2015-07-01'),
(8, 4, 3, '2015-08-01', '2016-01-01')
Query
with ServiceCTE as (
select Srv_Plan_ID, Srv_Discipline_ID, SrvEndDate, SrvBeginDate as SrvDate
from Services
union all
select Srv_Plan_ID, Srv_Discipline_ID, SrvEndDate, dateadd(day, 1, SrvDate) as SrvDate
from ServiceCTE
where SrvDate != SrvEndDate
), PlanCTE as (
select Plan_ID, PlanEndDate, PlanBeginDate as PlanDate
from Plans
union all
select Plan_ID, PlanEndDate, dateadd(day, 1, PlanDate) as PlanDate
from PlanCTE
where PlanDate != PlanEndDate
), UncoveredPlanDisciplineCTE as (
select distinct pcte.Plan_ID, s.Srv_Discipline_ID
from PlanCTE pcte
join (select distinct Srv_Plan_ID, Srv_Discipline_ID
from Services) s
on s.Srv_Plan_ID = pcte.Plan_ID
where not exists (select null
from ServiceCTE scte
where scte.Srv_Plan_ID = s.Srv_Plan_ID
and scte.Srv_Discipline_ID = s.Srv_Discipline_ID
and scte.SrvDate = pcte.PlanDate)
)
select p.Plan_ID, p.PlanBeginDate, p.PlanEndDate,
s.Srv_Discipline_ID, s.SrvBeginDate, s.SrvEndDate
from UncoveredPlanDisciplineCTE c
join Plans p
on p.Plan_ID = c.Plan_ID
join Services s
on s.Srv_Plan_ID = c.Plan_ID
and s.Srv_Discipline_ID = c.Srv_Discipline_ID
order by p.Plan_ID, s.Srv_Discipline_ID, s.SrvBeginDate
option (maxrecursion 0)
Result
Plan_ID PlanBeginDate PlanEndDate Srv_Discipline_ID SrvBeginDate SrvEndDate
------- ------------- ----------- ----------------- ------------ ----------
3 2013-01-01 2014-01-01 1 2013-01-01 2013-06-01
4 2015-01-01 2016-01-01 3 2015-01-01 2015-07-01
4 2015-01-01 2016-01-01 3 2015-08-01 2016-01-01
So this question is similar to one I've asked before, but slightly different.
I'm looking at data for clients who are admitted to and discharged from a program. For each admit and discharge they have an assessment done and are scored on it and sometimes they are admitted and discharged multiple times during a time period.
I need to be able to pair each clients admit score with their following discharge date so I can look at all clients who improved a certain amount from admit to discharge for each of their admits and discharges.
This is an dummy sample of how my data results are formatted right now:
And this is how I'd ideally like it formatted:
But I'd take any point in the right direction or similar formatting help that would allow me to be able to compare all of the instances of admit and discharge scores for all the clients.
Thanks!
In order to get the result, you can apply both the UNPIVOT and the PIVOT functions. The UNPIVOT will convert your multiple columns of date and score into rows, then you can pivot those rows back into columns.
Then unpivot syntax will be similar to this:
select person,
casenumber,
ScoreType+'_'+col col,
value,
rn
from
(
select person,
casenumber,
convert(varchar(10), date, 101) date,
cast(score as varchar(10)) score,
scoreType,
row_number() over(partition by casenumber, scoretype
order by case scoretype when 'Admit' then 1 end, date) rn
from yourtable
) d
unpivot
(
value
for col in (date, score)
) unpiv
See SQL Fiddle with Demo. This gives a result:
| PERSON | CASENUMBER | COL | VALUE | RN |
-----------------------------------------------------------
| Jon | 3412 | Discharge_date | 01/03/2013 | 1 |
| Jon | 3412 | Discharge_score | 12 | 1 |
| Al | 3452 | Admit_date | 05/16/2013 | 1 |
| Al | 3452 | Admit_score | 15 | 1 |
| Al | 3452 | Discharge_date | 08/01/2013 | 1 |
| Al | 3452 | Discharge_score | 13 | 1 |
As you can see this query also creates the new columns to then pivot. So the final code will be:
select person, casenumber,
Admit_Date, Admit_Score, Discharge_Date, Discharge_Score
from
(
select person,
casenumber,
ScoreType+'_'+col col,
value,
rn
from
(
select person,
casenumber,
convert(varchar(10), date, 101) date,
cast(score as varchar(10)) score,
scoreType,
row_number() over(partition by casenumber, scoretype
order by case scoretype when 'Admit' then 1 end, date) rn
from yourtable
) d
unpivot
(
value
for col in (date, score)
) unpiv
) src
pivot
(
max(value)
for col in (Admit_Date, Admit_Score, Discharge_Date, Discharge_Score)
) piv;
See SQL Fiddle with Demo. This gives a result:
| PERSON | CASENUMBER | ADMIT_DATE | ADMIT_SCORE | DISCHARGE_DATE | DISCHARGE_SCORE |
-------------------------------------------------------------------------------------
| Al | 3452 | 05/16/2013 | 15 | 08/01/2013 | 13 |
| Cindy | 6578 | 01/02/2013 | 17 | 03/04/2013 | 14 |
| Cindy | 6578 | 03/04/2013 | 14 | 03/18/2013 | 12 |
| Jon | 3412 | (null) | (null) | 01/03/2013 | 12 |
| Kevin | 9868 | 01/18/2013 | 19 | 03/02/2013 | 15 |
| Kevin | 9868 | 03/02/2013 | 15 | (null) | (null) |
| Pete | 4765 | 02/06/2013 | 15 | (null) | (null) |
| Susan | 5421 | 04/06/2013 | 19 | 05/07/2013 | 15 |
SELECT
ad.person, ad.CaseNumber, ad.Date as AdmitScoreDate, ad.Score as AdmitScore,
dis.date as DischargeScoreDate, dis.Score as DischargeScore
From
yourTable ad, yourTable dis
WHERE
ad.person=dis.person
and
ad.ScoreType='Admit'
and d
is.ScoreType='Discharge';
If all the columns you mentioned are in the same table, you can join on same table
SELECT t1.person,
t1.caseNumber,
t1.date adate,
t1.score ascore,
t1.scoreType ascoreType,
t2.date ddate,
t2.score dscore,
t2.scoreType dscoretype
FROM patient t1
join patient t2
on t1.casenumber=t2.casenumber
and t1.scoreType!=t2.scoreType
and t1.scoreType='Admit'
But this will not show you record of people who have been admitted and not discharged yet. I don't know if you were also looking for that information.
SQL Fiddle link
Hope this helps!