Datediff between two tables - sql

I have those two tables
1-Add to queue table
TransID , ADD date
10 , 10/10/2012
11 , 14/10/2012
11 , 18/11/2012
11 , 25/12/2012
12 , 1/1/2013
2-Removed from queue table
TransID , Removed Date
10 , 15/1/2013
11 , 12/12/2012
11 , 13/1/2013
11 , 20/1/2013
The TansID is the key between the two tables , and I can't modify those tables, what I want is to query the amount of time each transaction spent in the queue
It's easy when there is one item in each table , but when the item get queued more than once how do I calculate that?

Assuming the order TransIDs are entered into the Add table is the same order they are removed, you can use the following:
WITH OrderedAdds AS
( SELECT TransID,
AddDate,
[RowNumber] = ROW_NUMBER() OVER(PARTITION BY TransID ORDER BY AddDate)
FROM AddTable
), OrderedRemoves AS
( SELECT TransID,
RemovedDate,
[RowNumber] = ROW_NUMBER() OVER(PARTITION BY TransID ORDER BY RemovedDate)
FROM RemoveTable
)
SELECT OrderedAdds.TransID,
OrderedAdds.AddDate,
OrderedRemoves.RemovedDate,
[DaysInQueue] = DATEDIFF(DAY, OrderedAdds.AddDate, ISNULL(OrderedRemoves.RemovedDate, CURRENT_TIMESTAMP))
FROM OrderedAdds
LEFT JOIN OrderedRemoves
ON OrderedAdds.TransID = OrderedRemoves.TransID
AND OrderedAdds.RowNumber = OrderedRemoves.RowNumber;
The key part is that each record gets a rownumber based on the transaction id and the date it was entered, you can then join on both rownumber and transID to stop any cross joining.
Example on SQL Fiddle

DISCLAIMER: There is probably problem with this, but i hope to send you in one possible direction. Make sure to expect problems.
You can try in the following direction (which might work in some way depending on your system, version, etc) :
SELECT transId, (sum(add_date_sum) - sum(remove_date_sum)) / (1000*60*60*24)
FROM
(
SELECT transId, (SUM(UNIX_TIMESTAMP(add_date)) as add_date_sum, 0 as remove_date_sum
FROM add_to_queue
GROUP BY transId
UNION ALL
SELECT transId, 0 as add_date_sum, (SUM(UNIX_TIMESTAMP(remove_date)) as remove_date_sum
FROM remove_from_queue
GROUP BY transId
)
GROUP BY transId;
A bit of explanation: as far as I know, you cannot sum dates, but you can convert them to some sort of timestamps. Check if UNIX_TIMESTAMPS works for you, or figure out something else. Then you can sum in each table, create union by conveniently leaving the other one as zeto and then subtracting the union query.
As for that devision in the end of first SELECT, UNIT_TIMESTAMP throws out miliseconds, you devide to get days - or whatever it is that you want.
This all said - I would probably solve this using a stored procedure or some client script. SQL is not a weapon for every battle. Making two separate queries can be much simpler.

Answer 2: after your comments. (As a side note, some of your dates 15/1/2013,13/1/2013 do not represent proper date formats )
select transId, sum(numberOfDays) totalQueueTime
from (
select a.transId,
datediff(day,a.addDate,isnull(r.removeDate,a.addDate)) numberOfDays
from AddTable a left join RemoveTable r on a.transId = r.transId
order by a.transId, a.addDate, r.removeDate
) X
group by transId
Answer 1: before your comments
Assuming that there won't be a new record added unless it is being removed. Also note following query will bring numberOfDays as zero for unremoved records;
select a.transId, a.addDate, r.removeDate,
datediff(day,a.addDate,isnull(r.removeDate,a.addDate)) numberOfDays
from AddTable a left join RemoveTable r on a.transId = r.transId
order by a.transId, a.addDate, r.removeDate

Related

SQL: How to create supplemental time-series records "out of thin air" from existing records

Suppose I have a table CUSTEVENTS listing customers active in certain months. I now want to consider a customer as being active even if it was in the prior two months.
Simple example, the data might start as:
MONTH_ENDING
CUSTNUM
2022-10-31
72378
2022-11-30
72378
It should be transformed into the following, given the expanded definition of active:
MONTH_ENDING
CUSTNUM
2022-10-31
72378
2022-11-30
72378
**2022-12-31
72378**
**2023-01-31
72378***
I'm arrive at the simplest / most elegant way to get there. I could certainly explode out the data using a time series reference table which would list all the pairs of MONTH_ENDING and "additional" MONTH_ENDING values that "count". Or perhaps I could UNION three subqueries that take the MONTH_ENDING, add_months(MONTH_ENDING,1) add_months(MONTH_ENDING,2). But, maybe there's something even more concise not involving multiple unioned queries or an instrumental time-mapping table.
I happen to be using Teradata but I'm not sure I care about platform-specificity; if there's a Teradata-only approach that works, I'll gladly take it.
The general approach is to first calculate the "Last" event time for a given customer, which is handled by something like
LAG(EVENT_DT) OVER (PARTITION BY CUSTNUM ORDER BY EVENT_DT)
The next concept is islands. You want to calculate that an island begins if the event happened after {your window} has elapsed from the prior one. Vice versa to calculate the island's end.
You can actually find some great online articles about this classic problem: Gaps and Islands problem.
If you understand CTE's, you can probably follow it through this example code I wrote. The first CTE is there to simply allow you to easily add a condition (instead of 1=1) for the events you care about.
WITH CTE_CONDITION AS (
SELECT
EVENT_DT AS dtm,
CUSTNUM
FROM
My_First_Table
WHERE
1 = 1
AND EVENT_DT is not null
),
CTE_LAGGED AS (
SELECT
dtm,
CUSTNUM,
LAG(dtm) OVER (
PARTITION BY CUSTNUM
ORDER BY
dtm
) AS previous_datetime,
LEAD(dtm) OVER (
PARTITION BY CUSTNUM
ORDER BY
dtm
) AS next_datetime,
ROW_NUMBER() OVER (
PARTITION BY CUSTNUM
ORDER BY
CTE_CONDITION.dtm
) AS island_location
FROM
CTE_CONDITION
),
CTE_ISLAND_START AS (
SELECT
ROW_NUMBER() OVER (
PARTITION BY CUSTNUM
ORDER BY
dtm
) AS island_number,
CUSTNUM,
dtm AS island_start_datetime,
island_location AS island_start_location
FROM
CTE_LAGGED
WHERE
(
DATEDIFF(MONTH, previous_datetime, dtm) > 2
OR CTE_LAGGED.previous_datetime IS NULL
)
),
CTE_ISLAND_END AS (
SELECT
ROW_NUMBER() OVER (
PARTITION BY CUSTNUM
ORDER BY
dtm
) AS island_number,
CUSTNUM,
dtm AS island_end_datetime,
island_location AS island_end_location
FROM
CTE_LAGGED
WHERE
DATEDIFF(MONTH, dtm, next_datetime) > 2
OR CTE_LAGGED.next_datetime IS NULL
)
SELECT
CTE_ISLAND_START.CUSTNUM,
CTE_ISLAND_START.island_start_datetime,
CTE_ISLAND_END.island_end_datetime,
DATEDIFF(
MONTH, CTE_ISLAND_START.island_start_datetime,
CTE_ISLAND_END.island_end_datetime
) AS ISLAND_DURATION_MONTH,
(
SELECT
COUNT(*)
FROM
CTE_LAGGED
WHERE
CTE_LAGGED.dtm BETWEEN CTE_ISLAND_START.island_start_datetime
AND CTE_ISLAND_END.island_end_datetime
AND CTE_LAGGED.CUSTNUM = CTE_ISLAND_START.CUSTNUM
AND CTE_LAGGED.CUSTNUM = CTE_ISLAND_START.CUSTNUM
) AS island_row_count
FROM
CTE_ISLAND_START
INNER JOIN CTE_ISLAND_END ON CTE_ISLAND_END.island_number = CTE_ISLAND_START.island_number
AND CTE_ISLAND_START.CUSTNUM = CTE_ISLAND_END.CUSTNUM
I wrote this into a Rasgo template using Snowflake syntax, but only minor adjustments should be needed to get this to work in Teradata.
Once you have this result, then this tells you the periods of activity that include the 2 month window. You can then use a calendar table at each month-begin and query or not whether the customer was "active" or not based on whether that date falls into these active ranges.

Recursive subtraction from two separate tables to fill in historical data

I have two datasets hosted in Snowflake with social media follower counts by day. The main table we will be using going forward (follower_counts) shows follower counts by day:
This table is live as of 4/4/2020 and will be updated daily. Unfortunately, I am unable to get historical data in this format. Instead, I have a table with historical data (follower_gains) that shows net follower gains by day for several accounts:
Ideally - I want to take the follower_count value from the minimum date in the current table (follower_counts) and subtract the sum of gains (organic + paid gains) for each day, until the minimum date of the follower_gains table, to fill in the follower_count historically. In addition, there are several accounts with data in these tables, so it would need to be grouped by account. It should look like this:
I've only gotten as far as unioning these two tables together, but don't even know where to start with looping through these rows:
WITH a AS (
SELECT
account_id,
date,
organizational_entity,
organizational_entity_type,
vanity_name,
localized_name,
localized_website,
organization_type,
total_followers_count,
null AS paid_follower_gain,
null AS organic_follower_gain,
account_name,
last_update
FROM follower_counts
UNION ALL
SELECT
account_id,
date,
organizational_entity,
organizational_entity_type,
vanity_name,
localized_name,
localized_website,
organization_type,
null AS total_followers_count,
organic_follower_gain,
paid_follower_gain,
account_name,
last_update
FROM follower_gains)
SELECT
a.account_id,
a.date,
a.organizational_entity,
a.organizational_entity_type,
a.vanity_name,
a.localized_name,
a.localized_website,
a.organization_type,
a.total_followers_count,
a.organic_follower_gain,
a.paid_follower_gain,
a.account_name,
a.last_update
FROM a
ORDER BY date desc LIMIT 100
UPDATE: Changed union to union all and added not exists to remove duplicates. Made changes per the comments.
NOTE: Please make sure you don't post images of the tables. It's difficult to recreate your scenario to write a correct query. Test this solution and update so that I can make modifications if necessary.
You don't loop through in SQL because its not a procedural language. The operation you define in the query is performed for all the rows in a table.
with cte as (SELECT a.account_id,
a.date,
a.organizational_entity,
a.organizational_entity_type,
a.vanity_name,
a.localized_name,
a.localized_website,
a.organization_type,
(a.follower_count - (b.organic_gain+b.paid_gain)) AS follower_count,
a.account_name,
a.last_update,
b.organic_gain,
b.paid_gain
FROM follower_counts a
JOIN follower_gains b ON a.account_id = b.account_id
AND b.date < (select min(date) from
follower_counts c where a.account.id = c.account_id)
)
SELECT b.account_id,
b.date,
b.organizational_entity,
b.organizational_entity_type,
b.vanity_name,
b.localized_name,
b.localized_website,
b.organization_type,
b.follower_count,
b.account_name,
b.last_update,
b.organic_gain,
b.paid_gain
FROM cte b
UNION ALL
SELECT a.account_id,
a.date,
a.organizational_entity,
a.organizational_entity_type,
a.vanity_name,
a.localized_name,
a.localized_website,
a.organization_type,
a.follower_count,
a.account_name,
a.last_update,
NULL as organic_gain,
NULL as paid_gain
FROM follower_counts a where not exists (select 1 from
follower_gains c where a.account_id = c.account_id AND a.date = c.date)
You could do something like this, instead of using the variable you can just wrap it another bracket and write at end ) AS FollowerGrowth
DECLARE #FollowerGrowth INT =
( SELECT total_followers_count
FROM follower_gains
WHERE AccountID = xx )
-
( SELECT TOP 1 follower_count
FROM follower_counts
WHERE AccountID = xx
ORDER BY date ASCENDING )

Modify my SQL Server query -- returns too many rows sometimes

I need to update the following query so that it only returns one child record (remittance) per parent (claim).
Table Remit_To_Activate contains exactly one date/timestamp per claim, which is what I wanted.
But when I join the full Remittance table to it, since some claims have multiple remittances with the same date/timestamps, the outermost query returns more than 1 row per claim for those claim IDs.
SELECT * FROM REMITTANCE
WHERE BILLED_AMOUNT>0 AND ACTIVE=0
AND REMITTANCE_UUID IN (
SELECT REMITTANCE_UUID FROM Claims_Group2 G2
INNER JOIN Remit_To_Activate t ON (
(t.ClaimID = G2.CLAIM_ID) AND
(t.DATE_OF_LATEST_REGULAR_REMIT = G2.CREATE_DATETIME)
)
where ACTIVE=0 and BILLED_AMOUNT>0
)
I believe the problem would be resolved if I included REMITTANCE_UUID as a column in Remit_To_Activate. That's the REAL issue. This is how I created the Remit_To_Activate table (trying to get the most recent remittance for a claim):
SELECT MAX(create_datetime) as DATE_OF_LATEST_REMIT,
MAX(claim_id) AS ClaimID,
INTO Latest_Remit_To_Activate
FROM Claims_Group2
WHERE BILLED_AMOUNT>0
GROUP BY Claim_ID
ORDER BY Claim_ID
Claims_Group2 contains these fields:
REMITTANCE_UUID,
CLAIM_ID,
BILLED_AMOUNT,
CREATE_DATETIME
Here are the 2 rows that are currently giving me the problem--they're both remitts for the SAME CLAIM, with the SAME TIMESTAMP. I only want one of them in the Remits_To_Activate table, so only ONE remittance will be "activated" per Claim:
enter image description here
You can change your query like this:
SELECT
p.*, latest_remit.DATE_OF_LATEST_REMIT
FROM
Remittance AS p inner join
(SELECT MAX(create_datetime) as DATE_OF_LATEST_REMIT,
claim_id,
FROM Claims_Group2
WHERE BILLED_AMOUNT>0
GROUP BY Claim_ID
ORDER BY Claim_ID) as latest_remit
on latest_remit.claim_id = p.claim_id;
This will give you only one row. Untested (so please run and make changes).
Without having more information on the structure of your database -- especially the structure of Claims_Group2 and REMITTANCE, and the relationship between them, it's not really possible to advise you on how to introduce a remittance UUID into DATE_OF_LATEST_REMIT.
Since you are using SQL Server, however, it is possible to use a window function to introduce a synthetic means to choose among remittances having the same timestamp. For example, it looks like you could approach the problem something like this:
select *
from (
select
r.*,
row_number() over (partition by cg2.claim_id order by cg2.create_datetime desc) as rn
from
remittance r
join claims_group2 cg2
on r.remittance_uuid = cg2.remittance_uuid
where
r.active = 0
and r.billed_amount > 0
and cg2.active = 0
and cg2.billed_amount > 0
) t
where t.rn = 1
Note that that that does not depend on your DATE_OF_LATEST_REMIT table at all, it having been subsumed into the inline view. Note also that this will introduce one extra column into your results, though you could avoid that by enumerating the columns of table remittance in the outer select clause.
It also seems odd to be filtering on two sets of active and billed_amount columns, but that appears to follow from what you were doing in your original queries. In that vein, I urge you to check the results carefully, as lifting the filter conditions on cg2 columns up to the level of the join to remittance yields a result that may return rows that the original query did not (but never more than one per claim_id).
A co-worker offered me this elegant demonstration of a solution. I'd never used "over" or "partition" before. Works great! Thank you John and Gaurasvsa for your input.
if OBJECT_ID('tempdb..#t') is not null
drop table #t
select *, ROW_NUMBER() over (partition by CLAIM_ID order by CLAIM_ID) as ROW_NUM
into #t
from
(
select '2018-08-15 13:07:50.933' as CREATE_DATE, 1 as CLAIM_ID, NEWID() as
REMIT_UUID
union select '2018-08-15 13:07:50.933', 1, NEWID()
union select '2017-12-31 10:00:00.000', 2, NEWID()
) x
select *
from #t
order by CLAIM_ID, ROW_NUM
select CREATE_DATE, MAX(CLAIM_ID), MAX(REMIT_UUID)
from #t
where ROW_NUM = 1
group by CREATE_DATE

Get sum of previous 6 values including the group

I need to sum up the values for the last 7 days,so it should be the current plus the previous 6. This should happen for each row i.e. in each row the column value would be current + previous 6.
The case :-
(Note:- I will calculate the hours,by suming up the seconds).
I tried using the below query :-
select SUM([drivingTime]) OVER(PARTITION BY driverid ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW)
from [f.DriverHseCan]
The problem I face is I have to do grouping on driver,asset for a date
In the above case,the driving time should be sumed up and then,its previous 6 rows should be taken,
I cant do this using rank() because I need these rows as well as I have to show it in the report.
I tried doing this in SSRS and SQL both.
In short it is adding total driving time for current+ 6 previous days
Try the following query
SELECT
s.date
, s.driverid
, s.assetid
, s.drivingtime
, SUM(s2.drivingtime) AS total_drivingtime
FROM f.DriverHseCan s
JOIN (
SELECT date,driverid, SUM(drivingtime) drivingtime
FROM f.DriverHseCan
GROUP BY date,driverid
) AS s2
ON s.driverid = s2.driverid AND s2.date BETWEEN DATEADD(d,-6,s.date) AND s.date
GROUP BY
s.date
, s.driverid
, s.assetid
, s.drivingtime
If you have week start/end dates, there could be better performing alternatives to solve your problem, e.g. use the week number in SSRS expressions rather than do the self join on SQL server
I think aggregation does what you want:
select sum(sum([drivingTime])) over (partition by driverid
order by date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
)
from [f.DriverHseCan]
group by driverid, date
I guess you need to use CROSS APPLY.
Something like following? :
SELECT driverID,
date,
CA.Last6DayDrivingTime
FROM YourTable YT
CROSS APPLY
(
SELECT SUM(drivingTime) AS Last6DayDrivingTime
FROM YourTable CA ON CA.driverID=YT.driverID
WHERE CA.date BETWEEN DATEADD(DAY,-6,YT.date) AND YT.date)
) CA
Edit:
As you commented that cross apply slow down the performance, other option is to pre calculate the week values in temp table or using CTE and then use them in your main query.

Calculating information by using values from previous line

I have the current balance for each account and I need to subtract the netamount for transactions to create the previous month's end balance for the past 24 months. Below is a sample dataset;
create table txn_by_month (
memberid varchar(15)
,accountid varchar(15)
,effective_year varchar(4)
,effective_month varchar(2)
,balance money
,netamt money
,prev_mnthendbal money)
insert into txn_by_month values
(10001,111222333,2012,12,634.15,-500,1134.15)
,(10001,111222333,2012,11,NULL,-1436,NULL)
,(10001,111222333,2012,10,NULL,600,NULL)
,(10002,111333444,2012,12,1544.20,1650,-105.80)
,(10002,111333444,2012,11,NULL,1210,NULL)
,(10002,111333444,2012,10,NULL,-622,NULL)
,(10003,111456456,2012,01,125000,1200,123800)
,(10003,111456456,2011,12,NULL,1350,NULL)
,(10003,111456456,2011,11,NULL,-102,NULL)
As you can see I already have a table of all the transactions for each month totaled up. I just need to calculate the previous month end balance on the first line and bring it down to the second, third line etc. I have been trying to use CTEs, but am not overly familiar with them and seem to be stuck at the moment. This is what I have;
;
WITH CTEtest AS
(SELECT ROW_NUMBER() OVER (PARTITION BY memberid order by(accountid)) AS Sequence
,memberid
,accountid
,prev_mnthendbal
,netamt
FROM txn_by_month)
select c1.memberid
,c1.accountid
,c1.sequence
,c2.prev_mnthendbal as prev_mnthendbal
,c1.netamt,
COALESCE(c2.prev_mnthendbal, 0) - COALESCE(c1.netamt, 0) AS cur_mnthendbal
FROM CTEtest AS c1
LEFT OUTER JOIN CTEtest AS c2
ON c1.memberid = c2.memberid
and c1.accountid = c2.accountid
and c1.Sequence = c2.Sequence + 1
This is working only for the sequence = 2. I know that my issue is that I need to bring my cur_mnthendbal value down into the next line, but I can't seem to wrap my head around how. Do I need another CTE?
Any help would be greatly appreciated!
EDIT: Maybe I need to explain it better.... If I have this;
The balance for line 2 would be the prev_mnthendbal from line 1 ($1,134.15). Then the prev_mnthendbal from line 2 would be the balance - netamt ($1,134.15 - (-$1,436) = $2,570.15). I have been trying to use CTEs, but I can't seem to figure out how to populate the balance field with the prev_mnthendbal from the previous line (since it isn't calculated until the balance is available). Maybe I can't use CTE? Do I need to use cursor?
Turns out that I needed to combine a running total with the sequential CTE I was using to begin with.
;
with CTEtest AS
(SELECT ROW_NUMBER() OVER (PARTITION BY memberid order by effective year, effective month desc) AS Sequence, *
FROM txn_by_month)
,test
as (select * , balance - netamt as running_sum from CTEtest where sequence = 1
union all
select t.*, t1.running_sum - t.netamt from CTEtest t inner join test t1
on t.memberid = t1.memberid and t.sequence = t1.Sequence+1 where t.sequence > 1)
select * from test
order by memberid, Sequence
Hopefully this will help someone else in the future.
See LEAD/LAG analytic functions.