Query to find Cumulative while subtracting other counts - sql

Here is my table structure
Id INT
RecId INT
Dated DATETIME
Status INT
and here is my data.
Status table (contains different statuses)
Id Status
1 Created
2 Assigned
Log table (contains logs for the different statuses that a record went through (RecId))
Id RecId Dated Status
1 1 2013-12-09 14:16:31.930 1
2 7 2013-12-09 14:27:26.620 1
3 1 2013-12-09 14:27:26.620 2
3 8 2013-12-10 11:14:13.747 1
3 9 2013-12-10 11:14:13.747 1
3 8 2013-12-10 11:14:13.747 2
What I need to generate a report from this data in the following format.
Dated Created Assigned
2013-12-09 2 1
2013-12-10 3 1
Here the rows data is calculated date wise. The Created is calculated as (previous record (date) Created count - Previous date Assigned count) + Todays Created count.
For example if on date 2013-12-10 three entries were made to log table out of which two have the status Created while one has the status assigned. So in the desired view that I want to build for report, For date 2013-12-10, the view will return Created as 2 + 1 = 3 where 2 is newly inserted records in log table and 1 is the previous day remaining record count (Created - Assigned) 2 - 1.
I hope the scenario is clear. Please ask me if further information is required.
Please help me with the sql to construct the above view.

This matches the expected result for the provided sample, but may require more testing.
with CTE as (
select
*
, row_number() over(order by dt ASC) as rn
from (
select
cast(created.dated as date) as dt
, count(created.status) as Created
, count(Assigned.status) as Assigned
, count(created.status)
- count(Assigned.status) as Delta
from LogTable created
left join LogTable assigned
on created.RecId = assigned.RecId
and created.status = 1
and assigned.Status = 2
and created.Dated <= assigned.Dated
where created.status = 1
group by
cast(created.dated as date)
) x
)
select
dt.dt
, dt.created + coalesce(nxt.delta,0) as created
, dt.assigned
from CTE dt
left join CTE nxt on dt.rn = nxt.rn+1
;
Result:
| DT | CREATED | ASSIGNED |
|------------|---------|----------|
| 2013-12-09 | 2 | 1 |
| 2013-12-10 | 3 | 1 |
See this SQLFiddle demo

Related

How can I calculate user session time from heart beat data in Presto SQL?

I'm currently recording when user's are active via a heart beat. It's stored in a table like so:
User ID
Minute of Day
1
3
1
4
1
5
1
8
1
9
2
2
2
3
2
4
User ID 1 is active from 3 to 5 but then is inactive from 6 to 7 and then becomes active again from 8 to 9.
User ID 1 was active for 3 minutes: (5-3 + 9-8) = 3
User ID 2 was active for 2 minutes: 4-2 = 2
How can I calculate this using a SQL (Presto) query?
Output should be like so:
User ID
Total Minutes
1
3
2
2
You may try the following which uses the lag function to determine active periods (diff = 1) before summing them
SELECT
USERID,
SUM(diff) as TotalMinutes
FROM (
SELECT
UserId,
(MinuteofDay - LAG(MinuteofDay,1,MinuteofDay) OVER (PARTITION BY UserId ORDER BY MinuteofDay)) as diff
FROM
my_table
) t
WHERE
diff = 1
GROUP BY
UserID;
userid
TotalMinutes
1
3
2
2
View on DB Fiddle

How to select rows with conditional values of one column in SQL

Say I have this table:
id
timeline
1
BASELINE
1
MIDTIME
1
ENDTIME
2
BASELINE
2
MIDTIME
3
BASELINE
4
BASELINE
5
BASELINE
5
MIDTIME
5
ENDTIME
6
MIDTIME
6
ENDTIME
7
RISK
7
RISK
So this is what the data looks like except the data has more observations (few thousands)
How do I get the output so that it will look like this:
id
timeline
1
BASELINE
1
MIDTIME
2
BASELINE
2
MIDTIME
5
BASELINE
5
MIDTIME
How do I select the first two terms of each ID which has 2 specific timeline values (BASELINE and MIDTIME)? Notice id 6 has MIDTIME and ENDTIME,and id 7 has two RISK I don't want these two ids.
I used
SELECT *
FROM df
WHERE id IN (SELECT id FROM df GROUP BY id HAVING COUNT(*)=2)
and got IDs with two timeline values (output below) but don't know how to get rows with only BASELINE and MIDTIME.
id timeline
---|--------|
1 | BASELINE |
1 | MIDTIME |
2 | BASELINE |
2 | MIDTIME |
5 | BASELINE |
5 | MIDTIME |
6 | MIDTIME | ---- dont want this
6 | ENDTIME | ---- dont want this
7 | RISK | ---- dont want this
7 | RISK | ---- dont want this
Many Thanks.
You can try using exists -
DEMO
select * from t t1 where timeline in ('BASELINE','MIDTIME') and
exists
(select 1 from t t2 where t1.id=t2.id and timeline in ('BASELINE','MIDTIME')
group by t2.id having count(distinct timeline)=2)
OUTPUT:
id timeline
1 BASELINE
1 MIDTIME
2 BASELINE
2 MIDTIME
5 BASELINE
5 MIDTIME
I think this query should give you the result you want.
NOTE: As i understand, you don't want the ID where exists a "ENDTIME", and in your sample data, there is an "ENDTIME" for ID 1. I assumed this was an error so i made a query that excludes all id containing "ENDTIME".
WITH CTE AS
(
SELECT
id
FROM
df
WHERE
timeline IN ('ENDTIME', 'RISK')
)
SELECT
id,
timeline
FROM
df
WHERE
id NOT IN (SELECT id FROM CTE);
There's probably a number of ways to do this, here's one way that will pick up BASELINE and MIDTIME rows where only they exist, ensuring there are only 2 rows per returned ID. Without knowing the ordering of timeline, it's not possible to go further I don't think:
SELECT
id
, timeline
FROM (
SELECT
*
, SUM(CASE WHEN timeline = 'BASELINE' THEN 1 ELSE 0 END) OVER (PARTITION BY id) AS BaselineCount
, SUM(CASE WHEN timeline = 'MIDTIME' THEN 1 ELSE 0 END) OVER (PARTITION BY id) AS MidtimeCount
FROM df
WHERE df.timeline IN ('BASELINE', 'MIDTIME')
) subquery
WHERE subquery.BaselineCount > 0
AND subquery.MidtimeCount > 0
GROUP BY
id
, timeline
;

Add auto incrementing number based on column

I am trying to wrap my head around a problem I hit exporting data from one system to another.
Let's say I have a table like:
id | item_num
1 1
2 1
3 2
4 3
5 3
6 3
I need to add a column to the table and update it to contain an incrementing product_num field based on item. This would be the end result given the above table.
id | item_num | product_num
1 1 1
2 1 2
3 2 1
4 3 1
5 3 2
6 3 3
Any ideas on going about this?
Edit: This is being done in Access 2010 from one system to another (sql server source, custom/unknown ODBC driven destination)
Perhaps you could create a view in your SQL Server database and then select from that in Access to insert into your destination.
Possible solutions in SQL Server:
-- Use row_number() to get product_num in SQL Server 2005+:
select id
, item_num
, row_number() over (partition by item_num order by id) as product_num
from MyTable;
-- Use a correlated subquery to get product_num in many databases:
select t.id
, t.item_num
, (select count(*) from MyTable where item_num = t.item_num and id <= t.id) as product_num
from MyTable t;
Same result:
id item_num product_num
----------- ----------- --------------------
1 1 1
2 1 2
3 2 1
4 3 1
5 3 2
6 3 3

Complex Query Involving Search for Contiguous Dates (by Month)

I have a table that contains a list of accounts by month along with a field that indicates activity. I want to search through to find when an account has "died", based on the following criteria:
the account had consistent activity for a contiguous period of months
the account had a spike of activity on a final month (spike = 200% or more of average of all previous contiguous months of activity)
the month immediately following the spike of activity and the next 12 months all had 0 activity
So the table might look something like this:
ID | Date | Activity
1 | 1/1/2010 | 2
2 | 1/1/2010 | 3.2
1 | 2/3/2010 | 3
2 | 2/3/2010 | 2.7
1 | 3/2/2010 | 8
2 | 3/2/2010 | 9
1 | 4/6/2010 | 0
2 | 4/6/2010 | 0
1 | 5/2/2010 | 0
2 | 5/2/2010 | 2
So in this case both accounts 1 and 2 have activity in months Jan - Mar. Both accounts exhibit a spike of activity in March. Both accounts have 0 activity in April. Account 2 has activity again in May, but account 1 does not. Therefore, my query should return Account 1, but not Account 2. I would want to see this as my query result:
ID | Last Date
1 | 3/2/2010
I realize this is a complicated question and I'm not expecting anyone to write the whole query for me. The current best approach I can think of is to create a series of sub-queries and join them, but I don't even know what the subqueries would look like. For example: how do I look for a contiguous series of rows for a single ID where activity is all 0 (or all non-zero?).
My fall-back if the SQL is simply too involved is to use a brute-force search using Java where I would first find all unique IDs, and then for each unique ID iterate across the months to determine if and when the ID "died".
Once again: any help to move in the right direction is very much appreciated.
Processing in Java, or partially processing in SQL, and finishing the processing in Java is a good approach.
I'm not going to tackle how to define a spike.
I will suggest that you start with condition 3. It's easy to find the last non-zero value. Then that's the one you want to test for a spike, and consistant data before the spike.
SELECT out.*
FROM monthly_activity out
LEFT OUTER JOIN monthly_activity comp
ON out.ID = comp.ID AND out.Date < comp.Date AND comp.Activity <> 0
WHERE comp.Date IS NULL
Not bad, but you don't want the result if this is because the record is the last for the month, so instead,
SELECT out.*
FROM monthly_activity out
INNER JOIN monthly_activity comp
ON out.ID = comp.ID AND out.Date < comp.Date AND comp.Activity == 0
GROUP BY out.ID
Probably not the world's most efficient code, but I think this does what you're after:
declare #t table (AccountId int, ActivityDate date, Activity float)
insert #t
select 1, '2010-01-01', 2
union select 2, '2010-01-01', 3.2
union select 1, '2010-02-03', 3
union select 2, '2010-02-03', 2.7
union select 1, '2010-03-02', 8
union select 2, '2010-03-02', 9
union select 1, '2010-04-06', 0
union select 2, '2010-04-06', 0
union select 1, '2010-05-02', 0
union select 2, '2010-05-02', 2
select AccountId, ActivityDate LastActivityDate --, Activity
from #t a
where
--Part 2 --select only where the activity is a peak
Activity >= isnull
(
(
select 2 * avg(c.Activity)
from #t c
where c.AccountId = 1
and c.ActivityDate >= isnull
(
(
select max(d.ActivityDate)
from #t d
where d.AccountId = c.AccountId
and d.ActivityDate < c.ActivityDate
and d.Activity = 0
)
,
(
select min(e.ActivityDate)
from #t e
where e.AccountId = c.AccountId
)
)
and c.ActivityDate < a.ActivityDate
)
, Activity + 1 --Part 1 (i.e. if no activity before today don't include the result)
)
--Part 3
and not exists --select only dates which have had no activity for the following 12 months on the same account (assumption: count no record as no activity / also ignore current date in this assumption)
(
select 1
from #t b
where a.AccountId = b.AccountId
and b.Activity > 0
and b.ActivityDate between dateadd(DAY, 1, a.ActivityDate) and dateadd(YEAR, 1, a.ActivityDate)
)

Implementing Hierarchy in SQL

Suppose I have a table which has a "CDATE" representing the date when I retrieved the data, a "SECID" identifying the security I retrieved data for, a "SOURCE" designating where I got the data and the "VALUE" which I got from the source. My data might look as following:
CDATE | SECID | SOURCE | VALUE
--------------------------------
1/1/2012 1 1 23
1/1/2012 1 5 45
1/1/2012 1 3 33
1/4/2012 2 5 55
1/5/2012 1 5 54
1/5/2012 1 3 99
Suppose I have a HIERARCHY table like the following ("SOURCE" with greatest HIERARCHY number takes precedence):
SOURCE | NAME | HIERARCHY
---------------------------
1 ABC 10
3 DEF 5
5 GHI 2
Now let's suppose I want my results to be picked according to the hierarchy above. So applying the hierarch and selecting the source with the greatest HIERARCHY number I would like to end up with the following:
CDATE | SECID | SOURCE | VALUE
---------------------------------
1/1/2012 1 1 23
1/4/2012 2 5 55
1/5/2012 1 3 99
This joins on your hierarchy and selects the top-ranked source for each date and security.
SELECT CDATE, SECID, SOURCE, VALUE
FROM (
SELECT t.CDATE, t.SECID, t.SOURCE, t.VALUE,
ROW_NUMBER() OVER (PARTITION BY t.CDATE, t.SECID
ORDER BY h.HIERARCHY DESC) as nRow
FROM table1 t
INNER JOIN table2 h ON h.SOURCE = t.SOURCE
) A
WHERE nRow = 1
You can get the results you want with the below. It combines your data with your hierarchies and ranks them according to the highest hierarchy. This will only return one result arbitrarily though if you have a source repeated for the same date.
;with rankMyData as (
select
d.CDATE
, d.SECID
, d.SOURCE
, d.VALUE
, row_number() over(partition by d.CDate, d.SECID order by h.HIERARCHY desc) as ranking
from DATA d
inner join HIERARCHY h
on h.source = d.source
)
SELECT
CDATE
, SECID
, SOURCE
, VALUE
FROM rankMyData
where ranking = 1