Calculate Status Chages and time in between status - sql

I have a table as shown below. I have column that tracks an application and application status changes over time. I track the time it changes the status in the date column. The table is sorted by application and date of status change from oldest to newest.
+--------+-----------+--------+------------------+
| app_id | status_id | row_no | date |
+--------+-----------+--------+------------------+
| 1 | a | 10 | 2016-10-04 21:35 |
| 1 | b | 11 | 2016-10-12 21:50 |
| 1 | c | 12 | 2016-10-25 20:40 |
| 1 | d | 13 | 2016-10-26 16:10 |
| 1 | e | 14 | 2016-10-26 16:10 |
| 2 | a | 20 | 2016-09-15 1:26 |
| 2 | c | 21 | 2016-09-15 21:32 |
| 2 | d | 22 | 2016-09-16 21:51 |
| 2 | e | 23 | 2016-09-16 21:51 |
| 2 | f | 24 | 2016-09-20 22:55 |
| 2 | g | 25 | 2016-10-20 22:46 |
| 2 | g | 26 | 2016-10-20 22:46 |
+--------+-----------+--------+------------------+
I am trying to achieve how much time an application is spending before it reached the final state. Below is a sample of the table That i am trying to build in Sql. For every status i am trying to capture the next status. The previous status column shows the status in that row, while next status shows the next status in the next row for that application. If the application is at its last status then the next status is marked as Last. Next I calculate the time between status by calculating the hours differences between two dates. I would really appreciate if you can tell me how to achieve this functionality in Sql. Thank you in advance.
+--------+-----------+--------+------------------+-----------------+-------------+--------------+
| app_id | status_id | row_no | date | previous status | next status | time between |
+--------+-----------+--------+------------------+-----------------+-------------+--------------+
| 1 | a | 10 | 2016-10-04 21:35 | a | b | 192.2333333 |
| 1 | b | 11 | 2016-10-12 21:50 | b | c | 334.8333333 |
| 1 | c | 12 | 2016-10-25 20:40 | c | d | 43.48333333 |
| 1 | d | 13 | 2016-10-26 16:10 | d | e | 0 |
| 1 | e | 14 | 2016-10-26 16:10 | e | Last | Last |
| 2 | a | 20 | 2016-09-15 1:26 | a | c | 20.08333333 |
| 2 | c | 21 | 2016-09-15 21:32 | c | d | 24.31666667 |
| 2 | d | 22 | 2016-09-16 21:51 | d | e | 0 |
| 2 | e | 23 | 2016-09-16 21:51 | e | f | 97.06666667 |
| 2 | f | 24 | 2016-09-20 22:55 | f | g | 743.8333333 |
| 2 | g | 25 | 2016-10-20 22:46 | g | g | 0 |
| 2 | g | 26 | 2016-10-20 22:46 | g | Last | Last |
+--------+-----------+--------+------------------+-----------------+-------------+--------------+

It's a bit messy, but if you have a known amount of status_id's, you can try making each one in a "with" clause and joining them all together at the end on app_id. And then making a final table which calculates the steps between A and B, between B and C, etc. This wouldn't produce a table like the one you made, however. But it should get all the time differences.
with A_table as (
select
app_id,
date A_status
where status_id = 'a'
)
, B_table (
select
app_id,
date B_status
where status_id = 'b'
)
--MORE STATUS TABLE HERE
, combined_table (
select
a.app_id,
a.A_status,
b.B_status,
--MORE STATUS DATES HERE
from A_table a
left outer join B_table b on a.app_id = b.app_id
--LEFT OUTER JOIN MORE STATUS TABLES ON A_TABLE HERE
--YOU'RE MAKING ONE TABLE WITH EACH APP_ID ON ONE ROW WITH ALL TIME STAMPS
)
select
*,
B_status - A_status A_B
--MORE TIME SUBTRACTIONS HERE
--SINCE YOU'VE OUTER JOINED ABOVE, YOU'LL HAVE COLUMNS FOR ALL POSSIBLE
--STATUS STEPS AND THOSE WHICH DIDN'T HAVE THAT STEP WILL BE NULL
from combined
It's kind of clunky but with a fixed amount of status steps, should get the job done. It doesn't account for which step is the "last step" though. I don't know how important that is. You can always write a case statement that looks at the next step to see if it was null. What you want might be able to be achieved with loops but I've never used those.
Also note that if you have duplicate rows of app_id, status_id and date, like the last two rows in your sample table, you'd need to sort that out in the with tables some how, like only taking the first row, or ranking them.

Using Sql Lead and Lag function we can achive this. Here is the query:
select app_id, status as prev_status, date as prev_date, lead(status) over (partition by app_id order by date) as next_status, lead(date) over (partition by app_id order by date) as next_date from table

Related

Get row for each unique user based on highest column value

I have the following data
+--------+-----------+--------+
| UserId | Timestamp | Rating |
+--------+-----------+--------+
| 1 | 1 | 1202 |
| 2 | 1 | 1198 |
| 1 | 2 | 1204 |
| 2 | 2 | 1196 |
| 1 | 3 | 1206 |
| 2 | 3 | 1194 |
| 1 | 4 | 1198 |
| 2 | 4 | 1202 |
+--------+-----------+--------+
I am trying to find the distribution of each user's Rating, based on their latest row in the table (latest is determined by Timestamp). On the path to that, I am trying to get a list of user IDs and Ratings which would look like the following
+--------+--------+
| UserId | Rating |
+--------+--------+
| 1 | 1198 |
| 2 | 1202 |
+--------+--------+
Trying to get here, I sorted the list on UserId and Timestamp (desc) which gives the following.
+--------+-----------+--------+
| UserId | Timestamp | Rating |
+--------+-----------+--------+
| 1 | 4 | 1198 |
| 2 | 4 | 1202 |
| 1 | 3 | 1206 |
| 2 | 3 | 1194 |
| 1 | 2 | 1204 |
| 2 | 2 | 1196 |
| 1 | 1 | 1202 |
| 2 | 1 | 1198 |
+--------+-----------+--------+
So now I just need to take the top N rows, where N is the number of players. But, I can't do a LIMIT statement as that needs a constant expression, as I want to use count(id) as the input for LIMIT which doesn't seem to work.
Any suggestions on how I can get the data I need?
Cheers!
Andy
This should work:
SELECT test.UserId, Rating FROM test
JOIN
(select UserId, MAX(Timestamp) Timestamp FROM test GROUP BY UserId) m
ON test.UserId = m.UserId AND test.Timestamp = m.Timestamp
If you can use WINDOW FUNCTIONS then you can use the following:
SELECT UserId, Rating FROM(
SELECT UserId, Rating, ROW_NUMBER() OVER (PARTITION BY UserId ORDER BY Timestamp DESC) row_num FROM test
)m WHERE row_num = 1

SQL to display value for different dates

I have a table named Reading_Hist containing columns such as Reading, Date, ID. This table contains history of the readings. example
+----+---------+-------------+
| ID | Reading | ReadingDate |
+----+---------+-------------+
| 1 | 12 | 9/12/2018 |
| 2 | 15 | 9/12/2018 |
| 1 | 16 | 9/5/2018 |
| 4 | 1 | 9/12/2018 |
| 3 | 65 | 9/12/2018 |
| 1 | 23 | 8/29/2018 |
| 3 | 25 | 9/5/2018 |
| 2 | 23 | 9/5/2018 |
| 4 | 3 | 9/5/2018 |
+----+---------+-------------+
I want to write a sql to display each ID and it's current Reading on first column, next reading taken a week before and third reading taken two weeks before and last trend of the reading.
Example Result below.
+----+---------+------+------+-------+
| ID | Current | Wk_1 | Wk_2 | Trend |
+----+---------+------+------+-------+
| 1 | 12 | 16 | 23 | Down |
| 2 | 15 | 23 | NULL | Down |
| 3 | 65 | 25 | NULL | UP |
| 4 | 1 | 3 | NULL | Down |
+----+---------+------+------+-------+
You can use aggregation to get the maximum day of readings per ID. Then left join the current readings, them of the last week and two weeks ago. Use CASE to calculate the trend.
It could look something like:
SELECT x.id,
rh2.reading current,
rh3.reading wk_1,
rh4.reading wk_2,
CASE
WHEN rh2.reading > rh3.reading THEN
'Up'
WHEN rh2.reading < rh3.reading THEN
'Down'
WHEN rh2.reading = rh3.reading THEN
'-'
END trend
FROM (SELECT rh1.id,
max(rh1.reading_date) reading_date
FROM reading_hist rh1
GROUP BY rh1.id) x
LEFT JOIN reading_hist rh2
ON rh2.id = x.id
AND rh2.reading_date = x.reading_date
LEFT JOIN reading_hist rh3
ON rh3.id = x.id
AND rh3.reading_date = dateadd(day, -7, x.reading_date)
LEFT JOIN reading_hist rh4
ON rh4.id = x.id
AND rh4.reading_date = dateadd(day, -14, x.reading_date);
Of course this requires, that there are readings exactly 7 or 14 days from the last day of readings.

How do I select columns whenever they change?

I'm trying to create a slowly changing dimension (type 2 dimension) and am a bit lost on how to logically write it out. Say that we have a source table with a grain of Person | Country | Department | Login Time. I want to create this dimension table with Person | Country | Department | Eff Start time | Eff End Time.
Data could look like this:
Person | Country | Department | Login Time
------------------------------------------
Bob | CANADA | Marketing | 2009-01-01
Bob | CANADA | Marketing | 2009-02-01
Bob | USA | Marketing | 2009-03-01
Bob | USA | Sales | 2009-04-01
Bob | MEX | Product | 2009-05-01
Bob | MEX | Product | 2009-06-01
Bob | MEX | Product | 2009-07-01
Bob | CANADA | Marketing | 2009-08-01
What I want in the Type 2 dimension would look like this:
Person | Country | Department | Eff Start time | Eff End Time
------------------------------------------------------------------
Bob | CANADA | Marketing | 2009-01-01 | 2009-03-01
Bob | USA | Marketing | 2009-03-01 | 2009-04-01
Bob | USA | Sales | 2009-04-01 | 2009-05-01
Bob | MEX | Product | 2009-05-01 | 2009-08-01
Bob | CANADA | Marketing | 2009-08-01 | NULL
Assume that Bob's name, Country and Department hasn't been updated since 2009-08-01 so it's left as NULL
What function would work best here? This is on Netezza, which uses a flavor of Postgres.
Obviously GROUP BY would not work here because of same groupings later on (I added in Bob | CANADA | Marketing at the last row to show this.
EDIT
Including a hash column on Person, Country, and Department, would make sense, correct? Thinking of using logic of
SELECT PERSON, COUNTRY, DEPARTMENT
FROM table t1
where
person = person
AND t1.hash <> hash_function(person, country, department)
Answer
create table so (
person varchar(32)
,country varchar(32)
,department varchar(32)
,login_time date
) distribute on random;
insert into so values ('Bob','CANADA','Marketing','2009-01-01');
insert into so values ('Bob','CANADA','Marketing','2009-02-01');
insert into so values ('Bob','USA','Marketing','2009-03-01');
insert into so values ('Bob','USA','Sales','2009-04-01');
insert into so values ('Bob','MEX','Product','2009-05-01');
insert into so values ('Bob','MEX','Product','2009-06-01');
insert into so values ('Bob','MEX','Product','2009-07-01');
insert into so values ('Bob','CANADA','Marketing','2009-08-01');
/* ************************************************************************** */
with prm as ( --Create an ordinal primary key.
select
*
,row_number() over (
partition by person
order by login_time
) rwn
from
so
), chn as ( --Chain events to their previous and next event.
select
cur.rwn
,cur.person
,cur.country
,cur.department
,cur.login_time cur_login
,case
when
cur.country = prv.country
and cur.department = prv.department
then 1
else 0
end prv_equal
,case
when
(
cur.country = nxt.country
and cur.department = nxt.department
) or nxt.rwn is null --No next record should be equivalent to matching.
then 1
else 0
end nxt_equal
,case prv_equal
when 0 then cur_login
else null
end eff_login_start_sparse
,case
when eff_login_start_sparse is null
then max(eff_login_start_sparse) over (
partition by cur.person
order by rwn
rows unbounded preceding --The secret sauce.
)
else eff_login_start_sparse
end eff_login_start
,case nxt_equal
when 0 then cur_login
else null
end eff_login_end
from
prm cur
left outer join prm nxt on
cur.person = nxt.person
and cur.rwn + 1 = nxt.rwn
left outer join prm prv on
cur.person = prv.person
and cur.rwn - 1 = prv.rwn
), grp as ( --Group by login starts.
select
person
,country
,department
,eff_login_start
,max(eff_login_end) eff_login_end
from
chn
group by
person
,country
,department
,eff_login_start
), led as ( --Change the effective end to be the next start, if desired.
select
person
,country
,department
,eff_login_start
,case
when eff_login_end is null
then null
else
lead(eff_login_start) over (
partition by person
order by eff_login_start
)
end eff_login_end
from
grp
)
select * from led order by eff_login_start;
This code returns the following table.
PERSON | COUNTRY | DEPARTMENT | EFF_LOGIN_START | EFF_LOGIN_END
--------+---------+------------+-----------------+---------------
Bob | CANADA | Marketing | 2009-01-01 | 2009-03-01
Bob | USA | Marketing | 2009-03-01 | 2009-04-01
Bob | USA | Sales | 2009-04-01 | 2009-05-01
Bob | MEX | Product | 2009-05-01 | 2009-08-01
Bob | CANADA | Marketing | 2009-08-01 |
Explanation
I must have solved this four or five times in the past few years and keep neglecting to write it down formally. I'm glad to have the chance to do it, so this is a great question.
When attempting this, I like writing down the problem in matrix form. Here's the input, presuming that all values have the same key in the SCD.
Cv | Ce
----|----
A | 10
A | 11
B | 14
C | 16
D | 18
D | 25
D | 34
A | 40
Where Cv is the value that we'll need to compare against (again, presuming that the key value for the SCD is equal in this data; we'll be partitioning over the key value the entire time so it's irrelevant to the solution) and Ce is the event time.
First, we need an ordinal primary key. I've designated this Ck in the table. This will allow us to join the table to itself to get the previous and next events. I've called these columns Pk (previous key), Nk (next key), Pv, and Nv.
Cv | Ce | Ck | Pk | Pv | Nk | Nv |
----|----|----|----|----|----|----|
A | 10 | 1 | | | 2 | A |
A | 11 | 2 | 1 | A | 3 | B |
B | 14 | 3 | 2 | A | 4 | C |
C | 16 | 4 | 3 | B | 5 | D |
D | 18 | 5 | 4 | C | 6 | D |
D | 25 | 6 | 5 | D | 7 | D |
D | 34 | 7 | 6 | D | 8 | A |
A | 40 | 8 | 7 | D | | |
Now we need some columns to see if we're at the beginning or end of a contiguous event block. I'll call these Pc and Nc, for contiguous. Pc is defined as Pv = Cv => true. 1 represents true and 0 represents false. Nc is defined similarly, except that the null case defaults to true (we'll see why in a minute)
Cv | Ce | Ck | Pk | Pv | Nk | Nv | Pc | Nc |
----|----|----|----|----|----|----|----|----|
A | 10 | 1 | | | 2 | A | 0 | 1 |
A | 11 | 2 | 1 | A | 3 | B | 1 | 0 |
B | 14 | 3 | 2 | A | 4 | C | 0 | 0 |
C | 16 | 4 | 3 | B | 5 | D | 0 | 0 |
D | 18 | 5 | 4 | C | 6 | D | 0 | 1 |
D | 25 | 6 | 5 | D | 7 | D | 1 | 1 |
D | 34 | 7 | 6 | D | 8 | A | 1 | 0 |
A | 40 | 8 | 7 | D | | | 0 | 1 |
Now you can start to see how the 1,1 combination of Pc,Nc is a completely useless record. We know this intuitively, since Bob's Mex/Product combination on the 6th row is pretty much useless information when building an SCD.
So let's get rid of the useless information. I'll add two new columns here: an almost-complete effective start time called Sn and an actually-complete effective end time called Ee. Sn is is populated with Ce when Pc is 0 and Ee is populated with Ce when Nc is 0.
Cv | Ce | Ck | Pk | Pv | Nk | Nv | Pc | Nc | Sn | Ee |
----|----|----|----|----|----|----|----|----|----|----|
A | 10 | 1 | | | 2 | A | 0 | 1 | 10 | |
A | 11 | 2 | 1 | A | 3 | B | 1 | 0 | | 11 |
B | 14 | 3 | 2 | A | 4 | C | 0 | 0 | 14 | 14 |
C | 16 | 4 | 3 | B | 5 | D | 0 | 0 | 16 | 16 |
D | 18 | 5 | 4 | C | 6 | D | 0 | 1 | 18 | |
D | 25 | 6 | 5 | D | 7 | D | 1 | 1 | | |
D | 34 | 7 | 6 | D | 8 | A | 1 | 0 | | 34 |
A | 40 | 8 | 7 | D | | | 0 | 1 | 40 | |
This looks really close, but we still have the problem that we can't group by Cv (person/country/department). What we need is for Sn to populate all those nulls with the previous value of Sn. You could join this table to itself on rwn < rwn and get the maximum, but I'm going to be lazy and use Netezza's analytic functions and the rows unbounded preceding clause. It's a shortcut to the method I just described. So we're going to create another column called Es, efffective start, defined as follows.
case
when Sn is null
then max(Sn) over (
partition by k --key value of the SCD
order by Ck
rows unbounded preceding
)
else Sn
end Es
With that definition, we get this.
Cv | Ce | Ck | Pk | Pv | Nk | Nv | Pc | Nc | Sn | Ee | Es |
----|----|----|----|----|----|----|----|----|----|----|----|
A | 10 | 1 | | | 2 | A | 0 | 1 | 10 | | 10 |
A | 11 | 2 | 1 | A | 3 | B | 1 | 0 | | 11 | 10 |
B | 14 | 3 | 2 | A | 4 | C | 0 | 0 | 14 | 14 | 14 |
C | 16 | 4 | 3 | B | 5 | D | 0 | 0 | 16 | 16 | 16 |
D | 18 | 5 | 4 | C | 6 | D | 0 | 1 | 18 | | 18 |
D | 25 | 6 | 5 | D | 7 | D | 1 | 1 | | | 18 |
D | 34 | 7 | 6 | D | 8 | A | 1 | 0 | | 34 | 18 |
A | 40 | 8 | 7 | D | | | 0 | 1 | 40 | | 40 |
The rest is trivial. Group by Es and grab the max of Ee to obtain this table.
Cv | Es | Ee |
----|----|----|
A | 10 | 11 |
B | 14 | 14 |
C | 16 | 16 |
D | 18 | 34 |
A | 40 | |
If you want to populate the effective end time with the next start, join the table again to itself or use the lead() window function to grab it.

How to partition by a customized sum value?

I have a table with the following columns: customer_id, event_date_time
I'd like to figure out how many times a customer triggers an event every 12 hours from the start of an event. In other words, aggregate the time between events for up to 12 hours by customer.
For example, if a customer triggers an event (in order) at noon, 1:30pm, 5pm, 2am, and 3pm, I would want to return the noon, 2am, and 3pm record.
I've written this query:
select
cust_id,
event_datetime,
nvl(24*(event_datetime - lag(event_datetime) over (partition BY cust_id ORDER BY event_datetime)),0) as difference
from
tbl
I feel like I'm close with this. Is there a way to add something like
over (partition BY cust_id, sum(difference)<12 ORDER BY event_datetime)
EDIT: I'm adding some sample data:
+---------+-----------------+-------------+---+
| cust_id | event_datetime | DIFFERENCE | X |
+---------+-----------------+-------------+---+
| 1 | 6/20/2015 23:35 | 0 | x |
| 1 | 6/21/2015 0:09 | 0.558611111 | |
| 1 | 6/21/2015 0:49 | 0.667777778 | |
| 1 | 6/21/2015 1:30 | 0.688333333 | |
| 1 | 6/21/2015 9:38 | 8.133055556 | |
| 1 | 6/21/2015 10:09 | 0.511111111 | |
| 1 | 6/21/2015 10:45 | 0.600555556 | |
| 1 | 6/21/2015 11:09 | 0.411111111 | |
| 1 | 6/21/2015 11:32 | 0.381666667 | |
| 1 | 6/21/2015 11:55 | 0.385 | x |
| 1 | 6/21/2015 12:18 | 0.383055556 | |
| 1 | 6/21/2015 12:23 | 0.074444444 | |
| 1 | 6/22/2015 10:01 | 21.63527778 | x |
| 1 | 6/22/2015 10:24 | 0.380555556 | |
| 1 | 6/22/2015 10:46 | 0.373611111 | |
+---------+-----------------+-------------+---+
The "x" are the records that should be pulled since they're the first records in the 12 hour block.
If I understand correctly, you want the first record in each 12-hour block where the blocks of time are defined by the first event time.
If so, you need to modify your query to get the difference from the *first * time for each customer. The rest is just arithmetic. The query would look something like this:
with t as (
select cust_id, event_datetime,
(24 * (event_datetime -
coalesce(min(event_datetime) over (partition by cust_id ), 0)
) as difference
from tbl
)
select t.*
from (select t.*,
row_number() over (partition by cust_id, floor(difference / 12)
order by difference) as seqnum
from t
) t
where seqnum = 1;

MS Access SQL query from 3 tables

I have 3 tables shown below in MS Access 2010:
Table: devices
id | device_id | Company | Version | Revision |
-----------------------------------------------
1 | dev_a | Almaras | 1.5.1 | 0.2A |
2 | dev_b | Enigma | 1.5.1 | 0.2A |
3 | dev_c | Almaras | 1.5.1 | 0.2C |
*Field: device_id is Primary Key Unique String
*Field ID is just an auto-number column
Table: activities
id | act_id | act_date | act_type | act_note |
------------------------------------------------
1 | dev_a | 07/22/2013 | usb_axc | ok |
2 | dev_a | 07/23/2013 | usb_axe | ok | (LAST ROW for dev_a)
3 | dev_c | 07/22/2013 | usb_axc | ok | (LAST ROW for dev_c)
4 | dev_b | 07/21/2013 | usb_axc | ok | (LAST ROW for dev_b)
*Field: act_id contains device_id; NOT UNIQUE
*Field ID is just an auto-number column
Table: matrix
id | mat_id | tc | ts | bat | cycles |
-----------------------------------------
1 | dev_a | 2811 | 10 | 99 | 200 |
2 | dev_a | 2911 | 10 | 97 | 400 |
3 | dev_a | 3007 | 10 | 94 | 600 |
4 | dev_a | 3210 | 10 | 92 | 800 | (LAST ROW for dev_d)
5 | dev_b | 1100 | 5 | 98 | 100 |
6 | dev_b | 1300 | 8 | 93 | 200 |
7 | dev_b | 1411 | 11 | 90 | 300 | (LAST ROW for dev_b)
8 | dev_c | 4000 | 27 | 77 | 478 | (LAST ROW for dev_c)
*Field: mat_id contains device_id; NOT UNIQUE
*Field ID is just an auto-number column
Is there any way to query tables to get results as shown below (each device from devices and only last row added [see example output table] from each of the other two tables):
Query Results:
device_id | Company | act_date | act_type | bat | cycles |
------------------------------------------------------------
device_a | Almaras | 07/23/2013 | usb_axe | 92 | 800 |
device_b | Enigma | 07/21/2013 | usb_axc | 90 | 300 |
device_c | Almaras | 07/22/2013 | usb_axc | 77 | 478 |
Any ideas? Thank you in advance for reading and helping me out :)
I think is what you want,
SELECT a.device_id, a.Company,
b.act_date, b.act_type,
c.bat, c.cycles
FROM ((((devices AS a
INNER JOIN activities AS b
ON a.device_id = b.act_id)
INNER JOIN matrix AS c
ON a.device_id = c.mat_id)
INNER JOIN
(
SELECT act_id, MAX(act_date) AS max_date
FROM activities
GROUP BY act_id
) AS d ON b.act_id = d.act_id AND b.act_date = d.max_date)
INNER JOIN
(
SELECT mat_id, MAX(tc) AS max_tc
FROM matrix
GROUP BY mat_id
) AS e ON c.mat_id = e.mat_id AND c.tc = e.max_tc)
The subqueries: d and e separately gets the latest row for every act_id.
Try
SELECT devices.device_id, devices.Company, activities.act_data, activities.act_type, matrix.bat, matrix.cycles
FROM devices
LEFT JOIN activities
ON devices.device_id = activities.act_id
LEFT JOIN matrix
ON devices.device_id = matrix.mat_id;
What do you consider the "last" row in Matrix?
You need to do something like
WHERE act_date in (SELECT max(a.act_date) from activities a where a.mat_id=d.device_id GROUP BY a.mat_id)
and something similar for the join to matrix.