Rank based on sequence of dates - sql

I am having data as below
**Heading Date**
A 2009-02-01
B 2009-02-03
c 2009-02-05
d 2009-02-06
e 2009-02-08
I need rank as below
Heading Date Rank
A 2009-02-01 1
B 2009-02-03 2
c 2009-02-05 1
d 2009-02-06 2
e 2009-02-07 3
As I need rank based on date. If the date is continuous the rank should be 1, 2, 3 etc. If there is any break on dates I need to start over with 1, 2, ...
Can any one help me on this?

SELECT heading, thedate
,row_number() OVER (PARTITION BY grp ORDER BY thedate) AS rn
FROM (
SELECT *, thedate - (row_number() OVER (ORDER BY thedate))::int AS grp
FROM demo
) sub;
While you speak of "rank" you seem to want the result of the window function row_number().
Form groups of consecutive days (same date in grp) in subquery sub.
Number rows with another row_number() call, this time partitioned by grp.
One subquery is the bare minimum here, since window functions cannot be nested.
SQL Fiddle.
Note that I went with the second version of your contradictory sample data. And the result is as #mu suggested in his comment.
Also assuming that there are no duplicate dates. You'd have to aggregate first in this case.

Hi this is not correct answer, I am trying.. It is interesting..:) I am posting what I got so far: sqlfiddle
SELECT
rank() over (order by thedate asc) as rank,
heading, thedate
FROM
demo
Order by
rank asc;
Now I am trying to get the break in dates. I don't know how? But may be these links useful
SQL — computing end dates from a given start date with arbitrary
breaks
How to rank in postgres query
I will update if I got anything.
Edit:
I got this for mysql, I am posting this because it may helpful. Check Emulate Row_Number()
Here
Given a table with two columns i and j, generate a resultset that has
a derived sequential row_number column taking the values 1,2,3,... for
a defined ordering of j which resets to 1 when the value of i changes

Bangalore BLR - Bagmane Tech Park 2013-10-11 Data Centre 0
Bangalore BLR - Bagmane Tech Park 2013-10-11 BMS 0
Bangalore BLR - Bagmane Tech Park 2013-10-12 BMS 0
Bangalore BLR - Bagmane Tech Park 2013-10-15 BMS 3
I am having data lyk this..
If last column is zero the rank should be made based on all columns..If the date is continuous
like 2013-10-11 ,2013-10-12 rank should be 1,2...
If there is any break in date 2013-10-11 ,2013-10-12 and 2013-10-15 again the rank should start from 1 for 2013-10-15

Related

How to get correct min, max date for each customer's changing label in wide format in BigQuery?

I have a table that records customer purchases, for example:
customer_id
label
date
purchase_id
price
2
A
2022-01-01
asd
10
3
A
2022-01-01
asdf
5
4
B
2022-02-04
asdfg
200
2
A
2022-01-03
asdjg
4
3
B
2022-02-01
dfs
20
2
G
2022-04-05
fdg
40
2
G
2022-04-10
fdg
40
2
A
2022-06-06
fgd
20
I want to see how many days/money each customer has spent in each label, so far what I'm doing is:
SELECT
customer_id,
label,
COUNT(DISTINCT(purchase_id) as orders_count,
SUM(price) as total_spent,
min(date) as first_date,
max(date) as last_date,
DATE_DIFF(max(date), min(date), DAY) as days
FROM
TABLE
WHERE
date > '2022-01-01'
GROUP BY
customer_id,
label
which gives me a long table, like this:
customer_id
label
orders_count
total_spent
first_date
last_date
days
2
A
3
34
2022-01-01
2022-06-06
180
2
G
1
40
2022-04-05
2022-04-10
5
etc
Just for simplicity I show a few columns, but customers have orders all the time. The issue with the above is that, for example for customer 2, that he starts with label A, then changes to G, then he is back to A so this is not visible in the results table (min(date) is correct, but max(date) takes their 2nd A max(date)) and that I'd prefer to have it in wide format. For instance, ideally, columns called next_label_{i} that you get values for each changing label would be the best for me.
Could you advise me of a way of a) dealing with accomodating with this label change(future label change is the same as an earlier label) and b) a way to produce it into a wide format?
Thanks
edit:
example output (correct date, wide format) [columns would go as wide as the max number of unique labels for any customer]
customer_id
first_label
first_first_date
first_last_date
first_total_spent
first_days
next_label
next_first_date
next_last_date
next_days
next_label_2
next_first_date_2
next_last_date_2
next_days_2
2
A
2022-01-01
2022-01-03
2
14
G
2022-04-05
2022-04-05
0
A
2022-06-06
2022-06-06
0
etc
Sorry this is not exactly accurate (missing the orders_count, total_spent) but it's a pain in the ass for format it here, but hopefully you get the idea. In principle, it's something as if you used python's pivot_table on the previous dataset.
Alternatively, I'd be glad for just a solution in the long format that distinguishes between a customer's label and the same customer's repeated label ( as in customer 2 who starts with A and after changing to G, returns to A)
Could you advise me of ... b) a way to produce it into a wide format?
First, I want to say that I hope you have really good reason to get that output as usually it is not what is considered a best practices and rather is being left for presentation layer to handle.
With that in mind - consider below approach
select * from (
select customer_id, offset, purchase.*
from (
select customer_id,
array_agg((struct(label, date, purchase_id, price)) order by date) purchases
from your_table
group by customer_id
), unnest(purchases) purchase with offset
order by customer_id, offset
)
pivot (
any_value(label) label,
any_value(date) date,
any_value(purchase_id) purchase_id,
any_value(price) price
for offset in (0,1,2,3,4,5)
)
if applied to sample data in your question - output is
Note: Above has silly assumption that you know the max number of steps (in this case I used 6 - from 0 till 5). There are plenty of posts here on SO that shows how to use same technique to make it dynamic. I do not want to duplicate them as it is against SO policies. So, just do your extra homework on this :o)

Oracle - Count based on previous and next column

I've got a rather unusual question about some database query with oracle.
I got asked if it's possible to get the number of cases where the patient got a resumption on the same station they were discharged from within 48 / 72 hours.
Consider the following example:
Case
Station
From
To
1
Stat_1
2020-01-03 20:10:00
2020-01-04 17:40:00
1
Stat_2
2020-01-04 17:40:00
2020-01-05 09:35:00
1
Stat_1
2020-01-05 09:35:00
2020-01-10 12:33:00
In this example, I'd have to check the difference between the last discharge time from station one and the first admission time when he's again registered at station 1. This should then count as one readmission.
I've tried some stuff with LAG and LEAD, but you can't use them in the WHERE-Clause, so that's not too useful I guess.
LAG (o.OEBENEID, 1, 0) OVER (ORDER BY vfs.GUELTIG_BIS) AS Prev_Stat,
LEAD (o.OEBENEID, 1, 0) OVER (ORDER BY vfs.GUELTIG_BIS) AS Next_Stat,
LAG (vfs.GUELTIG_BIS, 1) OVER (ORDER BY vfs.GUELTIG_BIS) AS End_Prev_Stat,
LEAD (vfs.GUELTIG_AB, 1) OVER (ORDER BY vfs.GUELTIG_AB) AS Begin_Next_Stat
I am able to get the old values, but I can't do something like calculate the difference between those two dates.
Is this even possible to achieve? I can't really wrap my head around how to do it with SQL.
Thanks in advance!
You need a partition by clause to retrieve the previous discharge date of the same user in the same station. Then, you can filter in an outer query:
select count(*) as cnt
from (
select case_no, station, dt_from, dt_to
lag(dt_to) over(partition by case_no, station order by dt_from) as lag_dt_to
from mytable t
) t
where dt_from < lag_dt_to + 2
This counts how many rows have a gap of less than 2 days with the previous discharge date of the same user in the same station.
This assumes that your are string your dates as dates. If you have timestamps instead, you need interval arithmetics, so:
where dt_from < lag_dt_to + interval '2' day
Note that case, from and to are reserverd words in Oracle: I used alternative names in the query.

SQL Flag consecutive (follow) records

I am relatively new to SQL and I tried to look for a similar question but I was not sure if the question related to my problem or that the answer might have been above skill level.
I think that the question is simple but I am not sure if the solution is simple.
I have the following sql table output
Room Name Time in Room Turnover Date
11 Mansson 740 NA 1/21/2017
11 Klein 841 NA 1/21/2017
11 Klein 1035 28 1/21/2017
I would like to write a query where I can flag fields where the following records are consecutive - Room, Name, Date.
This would flag the last 2 rows where Name is Klein.
Is there a way to do this, if yes can please guide me.
You can add a room/name/date flag using ANSI standard window functions:
select t.*,
(case when count(*) over (partition by room, name, date) > 1
then 1 else 0
end) as HasDuplicatesFlag
from t;

Exponential decay in SQL for different dates page views

I have a different dates with the amount of products viewed on a webpage over a 30 day time frame. I am trying to create a exponential decay model in SQL. I am using exponential decay because I want to highlight the latest events over older ones. I not sure how to write this in SQL without getting an error. I have never done this before with this type of model so want to make sure I am doing it correctly too.
=================================
Data looks like this
product views date
a 1 2014-05-15
a 2 2014-05-01
b 2 2014-05-10
c 4 2014-05-02
c 1 2014-05-12
d 3 2014-05-11
================================
Code:
create table decay model as
select product,views,date
case when......
from table abc
group by product;
not sure what to write to do the model
I want to penalize products that were viewed that were older vs products that were viewed more recently
Thank you for your help
You can do it like this:
Choose the partition in which you want to apply exponential decay, then order descending by date within such a group.
use the function ROW_NUMBER() with ascendent ordering to get the row numbering within each subgroup.
calculate pow(your_variable_in_[0,1], rownum) and apply it to your result.
Code might look like this (might work in Oracle SQL or db2):
SELECT <your_partitioning>, date, <whatever>*power(<your_variable>,rownum-1)
FROM (SELECT a.*
, ROW_NUMBER() OVER (PARTITION BY <your_partitioning> ORDER BY a.date DESC) AS rownum
FROM YOUR_TABLE a)
ORDER BY <your_partitioning>, date DESC
EDIT: I read again over your problem and think I understood now what you asked for, so here is a solution which might work (decay factor is 0.9 here):
SELECT product, sum(adjusted_views) // (i)
FROM (SELECT product, views*power(0.9, rownum-1) AS adjusted_views, date, rownum // (ii)
FROM (SELECT product, views, date // (iii)
, ROW_NUMBER() OVER (PARTITION BY product ORDER BY a.date DESC) AS rownum
FROM YOUR_TABLE a)
ORDER BY product, date DESC)
GROUP BY product
The inner select statement (iii) creates a temporary table that might look like this
product views date rownum
--------------------------------------------------
a 1 2014-05-15 1
a 2 2014-05-14 2
a 2 2014-05-13 3
b 2 2014-05-10 1
b 3 2014-05-09 2
b 2 2014-05-08 3
b 1 2014-05-07 4
The next query (ii) then uses the rownumber to construct an exponentially decaying factor 0.9^(rownum-1) and applies it to views. The result is
product adjusted_views date rownum
--------------------------------------------------
a 1 * 0.9^0 2014-05-15 1
a 2 * 0.9^1 2014-05-14 2
a 2 * 0.9^2 2014-05-13 3
b 2 * 0.9^0 2014-05-10 1
b 3 * 0.9^1 2014-05-09 2
b 2 * 0.9^2 2014-05-08 3
b 1 * 0.9^3 2014-05-07 4
In a last step (the outer query) the adjusted views are summed up, as this seems to be the quantity you are interested in.
Note, however, that in order to be consistent there should be regular distances between the dates, e.g., always on day (--not one day here and a month there, because these will be weighted in a similar fashion although they shouldn't).

Oracle Database Temporal Query Implementation - Collapse Date Ranges

This is the result of one of my queries:
SURGERY_D
---------
01-APR-05
02-APR-05
03-APR-05
04-APR-05
05-APR-05
06-APR-05
07-APR-05
11-APR-05
12-APR-05
13-APR-05
14-APR-05
15-APR-05
16-APR-05
19-APR-05
20-APR-05
21-APR-05
22-APR-05
23-APR-05
24-APR-05
26-APR-05
27-APR-05
28-APR-05
29-APR-05
30-APR-05
I want to collapse the date ranges which are continuous, into intervals. For examples,
[01-APR-05, 07-APR-05], [11-APR-05, 16-APR-05] and so on.
In terms of temporal databases, I want to 'collapse' the dates. Any idea how to do that on Oracle? I am using version 11. I searched for it and read a book but couldn't find/understand how to do it. It might be simple, but everyone has their own flaws and Oracle is mine. Also, I am new to SO so my apologies if I have violated any rules. Thank You!
You can take advantage of the ROW_NUMBER analytical function to generate a unique, sequential number for each of the records (we'll assign that number to the dates in ascending order).
Then, you group the dates by difference between the date and the generated number - the consecutive dates will have the same difference:
Date Number Difference
01-APR-05 1 1 -- MIN(date_val) in group with diff. = 1
02-APR-05 2 1
03-APR-05 3 1
04-APR-05 4 1
05-APR-05 5 1
06-APR-05 6 1
07-APR-05 7 1 -- MAX(date_val) in group with diff. = 1
11-APR-05 8 3 -- MIN(date_val) in group with diff. = 3
12-APR-05 9 3
13-APR-05 10 3
14-APR-05 11 3
15-APR-05 12 3
16-APR-05 13 3 -- MAX(date_val) in group with diff. = 3
Finally, you select the minimal and maximal date in each of the groups to get the beginning and ending of each range.
Here's the query:
SELECT
MIN(date_val) start_date,
MAX(date_val) end_date
FROM (
SELECT
date_val,
row_number() OVER (ORDER BY date_val) AS rn
FROM date_tab
)
GROUP BY date_val - rn
ORDER BY 1
;
Output:
START_DATE END_DATE
------------ ----------
01-04-2005 07-04-2005
11-04-2005 16-04-2005
19-04-2005 24-04-2005
26-04-2005 30-04-2005
You can check how that works on SQLFidlle: Dates ranges example