Cumulative open subscriptions with start_date and end_date on Redshift - sql

I am trying to write a query that will allow to me to count the number of active subscriptions by day in Redshift.
I have the following table:
sub_id | start_date | end_date
---------------------------------------
20001 | 2017-09-01 | NULL
20002 | 2017-08-01 | 2017-08-29
20003 | 2016-01-01 | 2017-04-25
20004 | 2016-07-01 | 2017-09-03
I would like to be able to state, for each date between two dates how many subscriptions are active, such that:
date | active_subs
------------------------
2016-06-30 | 1
2016-07-01 | 2
... |
2017-04-24 | 2
2017-04-25 | 1
... |
2017-07-31 | 1
2017-08-01 | 2
... |
2017-08-28 | 2
2017-08-29 | 1
2017-08-30 | 1
2017-08-31 | 1
2017-09-01 | 2
2017-09-02 | 2
2017-09-03 | 1
I have a reference table from which a query can draw 1 row per day with the table name of date and the relevant column being date.ref_date (in the YYYY-MM-DD format)
Do i write this query using window functions or is there a better way?
Thanks

If I understood you correctly, you don't need nor window functions, joins(except to the date table) or cumulative count. You can do this:
SELECT t.date,
COUNT(s.sub_id) as active_subs
FROM dateTable t
LEFT JOIN YourTable s
ON(t.dateCol between s.start_date
AND COALESCE(s.end_date,<Put A late date here>))
GROUP BY t.date

I would do this as:
with cte as (
select start_date as dte, 1 as inc
from t
union all
select coalesce(end_date, current_date), -1 as inc
from t
)
select dte,
sum(sum(inc)) over (order by dte)
from cte
group by dte
order by dte;
There may be off-by-one errors, depending on whether you count stops on the date given or on the next day.

Related

How to group date by week in PostgreSQL?

I have pretty simple table which has 2 column. First one show time (timestamp), the second one show speed of car at that time (float8).
| DATE_TIME | SPEED |
|---------------------|-------|
| 2018-11-09 00:00:00 | 256 |
| 2018-11-09 01:00:00 | 659 |
| 2018-11-09 02:00:00 | 256 |
| other dates | xxx |
| 2018-11-21 21:00:00 | 651 |
| 2018-11-21 22:00:00 | 515 |
| 2018-11-21 23:00:00 | 849 |
Lets say we have period from 9 november to 21 november. How to group that period by week. In fact I want such result:
| DATE_TIME | AVG_SPEED |
|---------------------|-----------|
| 9-11 November | XXX |
| 12-18 November | YYY |
| 19-21 November | ZZZ |
I use PostgreSQL 10.4.
I use such SQL Statement to know the number of the week of the certain date:
SELECT EXTRACT(WEEK FROM TIMESTAMP '2018-11-09 00:00:00');
EDIT:
#tim-biegeleisen when I set period from '2018-11-01' to '2018-11-13' your sql statement return 2 result:
In fact I need such result:
2018-11-01 00:00:00 | 2018-11-04 23:00:00
2018-11-05 00:00:00 | 2018-11-11 23:00:00
2018-11-12 00:00:00 | 2018-11-13 05:00:00
As you can see in the calendar there are 3 week in that period.
We can do this using a calendar table. This answer assumes that a week begins with the first date in your data set. You could also do this assuming something else, e.g. a standard week according to something else.
WITH dates AS (
SELECT date_trunc('day', dd)::date AS dt
FROM generate_series
( '2018-11-09'::timestamp
, '2018-11-21'::timestamp
, '1 day'::interval) dd
),
cte AS (
SELECT t1.dt, t2.DATE_TIME, t2.SPEED,
EXTRACT(week from t1.dt) week
FROM dates t1
LEFT JOIN yourTable t2
ON t1.dt = t2.DATE_TIME::date
)
SELECT
MIN(dt)::text || '-' || MAX(dt) AS DATE_TIME,
AVG(SPEED) AS AVG_SPEED
FROM cte
GROUP BY
week
ORDER BY
MIN(dt);
Demo

Generate date range for missing dates and assign the current maximum value

I have a table like below in DB2 -
Date | Catg| Amount
2018-05-21 | 2 | 583227.57485
2018-05-21 | 5 | 2200097.73226
2018-05-22 | 2 | 116246.63551
2018-05-22 | 4 | 231116.66241
2018-05-22 | 5 | 244093.91680
2018-05-31 | 1 | 244714.77015
2018-05-31 | 2 | 288946.64734
2018-05-31 | 3 | 330801.32189
2018-05-31 | 5 | 345984.62256
2018-06-05 | 4 | 228612.55653
2018-06-05 | 5 | 244944.22519
2018-06-11 | 2 | 288940.63303
2018-06-11 | 3 | 344938.50723
2018-06-11 | 4 | 346234.65196
2018-06-11 | 5 | 375935.22568
I want to generate the report for the month of June till 22nd for every catg. So I want the report to be -
Date | Catg| Amount
2018-06-01 | 1 | 244714.77015 -- Being 5/31 is latest for 6/1
2018-06-01 | 2 | 288946.64734 -- Being 5/31 is latest for 6/1
2018-06-01 | 3 | 330801.32189 -- Being 5/31 is latest for 6/1
2018-06-01 | 4 | 231116.66241 -- Being 5/22 is latest for 6/1
2018-06-01 | 5 | 345984.62256 -- Being 5/31 is latest for 6/1
.
.
.
.
.
2018-06-22 | 1 | 244714.77015 -- Being 5/31 is latest for 6/22
2018-06-22 | 2 | 288940.63303 -- Being 6/11 is latest for 6/22
2018-06-22 | 3 | 344938.50723 -- Being 6/11 is latest for 6/22
2018-06-22 | 4 | 346234.65196 -- Being 6/11 is latest for 6/22
2018-06-22 | 5 | 375935.22568 -- Being 6/11 is latest for 6/22
I don't know if this even doable with SQL. I have successfully generated the dates but not sure how to assign the immediate previous values to them.
I have generated the dates through below code -
WITH DATE_TAB(DATES) AS (
SELECT DATE('2018-06-01') DATES
FROM SYSIBM.SYSDUMMY1
UNION ALL
SELECT DATES + 1 DAYS AS DATES
FROM DATE_TAB
WHERE DATES < '2018-06-22')
SELECT DATES
FROM DATE_TAB
Any help is greatly appreciated.
Thanks in advance!!!
The rest-part will be CROSS JOIN :
WITH DATE_TAB(DATES) AS (
SELECT DATE('2018-06-01') DATES
FROM SYSIBM.SYSDUMMY1
UNION ALL
SELECT DATES + 1 DAYS AS DATES
FROM DATE_TAB
WHERE DATES < '2018-06-22'
)
SELECT DISTINCT dt.DATES, dt1.Catg,
(SELECT t.Amount
FROM table t
WHERE t.catg = dt.catg and t.date <= dt.date
ORDER BY t.date desc
FETCH FIRST 1 ROW ONLY
)
FROM DATE_TAB dt
CROSS JOIN (SELECT DISTINCT Catg, Amount FROM table) dt1;
You can do:
with date_tab ( . . . )
select d.dte, c.catg,
(select t.amount
from t
where t.catg = c.catg and t.dte <= d.dte
order by t.dte desc
fetch first 1 row only
) as amount
from date_tab d cross join
(select distinct catg from t) c
order by d.dte, c.catg;

how to get second max date in postgres sql

I have following situation where i need to get several values between two invoices date.
So query is giving data based on invoices now what i need to do is for some values fetch data between this invoice date and last invoice date
already tried ways
1) sub query will easily solve this but as i have to do this for 4-5 column and its a 15 gb database so that's not possible.
2) if i go like this
left join (select inv.date ,inv,actno from invoice inv) as invo on invo.actno=act.id and invo.date < inv.date
then it will give all the data less then that date but i need only one data that will be less than main invoice date.
3) we can not get second max value in subquery of from clause because outer invoice is not grouped so it might be max or midlle or least .
4) we can not send values of other table in subquery of join table.
ex
create table inv (id serial ,date timestamp without time zone);
insert into inv (date) values('2017-01-31 00:00:00'),('2017-01-30 00:00:00'),('2017-01-29 00:00:00'),('2017-01-28 00:00:00'),('2017-01-27 00:00:00');
select date as d1 from inv;
id | date
----+---------------------
1 | 2017-01-31 00:00:00
2 | 2017-01-30 00:00:00
3 | 2017-01-29 00:00:00
4 | 2017-01-28 00:00:00
5 | 2017-01-27 00:00:00
(5 rows)
I need this
id |date |date | id
1 | 2017-01-31 00:00:00 | 2017-01-30 00:00:00 | 2
2 | 2017-01-30 00:00:00 | 2017-01-29 00:00:00 | 3
3 | 2017-01-29 00:00:00 | 2017-01-28 00:00:00 | 4
4 | 2017-01-28 00:00:00 | 2017-01-27 00:00:00 | 5
5 | 2017-01-27 00:00:00 |
I can't do subquery in select as database is big and need to do this for 4-5 column
UPDATE 1
I need this from same table but using it twice in FROM clause as my requirement is that I need several data joined from invoice table and then there is 4-5 column in which I need things like sum of amount paid between last and this invoice.
So I can take both invoice date in subquery and get the data between them
UPDATE 2
lag will not solve this
select i.id,i.date, lag(date) over (order by date) from inv i order by id ;
id | date | lag
----+---------------------+---------------------
1 | 2017-01-31 00:00:00 | 2017-01-30 00:00:00
2 | 2017-01-30 00:00:00 | 2017-01-29 00:00:00
3 | 2017-01-29 00:00:00 | 2017-01-28 00:00:00
4 | 2017-01-28 00:00:00 | 2017-01-27 00:00:00
5 | 2017-01-27 00:00:00 |
(5 rows)
Time: 0.480 ms
test=# select i.id,i.date, lag(date) over (order by date) from inv i where id=2 order by id ;
id | date | lag
----+---------------------+-----
2 | 2017-01-30 00:00:00 |
(1 row)
Time: 0.525 ms
test=# select i.id,i.date, lag(date) over (order by date) from inv i where id in (2,3) order by id ;
id | date | lag
----+---------------------+---------------------
2 | 2017-01-30 00:00:00 | 2017-01-29 00:00:00
3 | 2017-01-29 00:00:00 |
it will calculate on the data it will get from the table in that query it is bounded in that query see here 3 has a lag but could not get it cause query is not allowing it to have it ....something in left join needs to be done so the lag date can be taken from same table but calling it again in from clause Thanks Again buddy
Like here?:
t=# select date as d1,
lag(date) over (order by date)
from inv
order by 1 desc;
d1 | lag
---------------------+---------------------
2017-01-31 00:00:00 | 2017-01-30 00:00:00
2017-01-30 00:00:00 | 2017-01-29 00:00:00
2017-01-29 00:00:00 | 2017-01-28 00:00:00
2017-01-28 00:00:00 | 2017-01-27 00:00:00
2017-01-27 00:00:00 |
(5 rows)
Time: 1.416 ms

SQL - How can I count days by comparing current row to the 1st row?

I have a table as below in the database, how can I write a SQL to show the expected result?
first_date: the first order_date on the table ORDER BY order_date ASC
days_to_date: (order_date - first_date) in number of days
My table:
id | order_date | order_ref
---+------------------------
1 | 2015-03-01 | BC101
2 | 2015-03-01 | BC102
3 | 2015-03-02 | BC103
4 | 2015-03-03 | BC104
Expected result:
id | order_date | first_date | days_to_date
---+------------+------------+-------------
1 | 2015-03-01 | 2015-03-01 | 0
2 | 2015-03-01 | 2015-03-01 | 0
3 | 2015-03-02 | 2015-03-01 | 1
4 | 2015-03-03 | 2015-03-01 | 2
Other notes:
I'm using HSQLDB 2.0.0, but prefer solving the problem of getting the first_date displayed on every row in general cases if that's possible
Thanks in advance
Try
select id, order_date,
(select min(order_date) from your_table) as first_date,
datediff('day', (select min(order_date) from your_table), order_date) as days_to_date
from your_table
order by order_date

TSQL query help structuring results

I have a table with the following columns:
timestamp | value | desc
example of the data:
2014-01-27 10:00:00.000 | 100 | 101
2014-01-27 10:00:00.000 | 105 | 101
2014-01-27 11:00:00.000 | 160 | 101
2014-01-27 12:00:00.000 | 200 | 101
...
...
2014-01-28 10:00:00.000 | 226 | 101
2014-01-28 10:00:00.000 | 325 | 101
2014-01-28 11:00:00.000 | 145 | 101
what I would like to obtain is a grouping by the hour part but without merging the period interval.
So that the result will be like this (in the select I will pass a date interval and a condition on the description like desc = '101':
Structure:
hour | count
Data:
10 | 2 (referring to the 20140127)
11 | 1 (referring to the 20140127)
12 | 1 (referring to the 20140127)
...
...
10 | 2 (referring to the 20140128)
11 | 1 (referring to the 20140128)
I thought about using a cursor but I was wondering if it is possible to achieve this result without it.
I'm using SQL server 2012 SP1.
Thanks for your attention.
Bye,
F.
Try this:-
SELECT Count(*) AS [Count],
Datepart(hour, timestamp) AS [Hour]
FROM yourtable
GROUP BY CONVERT(DATE, timestamp),
Datepart(hour, timestamp)
ORDER BY CONVERT(DATE, timestamp)
You may use this. This should work
SELECT DATEPART(hh,timestamp), COUNT(*)
FROM tablename
GROUP BY
DATEPART(hh,timestamp),
DATETIMEFROMPARTS (YEAR(timestamp),MONTH(timestamp),DAY(timestamp),0,0,0,0,0),
desc HAVING desc ='yourvalue'