get median in overlap time range

get median in overlap time range - sql

vertica db, for example, have a table called revenue:
date revenue
2016-07-12 1
2016-07-12 10
2016-07-12 5
2016-07-12 3
2016-07-13 7
2016-07-13 120
2016-07-13 22
2016-07-14 5
2016-07-14 17
The tricky thing is I don't want median for each date but I want to calculate the median revenue for the timerange >= given each day, for example the result would be like:
daterange median_revenue
>= 2016-07-12 7
>= 2016-07-13 17
>= 2016-07-14 11
to be clear:
7 = median(1,10,5,3,7,120,22,5,17)
17 = median(7,120,22,5,17)
11 = median(5,17)
How could I write a sql script for these daterange? Is there an easy way to query? I don't want to calculate in each daterange then union because there are many days.

Would this help?
SELECT
date_table.[date],
MEDIAN (r.revenue) AS median_revenue
FROM
(SELECT DISTINCT [date] FROM revenue) date_table
LEFT JOIN revenue r ON r.[date] >= r_main.[date]
GROUP BY
date_table.[date]

just figured out
select distinct date, median(revenue) over (partition by date) as rev_median
from (select a.date,b.revenue
from (select distinct date from revenue_test) a
left outer join revenue b
on a.date<=b.date order by a.date,b.date) a ;`

Related

SQL query group by with null values is returning duplicates

I have following query
My #dates table has following records:
month year saledate
9 2020 2020-09-01
10 2020 2020-10-01
11 2020 2020-11-01
with monthlysalesdata as(
select month(salesdate) as salemonth, year(salesdate) as saleyear,salesrepid, salespercentage
from salesrecords r
join #dates d on d.saledate = r.salesdate
group by salesrepid, salesdate),
averagefor3months as(
select 0 as salemonth, 0 as saleyear, salesrepid, salespercentage
from monthlysalesdata
group by salesrepid)
finallist as(
select * from monthlysalesdata
union
select * from averagefor3months
This query returns following records which gives duplicate for a averagefor3months result set when there is null record in the first monthlyresultdata. how to achieve average for 3 months as one record instead of having duplicates?
salesrepid salemonth saleyear percentage
232 0 0 null -------------this is the duplicate record
232 0 0 90
232 9 2020 80
232 10 2020 null
232 11 2020 100
My first cte has this result:
salerepid month year percentage
---------------------------------------------
232 9 2020 80
232 10 2020 null
232 11 2020 100
My second cte has this result:
salerepid month year percentage
---------------------------------------------
232 0 0 null
232 0 0 90
How to avoid the duplicate record in my second cte,

I suspect that you want a summary row per sales rep based on some aggregation. Your question is not clear on what is needed for the aggregation, but something like this:
with ym as (
select r.salesrepid, d.year, d.month, sum(<something>) as whatever
from salesrecords r join
#dates d
on d.saledate = r.salesdate
group by r.salesrepid, d.year, d.month
)
select ym.*
from ym
union all
select salesrepid, null, null, avg(whatever)
from hm
group by salesrepid;

I updated to selected the group by from the table directly instead of the previous cte and got my results. Thank you all for helping
with ym as (
select r.salesrepid, d.year, d.month, sum(<something>) as whatever
from salesrecords r join
#dates d
on d.saledate = r.salesdate
group by r.salesrepid, d.year, d.month
),
threemonthsaverage as(
select r.salesrepid, r.year, r.month, sum(something) as whatever
from salesrecords as r
group by salesrepid)
select ym *
union
select threemonthsaverage*

cumlative sum missing values of the month in sql

i have input data below
date amount
01-01-2020 10
01-02-2020 15
01-03-2020 10
01-05-2020 20
01-06-2020 30
01-08-2020 5
01-09-2020 6
01-10-2020 10
select sum(date),over(partition date) from table;
after add the missing month values i need output
output
Date amount cum_sum
01-01-2020 10 10
01-02-2020 15 25
01-03-2020 10 35
01-04-2020 0 35
01-05-2020 20 55
01-06-2020 30 85
01-07-2020 0 85
01-08-2020 5 90
01-09-2020 6 96
01-10-2020 10 106

You would typically generate the dates with a recursive query, then use window functions.
You don't tell which database you use. The exact syntax of recursive queries and date artithmetics varies across vendors, but here is what it would look like:
with recursive all_dates (dt, max_dt) as (
select min(date) dt, max(date) max_dt from mytable
union all
select dt + interval '1' day, max_dt from all_dates where dt < max_dt
)
select d.dt, sum(t.amount) over(order by c.dt) amount
from all_dates d
left join mytable t on t.date = d.dt
order by d.dt

You simply want a window function:
select t.*, sum(amount) over (order by date)
from table t

SQL - Query to return active subscriptions on a given day

I have a table that shows when a user signs up for a subscription and when their membership will expire. A user can purchase a new subscription even if their current one is in force.
userid|purchasedate|expirydate
1 |2019-01-01 |2019-02-01
2 |2019-01-02 |2019-02-02
3 |2019-01-03 |2019-02-03
3 |2019-01-04 |2019-03-03
I need a SQL query that will GROUP BY the date and return the number of active subscriptions on that date. So it would return:
date |count
2019-01-01|1
2019-01-02|2
2019-01-03|3
2019-01-04|3

Below is for BigQuery Standard SQL
#standardSQL
SELECT day, COUNT(DISTINCT userid) active_subscriptions
FROM (SELECT AS STRUCT MIN(purchasedate) min_date, MAX(expirydate) max_date FROM `project.dataset.table`),
UNNEST(GENERATE_DATE_ARRAY(min_date, max_date)) day
JOIN `project.dataset.table`
ON day BETWEEN purchasedate AND expirydate
GROUP BY day
You can test, play with above using dummy data from your question as in below example
#standardSQL
WITH `project.dataset.table` AS (
SELECT 1 userid, DATE '2019-01-01' purchasedate, DATE '2019-02-01' expirydate UNION ALL
SELECT 2, '2019-01-02', '2019-02-02' UNION ALL
SELECT 3, '2019-01-03', '2019-02-03' UNION ALL
SELECT 3, '2019-01-04', '2019-03-03'
)
SELECT day, COUNT(DISTINCT userid) active_subscriptions
FROM (SELECT AS STRUCT MIN(purchasedate) min_date, MAX(expirydate) max_date FROM `project.dataset.table`),
UNNEST(GENERATE_DATE_ARRAY(min_date, max_date)) day
JOIN `project.dataset.table`
ON day BETWEEN purchasedate AND expirydate
GROUP BY day
with below output
Row day active_subscriptions
1 2019-01-01 1
2 2019-01-02 2
3 2019-01-03 3
4 2019-01-04 3
5 2019-01-05 3
6 2019-01-06 3
... ... ...
... ... ...
31 2019-01-31 3
32 2019-02-01 3
33 2019-02-02 2
34 2019-02-03 1
35 2019-02-04 1
... ... ...
... ... ...
61 2019-03-02 1
62 2019-03-03 1

You need a list of dates and count(distinct):
select d.dte, count(distinct t.userid) as num_users
from (select distinct purchase_date as dte from t) d left join
t
on d.dte >= t.dte and
d.dte <= t.expiry_date
group by d.dte
order by d.dte;
EDIT:
BigQuery can be fickle about inequalities in the on clause. Here is another approach:
select dte, count(distinct t.userid) as num_users
from t cross join
unnest(generate_date_array(t.purchase_date, t.expiry_date, interval 1 day)) dte
group by dte
order by dte;
You can use a where clause to filter down to particular dates.

I make the table name 'test_expirydate' and use your data
and this one work
select
tb1.expirydate,
count(*) as total
from test_expirydate as tb1
left join (
select
expirydate
from test_expirydate as tb2
group by userid
) as tb2
on tb1.expirydate >= tb2.expirydate
group by tb1.expirydate
I don't sure is it work in other case or not but it fine with current data
Oh, I interpret that the left column should be the expiration date.

getting first column blank postgres

SELECT CASE WHEN date_part('hour',created_at) BETWEEN 3 AND 15 THEN '9am-3pm'
WHEN date_part('hour',created_at) BETWEEN 15 AND 18 THEN '3pm-6pm' END "time window",COUNT(*) FROM tickets where created_at < now()
GROUP BY CASE WHEN date_part('hour',created_at) BETWEEN 3 AND 15 THEN '9am-3pm' WHEN date_part('hour',created_at) BETWEEN 15 AND 18 THEN '3pm-6pm' END;
time window | count
-------------+-------
| 6
9am-3pm | 69
is it possible to filter it by date along with time so that my result set will looks like
Date | time window | count
------------+-------------+-------
12-01-2020 | 9am-3pm| 6
12-01-2020 | 3pm-6pm| 69
13-01-2020 | 9am-3pm| 12
13-01-2020 | 3pm-6pm| 14

We can handle this requirement using a calendar table approach:
WITH dates AS (
SELECT '12-01-2020' AS created_at UNION ALL
SELECT '13-01-2020'
),
tw AS (
SELECT '9am-3pm' AS "time window" UNION ALL
SELECT '3pm-6pm'
),
cte AS (
SELECT
created_at::date AS created_at,
CASE WHEN DATE_PART('hour', created_at) BETWEEN 3 AND 15 THEN '9am-3pm'
WHEN DATE_PART('hour', created_at) BETWEEN 15 AND 18 THEN '3pm-6pm' END "time window",
COUNT(*) AS cnt
FROM tickets
WHERE created_at < NOW()
GROUP BY 1, 2
)
SELECT
d.created_at,
tw."time window",
COALESCE(t.cnt, 0) AS count
FROM dates d
CROSS JOIN tw
LEFT JOIN cte t
ON d.created_at = t.created_at AND tw."time window" = t."time window"
ORDER BY
d.dt,
tw."time window";

You are actually asking two questions:
The "empty space" (really an SQL NULL) is there because there are dates that do not fall within any of the time ranges. You can exclude them with an additional WHERE condition.
To get the date part as well, add
CAST (created_at AS date)
to the SELECT list and the GROUP BY ckause.

SQL Count by Active Date

If I have a table of records and active/inacitve dates, is there a simple way to count active records by month? For example:
tbl_a
id dt_active dt_inactive
a 2013-01-01 2013-08-24
b 2013-01-01 2013-07-05
c 2012-02-01 2012-01-01
If I have to generate an output of active records by month like this:
active: dt_active < first_day_of_month <= dt_inactive
month count
2013-01 2
2013-02 2
2013-03 2
2013-04 2
2013-05 2
2013-06 2
2013-07 2
2013-08 1
2013-09 0
Is there any clever way to do this besides uploading a temp table of dates and using subqueries?

Here is one method that gives the count of actives on the beginning of the month. It creates a list of all the months and then joins this information to tbl_a.
with dates as (
select cast('2013-01-01' as date) as month
union all
select dateadd(month, 1, dates.month)
from dates
where month < cast('2013-09-01' as date)
)
select convert(varchar(7), month, 121), count(a.id)
from dates m left outer join
tbl_a a
on m.month between a.dt_active and a.dt_inactive
group by convert(varchar(7), month, 121)
order by 1;
Note: if dt_inactive is the first date of inactivity, then the on clause should be:
on m.month >= a.dt_active and m.month < a.dt_inactive
Here is a SQL Fiddle with the working query.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

get median in overlap time range - sql

Would this help? SELECT date_table.[date], MEDIAN (r.revenue) AS median_revenue FROM (SELECT DISTINCT [date] FROM revenue) date_table LEFT JOIN revenue r ON r.[date] >= r_main.[date] GROUP BY date_table.[date]

just figured out select distinct date, median(revenue) over (partition by date) as rev_median from (select a.date,b.revenue from (select distinct date from revenue_test) a left outer join revenue b on a.date<=b.date order by a.date,b.date) a ;`

Related

SQL query group by with null values is returning duplicates

cumlative sum missing values of the month in sql

SQL - Query to return active subscriptions on a given day

getting first column blank postgres

SQL Count by Active Date

Categories

Resources