vertica db, for example, have a table called revenue:
date revenue
2016-07-12 1
2016-07-12 10
2016-07-12 5
2016-07-12 3
2016-07-13 7
2016-07-13 120
2016-07-13 22
2016-07-14 5
2016-07-14 17
The tricky thing is I don't want median for each date but I want to calculate the median revenue for the timerange >= given each day, for example the result would be like:
daterange median_revenue
>= 2016-07-12 7
>= 2016-07-13 17
>= 2016-07-14 11
to be clear:
7 = median(1,10,5,3,7,120,22,5,17)
17 = median(7,120,22,5,17)
11 = median(5,17)
How could I write a sql script for these daterange? Is there an easy way to query? I don't want to calculate in each daterange then union because there are many days.
Would this help?
MEDIAN (r.revenue) AS median_revenue
(SELECT DISTINCT [date] FROM revenue) date_table
LEFT JOIN revenue r ON r.[date] >= r_main.[date]
just figured out
select distinct date, median(revenue) over (partition by date) as rev_median
from (select,b.revenue
from (select distinct date from revenue_test) a
left outer join revenue b
on< order by, a ;`
I have following query
My #dates table has following records:
month year saledate
9 2020 2020-09-01
10 2020 2020-10-01
11 2020 2020-11-01
with monthlysalesdata as(
select month(salesdate) as salemonth, year(salesdate) as saleyear,salesrepid, salespercentage
from salesrecords r
join #dates d on d.saledate = r.salesdate
group by salesrepid, salesdate),
averagefor3months as(
select 0 as salemonth, 0 as saleyear, salesrepid, salespercentage
from monthlysalesdata
group by salesrepid)
finallist as(
select * from monthlysalesdata
select * from averagefor3months
This query returns following records which gives duplicate for a averagefor3months result set when there is null record in the first monthlyresultdata. how to achieve average for 3 months as one record instead of having duplicates?
salesrepid salemonth saleyear percentage
232 0 0 null -------------this is the duplicate record
232 0 0 90
232 9 2020 80
232 10 2020 null
232 11 2020 100
My first cte has this result:
salerepid month year percentage
232 9 2020 80
232 10 2020 null
232 11 2020 100
My second cte has this result:
salerepid month year percentage
232 0 0 null
232 0 0 90
How to avoid the duplicate record in my second cte,
I suspect that you want a summary row per sales rep based on some aggregation. Your question is not clear on what is needed for the aggregation, but something like this:
with ym as (
select r.salesrepid, d.year, d.month, sum(<something>) as whatever
from salesrecords r join
#dates d
on d.saledate = r.salesdate
group by r.salesrepid, d.year, d.month
select ym.*
from ym
union all
select salesrepid, null, null, avg(whatever)
from hm
group by salesrepid;
I updated to selected the group by from the table directly instead of the previous cte and got my results. Thank you all for helping
with ym as (
select r.salesrepid, d.year, d.month, sum(<something>) as whatever
from salesrecords r join
#dates d
on d.saledate = r.salesdate
group by r.salesrepid, d.year, d.month
threemonthsaverage as(
select r.salesrepid, r.year, r.month, sum(something) as whatever
from salesrecords as r
group by salesrepid)
select ym *
select threemonthsaverage*
i have input data below
date amount
01-01-2020 10
01-02-2020 15
01-03-2020 10
01-05-2020 20
01-06-2020 30
01-08-2020 5
01-09-2020 6
01-10-2020 10
select sum(date),over(partition date) from table;
after add the missing month values i need output
Date amount cum_sum
01-01-2020 10 10
01-02-2020 15 25
01-03-2020 10 35
01-04-2020 0 35
01-05-2020 20 55
01-06-2020 30 85
01-07-2020 0 85
01-08-2020 5 90
01-09-2020 6 96
01-10-2020 10 106
You would typically generate the dates with a recursive query, then use window functions.
You don't tell which database you use. The exact syntax of recursive queries and date artithmetics varies across vendors, but here is what it would look like:
with recursive all_dates (dt, max_dt) as (
select min(date) dt, max(date) max_dt from mytable
union all
select dt + interval '1' day, max_dt from all_dates where dt < max_dt
select d.dt, sum(t.amount) over(order by c.dt) amount
from all_dates d
left join mytable t on = d.dt
order by d.dt
You simply want a window function:
select t.*, sum(amount) over (order by date)
from table t
I have a table that shows when a user signs up for a subscription and when their membership will expire. A user can purchase a new subscription even if their current one is in force.
1 |2019-01-01 |2019-02-01
2 |2019-01-02 |2019-02-02
3 |2019-01-03 |2019-02-03
3 |2019-01-04 |2019-03-03
I need a SQL query that will GROUP BY the date and return the number of active subscriptions on that date. So it would return:
date |count
Below is for BigQuery Standard SQL
SELECT day, COUNT(DISTINCT userid) active_subscriptions
FROM (SELECT AS STRUCT MIN(purchasedate) min_date, MAX(expirydate) max_date FROM `project.dataset.table`),
UNNEST(GENERATE_DATE_ARRAY(min_date, max_date)) day
JOIN `project.dataset.table`
ON day BETWEEN purchasedate AND expirydate
You can test, play with above using dummy data from your question as in below example
WITH `project.dataset.table` AS (
SELECT 1 userid, DATE '2019-01-01' purchasedate, DATE '2019-02-01' expirydate UNION ALL
SELECT 2, '2019-01-02', '2019-02-02' UNION ALL
SELECT 3, '2019-01-03', '2019-02-03' UNION ALL
SELECT 3, '2019-01-04', '2019-03-03'
SELECT day, COUNT(DISTINCT userid) active_subscriptions
FROM (SELECT AS STRUCT MIN(purchasedate) min_date, MAX(expirydate) max_date FROM `project.dataset.table`),
UNNEST(GENERATE_DATE_ARRAY(min_date, max_date)) day
JOIN `project.dataset.table`
ON day BETWEEN purchasedate AND expirydate
with below output
Row day active_subscriptions
1 2019-01-01 1
2 2019-01-02 2
3 2019-01-03 3
4 2019-01-04 3
5 2019-01-05 3
6 2019-01-06 3
... ... ...
... ... ...
31 2019-01-31 3
32 2019-02-01 3
33 2019-02-02 2
34 2019-02-03 1
35 2019-02-04 1
... ... ...
... ... ...
61 2019-03-02 1
62 2019-03-03 1
You need a list of dates and count(distinct):
select d.dte, count(distinct t.userid) as num_users
from (select distinct purchase_date as dte from t) d left join
on d.dte >= t.dte and
d.dte <= t.expiry_date
group by d.dte
order by d.dte;
BigQuery can be fickle about inequalities in the on clause. Here is another approach:
select dte, count(distinct t.userid) as num_users
from t cross join
unnest(generate_date_array(t.purchase_date, t.expiry_date, interval 1 day)) dte
group by dte
order by dte;
You can use a where clause to filter down to particular dates.
I make the table name 'test_expirydate' and use your data
and this one work
count(*) as total
from test_expirydate as tb1
left join (
from test_expirydate as tb2
group by userid
) as tb2
on tb1.expirydate >= tb2.expirydate
group by tb1.expirydate
I don't sure is it work in other case or not but it fine with current data
Oh, I interpret that the left column should be the expiration date.
SELECT CASE WHEN date_part('hour',created_at) BETWEEN 3 AND 15 THEN '9am-3pm'
WHEN date_part('hour',created_at) BETWEEN 15 AND 18 THEN '3pm-6pm' END "time window",COUNT(*) FROM tickets where created_at < now()
GROUP BY CASE WHEN date_part('hour',created_at) BETWEEN 3 AND 15 THEN '9am-3pm' WHEN date_part('hour',created_at) BETWEEN 15 AND 18 THEN '3pm-6pm' END;
time window | count
| 6
9am-3pm | 69
is it possible to filter it by date along with time so that my result set will looks like
Date | time window | count
12-01-2020 | 9am-3pm| 6
12-01-2020 | 3pm-6pm| 69
13-01-2020 | 9am-3pm| 12
13-01-2020 | 3pm-6pm| 14
We can handle this requirement using a calendar table approach:
WITH dates AS (
SELECT '12-01-2020' AS created_at UNION ALL
SELECT '13-01-2020'
tw AS (
SELECT '9am-3pm' AS "time window" UNION ALL
SELECT '3pm-6pm'
cte AS (
created_at::date AS created_at,
CASE WHEN DATE_PART('hour', created_at) BETWEEN 3 AND 15 THEN '9am-3pm'
WHEN DATE_PART('hour', created_at) BETWEEN 15 AND 18 THEN '3pm-6pm' END "time window",
COUNT(*) AS cnt
FROM tickets
WHERE created_at < NOW()
tw."time window",
COALESCE(t.cnt, 0) AS count
FROM dates d
ON d.created_at = t.created_at AND tw."time window" = t."time window"
tw."time window";
You are actually asking two questions:
The "empty space" (really an SQL NULL) is there because there are dates that do not fall within any of the time ranges. You can exclude them with an additional WHERE condition.
To get the date part as well, add
CAST (created_at AS date)
to the SELECT list and the GROUP BY ckause.
If I have a table of records and active/inacitve dates, is there a simple way to count active records by month? For example:
id dt_active dt_inactive
a 2013-01-01 2013-08-24
b 2013-01-01 2013-07-05
c 2012-02-01 2012-01-01
If I have to generate an output of active records by month like this:
active: dt_active < first_day_of_month <= dt_inactive
month count
2013-01 2
2013-02 2
2013-03 2
2013-04 2
2013-05 2
2013-06 2
2013-07 2
2013-08 1
2013-09 0
Is there any clever way to do this besides uploading a temp table of dates and using subqueries?
Here is one method that gives the count of actives on the beginning of the month. It creates a list of all the months and then joins this information to tbl_a.
with dates as (
select cast('2013-01-01' as date) as month
union all
select dateadd(month, 1, dates.month)
from dates
where month < cast('2013-09-01' as date)
select convert(varchar(7), month, 121), count(
from dates m left outer join
tbl_a a
on m.month between a.dt_active and a.dt_inactive
group by convert(varchar(7), month, 121)
order by 1;
Note: if dt_inactive is the first date of inactivity, then the on clause should be:
on m.month >= a.dt_active and m.month < a.dt_inactive
Here is a SQL Fiddle with the working query.