Related
Hi I am trying to calculate the average of previous 4 Tuesdays. I have daily sales data and I am trying to calculate what the average for previous 4 weeks were for the same weekday.
Attached is a snapshot of how my dataset looks like
Now for March 6, I would like to know what is the average for the previous 4 weeks were, (namely Feb 6, Feb 13, Feb 20 and Feb 27). This value needs to be assigned to Monthly Average column
I am using a PostGres DB.
Thanks
You can use window functions:
select t.*,
avg(dailycount) over (partition by seller_name, day
order by date
rows between 3 preceding and current row
) as avg_4_weeks
from t
where day = 'Tuesday';
This assumes that "previous 4 weeks" is the current date plus the previous three weeks. If it starts the week before, only the windowing clause needs to change:
select t.*,
avg(dailycount) over (partition by seller_name, day
order by date
rows between 4 preceding and 1 preceding
) as avg_4_weeks
from t
where day = 'Tuesday';
I decided to post my answer also, for anyone else searching. My answer will allow you to put in any date and get the average for the previous 4 weeks ( current day + previous 3 weeks matching the day).
SQL Fiddle
PostgreSQL 9.3 Schema Setup:
CREATE TABLE sales (sellerName varchar(10), dailyCount int, saleDay date) ;
INSERT INTO sales (sellerName, dailyCount, saleDay)
SELECT 'ABC',10,to_date('2018-03-15','YYYY-MM-DD') UNION ALL /* THIS ONE */
SELECT 'ABC',11,to_date('2018-03-14','YYYY-MM-DD') UNION ALL
SELECT 'ABC',12,to_date('2018-03-12','YYYY-MM-DD') UNION ALL
SELECT 'ABC',13,to_date('2018-03-11','YYYY-MM-DD') UNION ALL
SELECT 'ABC',14,to_date('2018-03-10','YYYY-MM-DD') UNION ALL
SELECT 'ABC',15,to_date('2018-03-09','YYYY-MM-DD') UNION ALL
SELECT 'ABC',16,to_date('2018-03-08','YYYY-MM-DD') UNION ALL /* THIS ONE */
SELECT 'ABC',17,to_date('2018-03-07','YYYY-MM-DD') UNION ALL
SELECT 'ABC',18,to_date('2018-03-06','YYYY-MM-DD') UNION ALL
SELECT 'ABC',19,to_date('2018-03-05','YYYY-MM-DD') UNION ALL
SELECT 'ABC',20,to_date('2018-03-04','YYYY-MM-DD') UNION ALL
SELECT 'ABC',21,to_date('2018-03-03','YYYY-MM-DD') UNION ALL
SELECT 'ABC',22,to_date('2018-03-02','YYYY-MM-DD') UNION ALL
SELECT 'ABC',23,to_date('2018-03-01','YYYY-MM-DD') UNION ALL /* THIS ONE */
SELECT 'ABC',24,to_date('2018-02-28','YYYY-MM-DD') UNION ALL
SELECT 'ABC',25,to_date('2018-02-22','YYYY-MM-DD') UNION ALL /* THIS ONE */
SELECT 'ABC',26,to_date('2018-02-15','YYYY-MM-DD') UNION ALL
SELECT 'ABC',27,to_date('2018-02-08','YYYY-MM-DD') UNION ALL
SELECT 'ABC',28,to_date('2018-02-01','YYYY-MM-DD')
;
Now For The Query:
WITH theDay AS (
SELECT to_date('2018-03-15','YYYY-MM-DD') AS inDate
)
SELECT AVG(dailyCount) AS totalCount /* 18.5 = (10(3/15)+16(3/8)+23(3/1)+25(2/22))/4 */
FROM sales
CROSS JOIN theDay
WHERE extract(dow from saleDay) = extract(dow from theDay.inDate)
AND saleDay <= theDay.inDate
AND saleDay >= theDay.inDate-INTERVAL '3 weeks' /* Since we want to include the entered
day, for the INTERVAL we need 1 less week than we want */
Results:
| totalcount |
|------------|
| 18.5 |
I'm trying unsuccessfully to calculate a MAU- monthly distinct active users, by using window functions.
I need the calculation for each day during the month, for the preceding 30 days
This is what I have so far:
select
t.datee
, t.app,i.sourcee
, i.campaign
, t.mobile
, sum(count(distinct t.user_id)) over (
PARTITION BY
date_trunc('month',datee)
, t.app
, i.sourcee
, i.campaign
, t.mobile
ORDER BY datee asc
ROWS BETWEEN 30 PRECEDING AND CURRENT ROW
)
FROM dim_x i
JOIN agg_y t
ON i.app=t.app
AND i.mobile=t.mobile
WHERE t.datee>=CURRENT_DATE-30
AND t.datee<CURRENT_DATE
GROUP BY 1,2,3,4,5
order by 1 desc
But all I get is a sum of active users by all days instead of sum of distinct users. I'm using Vertica db.
Any suggestions?
I'm not getting, really, why you should need an OLAP expression for that.
Aren't you looking for the total number of distinct users per:
year-month combination out of datee
app
sourcee (whatever that might be)
campaign
mobile (probably mobile number)
?
A simple GROUP BY would do, as far as I'm concerned. If I disregard sourcee, campaign and mobile, selecting just from one table: input for argument's sake, with some sample data I just made up, this query:
SELECT
YEAR(datee) * 100 + MONTH(datee) AS yearmonth
, app
, COUNT(DISTINCT user_id) AS monthly_active_users
FROM input
GROUP BY 1,2
ORDER BY 1
;
... would return:
YEARMONTH|app |monthly_active_users
201,601|app-a| 2
201,601|app-b| 2
201,602|app-a| 2
201,602|app-b| 2
201,603|app-a| 2
201,603|app-b| 2
201,604|app-a| 2
201,604|app-b| 2
201,605|app-a| 2
201,605|app-b| 2
201,606|app-a| 1
201,606|app-b| 1
Just editing my previous answer. You seem to need the running COUNT DISTINCT of user id-s , partitioned by several expressions.
With the input from the WITH clause below, would you need a report like this (only showing the first 12 rows of 53, ordered by datee, app)?
datee |app |user_id |running_active_users
2016-01-01|app-a|arthur | 1
2016-01-04|app-b|ford | 1
2016-01-07|app-a|trillian| 2
2016-01-10|app-b|zaphod | 2
2016-01-13|app-a|arthur | 2
2016-01-16|app-b|ford | 2
2016-01-19|app-a|trillian| 2
2016-01-22|app-b|zaphod | 2
2016-01-25|app-a|arthur | 2
2016-01-28|app-b|ford | 2
2016-01-31|app-a|trillian| 2
2016-02-03|app-b|zaphod | 2
?
If that's the case, I don't see the reason for existence of your GROUP BY clause, though.
Below is the query with GROUP BY as above with test data returning the results above in a WITH clause. Regard that input as the join between your two tables.
WITH
input(datee,app,user_id) AS (
SELECT DATE '2016-01-01','app-a','arthur'
UNION ALL SELECT DATE '2016-01-04','app-b','ford'
UNION ALL SELECT DATE '2016-01-07','app-a','trillian'
UNION ALL SELECT DATE '2016-01-10','app-b','zaphod'
UNION ALL SELECT DATE '2016-01-25','app-a','arthur'
UNION ALL SELECT DATE '2016-01-28','app-b','ford'
UNION ALL SELECT DATE '2016-03-04','app-b','ford'
UNION ALL SELECT DATE '2016-03-25','app-a','arthur'
UNION ALL SELECT DATE '2016-04-09','app-b','ford'
UNION ALL SELECT DATE '2016-04-30','app-a','arthur'
UNION ALL SELECT DATE '2016-05-06','app-a','trillian'
UNION ALL SELECT DATE '2016-05-09','app-b','zaphod'
UNION ALL SELECT DATE '2016-05-15','app-b','ford'
UNION ALL SELECT DATE '2016-06-05','app-a','arthur'
UNION ALL SELECT DATE '2016-01-13','app-a','arthur'
UNION ALL SELECT DATE '2016-01-16','app-b','ford'
UNION ALL SELECT DATE '2016-01-31','app-a','trillian'
UNION ALL SELECT DATE '2016-02-03','app-b','zaphod'
UNION ALL SELECT DATE '2016-02-06','app-a','arthur'
UNION ALL SELECT DATE '2016-02-09','app-b','ford'
UNION ALL SELECT DATE '2016-02-12','app-a','trillian'
UNION ALL SELECT DATE '2016-02-15','app-b','zaphod'
UNION ALL SELECT DATE '2016-02-18','app-a','arthur'
UNION ALL SELECT DATE '2016-02-21','app-b','ford'
UNION ALL SELECT DATE '2016-02-24','app-a','trillian'
UNION ALL SELECT DATE '2016-02-27','app-b','zaphod'
UNION ALL SELECT DATE '2016-03-01','app-a','arthur'
UNION ALL SELECT DATE '2016-03-10','app-b','zaphod'
UNION ALL SELECT DATE '2016-03-13','app-a','arthur'
UNION ALL SELECT DATE '2016-03-16','app-b','ford'
UNION ALL SELECT DATE '2016-03-28','app-b','ford'
UNION ALL SELECT DATE '2016-03-31','app-a','trillian'
UNION ALL SELECT DATE '2016-04-06','app-a','arthur'
UNION ALL SELECT DATE '2016-04-12','app-a','trillian'
UNION ALL SELECT DATE '2016-04-15','app-b','zaphod'
UNION ALL SELECT DATE '2016-04-27','app-b','zaphod'
UNION ALL SELECT DATE '2016-05-03','app-b','ford'
UNION ALL SELECT DATE '2016-05-27','app-b','ford'
UNION ALL SELECT DATE '2016-05-30','app-a','trillian'
UNION ALL SELECT DATE '2016-01-19','app-a','trillian'
UNION ALL SELECT DATE '2016-01-22','app-b','zaphod'
UNION ALL SELECT DATE '2016-03-07','app-a','trillian'
UNION ALL SELECT DATE '2016-03-19','app-a','trillian'
UNION ALL SELECT DATE '2016-03-22','app-b','zaphod'
UNION ALL SELECT DATE '2016-04-03','app-b','zaphod'
UNION ALL SELECT DATE '2016-04-18','app-a','arthur'
UNION ALL SELECT DATE '2016-04-21','app-b','ford'
UNION ALL SELECT DATE '2016-04-24','app-a','trillian'
UNION ALL SELECT DATE '2016-05-12','app-a','arthur'
UNION ALL SELECT DATE '2016-05-18','app-a','trillian'
UNION ALL SELECT DATE '2016-05-21','app-b','zaphod'
UNION ALL SELECT DATE '2016-05-24','app-a','arthur'
UNION ALL SELECT DATE '2016-06-02','app-b','zaphod'
)
SELECT
YEAR(datee) * 100 + MONTH(datee) AS YEARMONTH
, app
, COUNT(DISTINCT user_id) AS monthly_active_users
FROM input
GROUP BY 1,2
ORDER BY 1
;
I have been going around for while trying to get an anwswer to my issue, I think it revolves around cursors in SQL but I am not sure. I think I know how to write the loop for a single row of data but I don't know how to run it for all the records:
Hopefully there is an easy answer:
I have a table, let's call it A, that has Product_Code, Start_Date, End_Date and Value
I would need an output table B that has column: Product_Code, Month, Year, Value when Month * Year is in between Start_Date and End_date
Each record of A should then create several record into B. Hope that's fairly clear, I'm happy to elaborate if not! :)
CREATE TABLE YearMonth(
Year int not null,
Month int not null,
FirstDay date not null,
LastDay date not null
);
Fill this table with as many years and months that your range of data is covered (no problem if you have too much).
You could do this with a statement like this:
WITH y(year) AS (
SELECT 2007
union all
SELECT 2008
union all
SELECT 2009
union all
SELECT 2010
union all
SELECT 2011
union all
SELECT 2012
union all
SELECT 2013
union all
SELECT 2014
union all
SELECT 2015
union all
SELECT 2016
),
m(month) AS (
SELECT 1
union all
SELECT 2
union all
SELECT 3
union all
SELECT 4
union all
SELECT 5
union all
SELECT 6
union all
SELECT 7
union all
SELECT 8
union all
SELECT 9
union all
SELECT 10
union all
SELECT 11
union all
SELECT 12
)
INSERT INTO YearMonth(Year, Month, FirstDay, LastDay)
SELECT y.year
,m.month
,convert(date, convert(nvarchar(4), y.year) + '.' + convert(nvarchar(2), m.month) + '.01', 102)
,DateAdd(day, - 1,
CASE WHEN m.month = 12 THEN
convert(date, convert(nvarchar(4), y.year + 1) + '.01.01', 102)
ELSE
convert(date, convert(nvarchar(4), y.year) + '.' + convert(nvarchar(2), m.month + 1) + '.01', 102)
END)
FROM y CROSS JOIN m
The tricky part to calculate the LastDay works like this: create a date that is the first of the following month, then subtract one day from it. This handles the problem that the last day of the month can be 28, 29, 30, or 31.
Then just use a join:
INSERT INTO B(Product_Code, Month, Year, Value)
SELECT A.Product_Code
,YearMonth.Month
,YearMonth.Year
,A.Value
FROM A
JOIN YearMonth ON YearMonth.LastDay <= A.StartDate
AND YearMonth.FirstDay <= A.EndDate
Depending on the exact interpretation of "Month*Year is in between Start_Date and End_date", you might have to switch one or both of the <=s to <.
I have this simple query:
Select
To_Date('2012-sep-03','yyyy-mon-dd')as Date_Of_Concern,
Count(Player_Id) as Retained
From Player
Where
(To_Date('2012-sep-03','yyyy-mon-dd')-Trunc(Init_Dtime))<=7
Which Results In:
Date_Of_Concern Retained
03-Sep-12 81319
This query counts all of the players in my database who have logged in(init_dtime) within 7 days of a specific date.
As it stands, I will have to run this query multiple times, for every "Day of Concern" that I wish to know about. Is there a better solution?
If you need to run this query for multiple dates, you would need some mean to hold more than one value. I suggest you use a NESTED TABLE object:
CREATE TYPE my_dates AS TABLE OF DATE;
/
SELECT d.column_value AS Date_Of_Concern, count(Player_Id) AS Retained
FROM Player
JOIN TABLE (my_dates(to_date('2012-sep-03', 'yyyy-mon-dd'),
to_date('2012-sep-04', 'yyyy-mon-dd'),
to_date('2012-sep-05', 'yyyy-mon-dd'))) d
ON d.column_value - trunc(Init_Dtime) BETWEEN 0 AND 7
GROUP BY d.column_value
Simply use GROUP BY to get the count by day:
Select
To_Date(Init_Dtime,'yyyy-mon-dd') as Date_Of_Concern,
Count(Player_Id) as Retained
From Player
Where
(To_Date('2012-sep-03','yyyy-mon-dd') - Trunc(Init_Dtime)) <= 7
GROUP BY To_Date(Init_Dtime,'yyyy-mon-dd')
ORDER BY To_Date(Init_Dtime,'yyyy-mon-dd')
within 7 days of a specific date
To be able to do what you want you will have to know what "specific date" you are talking about by either a formula or a date range. Any random date would obviously require the user to either enter that date or modify the query to run for that date (the way you mentioned).
Not sure if I understood you correctly but this is probably what you want. Might have suboptimal performance though.
12:32:22 HR#vm_xe> l
1 with player(id, dt) as (
2 select 1, date '2012-01-01' from dual union all
3 select 2, date '2012-01-01' from dual union all
4 select 3, date '2012-01-02' from dual union all
5 select 4, date '2012-01-03' from dual union all
6 select 5, date '2012-01-04' from dual union all
7 select 6, date '2012-01-05' from dual union all
8 select 7, date '2012-01-06' from dual union all
9 select 8, date '2012-01-07' from dual union all
10 select 9, date '2012-01-08' from dual union all
11 select 10, date '2012-01-09' from dual union all
12 select 11, date '2012-01-10' from dual
13 )
14 select distinct
15 to_char(dt, 'dd-mm-yyyy') dt
16 ,count(*) over (order by trunc(dt) range interval '7' day preceding) week_cnt
17 from player
18* order by 1, 2
12:32:22 HR#vm_xe> /
DT WEEK_CNT
---------- ----------
01-01-2012 2
02-01-2012 3
03-01-2012 4
04-01-2012 5
05-01-2012 6
06-01-2012 7
07-01-2012 8
08-01-2012 9
09-01-2012 8
10-01-2012 8
10 rows selected.
Elapsed: 00:00:00.01
p.s. do not code like
(To_Date('2012-sep-03','yyyy-mon-dd')-Trunc(Init_Dtime))<=7
code like
init_time between to_date('2012-SEP-03', 'yyyy-mon-dd') and to_date('2012-SEP-03', 'yyyy-mon-dd') + 7
Unless you don't care about indexes, of course :)
Can any of these queries be done in SQL?
SELECT dates FROM system
WHERE dates > 'January 5, 2010' AND dates < 'January 30, 2010'
SELECT number FROM system
WHERE number > 10 AND number < 20
I'd like to create a generate_series, and that's why I'm asking.
I assume you want to generate a recordset of arbitrary number of values, based on the first and last value in the series.
In PostgreSQL:
SELECT num
FROM generate_series (11, 19) num
In SQL Server:
WITH q (num) AS
(
SELECT 11
UNION ALL
SELECT num + 1
FROM q
WHERE num < 19
)
SELECT num
FROM q
OPTION (MAXRECURSION 0)
In Oracle:
SELECT level + 10 AS num
FROM dual
CONNECT BY
level < 10
In MySQL:
Sorry.
Sort of for dates...
Michael Valentine Jones from SQL Team has an AWESOME date function
Check it out here:
http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=61519
In Oracle
WITH
START_DATE AS
(
SELECT TO_CHAR(TO_DATE('JANUARY 5 2010','MONTH DD YYYY'),'J')
JULIAN FROM DUAL
),
END_DATE AS
(
SELECT TO_CHAR(TO_DATE('JANUARY 30 2010','MONTH DD YYYY'),'J')
JULIAN FROM DUAL
),
DAYS AS
(
SELECT END_DATE.JULIAN - START_DATE.JULIAN DIFF
FROM START_DATE, END_DATE
)
SELECT TO_CHAR(TO_DATE(N + START_DATE.JULIAN, 'J'), 'MONTH DD YYYY')
DESIRED_DATES
FROM
START_DATE,
(
SELECT LEVEL N
FROM DUAL, DAYS
CONNECT BY LEVEL < DAYS.DIFF
)
If you want to get the list of days, with a SQL like
select ... as days where date is between '2010-01-20' and '2010-01-24'
And return data like:
days
----------
2010-01-20
2010-01-21
2010-01-22
2010-01-23
2010-01-24
This solution uses no loops, procedures, or temp tables. The subquery generates dates for the last thousand days, and could be extended to go as far back or forward as you wish.
select a.Date
from (
select curdate() - INTERVAL (a.a + (10 * b.a) + (100 * c.a)) DAY as Date
from (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as a
cross join (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as b
cross join (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as c
) a
where a.Date between '2010-01-20' and '2010-01-24'
Output:
Date
----------
2010-01-24
2010-01-23
2010-01-22
2010-01-21
2010-01-20
Notes on Performance
Testing it out here, the performance is surprisingly good: the above query takes 0.0009 sec.
If we extend the subquery to generate approx. 100,000 numbers (and thus about 274 years worth of dates), it runs in 0.0458 sec.
Incidentally, this is a very portable technique that works with most databases with minor adjustments.
Not sure if this is what you're asking, but if you are wanting to select something not from a table, you can use 'DUAL'
select 1, 2, 3 from dual;
will return a row with 3 columns, contain those three digits.
Selecting from dual is useful for running functions. A function can be run with manual input instead of selecting something else into it. For example:
select some_func('First Parameter', 'Second parameter') from dual;
will return the results of some_func.
In SQL Server you can use the BETWEEN keyword.
Link:
http://msdn.microsoft.com/nl-be/library/ms187922(en-us).aspx
You can select a range by using WHERE and AND WHERE. I can't speak to performance, but its possible.
The simplest solution to this problem is a Tally or Numbers table. That is a table that simply stores a sequence of integers and/or dates
Create Table dbo.Tally (
NumericValue int not null Primary Key Clustered
, DateValue datetime NOT NULL
, Constraint UK_Tally_DateValue Unique ( DateValue )
)
GO
;With TallyItems
As (
Select 0 As Num
Union All
Select ROW_NUMBER() OVER ( Order By C1.object_id ) As Num
From sys.columns as c1
cross join sys.columns as c2
)
Insert dbo.Tally(NumericValue, DateValue)
Select Num, DateAdd(d, Num, '19000101')
From TallyItems
Where Num
Once you have that table populated, you never need touch it unless you want to expand it. I combined the dates and numbers into a single table but if you needed more numbers than dates, then you could break it into two tables. In addition, I arbitrarily filled the table with 100K rows but you could obviously add more. Every day between 1900-01-01 to 9999-12-31 takes about 434K rows. You probably won't need that many but even if you did, the storage is tiny.
Regardless, this is a common technique to solving many gaps and sequences problems. For example, your original queries all ran in less than tenth of a second. You can also use this sort of table to solve gaps problems like:
Select NumericValue
From dbo.Tally
Left Join MyTable
On Tally.NumericValue = MyTable.IdentityColumn
Where Tally.NumericValue Between SomeLowValue And SomeHighValue