What is the best why to aggregate data for last 7,30,60.. days in SQL - sql

Hi I have a table with date and the number of views that we had in our channel at the same day
date views
03/06/2020 5
08/06/2020 49
09/06/2020 50
10/06/2020 1
13/06/2020 1
16/06/2020 1
17/06/2020 102
23/06/2020 97
29/06/2020 98
07/07/2020 2
08/07/2020 198
12/07/2020 1
14/07/2020 168
23/07/2020 292
No we want to see in each calendar date the sum of the past 7 and 30 days
so the result will be
date sum_of_7d sum_of_30d
01/06/2020 0 0
02/06/2020 0 0
03/06/2020 5 5
04/06/2020 5 5
05/06/2020 5 5
06/06/2020 5 5
07/06/2020 5 5
08/06/2020 54 54
09/06/2020 104 104
10/06/2020 100 105
11/06/2020 100 105
12/06/2020 100 105
13/06/2020 101 106
14/06/2020 101 106
15/06/2020 52 106
16/06/2020 53 107
17/06/2020 105 209
18/06/2020 105 209
so I was wondering what is the best SQL that I can write in order to get it
I'm working on redshift and the actual table (not this example) include over 40B rows
I used to do something like this:
select dates_helper.date
, tbl1.cnt
, sum(tbl1.cnt) over (order by date rows between 7 preceding and current row ) as sum_7d
, sum(tbl1.cnt) over (order by date rows between 30 preceding and current row ) as sum_7d
from bi_db.dates_helper
left join tbl1
on tbl1.invite_date = dates_helper.date

Related

Group repeating pattern in pandas Dataframe

so i have a Dataframe that has a repeating Number Series that i want to group like this:
Number Pattern
Value
Desired Group
Value.1
1
723
1
Max of Group
2
400
1
Max of Group
8
235
1
Max of Group
5
387
2
Max of Group
7
911
2
Max of Group
3
365
3
Max of Group
4
270
3
Max of Group
5
194
3
Max of Group
7
452
3
Max of Group
100
716
4
Max of Group
104
69
4
Max of Group
2
846
5
Max of Group
3
474
5
Max of Group
4
524
5
Max of Group
So essentially the number pattern is always monotonly increasing.
Any Ideas?
You can compare Number Pattern by 1 with cumulative sum by Series.cumsum and then is used GroupBy.transform with max:
df['Desired Group'] = df['Number Pattern'].eq(1).cumsum()
df['Value.1'] = df.groupby('Desired Group')['Value'].transform('max')
print (df)
Number Pattern Value Desired Group Value.1
0 1 723 1 723
1 2 400 1 723
2 3 235 1 723
3 1 387 2 911
4 2 911 2 911
5 1 365 3 452
6 2 270 3 452
7 3 194 3 452
8 4 452 3 452
9 1 716 4 716
10 2 69 4 716
11 1 846 5 846
12 2 474 5 846
13 3 524 5 846
For monotically increasing use:
df['Desired Group'] = (~df['Number Pattern'].diff().gt(0)).cumsum()

In Azure Data bricks I want to get start dates of every week with week numbers from datetime column

This is a sample Data Frame
Date Items_Sold
12/29/2019 10
12/30/2019 20
12/31/2019 30
1/1/2020 40
1/2/2020 50
1/3/2020 60
1/4/2020 35
1/5/2020 56
1/6/2020 34
1/7/2020 564
1/8/2020 6
1/9/2020 45
1/10/2020 56
1/11/2020 45
1/12/2020 37
1/13/2020 36
1/14/2020 479
1/15/2020 47
1/16/2020 47
1/17/2020 578
1/18/2020 478
1/19/2020 3578
1/20/2020 67
1/21/2020 578
1/22/2020 478
1/23/2020 4567
1/24/2020 7889
1/25/2020 8999
1/26/2020 99
1/27/2020 66
1/28/2020 678
1/29/2020 889
1/30/2020 990
1/31/2020 58585
2/1/2020 585
2/2/2020 555
2/3/2020 56
2/4/2020 66
2/5/2020 66
2/6/2020 6634
2/7/2020 588
2/8/2020 2588
2/9/2020 255
I am running this query
%sql
use my_items_table;
select weekofyear(Date), count(items_sold) as Sum
from my_items_table
where year(Date)=2020
group by weekofyear(Date)
order by weekofyear(Date)
I am getting this output. (IMP: I have added random values in Sum)
Week Sum
1 | 300091
2 | 312756
3 | 309363
4 | 307312
5 | 310985
6 | 296889
7 | 315611
But I want in which with week number one column should hold a start date of each week. Like this
Start_Date Week Sum
12/29/2019 1 300091
1/5/2020 2 312756
1/12/2020 3 309363
1/19/2020 4 307312
1/26/2020 5 310985
2/2/2020 6 296889
2/9/2020 7 315611
I am running the query on Azure Data Bricks.
If you have data for all days, then just use min():
select min(date), weekofyear(Date), count(items_sold) as Sum
from my_items_table
where year(Date) = 2020
group by weekofyear(Date)
order by weekofyear(Date);
Note: The year() is the calendar year starting on Jan 1. You are not going to get dates from other years using this query. If that is an issue, I would suggest that you ask a new question asking how to get the first day for the first week of the year.

Sql query to find max value within 60 seconds

Suppose I have a table like below (with different ids) ... here for example took 99 ...
id hist_timestamp DP mints Secnds value
99 2016-08-01 00:09:40 1 9 40 193.214
99 2016-08-01 00:10:20 1 10 20 198.573
99 2016-08-01 00:12:00 1 12 0 194.432
99 2016-08-01 00:52:10 1 52 10 430.455
99 2016-08-01 00:55:50 1 55 50 400.739
99 2016-08-01 01:25:10 2 25 10 193.214
99 2016-08-01 01:25:50 2 25 50 193.032
99 2016-08-01 01:34:30 2 34 30 403.113
99 2016-08-01 01:37:10 2 37 10 417.18
99 2016-08-01 01:38:10 2 38 10 400.495
99 2016-08-01 03:57:00 4 57 0 190.413
99 2016-08-01 03:58:40 4 58 40 191.936
Here I have a value column, starting from the first record I need to find max value within next 60 seconds which will result in below. In the group of those 60 seconds, I need to select one record with max value.
id hist_timestamp DP mints Secnds value
99 2016-08-01 00:10:20 1 10 20 198.573
99 2016-08-01 00:12:00 1 12 0 194.432
99 2016-08-01 00:52:10 1 52 10 430.455
99 2016-08-01 00:55:50 1 55 50 400.739
99 2016-08-01 01:25:10 2 25 10 193.214
99 2016-08-01 01:34:30 2 34 30 403.113
99 2016-08-01 01:37:10 2 37 10 417.18
99 2016-08-01 03:57:00 4 57 0 190.413
99 2016-08-01 03:58:40 4 58 40 191.936
How to build SQL query to get desired output?
Thanks !!!
it's simple. Use GROUP BY with MAX() function.
Example:
select max(value) as value
from table
where hist_timestamp between '2016-08-01 00:10:00' and '2016-08-01 00:10:59'
group by value
This gives you max unique value. Hope it helps
EDIT: if you need more complicated query, like max value for every 60seconds from first date you got in the table, you would need for example recursive CTE or more complicated query with group by

Getting average of product sales each day and calculate number of days that have positive sales

I have this table TARGETSALE that have the following columns
SELECT DATE, WEEK, BRANCH, PROD, TARGETREACH
FROM TARGETSALE
WHERE BRANCH = 1
AND WEEK BETWEEN 52 AND 53;
DATE WEEK BRANCH PROD TARGETREACH
-------------------------------------------------------------------
01/09/2014 52 1 1 50
02/09/2014 52 1 1 -10
03/09/2014 52 1 1 50
04/09/2014 52 1 1 50
05/09/2014 52 1 1 40
06/09/2014 52 1 1 -10
07/09/2014 53 1 1 -5
08/09/2014 53 1 1 0
09/09/2014 53 1 1 10
10/09/2014 53 1 1 20
11/09/2014 53 1 1 30
12/09/2014 53 1 1 40
13/09/2014 53 1 1 0
01/09/2014 52 1 2 20
02/09/2014 52 1 2 0
03/09/2014 52 1 2 0
04/09/2014 52 1 2 10
05/09/2014 52 1 2 20
06/09/2014 52 1 2 10
07/09/2014 53 1 2 -10
08/09/2014 53 1 2 10
09/09/2014 53 1 2 -10
10/09/2014 53 1 2 20
11/09/2014 53 1 2 20
12/09/2014 53 1 2 40
13/09/2014 53 1 2 0
01/09/2014 52 1 3 30
02/09/2014 52 1 3 30
03/09/2014 52 1 3 5
04/09/2014 52 1 3 0
05/09/2014 52 1 3 10
06/09/2014 52 1 3 -10
07/09/2014 53 1 3 -10
08/09/2014 53 1 3 -10
09/09/2014 53 1 3 20
10/09/2014 53 1 3 10
11/09/2014 53 1 3 40
12/09/2014 53 1 3 10
13/09/2014 53 1 3 10
"targetsales" shows how much over the target the sales is, where negative means how far below the target the sales was. How can I do the following:
1. I need to get the average for all the product for each day. Something like this:
DATE BRANCH AVERAGE_SALES_OF_ALL_PRODUCT
01/09/2014 1 33.33
02/09/2014 1 -1.67
...and so on
And then I need to have another query that shows how many days within those two weeks that there's positive average sales. Something like this:
BRANCH 2WEEKS_SINCE DAYS_WITH_POSITIVE_AVERAGE_SALES
1 53 9
Above just an example not a real result.
Sorry, hope this not too confusing. Thank you so much.
In Oracle, the date type might still have a time component. If you do not know if this is there, then use trunc() to remove it:
select trunc(date), branch, avg(targetreach)
from targetsale
group by truncdate, branch
order by 1, 2;
For the second query, you want to use case:
select branch, count(distinct case when targetreach > 0 then date end) as DaysWithPositiveSales
from targetsales
group by branch;
If you know there is one row per date per branch -- and the time component of the date is empty -- then the distinct is not necessary.
1)
SELECT TRUNC(DATE, 'DD'), BRANCH, SUM(TARGETREACH)
FROM TARGETSALE WHERE BRANCH = 1 AND WEEK BETWEEN 52 AND 53
GROUP BY TRUNC(DATE, 'DD'), BRANCH;
2)
SELECT BRANCH, SUM(DECODE(ABS(TARGETREACH), 1, 1, 0)
FROM TARGETSALE WHERE BRANCH = 1 AND WEEK BETWEEN 52 AND 53
GROUP BY BRANCH;

Generate start and end dates in a date range Oracle

I made the below query and getting the following output.But the dates should not be consecutive,the new quater should start with next day.
SELECT x.* , end_dt-st_dt FROM
(SELECT 12-(LEVEL-1) AS Quater ,trunc(sysdate) - 90*LEVEL AS st_dt,trunc(sysdate) - 90*(LEVEL-1) AS end_dt
FROM dual
connect BY LEVEL <= 12
ORDER BY 1
) x
1 8/17/2011 11/15/2011 90
2 11/15/2011 2/13/2012 90
3 2/13/2012 5/13/2012 90
4 5/13/2012 8/11/2012 90
5 8/11/2012 11/9/2012 90
6 11/9/2012 2/7/2013 90
7 2/7/2013 5/8/2013 90
8 5/8/2013 8/6/2013 90
9 8/6/2013 11/4/2013 90
10 11/4/2013 2/2/2014 90
11 2/2/2014 5/3/2014 90
12 5/3/2014 8/1/2014 90
EXPECTED output :
....
...
10 11/2/2013 1/31/2014 90
11 2/1/2014 5/2/2014 90
12 5/3/2014 8/1/2014 90
Is this is what you want? I am not sure
SELECT x.* , end_dt-st_dt FROM
(SELECT 12-(LEVEL-1) AS Quater ,
(CASE WHEN ( trunc(sysdate) - 90*LEVEL = TO_DATE('17-AUG-11','DD-MON-YY'))
THEN trunc(sysdate) - 90*LEVEL
ELSE trunc(sysdate)+1 - 90*LEVEL
END) AS st_dt,trunc(sysdate) - 90*(LEVEL-1) AS end_dt
FROM dual
connect BY LEVEL <= 12
ORDER BY 1
) x;
My output:
1 17-AUG-11 15-NOV-11 90
2 16-NOV-11 13-FEB-12 89
3 14-FEB-12 13-MAY-12 89
4 14-MAY-12 11-AUG-12 89
5 12-AUG-12 09-NOV-12 89
6 10-NOV-12 07-FEB-13 89
7 08-FEB-13 08-MAY-13 89
8 09-MAY-13 06-AUG-13 89
9 07-AUG-13 04-NOV-13 89
10 05-NOV-13 02-FEB-14 89
11 03-FEB-14 03-MAY-14 89
12 04-MAY-14 01-AUG-14 89