How to I count a range in sql? - sql

I have a data that looks like this:
$ Time : int 0 1 5 8 10 11 15 17 18 20 ...
$ NumOfFlights: int 1 6 144 91 504 15 1256 1 1 578 ...
Time col is just 24hr time. From 0 up all the way until 2400
What I hope to get is:
hour | number of flight
-------------------------------------
1st | 240
2nd | 223
... | ...
24th | 122
Where 1st hour is from midnight to 1am, and 2nd is 1am to 2am, and so on until finally 24th which is from 11pm to midnight. And number of flights is just the total of the NumOfFlights within the range.
I've tried:
dbGetQuery(conn,"
SELECT
flights.CRSDepTime AS Time,
COUNT(flights.CRSDepTime) AS NumOnTimeFlights
FROM flights
GROUP BY CRSDepTime/60
")
But I realise it can't be done this way. The results that I get will have 40 values for time.
> head
Time NumOnTimeFlights
1 50 6055
2 105 2383
3 133 674
4 200 446
5 245 266
6 310 34
> tail
Time NumOnTimeFlights
35 2045 48136
36 2120 103229
37 2215 15737
38 2245 36416
39 2300 15322
40 2355 8018

If your CRSDepTime column is an integer encoded time like HHmm then CRSDepTime/100 will extract the hour.
SELECT
CRSDepTime/100 AS hh,
COUNT(flights.CRSDepTime) AS NumOnTimeFlights
FROM flights
GROUP BY CRSDepTime/100

Related

SQL percentage calculation over the hour

I have a table consisting of thousands of devices similar to the one below, and I want to calculate the time spent by the devices in certain locations as a percentage on an hourly basis using this table.
(Values are given as an example.)
device
geohash
gridtype
total_hour_count
total_day_count
avg_spent_hour
67a47cd76baff7e2
sxk9g3
Work
500
25
20.00
67a47cd76baff7e2
swy9g3
Home
590
27
18.00
67a47cd76baff7e2
szbvfd
Other
420
18
9.28
02d171810d7ae1f5
swdvdf
Home
274
30
18,54
02d171810d7ae1f5
sdefvx
Work
184
22
17,51
02d171810d7ae1f5
dfvcxv
Other
122
19
14,12
...
...
...
...
...
...
As an example the desired output:
deviceid
home_percent
work_percent
other_percent
67a47cd76baff7e2
35
35
30
02d171810d7ae1f5
50
25
25
784faeff1c8b76c1
90
5
5
28fa9ca3dfff8a6f
80
10
10
f2f6324d5149e336
80
0
20
d84410d139981c19
25
50
25
...
...
...
...
Thanks for your help.

How to sum multiple columns ending a certain word and keep summing in a new column?

Date A ZONE_GEN A ZONE_LOAD B ZONE_GEN A ZONE_LOAD
1-1-2010 20 15 30 25
1-2-2010 30 25 40 35
.... ... ... ... ...
1-12-2010 15 20 20 14
I want to create two new columns having names "Gen" and "LOAD" then sum each column ending with "Gen" in GEN column
likewise column ending with Load
I would like to get output as below:
Date Gen Load
1-1-2010 50 45
1-2-2010 70 60
...
1-12-2010 35 34
Try:
def f(c):
return c.rsplit('_', 1)[1]
df.set_index('Date').groupby(f, axis=1).sum().reset_index()
Date GEN LOAD
0 1-1-2010 50 40
1 1-2-2010 70 60
2 1-12-2010 35 34

In Azure Data bricks I want to get start dates of every week with week numbers from datetime column

This is a sample Data Frame
Date Items_Sold
12/29/2019 10
12/30/2019 20
12/31/2019 30
1/1/2020 40
1/2/2020 50
1/3/2020 60
1/4/2020 35
1/5/2020 56
1/6/2020 34
1/7/2020 564
1/8/2020 6
1/9/2020 45
1/10/2020 56
1/11/2020 45
1/12/2020 37
1/13/2020 36
1/14/2020 479
1/15/2020 47
1/16/2020 47
1/17/2020 578
1/18/2020 478
1/19/2020 3578
1/20/2020 67
1/21/2020 578
1/22/2020 478
1/23/2020 4567
1/24/2020 7889
1/25/2020 8999
1/26/2020 99
1/27/2020 66
1/28/2020 678
1/29/2020 889
1/30/2020 990
1/31/2020 58585
2/1/2020 585
2/2/2020 555
2/3/2020 56
2/4/2020 66
2/5/2020 66
2/6/2020 6634
2/7/2020 588
2/8/2020 2588
2/9/2020 255
I am running this query
%sql
use my_items_table;
select weekofyear(Date), count(items_sold) as Sum
from my_items_table
where year(Date)=2020
group by weekofyear(Date)
order by weekofyear(Date)
I am getting this output. (IMP: I have added random values in Sum)
Week Sum
1 | 300091
2 | 312756
3 | 309363
4 | 307312
5 | 310985
6 | 296889
7 | 315611
But I want in which with week number one column should hold a start date of each week. Like this
Start_Date Week Sum
12/29/2019 1 300091
1/5/2020 2 312756
1/12/2020 3 309363
1/19/2020 4 307312
1/26/2020 5 310985
2/2/2020 6 296889
2/9/2020 7 315611
I am running the query on Azure Data Bricks.
If you have data for all days, then just use min():
select min(date), weekofyear(Date), count(items_sold) as Sum
from my_items_table
where year(Date) = 2020
group by weekofyear(Date)
order by weekofyear(Date);
Note: The year() is the calendar year starting on Jan 1. You are not going to get dates from other years using this query. If that is an issue, I would suggest that you ask a new question asking how to get the first day for the first week of the year.

sum every 7 rows from column sales while ints representing n days away from installation of promotion-material (before and after the installation)

2 Stores, each with its sales data per day. Both get equipped with promotion material but not at the same day. After the pr_day the promotion material will stay there. Meaning, there should be a sales boost from the day of the installation of the promotion material.
Installation Date:
Store A - 05/15/2019
Store B - 05/17/2019
To see if the promotion was a success we measure the sales before the pr-date and after by returning number of sales (not revenue but pieces sold) next to the int, indicating how far away it was from the pr-day: (sum of sales from both stores)
pr_date| sales
-28 | 35
-27 | 40
-26 | 21
-25 | 36
-24 | 29
-23 | 36
-22 | 43
-21 | 31
-20 | 32
-19 | 21
-18 | 17
-17 | 34
-16 | 34
-15 | 37
-14 | 32
-13 | 29
-12 | 25
-11 | 45
-10 | 43
-9 | 26
-8 | 27
-7 | 33
-6 | 36
-5 | 17
-4 | 34
-3 | 33
-2 | 21
-1 | 28
1 | 16
2 | 6
3 | 16
4 | 29
5 | 32
6 | 30
7 | 30
8 | 30
9 | 17
10 | 12
11 | 35
12 | 30
13 | 15
14 | 28
15 | 14
16 | 16
17 | 13
18 | 27
19 | 22
20 | 34
21 | 33
22 | 22
23 | 13
24 | 35
25 | 28
26 | 19
27 | 17
28 | 29
you may noticed, that i already removed the day from the installation of the promotion material.
The issue starts with the different installation date of the pr-material. If I group by weekday it will combine the sales from different days away from the installation. It will just start at whatever weekday i define:
Select DATEDIFF(wk, change_date, sales_date), sum(sales)
from tbl_sales
group by DATEDIFF(wk, change_date, sales_date)
result:
week | sales
-4 | 75
-3 | 228
-2 | 204
-1 | 235
0 | 149
1 | 173
2 | 151
3 | 167
4 | 141
the numbers are not from the right days and there is one week to many. Guess this is comming from sql grouping the sales starting from Sunday and because the pr_dates are different it generates more than just the 8 weeks (4 before, 4 after)
trying to find a sustainable solution i couldn't find the right fit and decided to post it here. Very thankfull for every thoughts of the community about this topics. Quite sure there is a smart solution for this problem cause it doesn't look like a rare request to me
I tried it with over as well but i don't see how to sum the 7 days together as they are not date days anymore but delta to the pr-date
Desired Result:
week | sales
-4 | 240
-3 | 206
-2 | 227
-1 | 202
1 | 159
2 | 167
3 | 159
4 | 163
Attachment from my analysis by hand what the Results should be:
Why do i need the weekly summary -> the Stores are performing differently depending on the weekday. With summing 7 days together I make sure we don't compare mondays to sundays and so on. Furthermore, the result will be represented in a Line- or Barchart where you could see the weekday variation in a ugly way. Meaning it will be hard for your eyes to see the trend/devolopment of the salesnumbers. Whereas the weekly comparison will absorb this variations.
If anything is unclear please feel free to let me know so i could provide you with futher details
Thank you very much
Additional the different Installation date overview:
Shop A:
store A
delta date sales
-28 17.04.2019 20
-27 18.04.2019 20
-26 19.04.2019 13
-25 20.04.2019 25
-24 21.04.2019 16
-23 22.04.2019 20
-22 23.04.2019 26
-21 24.04.2019 15
-20 25.04.2019 20
-19 26.04.2019 13
-18 27.04.2019 13
-17 28.04.2019 20
-16 29.04.2019 21
-15 30.04.2019 20
-14 01.05.2019 17
-13 02.05.2019 13
-12 03.05.2019 9
-11 04.05.2019 34
-10 05.05.2019 28
-9 06.05.2019 19
-8 07.05.2019 14
-7 08.05.2019 23
-6 09.05.2019 18
-5 10.05.2019 9
-4 11.05.2019 22
-3 12.05.2019 17
-2 13.05.2019 14
-1 14.05.2019 19
0 15.05.2019 11
1 16.05.2019 0
2 17.05.2019 0
3 18.05.2019 1
4 19.05.2019 19
5 20.05.2019 18
6 21.05.2019 14
7 22.05.2019 11
8 23.05.2019 12
9 24.05.2019 8
10 25.05.2019 7
11 26.05.2019 19
12 27.05.2019 15
13 28.05.2019 15
14 29.05.2019 11
15 30.05.2019 5
16 31.05.2019 8
17 01.06.2019 10
18 02.06.2019 19
19 03.06.2019 14
20 04.06.2019 21
21 05.06.2019 22
22 06.06.2019 7
23 07.06.2019 6
24 08.06.2019 23
25 09.06.2019 17
26 10.06.2019 9
27 11.06.2019 8
28 12.06.2019 23
Shop B:
store B
delta date sales
-28 19.04.2019 15
-27 20.04.2019 20
-26 21.04.2019 8
-25 22.04.2019 11
-24 23.04.2019 13
-23 24.04.2019 16
-22 25.04.2019 17
-21 26.04.2019 16
-20 27.04.2019 12
-19 28.04.2019 8
-18 29.04.2019 4
-17 30.04.2019 14
-16 01.05.2019 13
-15 02.05.2019 17
-14 03.05.2019 15
-13 04.05.2019 16
-12 05.05.2019 16
-11 06.05.2019 11
-10 07.05.2019 15
-9 08.05.2019 7
-8 09.05.2019 13
-7 10.05.2019 10
-6 11.05.2019 18
-5 12.05.2019 8
-4 13.05.2019 12
-3 14.05.2019 16
-2 15.05.2019 7
-1 16.05.2019 9
0 17.05.2019 9
1 18.05.2019 16
2 19.05.2019 6
3 20.05.2019 15
4 21.05.2019 10
5 22.05.2019 14
6 23.05.2019 16
7 24.05.2019 19
8 25.05.2019 18
9 26.05.2019 9
10 27.05.2019 5
11 28.05.2019 16
12 29.05.2019 15
13 30.05.2019 17
14 31.05.2019 9
15 01.06.2019 8
16 02.06.2019 3
17 03.06.2019 8
18 04.06.2019 8
19 05.06.2019 13
20 06.06.2019 11
21 07.06.2019 15
22 08.06.2019 7
23 09.06.2019 12
24 10.06.2019 11
25 11.06.2019 10
26 12.06.2019 9
27 13.06.2019 6
28 14.06.2019 9
Try
select wk, sum(sales)
from (
select
isnull(sa.sales,0) + isnull(sb.sales,0) sales
, isnull(sa.delta , sb.delta) delta
, case when isnull(sa.delta , sb.delta) = 0 then 0
else case when isnull(sa.delta , sb.delta) > 0 then (isnull(sa.delta , sb.delta) -1) /7 +1
else (isnull(sa.delta , sb.delta) +1) /7 -1
end
end wk
from shopA sa
full join shopB sb on sa.delta=sb.delta
) t
group by wk;
sql fiddle
A more readable version, it doesn't run faster, just using CROSS APLLY this way allows to indroduce sort of intermediate variables for cleaner code.
select wk, sum(sales)
from (
select
isnull(sa.sales,0) + isnull(sb.sales,0) sales
, dlt delta
, case when dlt = 0 then 0
else case when dlt > 0 then (dlt - 1) / 7 + 1
else (dlt + 1) / 7 - 1
end
end wk
from shopA sa
full join shopB sb on sa.delta=sb.delta
cross apply (
select dlt = isnull(sa.delta, sb.delta)
) tmp
) t
group by wk;
Finally, if you already have a query which produces a dataset with the (pr_date, sales) columns
select wk, sum(sales)
from (
select sales
, case when pr_date = 0 then 0
else case when pr_date > 0 then (pr_date - 1) / 7 + 1
else (pr_date + 1) / 7 - 1
end
end wk
from (
-- ... you query here ...
)pr_date_sales
) t
group by wk;
I think you just need to take the day difference and use arithmetic. Using datediff() with week counts week-boundaries -- which is not what you want. That is, it normalizes the weeks to calendar weeks.
You want to leave out the day of the promotion, which makes this a wee bit more complicated.
I think this is the logic:
Select v.week_diff, sum(sales)
from tbl_sales s cross join
(values (case when change_date < sales_date
then (datediff(day, change_date, sales_date) + 1) / 7
else (datediff(day, change_date, sales_date) - 1) / 7
end)
) v(week_diff)
where change_date <> sales_date
group by v.week_diff;
There might be an off-by-one problem, depending on what you really want to do when the dates are the same.

Exclude the specific kind of record

I am using SQL Server 2008 R2. I do have records as below in a table :
Id Sys Dia Type UniqueId
1 156 20 first 12345
2 157 20 first 12345
3 150 15 last 12345
4 160 17 Average 12345
5 150 15 additional 12345
6 157 35 last 891011
7 156 25 Average 891011
8 163 35 last 789521
9 145 25 Average 789521
10 156 20 first 963215
11 150 15 last 963215
12 160 17 Average 963215
13 156 20 first 456878
14 157 20 first 456878
15 150 15 last 456878
16 160 17 Average 456878
17 150 15 last 246977
18 160 17 Average 246977
19 150 15 additional 246977
Regarding this data, these records are kind of groups that have common UniqueId. The records can be of type "first, last, average and additional". Now, from these records I want to select "average" type of records only if they have "first" or "additional" kind of reading in group. Else I want to exclude them from selection..
The expected result is :
Id Sys Dia Type UniqueId
1 156 20 first 12345
2 157 20 first 12345
3 150 15 last 12345
4 160 17 Average 12345
5 150 15 additional 12345
6 157 35 last 891011
7 163 35 last 789521
8 156 20 first 963215
9 150 15 last 963215
10 160 17 Average 963215
11 156 20 first 456878
12 157 20 first 456878
13 150 15 last 456878
14 160 17 Average 456878
15 150 15 last 246977
16 160 17 Average 246977
17 150 15 additional 246977
In short, I don't want to select the record that have type="Average" and have only "last" type of record with same UniqueId. Any solution?
Using EXISTS operator along correlated sub-query:
SELECT * FROM dbo.Table1 t1
WHERE [Type] != 'Average'
OR EXISTS (SELECT * FROM Table1 t2
WHERE t1.UniqueId = t2.UniqueId
AND t1.[Type] = 'Average'
AND t2.[Type] IN ('first','additional'))
SQLFiddle DEMO
Try something like this:
SELECT * FROM MyTable WHERE [Type] <> 'Average'
UNION ALL
SELECT * FROM MyTable T WHERE [Type] = 'Average'
AND EXISTS (SELECT * FROM MyTable
WHERE [Type] IN ('first', 'additional')
AND UniqueId = T.UniqueId)
The first SELECT statement gets all records except the ones with Type = 'Average'. The second SELECT statement gets only the Type = 'Average' records that have at least one record with the same UniqueId, that is of type 'first' or 'additional'.