Get temperature from live data if available, else avg over historical data - sql

I am trying to get either live temperature for a trip, if live data is not available get an average temperature from histroical data.
I have made a simple version of my problem, with these tabels:
Trip
id departure_time arrival_time location_id
1 2018-04-07 07:00:00 2018-04-14 17:00:00 1
2 2018-04-14 07:00:00 2018-04-21 17:00:00 1
Location
id name
1 Location
Weather
id temperature date location_id
1 20 2018-04-07 1
2 20 2018-04-08 1
3 20 2018-04-09 1
4 20 2018-04-10 1
5 20 2018-04-11 1
6 20 2018-04-12 1
7 20 2018-04-13 1
8 20 2018-04-14 1
9 15 2016-04-07 1
10 15 2016-04-08 1
11 15 2016-04-09 1
12 15 2016-04-10 1
13 15 2016-04-11 1
14 15 2016-04-12 1
15 15 2016-04-13 1
16 15 2016-04-14 1
17 19 2017-04-07 1
18 19 2017-04-08 1
19 19 2017-04-09 1
20 19 2017-04-10 1
21 19 2017-04-11 1
22 19 2017-04-12 1
23 19 2017-04-13 1
24 19 2017-04-14 1
25 15 2017-04-15 1
26 15 2017-04-16 1
27 15 2017-04-17 1
28 15 2017-04-18 1
29 15 2017-04-19 1
30 15 2017-04-20 1
31 15 2017-04-21 1
32 19 2016-04-15 1
33 19 2016-04-16 1
34 19 2016-04-17 1
35 19 2016-04-18 1
36 19 2016-04-19 1
37 19 2016-04-20 1
38 19 2016-04-21 1
The problem i am having is that since these trips are last-minute trips i have "live" data for trips departing within the next week.
So i would like to get a either live forecast if available, else an avg for the temperature from the years from the previous years.
http://sqlfiddle.com/#!17/bce59/3
Here is the approach i took in order to try and solve the problem.
If any details has been forgotten please ask.
Expected result:
id departure_time arrival_time location_id temperature
1 2018-04-07 07:00:00 2018-04-14 17:00:00 1 20
1 2018-04-07 07:00:00 2018-04-14 17:00:00 1 20
1 2018-04-07 07:00:00 2018-04-14 17:00:00 1 20
1 2018-04-07 07:00:00 2018-04-14 17:00:00 1 20
1 2018-04-07 07:00:00 2018-04-14 17:00:00 1 20
1 2018-04-07 07:00:00 2018-04-14 17:00:00 1 20
1 2018-04-07 07:00:00 2018-04-14 17:00:00 1 20
1 2018-04-07 07:00:00 2018-04-14 17:00:00 1 20
2 2018-04-14 07:00:00 2018-04-21 17:00:00 1 20
2 2018-04-14 07:00:00 2018-04-21 17:00:00 1 17
2 2018-04-14 07:00:00 2018-04-21 17:00:00 1 17
2 2018-04-14 07:00:00 2018-04-21 17:00:00 1 17
2 2018-04-14 07:00:00 2018-04-21 17:00:00 1 17
2 2018-04-14 07:00:00 2018-04-21 17:00:00 1 17
2 2018-04-14 07:00:00 2018-04-21 17:00:00 1 17
2 2018-04-14 07:00:00 2018-04-21 17:00:00 1 17

Using generate_series function to make a Calendar from trip table on subquery.
Then Left JOIN on subquery by dates you might get match weather you can get it temperature. if temperature is null on w.temperature then get avg temperature
You can try this.
SELECT t.id,
t.departure_time,
t.arrival_time,
l.id as "location_id",
coalesce(w.temperature,(select FLOOR(avg(temperature)) from weather)) as "temperature"
FROM
location l inner join
(
select id,
location_id,
departure_time,
arrival_time,
generate_series(departure_time :: timestamp,arrival_time::timestamp,'1 day'::interval) as dates
from trip
) t on t.location_id = l.id LEFT JOIN weather w on t.dates::date = w.date::date
sqlfiddle:http://sqlfiddle.com/#!17/bce59/48
EDIT
You could use a CTE query get Avg by year instead of the subquery in coalesce function on select clause.
WITH weather_avg AS (
SELECT floor(avg(a)) avgTemp
from
(
SELECT
extract(YEAR from weather.date) AS YEAR,
floor(avg(weather.temperature)) a
FROM weather
group by extract(YEAR from weather.date)
) t
)
SELECT t.id,
t.departure_time,
t.arrival_time,
t.location_id as "location_id",
coalesce(w.temperature,(select avgTemp from weather_avg)) as "temperature"
FROM
(
select t.id,
t.location_id,
t.departure_time,
t.arrival_time,
generate_series(departure_time :: timestamp,arrival_time::timestamp,'1 day'::interval) as dates
from trip t inner join location l on t.location_id = l.id
) t LEFT JOIN weather w
on t.dates::date = w.date::date
sqlfiddle:http://sqlfiddle.com/#!17/bce59/76

Related

How to count the number of campaigns per day based on the start and end dates of the campaigns in SQL

I need to count the number of campaigns per day based on the start and end dates of the campaigns
Input Table:
Campaign name
Start date
End date
Campaign A
2022-07-10
2022-09-25
Campaign B
2022-08-06
2022-10-07
Campaign C
2022-07-30
2022-09-10
Campaign D
2022-08-26
2022-10-24
Campaign E
2022-07-17
2022-09-29
Campaign F
2022-08-24
2022-09-12
Campaign G
2022-08-11
2022-10-24
Campaign H
2022-08-26
2022-11-22
Campaign I
2022-08-29
2022-09-25
Campaign J
2022-08-21
2022-11-15
Campaign K
2022-07-20
2022-09-18
Campaign L
2022-07-31
2022-11-20
Campaign M
2022-08-17
2022-10-10
Campaign N
2022-07-27
2022-09-07
Campaign O
2022-07-29
2022-09-26
Campaign P
2022-07-06
2022-09-15
Campaign Q
2022-07-16
2022-09-22
Out needed (result):
Date
Count unique campaigns
2022-07-02
17
2022-07-03
47
2022-07-04
5
2022-07-05
5
2022-07-06
25
2022-07-07
27
2022-07-08
17
2022-07-09
58
2022-07-10
23
2022-07-11
53
2022-07-12
18
2022-07-13
29
2022-07-14
52
2022-07-15
7
2022-07-16
17
2022-07-17
37
2022-07-18
33
How do I need to write the SQL command to get the above result? thanks all
In the following solutions we leverage string_split with combination with replicate to generate new records.
select dt as date
,count(*) as Count_unique_campaigns
from
(
select *
,dateadd(day, row_number() over(partition by Campaign_name order by (select null))-1, Start_date) as dt
from (
select *
from t
outer apply string_split(replicate(',',datediff(day, Start_date, End_date)),',')
) t
) t
group by dt
order by dt
date
Count_unique_campaigns
2022-07-06
1
2022-07-07
1
2022-07-08
1
2022-07-09
1
2022-07-10
2
2022-07-11
2
2022-07-12
2
2022-07-13
2
2022-07-14
2
2022-07-15
2
2022-07-16
3
2022-07-17
4
2022-07-18
4
2022-07-19
4
2022-07-20
5
2022-07-21
5
2022-07-22
5
2022-07-23
5
2022-07-24
5
2022-07-25
5
2022-07-26
5
2022-07-27
6
2022-07-28
6
2022-07-29
7
2022-07-30
8
2022-07-31
9
2022-08-01
9
2022-08-02
9
2022-08-03
9
2022-08-04
9
2022-08-05
9
2022-08-06
10
2022-08-07
10
2022-08-08
10
2022-08-09
10
2022-08-10
10
2022-08-11
11
2022-08-12
11
2022-08-13
11
2022-08-14
11
2022-08-15
11
2022-08-16
11
2022-08-17
12
2022-08-18
12
2022-08-19
12
2022-08-20
12
2022-08-21
13
2022-08-22
13
2022-08-23
13
2022-08-24
14
2022-08-25
14
2022-08-26
16
2022-08-27
16
2022-08-28
16
2022-08-29
17
2022-08-30
17
2022-08-31
17
2022-09-01
17
2022-09-02
17
2022-09-03
17
2022-09-04
17
2022-09-05
17
2022-09-06
17
2022-09-07
17
2022-09-08
16
2022-09-09
16
2022-09-10
16
2022-09-11
15
2022-09-12
15
2022-09-13
14
2022-09-14
14
2022-09-15
14
2022-09-16
13
2022-09-17
13
2022-09-18
13
2022-09-19
12
2022-09-20
12
2022-09-21
12
2022-09-22
12
2022-09-23
11
2022-09-24
11
2022-09-25
11
2022-09-26
9
2022-09-27
8
2022-09-28
8
2022-09-29
8
2022-09-30
7
2022-10-01
7
2022-10-02
7
2022-10-03
7
2022-10-04
7
2022-10-05
7
2022-10-06
7
2022-10-07
7
2022-10-08
6
2022-10-09
6
2022-10-10
6
2022-10-11
5
2022-10-12
5
2022-10-13
5
2022-10-14
5
2022-10-15
5
2022-10-16
5
2022-10-17
5
2022-10-18
5
2022-10-19
5
2022-10-20
5
2022-10-21
5
2022-10-22
5
2022-10-23
5
2022-10-24
5
2022-10-25
3
2022-10-26
3
2022-10-27
3
2022-10-28
3
2022-10-29
3
2022-10-30
3
2022-10-31
3
2022-11-01
3
2022-11-02
3
2022-11-03
3
2022-11-04
3
2022-11-05
3
2022-11-06
3
2022-11-07
3
2022-11-08
3
2022-11-09
3
2022-11-10
3
2022-11-11
3
2022-11-12
3
2022-11-13
3
2022-11-14
3
2022-11-15
3
2022-11-16
2
2022-11-17
2
2022-11-18
2
2022-11-19
2
2022-11-20
2
2022-11-21
1
2022-11-22
1
For SQL in Azure and SQL Server 2022 we have a cleaner solution based on [ordinal][4].
"The enable_ordinal argument and ordinal output column are currently
supported in Azure SQL Database, Azure SQL Managed Instance, and Azure
Synapse Analytics (serverless SQL pool only). Beginning with SQL
Server 2022 (16.x) Preview, the argument and output column are
available in SQL Server."
select dt as date
,count(*) as Count_unique_campaigns
from
(
select *
,dateadd(day, ordinal-1, Start_date) as dt
from (
select *
from t
outer apply string_split(replicate(',',datediff(day, Start_date, End_date)),',', 1)
) t
) t
group by dt
order by dt
Fiddle
Your sample data doesn't seem to match your desired results, but I think what you're after is this:
DECLARE #Start date, #End date;
-- first, find the earliest and last date:
SELECT #Start = MIN([Start date]), #End = MAX([End date])
FROM dbo.Campaigns;
-- now use a recursive CTE to build a date range,
-- and count the number of campaigns that have a row
-- where the campaign was active on that date:
WITH d(d) AS
(
SELECT #Start
UNION ALL
SELECT DATEADD(DAY, 1, d) FROM d WHERE d < #End
)
SELECT
[Date] = d,
[Count unique campaigns] = COUNT(*)
FROM d
INNER JOIN dbo.Campaigns AS c
ON d.d >= c.[Start date] AND d.d <= c.[End date]
GROUP BY d.d OPTION (MAXRECURSION 32767);
Working example in this fiddle.

Filter rows of a table based on a condition that implies: 1) value of a field within a range 2) id of the business and 3) date?

I want to filter a TableA, taking into account only those rows whose "TotalInvoice" field is within the minimum and maximum values expressed in a ViewB, based on month and year values and RepairShopId (the sample data only has one RepairShopId, but all the data has multiple IDs).
In the view I have minimum and maximum values for each business and each month and year.
TableA
RepairOrderDataId
RepairShopId
LastUpdated
TotalInvoice
1
10
2017-06-01 07:00:00.000
765
1
10
2017-06-05 12:15:00.000
765
2
10
2017-02-25 13:00:00.000
400
3
10
2017-10-19 12:15:00.000
295679
4
10
2016-11-29 11:00:00.000
133409.41
5
10
2016-10-28 12:30:00.000
127769
6
10
2016-11-25 16:15:00.000
122400
7
10
2016-10-18 11:15:00.000
1950
8
10
2016-11-07 16:45:00.000
79342.7
9
10
2016-11-25 19:15:00.000
1950
10
10
2016-12-09 14:00:00.000
111559
11
10
2016-11-28 10:30:00.000
106333
12
10
2016-12-13 18:00:00.000
23847.4
13
10
2016-11-01 17:00:00.000
22782.9
14
10
2016-10-07 15:30:00.000
NULL
15
10
2017-01-06 15:30:00.000
138958
16
10
2017-01-31 13:00:00.000
244484
17
10
2016-12-05 09:30:00.000
180236
18
10
2017-02-14 18:30:00.000
92752.6
19
10
2016-10-05 08:30:00.000
161952
20
10
2016-10-05 08:30:00.000
8713.08
ViewB
RepairShopId
Orders
Average
MinimumValue
MaximumValue
year
month
yearMonth
10
1
370343
370343
370343
2015
7
2015-7
10
1
109645
109645
109645
2015
10
2015-10
10
1
148487
148487
148487
2015
12
2015-12
10
1
133409.41
133409.41
133409.41
2016
3
2016-3
10
1
19261
19261
19261
2016
8
2016-8
10
4
10477.3575
2656.65644879821
18298.0585512018
2016
9
2016-9
10
69
15047.709565
10
90942.6052417394
2016
10
2016-10
10
98
22312.077244
10
147265.581935242
2016
11
2016-11
10
96
20068.147395
10
99974.1750708773
2016
12
2016-12
10
86
25334.053372
10
184186.985160105
2017
1
2017-1
10
69
21410.63855
10
153417.00126689
2017
2
2017-2
10
100
13009.797
10
59002.3589332934
2017
3
2017-3
10
101
11746.191287
10
71405.3391452842
2017
4
2017-4
10
123
11143.49756
10
55306.8202091131
2017
5
2017-5
10
197
15980.55406
10
204538.144334771
2017
6
2017-6
10
99
10852.496969
10
63283.9899761938
2017
7
2017-7
10
131
52601.981526
10
1314998.61355187
2017
8
2017-8
10
124
10983.221854
10
59444.0535811233
2017
9
2017-9
10
115
12467.148434
10
72996.6054527277
2017
10
2017-10
10
123
14843.379593
10
129673.931373139
2017
11
2017-11
10
111
8535.455945
10
50328.1495501884
2017
12
2017-12
I've tried:
SELECT *
FROM TableA
INNER JOIN ViewB ON TableA.RepairShopId = ViewB.RepairShopId
WHERE TotalInvoice > MinimumValue AND TotalInvoice < MaximumValue
AND TableA.RepairShopId = ViewB.RepairShopId
But I'm not sure how to compare it the yearMonth field with the datetime field "LastUpdated".
Any help is very appreciated!
here is how you can do it:
I assumed LastUpdated column is the column from tableA which indicate date of
SELECT *
FROM TableA A
INNER JOIN ViewB B
ON A.RepairShopId = B.RepairShopId
AND A.TotalInvoice > B.MinimumValue
AND A.TotalInvoice < B.MaximumValue
AND YEAR(LastUpdated) = B.year
AND MONTH(LastUpdated) = B.month

My SQL Query is working on one date, but I want start date to end date

I am using SQL Server 2005
I have two tables:
CheckInOut
TR BadgeNum USERID Dated Time CHECKTYPE
------- --------- ------ ----------------------- ----------------------- ----------
2337334 4 1 2018-04-01 00:00:00.000 2018-04-14 10:10:58.000 I
2337334 4 1 2018-04-01 00:00:00.000 2018-04-14 18:10:00.000 O
2337334 4 1 2018-04-02 00:00:00.000 2018-04-14 10:00:10.000 I
2337335 4 1 2018-04-02 00:00:00.000 2018-04-14 18:14:27.000 O
2337336 4 1 2018-04-03 00:00:00.000 2018-04-14 10:22:10.000 I
2337334 4 1 2018-04-03 00:00:00.000 2018-04-14 18:03:11.000 O
2337337 44 5 2018-04-01 00:00:00.000 2018-04-14 09:27:03.000 I
2337337 44 5 2018-04-01 00:00:00.000 2018-04-14 18:27:42.000 O
2337337 44 5 2018-04-02 00:00:00.000 2018-04-14 10:00:50.000 I
2337337 44 5 2018-04-02 00:00:00.000 2018-04-14 18:02:25.000 O
2337337 44 5 2018-04-03 00:00:00.000 2018-04-14 08:58:36.000 I
2337337 44 5 2018-04-03 00:00:00.000 2018-04-14 18:12:18.000 O
UserInfo
Tr UserID BadgeNumber Name
----- ------- ----------- --------------
13652 44 5 SAMIA NAZ
13653 4 1 Waqar Yousufzai
I need to calculate presence hours for each day for each user. My below query is working fine for given day. But I need to calculate for a given range. How do I get expected result?
Select isnull(max(ch.userid), 0)As 'ID'
,isnull(max(ch.badgenum), 0)as 'Badge#'
,isnull(max(convert(Char(10), ch.dated, 103)), '00:00')as 'Date'
,isnull(max(ui.name),'Empty')as 'Name'
,isnull(min(convert(VARCHAR(26), ch.time, 108)), '00:00') as 'Time In'
,case when min(ch.time) = max(ch.time) then '' else isnull(max(convert(VARCHAR(26), ch.time, 108)), '00:00') end as 'TimeOut'
,case when min(ch.time) = max(ch.time) then 'Absent' else 'Present' end as 'Status'
,isnull(CONVERT(varchar(3),DATEDIFF(minute,min(ch.time), max(ch.time))/60) + ' hrs and ' +
RIGHT('0' + CONVERT(varchar(2),DATEDIFF(minute,min(ch.time),max(ch.time))%60),2) + 'Min' , 0) as 'Total Hrs'
From CHECKINOUT ch left Join userinfo ui on ch.badgenum = ui.badgenumber
Where ch.Dated between '2018-04-01' and '2018-04-03' GROUP BY ch.badgenum
Query result
ID Badge# Date Name Time In TimeOut Status Total Hrs
--- ------ ---------- --------------- -------- ---------- -------- -----------------
4 1 03/04/2018 Waqar Yousufzai 11:33:34 18:24:23 Present 30 hrs and 14Min
82 3 03/04/2018 TANVEER ANSARI 09:37:14 19:18:22 Present 32 hrs and 37Min
13 4 03/04/2018 07:19:26 09:30:17 Present 21 hrs and 49Min
44 5 03/04/2018 SAMIA NAZ 08:53:15 18:25:21 Present 33 hrs and 24Min
28 7 03/04/2018 Anees Ahmad 08:34:57 22:00:38 Present 61 hrs and 25Min
46 8 03/04/2018 Shazia - OT 08:10:41 16:15:05 Present 32 hrs and 01Min
Expected result
ID Badge# Date Name Time In TimeOut Status Total Hrs
--- ------ ---------- --------------- -------- ---------- -------- -----------------
4 1 01/04/2018 Waqar Yousufzai 10:30:00 18:00:00 Present 7 hrs and 30Min
4 1 02/04/2018 Waqar Yousufzai 10:30:00 18:00:00 Present 7 hrs and 30Min
4 1 03/04/2018 Waqar Yousufzai 10:00:00 18:00:00 Present 8 hrs and 00Min
44 5 01/04/2018 SAMIA 08:00:00 18:00:00 Present 10 hrs and 00Min
44 5 02/04/2018 SAMIA 08:30:00 18:00:00 Present 9 hrs and 30Min
44 5 03/04/2018 SAMIA 08:00:00 18:00:00 Present 10 hrs and 00Min
You shouldn't do aggregation on date value, it must be part of grouping. Get time out and time in using conditional aggregation. And count total hours worked. Your query should be something like:
select
BadgeNum, USERID, Dated, Name
, right('0' + cast(datediff(mi, [in], [out]) / 60 as varchar(10)), 2) + ':'
+ right('0' + cast(datediff(mi, [in], [out]) % 60 as varchar(10)), 2)
from (
select
ch.BadgeNum, ch.USERID, dated = cast(ch.Dated as date), ui.Name
, [in] = min(case when ch.CHECKTYPE = 'I' then ch.Time end)
, [out] = min(case when ch.CHECKTYPE = 'O' then ch.Time end)
from
CheckInOut ch
left join UserInfo ui on ch.USERID = ui.badgenumber
where
ch.Dated >= '20180401'
and ch.Dated < '20180404'
group by ch.BadgeNum, ch.USERID, cast(ch.Dated as date), ui.Name
) t

sql query for selecting 30 days data with time interval

SQL Query not giving expected answer
SELECT CAST(PR.DateTimeStamp as date) AS PRDate,COUNT(PR.ID) AS PRCount
FROM tbl_Purchase PR
INNER JOIN tbl_PurchaseCategory PTC ON PR.ID = PTC.ID
WHERE PR.DateTimeStamp BETWEEN DATEADD(DAY,-30,'2017-12-07 09:00:00') AND
'2017-12-07 09:00:00' and PR.DepartmentID=1 and PTC.CategoryID=1 group by
CAST(PR.DateTimeStamp as date) order by CAST(PR.DateTimeStamp as date)
i want to select data like
PRDate PRCount
2017-12-07 3 // from 2017-12-08 09:00:00 to 2017-12-07 09:00:00
2017-12-06 31 // from 2017-12-07 09:00:00 to 2017-12-06 09:00:00
2017-12-05 10 // from 2017-12-06 09:00:00 to 2017-12-05 09:00:00
2017-12-04 23
2017-12-03 27
2017-12-02 15
2017-12-01 27
2017-11-30 39
2017-11-29 25
2017-11-28 27
2017-11-27 36
2017-11-26 30
2017-11-25 23
2017-11-24 18
2017-11-23 13
2017-11-22 16
2017-11-21 25
2017-11-20 15
2017-11-19 41
2017-11-18 11
2017-11-17 9
2017-11-16 19
2017-11-15 23
2017-11-14 17
2017-11-13 23
2017-11-12 20
2017-11-11 31
2017-11-10 29
2017-11-09 18
2017-11-08 29
2017-11-07 24
the above query is proving me data
12 to 12 time interval not from 9 to 9
You should subtract 9 hours from the date for the group by.
SELECT
CAST( DATEADD(HOUR,-9, PR.DateTimeStamp) as date) AS PRDate
, COUNT(PR.ID) AS PRCount
FROM tbl_Purchase PR
INNER JOIN tbl_PurchaseCategory PTC ON PR.ID = PTC.ID
WHERE
PR.DateTimeStamp BETWEEN DATEADD(DAY,-30,'2017-12-07 09:00:00') AND '2017-12-07 09:00:00'
AND PR.DepartmentID=1 and PTC.CategoryID=1
group by
CAST(DATEADD(HOUR,-9, PR.DateTimeStamp) as date)
order by
CAST(DATEADD(HOUR,-9, PR.DateTimeStamp) as date)

pandas group By select columns

I work with Cloudera VM 5.2.0 pandas 0.18.0.
I have the following data
adclicksDF = pd.read_csv('/home/cloudera/Eglence/ad-clicks.csv',
parse_dates=['timestamp'],
skipinitialspace=True).assign(adCount=1)
adclicksDF.head(n=5)
Out[65]:
timestamp txId userSessionId teamId userId adId adCategory \
0 2016-05-26 15:13:22 5974 5809 27 611 2 electronics
1 2016-05-26 15:17:24 5976 5705 18 1874 21 movies
2 2016-05-26 15:22:52 5978 5791 53 2139 25 computers
3 2016-05-26 15:22:57 5973 5756 63 212 10 fashion
4 2016-05-26 15:22:58 5980 5920 9 1027 20 clothing
adCount
0 1
1 1
2 1
3 1
4 1
I want to do a group by for the field timestamp
adCategoryclicks = adclicksDF[['timestamp','adId','adCategory','userId','adCount']]
agrupadoDF = adCategoryclicks.groupby(pd.Grouper(key='timestamp', freq='1H'))['adCount'].agg(['count','sum'])
agrupadoDF.head(n=5)
Out[68]:
count sum
timestamp
2016-05-26 15:00:00 14 14
2016-05-26 16:00:00 24 24
2016-05-26 17:00:00 13 13
2016-05-26 18:00:00 16 16
2016-05-26 19:00:00 16 16
I want to add to agrupado more columns adCategory, idUser .
How can I do this?
There is multiple values in userId and adCategory for each group, so aggreagate by join:
In this sample last two datetime are changed for better output
print (adclicksDF)
timestamp txId userSessionId teamId userId adId adCategory \
0 2016-05-26 15:13:22 5974 5809 27 611 2 electronics
1 2016-05-26 15:17:24 5976 5705 18 1874 21 movies
2 2016-05-26 15:22:52 5978 5791 53 2139 25 computers
3 2016-05-26 16:22:57 5973 5756 63 212 10 fashion
4 2016-05-26 16:22:58 5980 5920 9 1027 20 clothing
adCount
0 1
1 1
2 1
3 1
4 1
#cast int to string
adclicksDF['userId'] = adclicksDF['userId'].astype(str)
adCategoryclicks = adclicksDF[['timestamp','adId','adCategory','userId','adCount']]
agrupadoDF = adCategoryclicks.groupby(pd.Grouper(key='timestamp', freq='1H'))
.agg({'adCount': ['count','sum'],
'userId': ', '.join,
'adCategory': ', '.join})
agrupadoDF.columns = ['adCategory','count','sum','userId']
print (agrupadoDF)
adCategory count sum \
timestamp
2016-05-26 15:00:00 electronics, movies, computers 3 3
2016-05-26 16:00:00 fashion, clothing 2 2
userId
timestamp
2016-05-26 15:00:00 611, 1874, 2139
2016-05-26 16:00:00 212, 1027