i have this data in table:
RowID PerID Date Time RowNumber
------------------------------------------------
2393 1856 2015-07-29 00:52:55 1
2408 1856 2015-07-29 19:13:32 2
2394 1864 2015-07-29 00:57:17 1
2399 1864 2015-07-29 11:07:26 2
2403 1864 2015-07-29 15:25:42 3
2406 1864 2015-07-29 19:06:37 4
2395 1877 2015-07-29 01:10:23 1
2407 1877 2015-07-29 19:13:26 2
2409 1881 2015-07-29 19:13:52 1
2391 1882 2015-07-29 00:32:15 1
2396 1882 2015-07-29 11:05:51 2
2397 1882 2015-07-29 11:05:53 3
2398 1882 2015-07-29 11:06:01 4
2401 1882 2015-07-29 15:20:16 5
2404 1882 2015-07-29 19:04:07 6
2392 1883 2015-07-29 00:35:50 1
2400 1883 2015-07-29 11:17:30 2
2402 1883 2015-07-29 15:24:10 3
2405 1883 2015-07-29 19:06:20 4
i want to create this data table from above data:
RowID PerID io_num ioDate InTime OutTime
----------------------------------------------
1 1856 1 2015-07-29 00:52:55 19:13:32
2 1864 1 2015-07-29 00:57:17 11:07:26
3 1864 2 2015-07-29 15:25:42 19:06:37
4 1877 1 2015-07-29 01:10:23 19:13:26
5 1881 1 2015-07-29 19:13:52 null
6 1882 1 2015-07-29 00:32:15 11:05:51
7 1882 2 2015-07-29 11:05:53 11:06:01
8 1882 3 2015-07-29 15:20:16 19:04:07
9 1883 1 2015-07-29 15:24:10 11:17:30
9 1883 2 2015-07-29 00:35:50 19:06:20
please help me
thanks
SQL FIDDLE DEMO
WITH calc_time as (
SELECT
t1.PerID,
t1.Date ioDate,
t1.Time InTime,
t2.Time OutTime
FROM mytable t1 left join
mytable t2 on
t1.PerId = t2.PerID
and t1.RowNumber = t2.RowNumber - 1
WHERE
(t1.RowNumber % 2) = 1
)
SELECT
ROW_NUMBER() OVER(ORDER BY PerID) AS RowID,
c.PerID,
ROW_NUMBER() OVER(Partition BY PerID ORDER BY ioDate, InTime) AS io_num,
c.ioDate,
c.InTime,
c.OutTime
FROM
calc_time c
Related
I need to count the number of campaigns per day based on the start and end dates of the campaigns
Input Table:
Campaign name
Start date
End date
Campaign A
2022-07-10
2022-09-25
Campaign B
2022-08-06
2022-10-07
Campaign C
2022-07-30
2022-09-10
Campaign D
2022-08-26
2022-10-24
Campaign E
2022-07-17
2022-09-29
Campaign F
2022-08-24
2022-09-12
Campaign G
2022-08-11
2022-10-24
Campaign H
2022-08-26
2022-11-22
Campaign I
2022-08-29
2022-09-25
Campaign J
2022-08-21
2022-11-15
Campaign K
2022-07-20
2022-09-18
Campaign L
2022-07-31
2022-11-20
Campaign M
2022-08-17
2022-10-10
Campaign N
2022-07-27
2022-09-07
Campaign O
2022-07-29
2022-09-26
Campaign P
2022-07-06
2022-09-15
Campaign Q
2022-07-16
2022-09-22
Out needed (result):
Date
Count unique campaigns
2022-07-02
17
2022-07-03
47
2022-07-04
5
2022-07-05
5
2022-07-06
25
2022-07-07
27
2022-07-08
17
2022-07-09
58
2022-07-10
23
2022-07-11
53
2022-07-12
18
2022-07-13
29
2022-07-14
52
2022-07-15
7
2022-07-16
17
2022-07-17
37
2022-07-18
33
How do I need to write the SQL command to get the above result? thanks all
In the following solutions we leverage string_split with combination with replicate to generate new records.
select dt as date
,count(*) as Count_unique_campaigns
from
(
select *
,dateadd(day, row_number() over(partition by Campaign_name order by (select null))-1, Start_date) as dt
from (
select *
from t
outer apply string_split(replicate(',',datediff(day, Start_date, End_date)),',')
) t
) t
group by dt
order by dt
date
Count_unique_campaigns
2022-07-06
1
2022-07-07
1
2022-07-08
1
2022-07-09
1
2022-07-10
2
2022-07-11
2
2022-07-12
2
2022-07-13
2
2022-07-14
2
2022-07-15
2
2022-07-16
3
2022-07-17
4
2022-07-18
4
2022-07-19
4
2022-07-20
5
2022-07-21
5
2022-07-22
5
2022-07-23
5
2022-07-24
5
2022-07-25
5
2022-07-26
5
2022-07-27
6
2022-07-28
6
2022-07-29
7
2022-07-30
8
2022-07-31
9
2022-08-01
9
2022-08-02
9
2022-08-03
9
2022-08-04
9
2022-08-05
9
2022-08-06
10
2022-08-07
10
2022-08-08
10
2022-08-09
10
2022-08-10
10
2022-08-11
11
2022-08-12
11
2022-08-13
11
2022-08-14
11
2022-08-15
11
2022-08-16
11
2022-08-17
12
2022-08-18
12
2022-08-19
12
2022-08-20
12
2022-08-21
13
2022-08-22
13
2022-08-23
13
2022-08-24
14
2022-08-25
14
2022-08-26
16
2022-08-27
16
2022-08-28
16
2022-08-29
17
2022-08-30
17
2022-08-31
17
2022-09-01
17
2022-09-02
17
2022-09-03
17
2022-09-04
17
2022-09-05
17
2022-09-06
17
2022-09-07
17
2022-09-08
16
2022-09-09
16
2022-09-10
16
2022-09-11
15
2022-09-12
15
2022-09-13
14
2022-09-14
14
2022-09-15
14
2022-09-16
13
2022-09-17
13
2022-09-18
13
2022-09-19
12
2022-09-20
12
2022-09-21
12
2022-09-22
12
2022-09-23
11
2022-09-24
11
2022-09-25
11
2022-09-26
9
2022-09-27
8
2022-09-28
8
2022-09-29
8
2022-09-30
7
2022-10-01
7
2022-10-02
7
2022-10-03
7
2022-10-04
7
2022-10-05
7
2022-10-06
7
2022-10-07
7
2022-10-08
6
2022-10-09
6
2022-10-10
6
2022-10-11
5
2022-10-12
5
2022-10-13
5
2022-10-14
5
2022-10-15
5
2022-10-16
5
2022-10-17
5
2022-10-18
5
2022-10-19
5
2022-10-20
5
2022-10-21
5
2022-10-22
5
2022-10-23
5
2022-10-24
5
2022-10-25
3
2022-10-26
3
2022-10-27
3
2022-10-28
3
2022-10-29
3
2022-10-30
3
2022-10-31
3
2022-11-01
3
2022-11-02
3
2022-11-03
3
2022-11-04
3
2022-11-05
3
2022-11-06
3
2022-11-07
3
2022-11-08
3
2022-11-09
3
2022-11-10
3
2022-11-11
3
2022-11-12
3
2022-11-13
3
2022-11-14
3
2022-11-15
3
2022-11-16
2
2022-11-17
2
2022-11-18
2
2022-11-19
2
2022-11-20
2
2022-11-21
1
2022-11-22
1
For SQL in Azure and SQL Server 2022 we have a cleaner solution based on [ordinal][4].
"The enable_ordinal argument and ordinal output column are currently
supported in Azure SQL Database, Azure SQL Managed Instance, and Azure
Synapse Analytics (serverless SQL pool only). Beginning with SQL
Server 2022 (16.x) Preview, the argument and output column are
available in SQL Server."
select dt as date
,count(*) as Count_unique_campaigns
from
(
select *
,dateadd(day, ordinal-1, Start_date) as dt
from (
select *
from t
outer apply string_split(replicate(',',datediff(day, Start_date, End_date)),',', 1)
) t
) t
group by dt
order by dt
Fiddle
Your sample data doesn't seem to match your desired results, but I think what you're after is this:
DECLARE #Start date, #End date;
-- first, find the earliest and last date:
SELECT #Start = MIN([Start date]), #End = MAX([End date])
FROM dbo.Campaigns;
-- now use a recursive CTE to build a date range,
-- and count the number of campaigns that have a row
-- where the campaign was active on that date:
WITH d(d) AS
(
SELECT #Start
UNION ALL
SELECT DATEADD(DAY, 1, d) FROM d WHERE d < #End
)
SELECT
[Date] = d,
[Count unique campaigns] = COUNT(*)
FROM d
INNER JOIN dbo.Campaigns AS c
ON d.d >= c.[Start date] AND d.d <= c.[End date]
GROUP BY d.d OPTION (MAXRECURSION 32767);
Working example in this fiddle.
I have three tables.
Order_Status
Order_ID order_status_id Timestamp
1 2 12/24/19 0:00
1 3 12/24/19 0:10
1 4 12/24/19 0:30
1 5 12/24/19 1:00
2 2 12/24/19 15:00
2 3 12/24/19 15:07
2 9 12/24/19 15:10
2 8 12/24/19 15:33
2 10 12/24/19 16:00
4 4 12/24/19 19:00
4 2 12/24/19 19:30
4 3 12/24/19 19:32
4 4 12/24/19 19:40
4 5 12/24/19 19:45
5 2 1/28/19 19:30
5 6 1/28/19 19:48
Contact
Order_id Contact_time
1 12/24/19 0:25
2 12/24/19 15:30
4 12/24/19 19:38
5 1/28/19 19:46
meta_status
order_status_id status_description
1 desc1
2 desc2
3 desc3
4 desc4
5 desc5
I am trying to retrieve the max order Timestamp before min Contact Time. I need it to be group by orderID, I also need the order_status_id and the status_description
This is my query so far
SELECT a.Order_ID,
a.order_status_id,
c.status_description,
MAX(CASE
WHEN a.order_timestamp < b.Contact_Time then
a.order_timestamp
ELSE
null
END) AS beforeContact
FROM Order_Status a
LEFT JOIN Contact b
ON b.Order_ID = a.Order_ID
LEFT JOIN meta_status c
ON c.order_status_id = a.order_status_id
GROUP BY a.Order_ID, a.order_status_id, c.status_description
But it still returns every row in the tables. I need it to be only 4 rows which represent 4 orders 1,2,4,5 and the max order timestamp before contact time.
Do I need to use subquery or windowing function for this?
This is it:
select a.* from (SELECT a.ordertimestamp,
a.order_status_id,
c.status_description,
a.order_timestamp AS beforeContact,
rank() over (partition by a.Order_ID order by a.order_timestamp desc) as
rank1
FROM Order_Status a
LEFT JOIN Contact b
ON b.Order_ID = a.Order_ID
LEFT JOIN meta_status c
ON c.order_status_id = a.order_status_id
where a.order_timestamp < b.Contact_Time
GROUP BY a.Order_ID, a.order_status_id, c.status_description)as a
where rank1=1;
I have a table RESERVED_BOOKINGS_OVERRIDDEN
booking_product_id on_site_from_dt on_site_to_dt venue_id
4 2021-08-07 16:00:00.000 2021-08-14 10:00:00.000 12
4 2021-08-07 16:00:00.000 2021-08-10 10:00:00.000 12
6 2021-08-02 16:00:00.000 2021-08-09 10:00:00.000 12
and another table ALLOCATED_PRODUCTS
Date booking_product_id venue_id ReservedQuant
2021-08-05 00:00:00.000 4 12 3
2021-08-06 00:00:00.000 4 12 3
2021-08-07 00:00:00.000 4 12 3
2021-08-08 00:00:00.000 4 12 3
2021-08-05 00:00:00.000 6 12 1
Now I need to update the ReservedQuant column in the ALLOCATED_PRODUCTS table based on the rows in RESERVED_BOOKINGS_OVERRIDDEN
The ReservedQuant must minus by the amount of rows found where the ALLOCATED_PRODUCTS.Date is within the RESERVED_BOOKINGS_OVERRIDDEN.on_site_from_dt and RESERVED_BOOKINGS_OVERRIDDEN.on_site_to_dt and ALLOCATED_PRODUCTS.booking_product_id = RESERVED_BOOKINGS_OVERRIDDEN.booking_product_id.
This should be the state of the data after the update:
Date booking_product_id venue_id ReservedQuant
2021-08-05 00:00:00.000 4 12 3
2021-08-06 00:00:00.000 4 12 3
2021-08-07 00:00:00.000 4 12 1
2021-08-08 00:00:00.000 4 12 1
2021-08-05 00:00:00.000 6 12 0
update a set a.ReservedQuant=ReservedQuant-(select count(1) from RESERVED_BOOKINGS_OVERRIDDEN b where a.booking_product_id=b.booking_product_id
and a.date between cast(b.on_site_from_dt as date) and cast(b.on_site_to_dt as date))
from ALLOCATED_PRODUCTS a
I want to build a column for my dataframe df['days_since_last'] that shows the days since the last match for each player_id for each event_id and nan if the row is the first match for the player in the dataset.
Example of my data:
event_id player_id match_date
0 1470993 227485 2015-11-29
1 1492031 227485 2016-07-23
2 1489240 227485 2016-06-19
3 1495581 227485 2016-09-02
4 1490222 227485 2016-07-03
5 1469624 227485 2015-11-14
6 1493822 227485 2016-08-13
7 1428946 313444 2014-08-10
8 1483245 313444 2016-05-21
9 1472260 313444 2015-12-13
I tried the code in Find days since last event pandas dataframe but got nonsensical results.
It seems you need sort first:
df['days_since_last_event'] = (df.sort_values(['player_id','match_date'])
.groupby('player_id')['match_date'].diff()
.dt.days)
print (df)
event_id player_id match_date days_since_last_event
0 1470993 227485 2015-11-29 15.0
1 1492031 227485 2016-07-23 20.0
2 1489240 227485 2016-06-19 203.0
3 1495581 227485 2016-09-02 20.0
4 1490222 227485 2016-07-03 14.0
5 1469624 227485 2015-11-14 NaN
6 1493822 227485 2016-08-13 21.0
7 1428946 313444 2014-08-10 NaN
8 1483245 313444 2016-05-21 160.0
9 1472260 313444 2015-12-13 490.0
Demo:
In [174]: df['days_since_last'] = (df.groupby('player_id')['match_date']
.transform(lambda x: (x.max()-x).dt.days))
In [175]: df
Out[175]:
event_id player_id match_date days_since_last
0 1470993 227485 2015-11-29 278
1 1492031 227485 2016-07-23 41
2 1489240 227485 2016-06-19 75
3 1495581 227485 2016-09-02 0
4 1490222 227485 2016-07-03 61
5 1469624 227485 2015-11-14 293
6 1493822 227485 2016-08-13 20
7 1428946 313444 2014-08-10 650
8 1483245 313444 2016-05-21 0
9 1472260 313444 2015-12-13 160
I work with Cloudera VM 5.2.0 pandas 0.18.0.
I have the following data
adclicksDF = pd.read_csv('/home/cloudera/Eglence/ad-clicks.csv',
parse_dates=['timestamp'],
skipinitialspace=True).assign(adCount=1)
adclicksDF.head(n=5)
Out[65]:
timestamp txId userSessionId teamId userId adId adCategory \
0 2016-05-26 15:13:22 5974 5809 27 611 2 electronics
1 2016-05-26 15:17:24 5976 5705 18 1874 21 movies
2 2016-05-26 15:22:52 5978 5791 53 2139 25 computers
3 2016-05-26 15:22:57 5973 5756 63 212 10 fashion
4 2016-05-26 15:22:58 5980 5920 9 1027 20 clothing
adCount
0 1
1 1
2 1
3 1
4 1
I want to do a group by for the field timestamp
adCategoryclicks = adclicksDF[['timestamp','adId','adCategory','userId','adCount']]
agrupadoDF = adCategoryclicks.groupby(pd.Grouper(key='timestamp', freq='1H'))['adCount'].agg(['count','sum'])
agrupadoDF.head(n=5)
Out[68]:
count sum
timestamp
2016-05-26 15:00:00 14 14
2016-05-26 16:00:00 24 24
2016-05-26 17:00:00 13 13
2016-05-26 18:00:00 16 16
2016-05-26 19:00:00 16 16
I want to add to agrupado more columns adCategory, idUser .
How can I do this?
There is multiple values in userId and adCategory for each group, so aggreagate by join:
In this sample last two datetime are changed for better output
print (adclicksDF)
timestamp txId userSessionId teamId userId adId adCategory \
0 2016-05-26 15:13:22 5974 5809 27 611 2 electronics
1 2016-05-26 15:17:24 5976 5705 18 1874 21 movies
2 2016-05-26 15:22:52 5978 5791 53 2139 25 computers
3 2016-05-26 16:22:57 5973 5756 63 212 10 fashion
4 2016-05-26 16:22:58 5980 5920 9 1027 20 clothing
adCount
0 1
1 1
2 1
3 1
4 1
#cast int to string
adclicksDF['userId'] = adclicksDF['userId'].astype(str)
adCategoryclicks = adclicksDF[['timestamp','adId','adCategory','userId','adCount']]
agrupadoDF = adCategoryclicks.groupby(pd.Grouper(key='timestamp', freq='1H'))
.agg({'adCount': ['count','sum'],
'userId': ', '.join,
'adCategory': ', '.join})
agrupadoDF.columns = ['adCategory','count','sum','userId']
print (agrupadoDF)
adCategory count sum \
timestamp
2016-05-26 15:00:00 electronics, movies, computers 3 3
2016-05-26 16:00:00 fashion, clothing 2 2
userId
timestamp
2016-05-26 15:00:00 611, 1874, 2139
2016-05-26 16:00:00 212, 1027