I have time series data with other fields.
Now I want create more columns like
valueonsamehour1daybefore,valueonsamehour2daybefore,
valueonsamehour3daybefore,valueonsamehour1weekbefore,
valueonsamehour1monthbefore
If values are not present at the hour then value should be set as zero
dataframe can be loaded from here
url = 'https://drive.google.com/file/d/1BXvJqKGLwG4hqWJvh9gPAHqCbCcCKkUT/view?usp=sharing'
path = 'https://drive.google.com/uc? export=download&id='+url.split('/')[-2]
df = pd.read_csv(path,index_col=0,delimiter=",")
The DataFrame looks like the following:
| time | StartCity | District | Id | stype | EndCity | Count
2021-09-15 09:00:00 1 104 2713 21 9 2
2021-05-16 11:00:00 1 107 1044 11 6 1
2021-05-16 12:00:00 1 107 1044 11 6 0
2021-05-16 13:00:00 1 107 1044 11 6 0
2021-05-16 14:00:00 1 107 1044 11 6 0
2021-05-16 15:00:00 1 107 1044 11 6 0
2021-05-16 16:00:00 1 107 1044 11 6 0
2021-05-16 17:00:00 1 107 1044 11 6 0
2021-05-16 18:00:00 1 107 1044 11 6 0
2021-05-16 19:00:00 1 107 1044 11 6 0
2021-05-16 20:00:00 1 107 1044 11 6 0
2021-05-16 21:00:00 1 107 1044 11 6 0
2021-05-16 22:00:00 1 107 1044 11 6 0
2021-05-16 23:00:00 1 107 1044 11 6 0
2021-05-17 00:00:00 1 107 1044 11 6 0
2021-05-17 01:00:00 1 107 1044 11 6 0
2021-05-17 02:00:00 1 107 1044 11 6 0
2021-05-17 03:00:00 1 107 1044 11 6 0
2021-05-17 04:00:00 1 107 1044 11 6 0
2021-05-17 05:00:00 1 107 1044 11 6 0
2021-05-17 06:00:00 1 107 1044 11 6 0
2021-05-17 07:00:00 1 107 1044 11 6 0
2021-05-17 08:00:00 1 107 1044 11 6 0
2021-05-17 09:00:00 1 107 1044 11 6 0
2021-05-17 10:00:00 1 107 1044 11 6 0
2021-05-17 11:00:00 1 107 1044 11 6 0
What I have is below.
DOG
Date
Steps
Tiger
2021-11-01
164
Oakley
2021-11-01
76
Piper
2021-11-01
65
Millie
2021-11-01
188
Oscar
2021-11-02
152
Foster
2021-11-02
191
Zeus
2021-11-02
101
Benji
2021-11-02
94
Lucy
2021-11-02
186
Rufus
2021-11-02
65
Hank
2021-11-03
98
Olive
2021-11-03
122
Ellie
2021-11-03
153
Thor
2021-11-03
152
Nala
2021-11-03
181
Mia
2021-11-03
48
Bella
2021-11-03
23
Izzy
2021-11-03
135
Pepper
2021-11-03
22
Diesel
2021-11-04
111
Dixie
2021-11-04
34
Emma
2021-11-04
56
Abbie
2021-11-04
32
Guinness
2021-11-04
166
Kobe
2021-11-04
71
What I want is below. Rank by value of ['Steps'] column for each Date
DOG
Date
Steps
Rank
Tiger
2021-11-01
164
2
Oakley
2021-11-01
76
3
Piper
2021-11-01
65
4
Millie
2021-11-01
188
1
Oscar
2021-11-02
152
3
Foster
2021-11-02
191
1
Zeus
2021-11-02
101
4
Benji
2021-11-02
94
5
Lucy
2021-11-02
186
2
Rufus
2021-11-02
65
6
Hank
2021-11-03
98
6
Olive
2021-11-03
122
5
Ellie
2021-11-03
153
2
Thor
2021-11-03
152
3
Nala
2021-11-03
181
1
Mia
2021-11-03
48
7
Bella
2021-11-03
23
8
Izzy
2021-11-03
135
4
Pepper
2021-11-03
22
9
Diesel
2021-11-04
111
2
Dixie
2021-11-04
34
5
Emma
2021-11-04
56
4
Abbie
2021-11-04
32
6
Guinness
2021-11-04
166
1
Kobe
2021-11-04
71
3
I tried below, but it failed.
df['Rank'] = df.groupby('Date')['Steps'].rank(ascending=False)
First your solution for me working.
Maybe need method='dense' and casting to integers.
df['Rank'] = df.groupby('Date')['Steps'].rank(ascending=False, method='dense').astype(int)
Here's my sample data
shop_code product_id doc_date ship_count mark_1 outputer y
-----------------------------------------------------------------------------
1 00664НСК 11628 2015-01-03 00:00:00.000 12 1 8 1
2 00664НСК 11628 2015-01-05 00:00:00.000 7 1 8 1
3 00664НСК 11628 2015-01-06 00:00:00.000 24 0 8 1
4 00664НСК 11628 2015-01-07 00:00:00.000 18 1 8 1
5 00664НСК 11628 2015-01-08 00:00:00.000 12 1 8 1
6 00664НСК 11628 2015-01-09 00:00:00.000 18 0 8 1
7 00664НСК 11628 2015-01-10 00:00:00.000 6 0 6 1
8 00664НСК 11628 2015-01-11 00:00:00.000 6 1 6 1
9 00664НСК 11628 2015-01-12 00:00:00.000 6 1 6 1
10 00664НСК 11628 2015-01-13 00:00:00.000 18 1 12 0
11 00664НСК 11628 2015-01-14 00:00:00.000 6 1 6 0
12 00664НСК 11628 2015-01-15 00:00:00.000 18 1 12 0
13 00664НСК 11628 2015-01-16 00:00:00.000 12 1 12 1
14 00664НСК 11628 2015-01-17 00:00:00.000 18 1 12 1
15 00664НСК 11628 2015-01-18 00:00:00.000 12 1 12 1
16 00664НСК 11628 2015-01-19 00:00:00.000 10 1 10 0
17 00664НСК 11628 2015-01-20 00:00:00.000 24 1 12 0
18 00664НСК 11628 2015-01-21 00:00:00.000 6 1 6 0
19 00664НСК 11628 2015-01-24 00:00:00.000 6 1 6 0
20 00664НСК 11628 2015-01-25 00:00:00.000 6 0 6 0
21 00664НСК 11628 2015-01-26 00:00:00.000 10 0 10 1
22 00664НСК 11628 2015-01-27 00:00:00.000 6 1 6 0
23 00664НСК 11628 2015-01-28 00:00:00.000 10 1 10 0
24 00664НСК 11628 2015-01-29 00:00:00.000 70 0 12 1
25 00664НСК 11628 2015-01-30 00:00:00.000 100 1 12 1
Similar question I have asked for R and got working solution, but now I want do it using T-SQL.
I need observe such a condition: if y = 1 and mark1 = 1, then the output by mark1=1 must be replaced by the first value that goes for y = 0 and mark1 = 1 in the output variable.
If the first value that goes for Y = 0 and mark1 = 1 in the output is more than the ship_count, then in output left the actual value of ship_count
Zero category of mark1 for output, we don't touch.
This operation must be done by group ship_code+product_id
So the desired output should look like this:
shop_code product_id doc_date ship_count mark_1 outputer y
----------------------------------------------------------------------------
1 00664НСК 11628 2015-01-03 00:00:00.000 12 1 *12 1
2 00664НСК 11628 2015-01-05 00:00:00.000 7 1 *7 1
3 00664НСК 11628 2015-01-06 00:00:00.000 24 0 24 1
4 00664НСК 11628 2015-01-07 00:00:00.000 18 1 *12 1
5 00664НСК 11628 2015-01-08 00:00:00.000 12 1 *12 1
6 00664НСК 11628 2015-01-09 00:00:00.000 18 0 18 1
7 00664НСК 11628 2015-01-10 00:00:00.000 6 0 6 1
8 00664НСК 11628 2015-01-11 00:00:00.000 6 1 6 1
9 00664НСК 11628 2015-01-12 00:00:00.000 6 1 6 1
10 00664НСК 11628 2015-01-13 00:00:00.000 18 1 *12 0
11 00664НСК 11628 2015-01-14 00:00:00.000 6 1 6 0
12 00664НСК 11628 2015-01-15 00:00:00.000 18 1 12 0
13 00664НСК 11628 2015-01-16 00:00:00.000 12 1 *10 1
14 00664НСК 11628 2015-01-17 00:00:00.000 18 1 *10 1
15 00664НСК 11628 2015-01-18 00:00:00.000 12 1 *10 1
16 00664НСК 11628 2015-01-19 00:00:00.000 10 1 10 0
17 00664НСК 11628 2015-01-20 00:00:00.000 24 1 12 0
18 00664НСК 11628 2015-01-21 00:00:00.000 6 1 6 0
19 00664НСК 11628 2015-01-24 00:00:00.000 6 1 6 0
20 00664НСК 11628 2015-01-25 00:00:00.000 6 0 6 0
21 00664НСК 11628 2015-01-26 00:00:00.000 10 0 10 1
22 00664НСК 11628 2015-01-27 00:00:00.000 6 1 6 1
23 00664НСК 11628 2015-01-28 00:00:00.000 20 1 *12 0
24 00664НСК 11628 2015-01-29 00:00:00.000 70 1 12 0
25 00664НСК 11628 2015-01-30 00:00:00.000 100 1 12 1
Good evening,
You should use a case statement to do your job.
For finding the first value for the describing clauses , use a subquery in which you keep the order that you wish(order by) and select the top 1 value.
Give a try and if you face some issues ask again.
so here is a question.
I have a table FacebookInfo with a column name.
Another table is FacebookPost with a column created_time and foriegn key as facebookinfoid mapped to FacebookInfo Id column
So basically FacebookInfo has a record of Facebook Pages and FacebookPost is the posts of those facebook pages.
What I want to find out is how frequently the pages are posting on Facebook, so I want to find out the average posts per day, the difference in hours between those posts, average time on first post of a day and average time on last post of the day.
Thanks for help.
Here is some sample data
FacebookInfo
id name
3 Qatar Airways
4 KLM Royal Dutch Airlines
5 LATAM Airlines
6 Southwest Airlines
FacebookPost
id facebookinfoid created_time
777 3 2016-12-06 12:54:31.000
778 3 2016-12-05 09:54:09.000
779 3 2016-12-02 12:40:46.000
780 3 2016-12-01 13:00:00.000
781 3 2016-11-30 11:29:53.000
782 3 2016-11-30 09:00:00.000
783 3 2016-11-29 10:09:45.000
784 3 2016-11-28 14:00:00.000
785 3 2016-11-27 11:21:11.000
786 3 2016-11-26 12:00:01.000
787 3 2016-11-25 11:58:55.000
788 3 2016-11-24 10:28:19.000
789 3 2016-11-23 16:20:29.000
790 3 2016-11-23 11:19:42.000
791 3 2016-11-21 12:03:07.000
792 3 2016-11-18 13:36:41.000
793 3 2016-11-17 11:08:41.000
794 3 2016-11-16 12:01:00.000
795 3 2016-11-15 13:39:06.000
796 3 2016-11-11 15:11:56.000
1454 4 2016-12-06 15:00:22.000
1455 4 2016-12-05 14:59:04.000
1456 4 2016-12-05 09:00:07.000
1457 4 2016-12-04 15:00:07.000
1458 4 2016-12-03 10:00:08.000
1459 4 2016-12-02 15:00:15.000
1460 4 2016-12-01 14:00:00.000
1461 4 2016-11-30 13:30:24.000
1462 4 2016-11-29 15:00:07.000
1463 4 2016-11-28 15:00:19.000
1464 4 2016-11-28 09:00:09.000
1465 4 2016-11-26 10:00:06.000
1466 4 2016-11-24 15:00:04.000
1467 4 2016-11-23 09:00:09.000
1468 4 2016-11-22 15:01:04.000
1469 4 2016-11-21 15:00:07.000
1470 4 2016-11-21 05:00:10.000
1471 4 2016-11-19 10:00:07.000
1472 4 2016-11-18 09:00:10.000
1473 4 2016-11-17 15:00:01.000
2454 5 2016-12-05 16:00:01.000
2455 5 2016-12-02 16:02:37.000
2456 5 2016-12-01 16:00:09.000
2457 5 2016-11-30 16:00:48.000
2458 5 2016-11-29 16:01:34.000
2459 5 2016-11-28 16:00:00.000
2460 5 2016-11-25 16:00:01.000
2461 5 2016-11-23 16:00:00.000
2462 5 2016-11-22 16:00:00.000
2463 5 2016-11-21 16:00:00.000
2464 5 2016-11-19 16:00:03.000
2465 5 2016-11-18 16:00:00.000
2466 5 2016-11-17 16:00:01.000
2467 5 2016-11-16 16:00:03.000
2468 5 2016-11-15 16:00:01.000
2469 5 2016-11-12 16:00:00.000
2470 5 2016-11-11 16:00:00.000
2471 5 2016-11-10 16:00:01.000
2472 5 2016-11-09 16:00:00.000
2473 5 2016-11-08 16:00:02.000
3059 6 2016-12-06 15:14:30.000
3060 6 2016-12-04 21:38:33.000
3061 6 2016-12-03 22:27:40.000
3062 6 2016-12-02 21:29:42.000
3063 6 2016-12-01 23:00:04.000
3064 6 2016-11-30 22:00:02.000
3065 6 2016-11-30 20:28:17.000
3066 6 2016-11-29 17:57:02.000
3067 6 2016-11-28 20:49:59.000
3068 6 2016-11-26 17:10:55.000
3069 6 2016-11-26 12:50:45.000
3070 6 2016-11-25 21:16:31.000
3071 6 2016-11-25 01:27:09.000
3072 6 2016-11-24 15:50:16.000
3073 6 2016-11-23 22:00:01.000
3074 6 2016-11-23 15:10:32.000
3075 6 2016-11-22 21:42:42.000
3076 6 2016-11-22 16:29:28.000
3077 6 2016-11-22 03:03:21.000
3078 6 2016-11-22 01:45:41.000
I am trying to get a count of rows with incremental dates.
My table looks like this:
ID name status create_date
1 John AC 2016-01-01 00:00:26.513
2 Jane AC 2016-01-02 00:00:26.513
3 Kane AC 2016-01-02 00:00:26.513
4 Carl AC 2016-01-03 00:00:26.513
5 Dave AC 2016-01-04 00:00:26.513
6 Gina AC 2016-01-04 00:00:26.513
Now what I want to return from the SQL is something like this:
Date Count
2016-01-01 1
2016-01-02 3
2016-01-03 4
2016-01-04 6
You can make use of COUNT() OVER () without PARTITION BY,by using ORDER BY. It will give you the cumulative sum.Use DISTINCT to filter out the duplicate values.
SELECT DISTINCT CAST(create_date AS DATE) [Date],
COUNT(create_date) OVER (ORDER BY CAST(create_date AS DATE)) as [COUNT]
FROM [YourTable]
SELECT create_date, COUNT(create_date) as [COUNT]
FROM (
SELECT CAST(create_date AS DATE) create_date
FROM [YourTable]
) T
GROUP BY create_date
Per your description, you need a continuous dates list, Does it make sense?
This sample only generating one-month data.
CREATE TABLE #tt(ID INT, name VARCHAR(10), status VARCHAR(10), create_date DATETIME)
INSERT INTO #tt
SELECT 1,'John','AC','2016-01-01 00:00:26.513' UNION
SELECT 2,'Jane','AC','2016-01-02 00:00:26.513' UNION
SELECT 3,'Kane','AC','2016-01-02 00:00:26.513' UNION
SELECT 4,'Carl','AC','2016-01-03 00:00:26.513' UNION
SELECT 5,'Dave','AC','2016-01-04 00:00:26.513' UNION
SELECT 6,'Gina','AC','2016-01-04 00:00:26.513' UNION
SELECT 7,'Tina','AC','2016-01-08 00:00:26.513'
SELECT * FROM #tt
SELECT CONVERT(DATE,DATEADD(d,sv.number,n.FirstDate)) AS [Date],COUNT(n.num) AS [Count]
FROM master.dbo.spt_values AS sv
LEFT JOIN (
SELECT MIN(t.create_date)OVER() AS FirstDate,DATEDIFF(d,MIN(t.create_date)OVER(),t.create_date) AS num FROM #tt AS t
) AS n ON n.num<=sv.number
WHERE sv.type='P' AND sv.number>=0 AND MONTH(DATEADD(d,sv.number,n.FirstDate))=MONTH(n.FirstDate)
GROUP BY CONVERT(DATE,DATEADD(d,sv.number,n.FirstDate))
Date Count
---------- -----------
2016-01-01 1
2016-01-02 3
2016-01-03 4
2016-01-04 6
2016-01-05 6
2016-01-06 6
2016-01-07 6
2016-01-08 7
2016-01-09 7
2016-01-10 7
2016-01-11 7
2016-01-12 7
2016-01-13 7
2016-01-14 7
2016-01-15 7
2016-01-16 7
2016-01-17 7
2016-01-18 7
2016-01-19 7
2016-01-20 7
2016-01-21 7
2016-01-22 7
2016-01-23 7
2016-01-24 7
2016-01-25 7
2016-01-26 7
2016-01-27 7
2016-01-28 7
2016-01-29 7
2016-01-30 7
2016-01-31 7
2017-01-01 7
2017-01-02 7
2017-01-03 7
2017-01-04 7
2017-01-05 7
2017-01-06 7
2017-01-07 7
2017-01-08 7
2017-01-09 7
2017-01-10 7
2017-01-11 7
2017-01-12 7
2017-01-13 7
2017-01-14 7
2017-01-15 7
2017-01-16 7
2017-01-17 7
2017-01-18 7
2017-01-19 7
2017-01-20 7
2017-01-21 7
2017-01-22 7
2017-01-23 7
2017-01-24 7
2017-01-25 7
2017-01-26 7
2017-01-27 7
2017-01-28 7
2017-01-29 7
2017-01-30 7
2017-01-31 7
2018-01-01 7
2018-01-02 7
2018-01-03 7
2018-01-04 7
2018-01-05 7
2018-01-06 7
2018-01-07 7
2018-01-08 7
2018-01-09 7
2018-01-10 7
2018-01-11 7
2018-01-12 7
2018-01-13 7
2018-01-14 7
2018-01-15 7
2018-01-16 7
2018-01-17 7
2018-01-18 7
2018-01-19 7
2018-01-20 7
2018-01-21 7
2018-01-22 7
2018-01-23 7
2018-01-24 7
2018-01-25 7
2018-01-26 7
2018-01-27 7
2018-01-28 7
2018-01-29 7
2018-01-30 7
2018-01-31 7
2019-01-01 7
2019-01-02 7
2019-01-03 7
2019-01-04 7
2019-01-05 7
2019-01-06 7
2019-01-07 7
2019-01-08 7
2019-01-09 7
2019-01-10 7
2019-01-11 7
2019-01-12 7
2019-01-13 7
2019-01-14 7
2019-01-15 7
2019-01-16 7
2019-01-17 7
2019-01-18 7
2019-01-19 7
2019-01-20 7
2019-01-21 7
2019-01-22 7
2019-01-23 7
2019-01-24 7
2019-01-25 7
2019-01-26 7
2019-01-27 7
2019-01-28 7
2019-01-29 7
2019-01-30 7
2019-01-31 7
2020-01-01 7
2020-01-02 7
2020-01-03 7
2020-01-04 7
2020-01-05 7
2020-01-06 7
2020-01-07 7
2020-01-08 7
2020-01-09 7
2020-01-10 7
2020-01-11 7
2020-01-12 7
2020-01-13 7
2020-01-14 7
2020-01-15 7
2020-01-16 7
2020-01-17 7
2020-01-18 7
2020-01-19 7
2020-01-20 7
2020-01-21 7
2020-01-22 7
2020-01-23 7
2020-01-24 7
2020-01-25 7
2020-01-26 7
2020-01-27 7
2020-01-28 7
2020-01-29 7
2020-01-30 7
2020-01-31 7
2021-01-01 7
2021-01-02 7
2021-01-03 7
2021-01-04 7
2021-01-05 7
2021-01-06 7
2021-01-07 7
2021-01-08 7
2021-01-09 7
2021-01-10 7
2021-01-11 7
2021-01-12 7
2021-01-13 7
2021-01-14 7
2021-01-15 7
2021-01-16 7
2021-01-17 7
2021-01-18 7
2021-01-19 7
2021-01-20 7
2021-01-21 7
2021-01-22 7
2021-01-23 7
2021-01-24 7
2021-01-25 7
2021-01-26 7
2021-01-27 7
2021-01-28 7
2021-01-29 7
2021-01-30 7
2021-01-31 7
select r.date,count(r.date) count
from
(
select id,name,substring(convert(nvarchar(50),create_date),1,10) date
from tblName
) r
group by r.date
In this code, in the subquery part,
I select the first 10 letter of date which is converted from dateTime to nvarchar so I make like '2016-01-01'. (which is not also necessary but for make code more readable I prefer to do it in this way).
Then with a simple group by I have date and date's count.