Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed last month.
Improve this question
I have the following table ticks
datetime
lowest_tick
tick_lower
2022-10-01 00:02:00
204406
204306
2022-10-01 00:03:00
204395
204295
2022-10-01 00:04:00
204487
204387
2022-10-01 00:05:00
204200
204100
2022-10-01 00:06:00
204220
204120
2022-10-01 00:07:00
204120
204020
What I want to get is to show the first value tick_lower_position for tick_lower when tick_lower <= lowest_tick
So the resulting table should look like this
datetime
lowest_tick
tick_lower
tick_lower_position
2022-10-01 00:02:00
204406
204306
204306
2022-10-01 00:03:00
204395
204295
204306
2022-10-01 00:04:00
204487
204387
204306
2022-10-01 00:05:00
204200
204100
204100
2022-10-01 00:06:00
204220
204120
204100
2022-10-01 00:07:00
204120
204020
204100
So far, I have tried to apply the lag function but cannot figure out how to use lag function with the desired condition.
You don't mention the database you are using so I'll assume it's PostgreSQL. You can do:
select y.*, first_value(tick_lower)
over(partition by g order by datetime) as tick_lower_position
from (
select x.*, sum(i) over(order by datetime) as g
from (
select t.*, case when lowest_tick <
lag(tick_lower) over(order by datetime)
then 1 else 0 end as i
from t
) x
) y
Result:
datetime lowest_tick tick_lower i g tick_lower_position
-------------------- ------------ ----------- -- -- -------------------
2022-10-01 00:02:00 204406 204306 0 0 204306
2022-10-01 00:03:00 204395 204295 0 0 204306
2022-10-01 00:04:00 204487 204387 0 0 204306
2022-10-01 00:05:00 204200 204100 1 1 204100
2022-10-01 00:06:00 204220 204120 0 1 204100
2022-10-01 00:07:00 204120 204020 0 1 204100
See running example at db<>fiddle.
Related
I tried countless answers to similar problems here on SO but couldn't find anything that works for this scenario. It's driving me nuts.
I have these two Dataframes:
df_op:
index
Date
Close
Name
LogRet
0
2022-11-29 00:00:00
240.33
MSFT
-0.0059
1
2022-11-29 00:00:00
280.57
QQQ
-0.0076
2
2022-12-13 00:00:00
342.46
ADBE
0.0126
3
2022-12-13 00:00:00
256.92
MSFT
0.0173
df_quotes:
index
Date
Close
Name
72
2022-11-29 00:00:00
141.17
AAPL
196
2022-11-29 00:00:00
240.33
MSFT
73
2022-11-30 00:00:00
148.03
AAPL
197
2022-11-30 00:00:00
255.14
MSFT
11
2022-11-30 00:00:00
293.36
QQQ
136
2022-12-01 00:00:00
344.11
ADBE
198
2022-12-01 00:00:00
254.69
MSFT
12
2022-12-02 00:00:00
293.72
QQQ
I would like to add a column to df_op that indicates the close of the stock in df_quotes 2 days later. For example, the first row of df_op should become:
index
Date
Close
Name
LogRet
Next
0
2022-11-29 00:00:00
240.33
MSFT
-0.0059
254.69
In other words:
for each row in df_op, find the corresponding Name in df_quotes with Date of 2 days later and copy its Close to df_op in column 'Next'.
I tried tens of combinations like this without success:
df_quotes[df_quotes['Date'].isin(df_op['Date'] + pd.DateOffset(days=2)) & df_quotes['Name'].isin(df_op['Name'])]
How can I do this without recurring to loops?
Try this:
#first convert to datetime
df_op['Date'] = pd.to_datetime(df_op['Date'])
df_quotes['Date'] = pd.to_datetime(df_quotes['Date'])
#merge on Date and Name, but the date is offset 2 business days
(pd.merge(df_op,
df_quotes[['Date','Close','Name']].rename({'Close':'Next'},axis=1),
left_on=['Date','Name'],
right_on=[df_quotes['Date'] - pd.tseries.offsets.BDay(2),'Name'],
how = 'left')
.drop(['Date_x','Date_y'],axis=1))
Output:
Date index Close Name LogRet Next
0 2022-11-29 0 240.33 MSFT -0.0059 254.69
1 2022-11-29 1 280.57 QQQ -0.0076 NaN
2 2022-12-13 2 342.46 ADBE 0.0126 NaN
3 2022-12-13 3 256.92 MSFT 0.0173 NaN
I need to count the number of campaigns per day based on the start and end dates of the campaigns
Input Table:
Campaign name
Start date
End date
Campaign A
2022-07-10
2022-09-25
Campaign B
2022-08-06
2022-10-07
Campaign C
2022-07-30
2022-09-10
Campaign D
2022-08-26
2022-10-24
Campaign E
2022-07-17
2022-09-29
Campaign F
2022-08-24
2022-09-12
Campaign G
2022-08-11
2022-10-24
Campaign H
2022-08-26
2022-11-22
Campaign I
2022-08-29
2022-09-25
Campaign J
2022-08-21
2022-11-15
Campaign K
2022-07-20
2022-09-18
Campaign L
2022-07-31
2022-11-20
Campaign M
2022-08-17
2022-10-10
Campaign N
2022-07-27
2022-09-07
Campaign O
2022-07-29
2022-09-26
Campaign P
2022-07-06
2022-09-15
Campaign Q
2022-07-16
2022-09-22
Out needed (result):
Date
Count unique campaigns
2022-07-02
17
2022-07-03
47
2022-07-04
5
2022-07-05
5
2022-07-06
25
2022-07-07
27
2022-07-08
17
2022-07-09
58
2022-07-10
23
2022-07-11
53
2022-07-12
18
2022-07-13
29
2022-07-14
52
2022-07-15
7
2022-07-16
17
2022-07-17
37
2022-07-18
33
How do I need to write the SQL command to get the above result? thanks all
In the following solutions we leverage string_split with combination with replicate to generate new records.
select dt as date
,count(*) as Count_unique_campaigns
from
(
select *
,dateadd(day, row_number() over(partition by Campaign_name order by (select null))-1, Start_date) as dt
from (
select *
from t
outer apply string_split(replicate(',',datediff(day, Start_date, End_date)),',')
) t
) t
group by dt
order by dt
date
Count_unique_campaigns
2022-07-06
1
2022-07-07
1
2022-07-08
1
2022-07-09
1
2022-07-10
2
2022-07-11
2
2022-07-12
2
2022-07-13
2
2022-07-14
2
2022-07-15
2
2022-07-16
3
2022-07-17
4
2022-07-18
4
2022-07-19
4
2022-07-20
5
2022-07-21
5
2022-07-22
5
2022-07-23
5
2022-07-24
5
2022-07-25
5
2022-07-26
5
2022-07-27
6
2022-07-28
6
2022-07-29
7
2022-07-30
8
2022-07-31
9
2022-08-01
9
2022-08-02
9
2022-08-03
9
2022-08-04
9
2022-08-05
9
2022-08-06
10
2022-08-07
10
2022-08-08
10
2022-08-09
10
2022-08-10
10
2022-08-11
11
2022-08-12
11
2022-08-13
11
2022-08-14
11
2022-08-15
11
2022-08-16
11
2022-08-17
12
2022-08-18
12
2022-08-19
12
2022-08-20
12
2022-08-21
13
2022-08-22
13
2022-08-23
13
2022-08-24
14
2022-08-25
14
2022-08-26
16
2022-08-27
16
2022-08-28
16
2022-08-29
17
2022-08-30
17
2022-08-31
17
2022-09-01
17
2022-09-02
17
2022-09-03
17
2022-09-04
17
2022-09-05
17
2022-09-06
17
2022-09-07
17
2022-09-08
16
2022-09-09
16
2022-09-10
16
2022-09-11
15
2022-09-12
15
2022-09-13
14
2022-09-14
14
2022-09-15
14
2022-09-16
13
2022-09-17
13
2022-09-18
13
2022-09-19
12
2022-09-20
12
2022-09-21
12
2022-09-22
12
2022-09-23
11
2022-09-24
11
2022-09-25
11
2022-09-26
9
2022-09-27
8
2022-09-28
8
2022-09-29
8
2022-09-30
7
2022-10-01
7
2022-10-02
7
2022-10-03
7
2022-10-04
7
2022-10-05
7
2022-10-06
7
2022-10-07
7
2022-10-08
6
2022-10-09
6
2022-10-10
6
2022-10-11
5
2022-10-12
5
2022-10-13
5
2022-10-14
5
2022-10-15
5
2022-10-16
5
2022-10-17
5
2022-10-18
5
2022-10-19
5
2022-10-20
5
2022-10-21
5
2022-10-22
5
2022-10-23
5
2022-10-24
5
2022-10-25
3
2022-10-26
3
2022-10-27
3
2022-10-28
3
2022-10-29
3
2022-10-30
3
2022-10-31
3
2022-11-01
3
2022-11-02
3
2022-11-03
3
2022-11-04
3
2022-11-05
3
2022-11-06
3
2022-11-07
3
2022-11-08
3
2022-11-09
3
2022-11-10
3
2022-11-11
3
2022-11-12
3
2022-11-13
3
2022-11-14
3
2022-11-15
3
2022-11-16
2
2022-11-17
2
2022-11-18
2
2022-11-19
2
2022-11-20
2
2022-11-21
1
2022-11-22
1
For SQL in Azure and SQL Server 2022 we have a cleaner solution based on [ordinal][4].
"The enable_ordinal argument and ordinal output column are currently
supported in Azure SQL Database, Azure SQL Managed Instance, and Azure
Synapse Analytics (serverless SQL pool only). Beginning with SQL
Server 2022 (16.x) Preview, the argument and output column are
available in SQL Server."
select dt as date
,count(*) as Count_unique_campaigns
from
(
select *
,dateadd(day, ordinal-1, Start_date) as dt
from (
select *
from t
outer apply string_split(replicate(',',datediff(day, Start_date, End_date)),',', 1)
) t
) t
group by dt
order by dt
Fiddle
Your sample data doesn't seem to match your desired results, but I think what you're after is this:
DECLARE #Start date, #End date;
-- first, find the earliest and last date:
SELECT #Start = MIN([Start date]), #End = MAX([End date])
FROM dbo.Campaigns;
-- now use a recursive CTE to build a date range,
-- and count the number of campaigns that have a row
-- where the campaign was active on that date:
WITH d(d) AS
(
SELECT #Start
UNION ALL
SELECT DATEADD(DAY, 1, d) FROM d WHERE d < #End
)
SELECT
[Date] = d,
[Count unique campaigns] = COUNT(*)
FROM d
INNER JOIN dbo.Campaigns AS c
ON d.d >= c.[Start date] AND d.d <= c.[End date]
GROUP BY d.d OPTION (MAXRECURSION 32767);
Working example in this fiddle.
I am trying to do a case statement within the where clause in snowflake but I’m not quite sure how should I go about doing it.
What I’m trying to do is, if my current month is Jan, then the where clause for date is between start of previous year and today. If not, the where clause for date would be between start of current year and today.
WHERE
CASE MONTH(CURRENT_DATE()) = 1 THEN DATE BETWEEN DATE_TRUNC(‘YEAR’, DATEADD(YEAR, -1, CURRENT_DATE())) AND CURRENT_DATE()
CASE MONTH(CURRENT_DATE()) != 1 THEN DATE BETWEEN DATE_TRUNC(‘YEAR’, CURRENT_DATE()) AND CURRENT_DATE()
END
Appreciate any help on this!
Use a CASE expression that returns -1 if the current month is January or 0 for any other month, so that you can get with DATEADD() a date of the previous or the current year to use in DATE_TRUNC():
WHERE DATE BETWEEN
DATE_TRUNC('YEAR', DATEADD(YEAR, CASE WHEN MONTH(CURRENT_DATE()) = 1 THEN -1 ELSE 0 END, CURRENT_DATE()))
AND
CURRENT_DATE()
I suspect that you don't even need to use CASE here:
WHERE
(MONTH(CURRENT_DATE()) = 1 AND
DATE BETWEEN DATE_TRUNC(‘YEAR’, DATEADD(YEAR, -1, CURRENT_DATE())) AND
CURRENT_DATE()) OR
(MONTH(CURRENT_DATE()) != 1 AND
DATE BETWEEN DATE_TRUNC(‘YEAR’, CURRENT_DATE()) AND CURRENT_DATE())
So the other answers are quite good, but... the answer can be even simpler
Making a little table to brake down what is happening.
select
row_number() over (order by null) - 1 as rn,
dateadd('day', rn * 5, date_trunc('year',current_date())) as pretend_current_date,
DATEADD(YEAR, -1, pretend_current_date) as pcd_sub1,
month(pretend_current_date) as pcd_month,
DATE_TRUNC(year, iff(pcd_month = 1, pcd_sub1, pretend_current_date)) as _from,
pretend_current_date as _to
from table(generator(ROWCOUNT => 30))
order by rn;
this shows:
RN
PRETEND_CURRENT_DATE
PCD_SUB1
PCD_MONTH
_FROM
_TO
0
2022-01-01
2021-01-01
1
2021-01-01
2022-01-01
1
2022-01-06
2021-01-06
1
2021-01-01
2022-01-06
2
2022-01-11
2021-01-11
1
2021-01-01
2022-01-11
3
2022-01-16
2021-01-16
1
2021-01-01
2022-01-16
4
2022-01-21
2021-01-21
1
2021-01-01
2022-01-21
5
2022-01-26
2021-01-26
1
2021-01-01
2022-01-26
6
2022-01-31
2021-01-31
1
2021-01-01
2022-01-31
7
2022-02-05
2021-02-05
2
2022-01-01
2022-02-05
8
2022-02-10
2021-02-10
2
2022-01-01
2022-02-10
9
2022-02-15
2021-02-15
2
2022-01-01
2022-02-15
10
2022-02-20
2021-02-20
2
2022-01-01
2022-02-20
11
2022-02-25
2021-02-25
2
2022-01-01
2022-02-25
12
2022-03-02
2021-03-02
3
2022-01-01
2022-03-02
13
2022-03-07
2021-03-07
3
2022-01-01
2022-03-07
14
2022-03-12
2021-03-12
3
2022-01-01
2022-03-12
15
2022-03-17
2021-03-17
3
2022-01-01
2022-03-17
16
2022-03-22
2021-03-22
3
2022-01-01
2022-03-22
17
2022-03-27
2021-03-27
3
2022-01-01
2022-03-27
18
2022-04-01
2021-04-01
4
2022-01-01
2022-04-01
19
2022-04-06
2021-04-06
4
2022-01-01
2022-04-06
20
2022-04-11
2021-04-11
4
2022-01-01
2022-04-11
21
2022-04-16
2021-04-16
4
2022-01-01
2022-04-16
22
2022-04-21
2021-04-21
4
2022-01-01
2022-04-21
23
2022-04-26
2021-04-26
4
2022-01-01
2022-04-26
24
2022-05-01
2021-05-01
5
2022-01-01
2022-05-01
25
2022-05-06
2021-05-06
5
2022-01-01
2022-05-06
26
2022-05-11
2021-05-11
5
2022-01-01
2022-05-11
27
2022-05-16
2021-05-16
5
2022-01-01
2022-05-16
28
2022-05-21
2021-05-21
5
2022-01-01
2022-05-21
29
2022-05-26
2021-05-26
5
2022-01-01
2022-05-26
Your logic is asking "is the current date in the month of January", at which point take the prior year, and then date truncate to the year, otherwise take the current date and truncate to the year. As the start of a BETWEEN test.
This is the same as getting the current date subtracting one month, and truncating this to year.
Thus there is no need for any IFF or CASE
WHERE date BETWEEN DATE_TRUNC(year, DATEADD(month,-1, CURRENT_DATE())) AND CURRENT_DATE()
and if you like to drop some paren's, CURRENT_DATE can be used if you leave it in upper case, thus it can even be smaller:
WHERE date BETWEEN DATE_TRUNC(year, DATEADD(month,-1, CURRENT_DATE)) AND CURRENT_DATE
Given are two series, like this:
#period1
DATE
2020-06-22 310.62
2020-06-26 300.05
2020-09-23 322.64
2020-10-30 326.54
#period2
DATE
2020-06-23 312.05
2020-09-02 357.70
2020-10-12 352.43
2021-01-25 384.39
These two series are correlated to each other, i.e. they each mark either the beginning or the end of a date period. The first series marks the end of a period1 period, the second series marks the end of period2 period. The end of a period2 period is at the same time also the start of a period1 period, and vice versa.
I've been looking for a way to aggregate these periods as date ranges, but apparently this is not easily possible with Pandas dataframes. Suggestions extremely welcome.
In the easiest case, the output layout should reflect the end dates of periods, which period type it was, and the amount of change between start and stop of the period.
Explicit output:
DATE CHG PERIOD
2020-06-22 NaN 1
2020-06-23 1.43 2
2020-06-26 12.0 1
2020-09-02 57.65 2
2020-09-23 35.06 1
2020-10-12 29.79 2
2020-10-30 25.89 1
2021-01-25 57.85 2
However, if there is any possibility of actually grouping by a date range consisting of start AND stop date, that would be much more favorable
Thank you!
p1 = pd.DataFrame(data={'Date': ['2020-06-22', '2020-06-26', '2020-09-23', '2020-10-30'], 'val':[310.62, 300.05, 322.64, 326.54]})
p2 = pd.DataFrame(data={'Date': ['2020-06-23', '2020-09-02', '2020-10-12', '2021-01-25'], 'val':[312.05, 357.7, 352.43, 384.39]})
p1['period'] = 1
p2['period'] = 2
df = p1.append(p2).sort_values('Date').reset_index(drop=True)
df['CHG'] = abs(df['val'].diff(periods=1))
df.drop('val', axis=1)
Output:
Date period CHG
0 2020-06-22 1 NaN
1 2020-06-23 2 1.43
2 2020-06-26 1 12.00
3 2020-09-02 2 57.65
4 2020-09-23 1 35.06
5 2020-10-12 2 29.79
6 2020-10-30 1 25.89
7 2021-01-25 2 57.85
EDIT: matching the format START - STOP - CHANGE - PERIOD
Starting from the above data frame:
df['Start'] = df.Date.shift(periods=1)
df.rename(columns={'Date': 'Stop'}, inplace=True)
df = df1[['Start', 'Stop', 'CHG', 'period']]
df
Output:
Start Stop CHG period
0 NaN 2020-06-22 NaN 1
1 2020-06-22 2020-06-23 1.43 2
2 2020-06-23 2020-06-26 12.00 1
3 2020-06-26 2020-09-02 57.65 2
4 2020-09-02 2020-09-23 35.06 1
5 2020-09-23 2020-10-12 29.79 2
6 2020-10-12 2020-10-30 25.89 1
7 2020-10-30 2021-01-25 57.85 2
# If needed:
df1.index = pd.to_datetime(df1.index)
df2.index = pd.to_datetime(df2.index)
df = pd.concat([df1, df2], axis=1)
df.columns = ['start','stop']
df['CNG'] = df.bfill(axis=1)['start'].diff().abs()
df['PERIOD'] = 1
df.loc[df.stop.notna(), 'PERIOD'] = 2
df = df[['CNG', 'PERIOD']]
print(df)
Output:
CNG PERIOD
Date
2020-06-22 NaN 1
2020-06-23 1.43 2
2020-06-26 12.00 1
2020-09-02 57.65 2
2020-09-23 35.06 1
2020-10-12 29.79 2
2020-10-30 25.89 1
2021-01-25 57.85 2
2021-01-29 14.32 1
2021-02-12 22.57 2
2021-03-04 15.94 1
2021-05-07 45.42 2
2021-05-12 16.71 1
2021-09-02 47.78 2
2021-10-04 24.55 1
2021-11-18 41.09 2
2021-12-01 19.23 1
2021-12-10 20.24 2
2021-12-20 15.76 1
2022-01-03 22.73 2
2022-01-27 46.47 1
2022-02-09 26.30 2
2022-02-23 35.59 1
2022-03-02 15.94 2
2022-03-08 21.64 1
2022-03-29 45.30 2
2022-04-29 49.55 1
2022-05-04 17.06 2
2022-05-12 36.72 1
2022-05-17 15.98 2
2022-05-19 18.86 1
2022-06-02 27.93 2
2022-06-17 51.53 1
df.head():
start_date end_date
0 03.09.2013 03.09.2025
1 09.08.2019 14.05.2020
2 03.08.2015 03.08.2019
3 31.03.2014 31.03.2019
4 02.02.2015 02.02.2019
5 21.08.2019 21.08.2024
when I do df.tail():
start_date end_date
30373 2019-07-05 00:00:00 2023-07-05 00:00:00
30374 2019-06-11 00:00:00 2023-06-11 00:00:00
30375 19.01.2017 2020-02-09 00:00:00 #these 2 start dates are just same as in head
30376 11.12.2009 2011-12-11 00:00:00
30377 2019-07-30 00:00:00 2023-07-30 00:00:00
when i do
df[start_date] = pd.to_datetime(df[start_date])
some dates have month converted as days.
The format is inconsistent through the column. How to convert properly?
Use dayfirst=True parameter:
df['start_date'] = pd.to_datetime(df['start_date'], dayfirst=True)
Or specify format by http://strftime.org/:
df['start_date'] = pd.to_datetime(df['start_date'], format='%d.%m.%Y')
df['start_date'] = pd.to_datetime(df['start_date'], dayfirst=True)
df['end_date'] = pd.to_datetime(df['end_date'], dayfirst=True)
print (df)
start_date end_date
0 2013-09-03 2025-09-03
1 2019-08-09 2020-05-14
2 2015-08-03 2019-08-03
3 2014-03-31 2019-03-31
4 2015-02-02 2019-02-02
5 2019-08-21 2024-08-21