I am trying to merge two dataframes with different time delta. One represents the returns of an asset (df2) on a daily basis and the other one is the inflation rate (df1) which is published once a month but not in a regular inverval. I am trying to merge those two.
df1 =
First Release
Original Release Date
30 Jun 2010 10:01 1.4%
30 Jul 2010 10:00 1.7%
31 Aug 2010 10:00 1.6%
30 Sep 2010 10:00 1.8%
29 Oct 2010 10:02 1.9%
... ...
17 Mar 2022 11:00 5.9%
21 Apr 2022 10:00 7.4%
18 May 2022 10:00 7.4%
17 Jun 2022 10:00 8.1%
19 Jul 2022 10:00 8.6%
[145 rows x 1 columns]
df2 =
Date
2010-08-11 -0.001654
2010-08-12 -0.028538
2010-08-13 0.001072
2010-08-16 -0.007665
2010-08-17 0.002667
...
2022-01-25 0.029663
2022-01-26 0.026082
2022-01-27 -0.000115
2022-01-28 0.002425
2022-01-31 0.007184
Obviously inflation rate should be placed in the new column from the day after it is released until there is a new release. For example 30. June is the first anouncement and 30 Jul the second. So from 1. July to the 30. July should be 1.4 %. The result is published on the 30. but to avoid look-ahead-bias it is more appropriate to have it . Does someone have an idea or maybe encountered some similar problem ?
Related
I'm trying to extract the name of space organisations from a table but the closest i can get is the amount of times it appears next to the name of the organisation but i just want the name of the organisation not the amount of times it is named in the table.
if you can help me please leave a comment on my google colab.
https://colab.research.google.com/drive/1m4zI4YGguQ5aWdDVyc7Bdpr-78KHdxhR?usp=sharing
What I get:
variable number
organisation
time of launch
0
SpaceX
Fri Aug 07, 2020 05:12 UTC
1
CASC
Thu Aug 06, 2020 04:01 UTC
2
SpaceX
Tue Aug 04, 2020 23:57 UTC
3
Roscosmos
Thu Jul 30, 2020 21:25 UTC
4
ULA
Thu Jul 30, 2020 11:50 UTC
...
...
...
4319
US Navy
Wed Feb 05, 1958 07:33 UTC
4320
AMBA
Sat Feb 01, 1958 03:48 UTC
4321
US Navy
Fri Dec 06, 1957 16:44 UTC
4322
RVSN USSR
Sun Nov 03, 1957 02:30 UTC
4323
RVSN USSR
Fri Oct 04, 1957 19:28 UTC
etc
etc
etc
What I want:
organisation
RVSN USSR
Arianespace
CASC
General Dynamics
NASA
VKS RF
US Air Force
ULA
Boeing
Martin Marietta
etc
My data is organized in partitions. Data is partitioned by the year, month, and day when the records were received by the servers. The dataset contains a column with the timestamp that records when an event happened and another one with the timestamp of when the data corresponding was received in the servers.
I need to go to each partition from 06/2021 to 06/2022, collect all rows that correspond to events that happened during the week of Jan. 18, 2021 to Jan. 24, 2021, and create a new table with the rows collected.
This is an example of how my datase looks like:
year
month
day
event_timestamp
server_timestamp
2021
07
01
2021-01-19 01:48:20.000
2021-07-01 01:48:20.000
2022
04
09
2022-04-08 01:48:20.000
2022-04-09 01:48:20.000
2023
01
19
2023-01-08 01:48:20.000
2023-01-19 01:48:20.000
2022
02
21
2022-01-09 01:48:20.000
2022-02-21 01:48:20.000
2021
08
05
2021-01-23 01:48:20.000
2021-08-05 01:48:20.000
What is the best way to solve this using SQL?
It seems like you do not need the columns year, month and day, since you have got the server_timestamp and you do not need a for loop?!
If i understood your question correctly, the answer could look something like that:
create table new_table(
year int,
month nvarchr(2),
day nvarchar (2),
event_timestamp timestamp,
server_timestamp timestamp
)
select year, month, day, event_timestamp, server_timestamp
into new_table
from dataset
where server_timestamp >= 2021-06-01 00:00:00.000
and server_timestamp < 2022-07-01 00:00:00.000
and event_timestamp >= 2021-01-18 00:00:00.000
and event_timestamp < 2021-01-25 00:00:00.000
Good afternoon -
I have a table in Teradata that stores a rolling cumulative sum that resets every month. I would like to be able to calculate the incremental gain between each day of the month. Is this something that I can accomplish with olap functions or should it be handled in a recursive cte? Would love assistance thinking through this. Thanks!
example source
date
month
cum_value
2022-07-02
July 2022
25
2022-07-01
July 2022
5
2022-06-30
June 2022
100
2022-06-29
June 2022
70
2022-06-28
June 2022
65
2022-06-27
June 2022
50
example result
date
month
cum_value
incremental_value
2022-07-02
July 2022
25
20
2022-07-01
July 2022
5
5
2022-06-30
June 2022
100
30
2022-06-29
June 2022
70
5
2022-06-28
June 2022
65
15
2022-06-27
June 2022
50
..
I have a datetime column (data type of timestamp without time zone) named time. I can best explain my issue with a example:
Example I've the following data in this column (pretifying timestamp for this example)
ID TIME
1 1 Mar 2022 - 1PM
2 1 Mar 2022 - 2PM
3 1 Mar 2022 - 1PM
4 1 Mar 2022 - 3PM
5 1 Mar 2022 - 2PM
6 2 Mar 2022 - 2PM
7 2 Mar 2022 - 1PM
8 2 Mar 2022 - 3PM
9 2 Mar 2022 - 1PM
10 1 Mar 2022 - 3PM
11 2 Mar 2022 - 2PM
12 2 Mar 2022 - 3PM
13 3 Mar 2022 - 4PM
14 3 Mar 2022 - 3PM
15 3 Mar 2022 - 3PM
16 3 Mar 2022 - 4PM
If i do ORDER BY time, i get the following result:
ID TIME
1 1 Mar 2022 - 1PM
3 1 Mar 2022 - 1PM
2 1 Mar 2022 - 2PM
5 1 Mar 2022 - 2PM
4 1 Mar 2022 - 3PM
10 1 Mar 2022 - 3PM
7 2 Mar 2022 - 1PM
9 2 Mar 2022 - 1PM
6 2 Mar 2022 - 2PM
11 2 Mar 2022 - 2PM
8 2 Mar 2022 - 3PM
12 2 Mar 2022 - 3PM
14 3 Mar 2022 - 3PM
15 3 Mar 2022 - 3PM
13 3 Mar 2022 - 4PM
16 3 Mar 2022 - 4PM
But i want the result in this way:
ID TIME
1 1 Mar 2022 - 1PM
2 1 Mar 2022 - 2PM
4 1 Mar 2022 - 3PM
13 3 Mar 2022 - 4PM
3 1 Mar 2022 - 1PM
5 1 Mar 2022 - 2PM
10 1 Mar 2022 - 3PM
16 3 Mar 2022 - 4PM
7 2 Mar 2022 - 1PM
6 2 Mar 2022 - 2PM
8 2 Mar 2022 - 3PM
9 2 Mar 2022 - 1PM
11 2 Mar 2022 - 2PM
12 2 Mar 2022 - 3PM
14 3 Mar 2022 - 3PM
13 3 Mar 2022 - 4PM
As you can see first 4 rows have unique timestamp and the sequence should repeat based on Time (1PM, 2PM, 3PM).
How can we do this in SQL? I'm using postresql as my DB. I'm using Rails for my Backend.
EDIT:
Have added more context to example to explain my scenario.
One way you can try to use ROW_NUMBER window function with REPLACE function
SELECT time
FROM (
SELECT *,REPLACE(time,'PM','') val,
ROW_NUMBER() OVER(PARTITION BY REPLACE(time,'PM','')) rn
FROM T
) t1
ORDER BY rn,val
For example, sequence of the col a
with tbl(a, othercol) as
(
SELECT 1,1 UNION ALL
SELECT 1,2 UNION ALL
SELECT 1,3 UNION ALL
SELECT 2,4 UNION ALL
SELECT 2,5 UNION ALL
SELECT 2,6 UNION ALL
SELECT 3,7 UNION ALL
SELECT 3,8 UNION ALL
SELECT 3,9
),
cte as (
SELECT *, row_number() over(partition by a order by a) rn
from tbl
)
select a, othercol
from cte
order by rn, a
The problem you have at hand is a direct result of not choosing the correct data type for the values you store.
To get the sorting correct, you need to convert the string to a proper time value. There is no to_time() function in Postgres, but you can convert it to a timestamp then cast it to a time:
order by to_timestamp("time", 'hham')::time
You should fix your database design and convert that column to a proper time type. Which will also prevent storing invalid values ('3 in the afternoon' or '128foo') in that column
As you can see a Date & Time Column are being saved in this CSV File. Now what problem is that the date & time are in format of something like - 30-1-2022 & 20:08:00
But i want it to look something like 30th Jan 22 and 8:08 PM
Any code for that ?
import requests
import pandas as pd
from datetime import datetime
from datetime import date
currentd = date.today()
s = requests.Session()
headers = {'user-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36'}
url = 'https://www.nseindia.com/'
step = s.get(url,headers=headers)
today = datetime.now().strftime('%d-%m-%Y')
api_url = f'https://www.nseindia.com/api/corporate-announcements?index=equities&from_date={today}&to_date={today}'
resp = s.get(api_url,headers=headers).json()
result = pd.DataFrame(resp)
result.drop(['difference', 'dt','exchdisstime','csvName','old_new','orgid','seq_id','sm_isin','bflag','symbol','sort_date'], axis = 1, inplace = True)
result.rename(columns = {'an_dt':'DateandTime', 'attchmntFile':'Source','attchmntText':'Topic','desc':'Type','smIndustry':'Sector','sm_name':'Company Name'}, inplace = True)
result[['Date','Time']] = result.DateandTime.str.split(expand=True)
result.drop(['DateandTime'], axis = 1, inplace = True)
result.to_csv( ( str(currentd.day) +'-'+str(currentd.month) +'-'+'CA.csv'), index=True)
print('Saved the CSV File')
Try creating a temporary column:
result['Full_date']=pd.to_datetime(result['Date']+' '+result['Time'])
Then format 'Date' and 'Time'
result['Date']=result['Full_date'].dt.strftime('%b %d, %Y')
result['Time']=result['Full_date'].dt.strftime('%R' '%p')
Try this:
# Remove comment if needed
# import locale
# locale.setlocale(locale.LC_TIME, 'C')
# https://stackoverflow.com/a/16671271
def ord(n):
return str(n)+("th" if 4<=n%100<=20 else {1:"st",2:"nd",3:"rd"}.get(n%10, "th"))
result['Date'] = pd.to_datetime(result['Date'], format='%d-%b-%Y')
result['Date'] = result['Date'].dt.day.map(ord) + result['Date'].dt.strftime(' %b %Y')
result['Time'] = pd.to_datetime(result['Time']).dt.strftime('%-H:%M %p')
# Now you can export
Output:
>>> result[['Date', 'Time']]
Date Time
0 30th Jan 2022 21:07 PM
1 30th Jan 2022 20:57 PM
2 30th Jan 2022 19:40 PM
3 30th Jan 2022 18:55 PM
4 30th Jan 2022 18:53 PM
5 30th Jan 2022 18:09 PM
6 30th Jan 2022 17:44 PM
7 30th Jan 2022 16:01 PM
8 30th Jan 2022 15:21 PM
9 30th Jan 2022 15:16 PM
10 30th Jan 2022 15:10 PM
11 30th Jan 2022 15:06 PM
12 30th Jan 2022 14:29 PM
13 30th Jan 2022 14:15 PM
14 30th Jan 2022 13:41 PM
15 30th Jan 2022 12:20 PM
16 30th Jan 2022 12:09 PM
17 30th Jan 2022 12:07 PM
18 30th Jan 2022 10:58 AM
19 30th Jan 2022 10:42 AM
20 30th Jan 2022 10:40 AM
21 30th Jan 2022 10:39 AM
22 30th Jan 2022 10:06 AM
23 30th Jan 2022 9:39 AM
24 30th Jan 2022 9:36 AM
25 30th Jan 2022 9:25 AM
26 30th Jan 2022 8:43 AM
27 30th Jan 2022 1:00 AM
28 30th Jan 2022 0:59 AM
29 30th Jan 2022 0:13 AM