Multiplication of returns by company increasing in time (BHARs) - pandas

I have the following Dataframe, organized in panel data. It contains daily returns of many companies on different days following the IPO date. The day_diff represents the days that have passed since the IPO, and return_1 represents the daily individual returns for that specific day for that specific company, from which I have already added +1. Each company has its own company_tic and I have about 300 companies. My goal is to calculate the first component of the right-hand side of the equation below (so having results for each day_diff and company_tic, always starting at day 0, until the last day of data; e.g. = from day 0 to day 1, then from day 0 to day 2, from 0 to day 3, and so on until my last day, which is day 730). I have tried df.groupby(['company_tic', 'day_diff'])['return_1'].expanding().prod() but it doesn't work. Any alternatives?
Index day_diff company_tic return_1
0 0 xyz 1.8914
1 1 xyz 1.0542
2 2 xyz 1.0016
3 0 abc 1.4398
4 1 abc 1.1023
5 2 abc 1.0233
... ... ... ...
159236 x 3

Not sure to fully get what you want, but you might want to use cumprod instead of expanding().prod().
Here's what I tried :
df['return_1_prod'] = df.groupby('company_tic')['return_1'].cumprod()
Output :
day_diff company_tic return_1 return_1_prod
0 0 xyz 1.8914 1.891400
1 1 xyz 1.0542 1.993914
2 2 xyz 1.0016 1.997104
3 0 abc 1.4398 1.439800
4 1 abc 1.1023 1.587092
5 2 abc 1.0233 1.624071

Related

How to produce monthly count when given a date range in pandas?

I have a dataframe that records users, a label, and the start and end date of them being labelled as such
e.g.
user
label
start_date
end_date
1
x
2018-01-01
2018-10-01
2
x
2019-05-10
2020-01-01
3
y
2019-04-01
2022-04-20
1
b
2018-10-01
2020-05-08
etc
where each row is for a given user and a label; a user appears multiple times for different labels
I want to get a count of users for every month for each label, such as this:
date
count_label_x
count_label_y
count_label_b
count_label_
2018-01
10
0
20
5
2018-02
2
5
15
3
2018-03
20
6
8
3
etc
where for instance for the first entry of the previous table, that user should be counted once for every month between his start and end date. The problem boils down to this and since I only have a few labels I can filter labels one by one and produce one output for each label. But how do I check and count users given an interval?
Thanks
You can use date_range combined with to_period to generate the active months, then pivot_table with aggfunc='nunique' to aggregate the unique user (if you want to count the duplicated users use aggfunc='count'):
out = (df
.assign(period=[pd.date_range(a, b, freq='M').to_period('M')
for a,b in zip(df['start_date'], df['end_date'])])
.explode('period')
.pivot_table(index='period', columns='label', values='user',
aggfunc='nunique', fill_value=0)
)
output:
label b x y
period
2018-01 0 1 0
2018-02 0 1 0
2018-03 0 1 0
2018-04 0 1 0
2018-05 0 1 0
...
2021-12 0 0 1
2022-01 0 0 1
2022-02 0 0 1
2022-03 0 0 1
handling NaT
if you have the same start/end and want to count the value:
out = (df
.assign(period=[pd.date_range(a, b, freq='M').to_period('M')
for a,b in zip(df['start_date'], df['end_date'])])
.explode('period')
.assign(period=lambda d: d['period'].fillna(d['start_date'].dt.to_period('M')))
.pivot_table(index='period', columns='label', values='user',
aggfunc='nunique', fill_value=0)
)

Is it possible to set a dynamic window frame bound in SQL OVER(ROW BETWEEN ...)-Clause?

Consider the following table, describing a patients medication plan. For example, the first row describes that the patient with patient_id = 1 is treated from timestamp 0 to 4. At time = 0, the patient has not yet become any medication (kum_amount_start = 0). At time = 4, the patient has received a kumulated amount of 100 units of a certain drug. It can be assumed, that the drug is given in with a constant rate. Regarding the first row, this means that the drug is given with a rate of 25 units/h.
patient_id
starttime [h]
endtime [h]
kum_amount_start
kum_amount_end
1
0
4
0
100
1
4
5
100
300
1
5
15
300
550
1
15
18
550
700
2
0
3
0
150
2
3
6
150
350
2
6
10
350
700
2
10
15
700
1100
2
15
19
1100
1500
I want to add the two columns "kum_amount_start_last_6hr" and "kum_amount_end_last_6hr" that describe the amount that has been given within the last 6 hours of the treatment (for the respective timestamps start, end).
I'm stuck with this problem for a while now.
I tried to tackle it with something like this
SUM(kum_amount) OVER (PARTITION BY patient_id ROWS BETWEEN "dynmaic window size" AND CURRENT ROW)
but I'm not sure whether this is the right approach.
I would be very happy if you could help me out here, thanks!

Customers who bought and not bought some product in last 90 days

I need a dax measure which shows me which customers bought products B and C in last 90 days.
And another one which shows me those whose bought products B and C in last 90 days.
(based in my filter date context)
Below is like it should be:
Can someone help me?
Here is a sample data if needed:
FactSales
KeyDate KeyCustomer KeyProduct Total
1 1 1 12,9
1 2 2 13
1 3 1 156,4
1 4 1 564,8
2 1 1 894,8
2 2 1 56,5
3 1 2 564,85
3 2 3 564,8
4 1 1 1325,6
4 2 1 132,3
Customer
KeyCustomer Name
1 Jean
2 Mari
3 Lisa
4 Julian
5 Jhonny
Calendar
KeyDate Date
1 01/01/2018
2 02/01/2018
3 01/05/2018
4 01/08/2018
Product
KeyProduct Product
1 A
2 B
3 C
Try something along these lines:
IfBought = IF(
COUNTROWS(
FILTER(FactSales,
RELATED('Product'[Product]) IN {"B", "C"} &&
RELATED('Calendar'[Date]) > TODAY() - 90)
) > 0,
1, 0)
Note that May 1st is longer than 90 days ago as of today though, so you won't get the result you asked for unless you change 90 to 114 or greater.

Get data for a given number of days by converting rows to column dynamically

This is a follow-up to my previous question: Get records for last 10 dates
I have to generate reports for all books of a store along with sold count (if any) for the last N dates, by passing storeId.
BOOK Book Sold Store
------------ -------------------- ----------------
Id Name SID Id Bid Count Date SID Name
1 ABC 1 1 1 20 11/12/2015 1 MNA
2 DEF 1 2 1 30 12/12/2015 2 KLK
3 DF2 2 3 2 20 11/12/2015 3 KJH
4 DF3 3 4 3 10 13/12/2015
5 GHB 3 5 4 5 14/12/2015
The number of days N is supplied by the user. This is the expected output for the last 4 dates for storeId -1,2 & 3.
BookName 11/12/2015 12/12/2015 13/12/2015 14/12/2015
ABC 20 30 -- --
DEF 20 -- -- --
DF2 -- -- 10 --
DF3 -- -- -- 5
GHB -- -- -- --
If the user passes 5 than data for the last 5 days shall be generated, starting date as 14/12/2015.
I am using Postgres 9.3.
Cross table without crosstab function:
SELECT
SUM(CASE book.Date ='11/11/2015' THEN book.Count ELSE 0 END) AS '11/11/2015',
SUM(CASE book.Date ='15/11/2015' THEN book.Count ELSE 0 END) AS '15/11/2015',
SUM(CASE book.Date ='17/11/2015' THEN book.Count ELSE 0 END) AS '17/11/2015'
FROM
store,
book
WHERE
store.Id = booksold.Bid
AND store.Id IN (1,2)
GROUP BY
book.Name
ORDER BY
book.id ASC;

how to get a moving average in sql

If I have data from week 1 to week 52 data and I want 4 week Moving Average with 1 week how can I make a SQL query for this? For example, for week 5 I want week1-week4 average, week6 I want week5-week8 average and so on.
I have the columns week and target_value in table A.
Sample data is like this:
Week target_value
1 20
2 10
3 10
4 20
5 60
6 20
So the output I want will start from week 5 as only week 1-week4 is available not before that.
Output data will look like:
Week Output
5 15 (20+10+10+20)/4=15 Moving Average week1-week4
6 25 (10+10+20+60)/4=25 Moving Average week2-week5
The data is in hive but I can move it to oracle if it is simpler to do this there.
SELECT
Week,
(SELECT ISNULL(AVG(B.target_value), A.target_value)
FROM tblA B
WHERE (B.Week < A.Week)
AND B.Week >= (A.Week - 4)
) AS Moving_Average
FROM tblA A
The ISNULL keeps you from getting a null for your first week since there is no week 0. If you want it to be null, then just leave the ISNULL function out.
If you want it to start at week 5 only, then add the following line to the end of the SQL that I wrote:
WHERE A.Week > 4
Results:
Week Moving_Average
1 20
2 20
3 15
4 13
5 15
6 25