Calculating the rolling exponential weighted moving average for each share price over time - pandas

This question is similar to my previous one: Shifting elements of column based on index given condition on another column
I have a dataframe (df) with 2 columns and 1 index.
Index is datetime index and is in format of 2001-01-30 .... etc and the index is ordered by DATE and there are thousands of identical dates (and is monthly dates). Column A is company name (which corresponds to the date), Column B are share prices for the company names in column A for the date in the Index.
Now there are multiple companies in Column A for each date, and companies do vary over time (so the data is not predictable fully).
I want to create a column C which has the 3 day rolling exponential weighting average of the price for a particular company using the current and 2 dates before for a particular company in column A.
I have tried a few methods but have failed. Thanks.

Try:
df.groupby('ColumnA', as_index=False).apply(lambda g: g.ColumnB.ewm(3).mean())

Related

Microsoft Access- How to create dynamic variables that queries a selection of Columns

Example Data Picture
My main data table is constructed in the following way:
1.State
2.Product
3.Account Name
4.Jan-20
5.Feb-20
.
.
.
N.)Recent Month- Recent Year
My goal is to get 6 total sums based on 6 different contiguous that are user selected. For example, if someone wanted the value of an Account Given a State and Product for FY-2020, they would sum columns 4 to column 15 (Twelve Months).
I am going to be running joins and queries off of the State, Product, Account combinations (first three columns), but I need to create a method to sum the Data table given a list of Column Numbers.
At this point, I am not looking to put non-contiguous columns in a selected Time Period (i.e. all time period selections will be from Col.Beg_TPn to and including Col.End_TPn). The Data Table houses monthly data that will have one new column every month. The Column Number should stay consistent as we are not looking back further than FY-2020.
This a much easier problem in Excel as you can do a simple SumIfs with an Index of the Column Range as the SumRange and then you filter on Columns 1,2,3. My data table is about 30,000 rows, so Excel freezes on me when could calculations and functions on the entire data set.
What is the best way to go about this in Microsoft Access? Ideally, I would like to create a CTE_TimePeriodTotals that houses the First 3 Columns (State,Product,Account) and then 6 TP Columns (tp-1,tp-2...) that holds the sum of each time period for each row based on the Time Period Column Starts and Time Period Column End.

Designing a database to house 1 dimensional data

I have a dataset of trading dates for different exchanges in a csv file. what is the best way to store them in a database? the dataset looks like this...
US,DE,JP
20040102,20040102,20040102
20040105,20040105,20040105
20040106,20040106,20040106
...
20210608,20210524,20210715
20210609,20210525,20210716
...
Essentially each country has a column of dates their stock exchange is open. Once i have this table created i want to query it to return the number of trading days between two dates on specific exchanges. eg number of trading days between 5/29/2020 and 6/19/2020 in the 'US' is 15.
What's the most efficient way to build the table and what query would i use to get the number of trading days between two dates.
Thanks
Use one table with two columns. One column for the country (code) and the other for the day. Something like:
CREATE TABLE trading_days
(country varchar(2),
day date,
UNIQUE (country,
day));
(Data types might vary depending on what DBMS you use, you weren't specific on that.)
A query to get the number of trading day for a country in a certain period then could look something like:
SELECT count(*)
FROM trading_days
WHERE day >= <lower bound of period>
AND day < <upper bound of period plus one day>
AND country = <country code>;

Trying to create a well count to compare to BOE using the on production date and comparing it to Capital spends and total BOE

I have data that includes the below columns:
Date
Total Capital
Total BOED
On Production Date
UWI
I'm trying to create a well count based on the unique UWI for each On Production Date and graph it against the Total BOED/Total Capital with Date as the x-axis.
I've tried unique count by UWI but it then populates ALL rows of that UWI with the same well count total, so when it is summed the numbers are multiplied by the row count.
Plot Xaxis as Date and Y with Total BOED and Well Count.
Add a calculated column to create a row id using the rowid() function. Then, in the calculation you already have, the one that populates all rows of the UWI with the same well count, add the following logic...
if([rowid] = min([rowid]) over [UWI], uniquecount([UWI]) over [Production Date], null)
This will make it so that the count only populates once.

Pandas groupby out of memory

I am adding a column to a dataframe calculating the number of days between each previous date for each of the customers with the following formula but I end up with out of memory
lapsed['Days']=lapsed[['Customer Number','GL Date']].groupby(['Customer Number']).diff()
The dataframe contains more than 1mln records
Customer Number is an int64 and I was thinking to run the the above statement withing ranges of numbers but I do not know if this is the best aproach
Any suggestion?

how to group rows based on a dates in a single column

I have a range of date (A1:CY7026) with a column for start dates. this column has a large amount of repeated dates within it. i need these dates group together based on the working week they are located in (eg. all values reading 16/7/18 - 22/07/18 would be one group and the following week would make up another).
Use below in B2 cell (Suppose you had a header column) and drag to the rest. Sort to obtain the desired result.
=CONCATENATE(YEAR(A2),"-",WEEKNUM(A2))