dataframe fill in next row with previous row value - pandas

AMA1=close1
AMA(t)=AMA(t-1)+alpha(t)*(close(t)-AMA(t-1))
id date close alpha AMA
1 2016-01-04 343 1 343
2 2016-01-05 335 2 327 (343+2*(335-343))
3 2016-01-06 337 3 357 (327+3*(337-327))
4 2016-01-07 338 -1 376 (357-1*(338-357))
I want to calculate column AMA of a dataframe, the rule likes this:
if id==1, AMA(1)=close(1)
if id>1, AMA(t)=AMA(t-1)+alpha(t)*(close(t)-AMA(t-1))
I used for loop, but it's very slow when i'm dealing with large data.
Does anyone know how to do this without for loop?Many thanks!!
for i in range(len(df)):
if i==1:
AMA(1)=close(1)
if i>1:
AMA(t)=AMA(t-1)+alpha(t)*(close(t)-AMA(t-1))

Related

Pandas adding row to categorical index

I have a scenario where I would like to group my datasets by personally defined week indexes that are then averaged and aggregate the averages into a "Total" row. I am able to achieve the first half of my scenario, but when I try to append/insert a new "Total" row that sums these rows I am receiving error messages.
I attempted to create this row via two different methods:
Method 1:
week_index_avg_unit.loc['Total'] = week_index_avg_unit.sum()
TypeError: cannot append a non-category item to a CategoricalIndex
Method 2:
week_index_avg_unit.index.insert(['Total'], week_index_avg_unit.sum())
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
I have used the first approach in this scenario multiple times, but this is the first time where I'm cutting the data into multiple categories and clearly see where the CategoricalIndex type is the problem.
Here is the format of my data:
date organic ppc oa other content_partnership total \
0 2018-01-01 379 251 197 51 0 878
1 2018-01-02 880 527 405 217 0 2029
2 2018-01-03 859 589 403 323 0 2174
3 2018-01-04 835 533 409 335 0 2112
4 2018-01-05 760 449 355 272 0 1836
year_month day weekday weekday_name week_index
0 2018-01 1 0 Monday Week 1
1 2018-01 2 1 Tuesday Week 1
2 2018-01 3 2 Wednesday Week 1
3 2018-01 4 3 Thursday Week 1
4 2018-01 5 4 Friday Week 1
Here is the code:
import pandas as pd
import numpy as np
from datetime import datetime
import matplotlib.pyplot as plt
historicals = pd.read_csv("2018-2019_plants.csv")
# Capture dates for additional date columns
date_col = pd.to_datetime(historicals['date'])
historicals['year_month'] = date_col.dt.strftime("%Y-%m")
historicals['day'] = date_col.dt.day
historicals['weekday'] = date_col.dt.dayofweek
historicals['weekday_name'] = date_col.dt.day_name()
# create week ranges segment (7 day range)
historicals['week_index'] = pd.cut(historicals['day'],[0,7,14,21,28,32], labels=['Week 1','Week 2','Week 3','Week 4','Week 5'])
# Week Index Average (Units)
week_index_avg_unit = historicals[df_monthly_average].groupby(['week_index']).mean().astype(int)
type(week_index_avg_unit.index)
pandas.core.indexes.category.CategoricalIndex
Here is the week_index_avg_unit table:
organic ppc oa other content_partnership total day weekday
week_index
Week 1 755 361 505 405 22 2027 4 3
Week 2 787 360 473 337 19 1959 11 3
Week 3 781 382 490 352 18 2006 18 3
...
pd.CategoricalIndex is a special animal. It is immutable, so to do the trick you may need to use something like pd.CategoricalIndex.set_categories to add a new category.
See pandas docs: https://pandas.pydata.org/pandas-docs/version/0.23.4/generated/pandas.CategoricalIndex.html

How to remove unwanted values in data when reading csv file

Reading Pina_Indian_Diabities.csv some of the values are strings, something like this
+AC0-5.4128147485
734 2
735 4
736 0
737 8
738 +AC0-5.4128147485
739 1
740 NaN
741 3
742 1
743 9
744 13
745 12
746 1
747 1
like in row 738, there re such values in other rows and columns as well.
How can I drop them?

Adding the most recent data to a pandas data frame

I am trying to build, and keep up to date, a data frame/time series where I scrape the data from a website table, and want to take the most recent data, and add to the data I've already got. A sample of what the data frame looks like is:
Date Price
0 10/01/19 100
1 09/01/19 95
2 08/01/19 96
3 07/01/19 97
What I then want to do is run my little program and have it identify that I am missing data for the 11th and 12th of Jan, and then add it to the top of the data frame. I am quite comfortable with compiling a data frame using .read_html, and generally building a data frame, but this is a bit beyond my talents currently.
I know the done thing is usually to show you what I have so far attempted but to be honest I actually don't know where to begin with this one.
Many thanks
Lets say the old dataframe as df which looks like:
Date Price
0 2019-01-10 100
1 2019-01-09 95
2 2019-01-08 96
3 2019-01-07 97
After 2 days you download a data which gives you 2 rows for 2019-01-11 and 2019-01-12, lets name it new_df (values are just as examples):
Date Price
0 2019-01-12 67
1 2019-01-11 89
2 2019-01-10 100
3 2019-01-09 95
Note: there are a few values in the new df which are present in the old df.
Using df.append() , df.drop_duplicates() and df.sort_values() :-
>>df.append(new_df,ignore_index=True).drop_duplicates().sort_values(by='Date',ascending=False)
Date Price
4 2019-01-12 67
5 2019-01-11 89
0 2019-01-10 100
1 2019-01-09 95
2 2019-01-08 96
3 2019-01-07 97
This will append the new values and sort them in descending manner based on Date column keeping the latest date at the top.
if you want the index sorted just add sort_index() in the end : df.append(new_df,ignore_index=True).drop_duplicates().sort_values(by='Date',ascending=False).sort_index()

How to group by a set of numbers in a column

I have a table as below. I want to do a group by in such a way that 1-4 weeknums are joined together and 5-8 weeknums are joined together. Or in other words i want to get the monthly total from below fields
table1
weeknum amount
1 1000
2 1100
3 1200
4 1300
5 1400
6 1500
7 1600
8 1700
The output i need is as below
output
max(weeknum) sum(amount)
4 4600
8 6200
The below answer did not work exactly for my actual values as below. I want to start with 4 weeks grouping. The formula (weeknum-1)/4 returns 3 groups as in the expected is only 2
weeknum Group Expr Expected Group Expr
1855 463 463
1856 463 463
1857 464 463
1858 464 463
1859 464 464
1860 464 464
1861 465 464
1862 465 464
Need to execute the query in oracle
Try using FLOOR that rounds the number down in the group by clause:
SELECT MAX(t.weeknum),sum(amount)
FROM table1 t
GROUP BY FLOOR((t.weeknum-1)/4)
This will make sure every 4 weeks are treated as a group :
(1-1)/4 -> 0
(2-1)/4 -> 0
...
(5-1)/4 -> 1

SQL Query: How to pull counts of two coulmns from respective tables

Given two tables:
1st Table Name: FACETS_Business_NPI_Provider
Buss_ID NPI Bussiness_Desc
11 222 Eleven 222
12 223 Twelve 223
13 224 Thirteen 224
14 225 Fourteen 225
11 226 Eleven 226
12 227 Tweleve 227
12 228 Tweleve 228
2nd Table : FACETS_PROVIDERs_Practitioners
NPI PRAC_NO PROV_NAME PRAC_NAME
222 943 P222 PR943
222 942 P222 PR942
223 931 P223 PR931
224 932 P224 PR932
224 933 P224 PR933
226 950 P226 PR950
227 951 P227 PR951
228 952 P228 PR952
228 953 P228 PR953
With below query I'm getting following results whereas it is expected to have the provider counts from table FACETS_Business_NPI_Provider (i.e. 3 instead of 4 for Buss_Id 12 and 2 instead of 3 for Buss_Id 11, etc).
SELECT BP.Buss_ID,
COUNT(BP.NPI) PROVIDER_COUNT,
COUNT(PP.PRAC_NO)PRACTITIONER_COUNT
FROM FACETS_Business_NPI_Provider BP
LEFT JOIN FACETS_PROVIDERs_Practitioners PP
ON PP.NOI=BP.NPI
group by BP.Buss_ID
Buss_ID PROVIDER_COUNT PRACTITIONER_COUNT
11 3 3
12 4 4
13 2 2
14 1 0
If I understood it correctly, you might want to add a DISTINCT clause to the columns.
Here is an SQL Fiddle, which we can probably use to discuss further.
http://sqlfiddle.com/#!2/d9a0e6/3