Smoothed Average over rows and columns with pandas - pandas

I am trying to create a function that averages over both row and column. For example:
**State** **1943 1944 1945 1946 1947 (1947_AVG) 1948 (1948_AVG)**
Alaska 1 2 3 4 5 2 6 3
CA 234 234 234 6677 34
I want a code that will give me an average for 1947 using 1943, 1944, and 1945. Something that gives me 1948 using 1944, 1945, 1946, ect, ect.
I currently have:
d3['pandas_SMA_Year'] = d3.iloc[:,1].rolling(window=3).mean()
But this is simply working over the rows, not the columns, and it doesn't take into account the fact that I'm looking 2 years back. Please and thank you for any guidance!

Related

How can I group and get MS Access query to show only rows with a maximum value in a specified field for a consecutive number of times?

I have a large access table that I need to pull specific data from with a query.
I need to get a list of all the IDs that meet a specific criteria, i.e. 3 months in a row with a cage number less than 50.
The SQL code I'm currently working with is below, but it only gives me which months of the past 3 had a cage number below 50.
SELECT [AbBehWeeklyMonitor Database].AnimalID, [AbBehWeeklyMonitor Database].Date, [AbBehWeeklyMonitor Database].Cage
FROM [AbBehWeeklyMonitor Database]
WHERE ((([AbBehWeeklyMonitor Database].Date)>=DateAdd("m",-3,Date())) AND (([AbBehWeeklyMonitor Database].Cage)<50))
ORDER BY [AbBehWeeklyMonitor Database].AnimalID DESC;
I would need it to look at the past 3 months for each ID, and only output if all 3 met the specific criteria, but I'm not sure where to go from here.
Any help would be appreciated.
Data Sample:
Date
AnimalID
Cage
6/28/2022
12345
50
5/19/2021
12345
32
3/20/2008
12345
75
5/20/2022
23569
4
8/20/2022
23569
4
5/20/2022
44444
71
8/1/2012
44444
4
4/1/2022
78986
30
1/20/2022
78986
1
9/14/2022
65659
59
8/10/2022
65659
48
7/14/2022
65659
30
6/14/2022
95659
12
8/14/2022
91111
51
7/14/2022
91111
5
6/14/2022
91111
90
8/14/2022
88888
4
7/14/2022
88888
5
6/14/2022
88888
15
Consider:
Query1:
SELECT AnimalID, Count(*) AS Cnt
FROM Table1
WHERE (((Cage)<50) AND (([Date]) Between #6/1/2022# And #8/31/2022#))
GROUP BY AnimalID
HAVING (((Count(*))=3));
Query2
SELECT Table1.*
FROM Query1 INNER JOIN Table1 ON Query1.AnimalID = Table1.AnimalID
WHERE ((([Date]) Between #6/1/2022# And #8/31/2022#));
Output:
Date AnimalID Cage
6/14/2022 65659 12
7/14/2022 65659 30
8/10/2022 65659 48
6/14/2022 88888 15
7/14/2022 88888 5
8/14/2022 88888 4
Date is a reserved word and really should not use reserved words as names.

Need assistance with below query

I'm getting this error:
Error tokenizing data. C error: Expected 2 fields in line 11, saw 3
Code: import webbrowser
website = 'https://en.wikipedia.org/wiki/Winning_percentage'
webbrowser.open(website)
league_frame = pd.read_clipboard()
And the above mentioned comes next.
I believe you need use read_html - returned all parsed tables and select Dataframe by position:
website = 'https://en.wikipedia.org/wiki/Winning_percentage'
#select first parsed table
df1 = pd.read_html(website)[0]
print (df1.head())
Win % Wins Losses Year Team Comment
0 0.798 67 17 1882 Chicago White Stockings best pre-modern season
1 0.763 116 36 1906 Chicago Cubs best 154-game NL season
2 0.721 111 43 1954 Cleveland Indians best 154-game AL season
3 0.716 116 46 2001 Seattle Mariners best 162-game AL season
4 0.667 108 54 1975 Cincinnati Reds best 162-game NL season
#select second parsed table
df2 = pd.read_html(website)[1]
print (df2)
Win % Wins Losses Season Team \
0 0.890 73 9 2015–16 Golden State Warriors
1 0.110 9 73 1972–73 Philadelphia 76ers
2 0.106 7 59 2011–12 Charlotte Bobcats
Comment
0 best 82 game season
1 worst 82-game season
2 worst season statistically

Create all combinations of summations given criteria in Access VBA

I have a subset summation problem I cannot find the answer to. I am trying to write something in VBA for access that will take all combinations of summations within a certain criteria and place them in a table so I can match a different table to it. Right now I am more concerned with creating the table of combinations. First time I have asked a question sorry if I mess something up.
Example:
Access Table: ImpTable
Fields: ID, Year-Month, Name, Country, Quantity
I need to make every combination of summations where the country and Year-Month are the same. Yet keep track of what was included in the formula. If the new table was created and kept track of which ID's were included in the combination I can reference the original table for the name.
Expected Ending Table Results:
NewID, Year-Month, Country, SumQuantity, ComboName (ID's from original table)
Any help is appreciated.
Raw Data:
ID Year-Month Name Country Quantity
1 2016-06 Person1 US 10
2 2016-06 Person2 US 12
3 2016-10 Person3 US 4
4 2016-06 Person4 UK 5
5 2016-06 Person5 UK 6
6 2016-06 Person6 US 3
Desired Results:
NewID Year-Month Country SumQuantity ComboName
1 2016-06 US 22 1,2
2 2016-06 US 13 1,6
3 2016-06 US 25 1,2,6
4 2016-06 US 15 2,6
5 2016-06 UK 11 4,5
6 2016-10 US 4 3

Pandas: Error when merging two tables, Error with set_index

Thanks in advance for your help, here's my question:
I've successfully loaded my df in to ipython notebook and then I ran a group by on it:
station_count = station.groupby('landmark').count()
which produced a table like this:
Now I'm trying to merge it with another table:
dock_count_by_station = station.groupby('landmark').sum()
that is also a simple group by on the same table, but the merge produces an error:
TypeError: cannot concatenate a non-NDFrame object
with this code:
dock_count_by_station.merge(station_count)
I think the problem is that I need to set the index of the two tables before merging them but I keep getting this error for the code below:
pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:3979)()
pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:3843)()
pandas/hashtable.pyx in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12265)()
pandas/hashtable.pyx in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12216)()
KeyError: 'landmark'
station_count.set_index('landmark')
Using join
You can use join, which merges the tables on their index. You may also wish to specify the join type (e.g. 'outer', 'inner', 'left' or 'right'). You have overlapping column names (e.g. station_id), so you need to specify a suffix.
>>> dock_count_by_station.join(station_count, rsuffix='_rhs')
dockcount lat long station_id dockcount_rhs installation lat_rhs long_rhs name station_id_rhs
landmark
Mountain View 117 261.767433 -854.623012 210 7 7 7 7 7 7
Palo Alto 75 187.191873 -610.767939 180 5 5 5 5 5 5
Redwood City 115 262.406232 -855.602755 224 7 7 7 7 7 7
San Francisco 665 1322.569239 -4284.054814 2126 35 35 35 35 35 35
San Jose 249 560.039892 -1828.370075 200 15 15 15 15 15 15
Using merge
Note that your landmark index was set by default when you did the groupby. You can always use as_index=False if you don't want this to occur, but then you would have to use merge instead of join.
dock_count_by_station = station.groupby('landmark', as_index=False).sum()
station_count = station.groupby('landmark', as_index=False).count()
>>> dock_count_by_station.merge(station_count, on='landmark', suffixes=['_lhs', '_rhs'])
landmark dockcount_lhs lat_lhs long_lhs station_id_lhs dockcount_rhs installation lat_rhs long_rhs name station_id_rhs
0 Mountain View 117 261.767433 -854.623012 210 7 7 7 7 7 7
1 Palo Alto 75 187.191873 -610.767939 180 5 5 5 5 5 5
2 Redwood City 115 262.406232 -855.602755 224 7 7 7 7 7 7
3 San Francisco 665 1322.569239 -4284.054814 2126 35 35 35 35 35 35
4 San Jose 249 560.039892 -1828.370075 200 15 15 15 15 15 15

MS Access, Excel, SQL, and New Tables

I'm just starting out with MS Access 2010 and have the following setup. 3 excel files: masterlist.x (which contains every product that I sell), vender1.x (which contains all products from vender1, I only sell some of these products), and vender2.x (again, contains all products from vender2, I only sell some of these products). Here's an example data collection:
masterlist.x
ID NAME PRICE
23 bananas .50
33 apples .75
35 nuts .87
38 raisins .25
vender1.x
ID NAME PRICE
23 bananas .50
25 pears .88
vender2.x
ID NAME PRICE
33 apples .75
35 nuts .87
38 raisins .25
49 kiwis .88
The vender lists get periodically updated with new items for sell and new prices. For example, vender1 raises the price on bananas to $.75, my masterlist.x would need to be updated to reflect this.
Where I'm at now: I know how to import the 3 excel charts into Access. From there, I've been researching if I need to setup relationships, create a macro, or a SQL query to accomplish my goals. Not necessarily looking for a solution, but to be pointed in the right direction would be great!
Also, once the masterlist.x table is updated, what feature would I use to see which line items were affected?
Update: discovered SQL /JOIN/ and have the following:
SELECT * FROM master
LEFT JOIN vender1
ON master.ID = vender1.ID
where master.PRICE <> vender1.PRICE;
This gives me the output (for the above scenario)
ID NAME PRICE ID NAME PRICE
23 bananas .50 23 bananas .75
What feature would instead give me:
masterlist.x
ID NAME PRICE
23 bananas .75
33 apples .75
35 nuts .87
38 raisins .25
Here is a heads up since you were asking for ideas to design. I don't really fancy your current table schema. The following queries are built in SQL Server 2008, the nearest syntax that I could get in sqlfiddle to MS Access SQL.
Please take a look:
SQLFIDDLE DEMO
Proposed table design:
vendor table:
VID VNAME
1 smp farms
2 coles
3 cold str
4 Anvil NSW
product table:
PID VID PNAME PPRICE
203 2 bananas 0.5
205 2 pears 0.88
301 3 bananas 0.78
303 3 apples 0.75
305 3 nuts 0.87
308 3 raisins 0.25
409 4 kiwis 0.88
masterlist:
ID PID MPRICE
1 203 0.5
2 303 0.75
3 305 0.87
4 308 0.25
Join queries can easily update your masterlist now. for e.g.:
When the vendor updates their prices for the fruits they provide you. Or when they stop supply on that product. You may use where clauses to add the conditions to the query as you desire.
Query:
SELECT m.id, p.vid, p.pname, p.pprice
FROM masterlist m
LEFT JOIN product p ON p.pid = m.pid
;
Results:
ID VID PNAME PPRICE
1 2 bananas 0.5
2 3 apples 0.75
3 3 nuts 0.87
4 3 raisins 0.25
Please comment. Happy to help you if have any doubts.