Pandas/MLP :: plot 36 months instead of 12 only on 3 year period - pandas

Below code snippet shows show revenue growth/decline of 3 companies for 3 year period however in this format I'm not able to find out how to add following:
1) instead of either seeing only 12 months out of 3 years (df.PublishedAtUtc.dt.month) or seeing very rugged graph based on year(df.PublishedAtUtc.dt.year), how do I see 36 or any amount month period(I tried different parameters with no luck)
https://i.imgur.com/beiO8w2.png
sums = df.groupby(['Company Name', df.Datetime.dt.month])['PaidTotal'].sum().reset_index(level=0)
for company in sums['Company Name'].unique():
sums[sums['Company Name'] == company]['PaidTotal'].plot();
Data sample only - original has thousands of lines:
Company Name PaidTotal Datetime
585 CompanyA 218916.0 2016-10-14 10:51:07
586 CompanyB 430000.0 2016-01-23 11:05:08
591 CompanyB 546217.0 2016-09-26 14:20:00
592 CompanyC 73780.0 2016-12-07 07:52:01
593 CompanyA 132720.0 2017-10-04 16:14:10
595 CompanyC 52065.0 2017-11-12 14:32:40
585 CompanyA 234566.0 2017-10-14 10:51:07
586 CompanyB 252325.0 2017-01-23 11:05:08
591 CompanyB 546217.0 2018-09-26 14:20:00
592 CompanyC 745780.0 2018-12-07 07:52:01
593 CompanyA 1322320.0 2018-10-04 16:14:10
595 CompanyC 5432065.0 2018-11-12 14:32:40

It looks like you're using the Datetime object to store your dates, so you will be able to use the Matplotlib's plt.plot_date() function documented here to solve your problem.
This function assumes that you're storing the dates/times as Datetime objects. So, you must specify this by passing xdate=True (see the code below).
To plot any range of Datetime objects datetimes with their paid totals totals, use something like this:
plt.plot_date(dates,totals,xdate=True)
plt.xlabel('date')
plt.ylabel('totals')
This should give you what you're looking for, something like this using your sample data:

Related

Smoothed Average over rows and columns with pandas

I am trying to create a function that averages over both row and column. For example:
**State** **1943 1944 1945 1946 1947 (1947_AVG) 1948 (1948_AVG)**
Alaska 1 2 3 4 5 2 6 3
CA 234 234 234 6677 34
I want a code that will give me an average for 1947 using 1943, 1944, and 1945. Something that gives me 1948 using 1944, 1945, 1946, ect, ect.
I currently have:
d3['pandas_SMA_Year'] = d3.iloc[:,1].rolling(window=3).mean()
But this is simply working over the rows, not the columns, and it doesn't take into account the fact that I'm looking 2 years back. Please and thank you for any guidance!

Pandas :: How to plot based on sum amount each month

I made data frame shown below which has 3 companies A,B and C. Companies buy certain amount of vouchers during 2016 to 2018 period. Some days eg. Company A buys 100 pieces for $3000 other days no company buys any.
I'd like see how these three companies compare for last two years when it comes to money spent for vouchers so my ideas was following:
Sum all money spent each month for each company, and plot them in bar graph or just standard line - so three lines each with different color.
Since its 2 years of data, there would be roughly 24 date-points on x axis
I tried something like: plt.bar(A['Datetime'], A['PaidTotal'])
But get: ufunc subtract cannot use operands with types dtype('
But this is just for one company anyway not for all 3 in one graph
(I can sort those dates, thats not a problem)
Company Name PaidTotal Datetime
585 CompanyA 218916.0 2016-10-14 10:51:07
586 CompanyB 430000.0 2017-01-23 11:05:08
591 CompanyB 546217.0 2016-09-26 14:20:00
592 CompanyC 73780.0 2016-12-07 07:52:01
593 CompanyA 132720.0 2016-10-04 16:14:10
595 CompanyC 52065.0 2016-11-12 14:32:40
For a bar chart you can call df.groupby('Company Name')['PaidTotal'].sum().plot.bar():
To see a line chart of all three over time, you can try this (the axes are wrong, but this is the general idea):
sums = df.groupby(['Company Name', 'Datetime'])['PaidTotal'].sum().reset_index(level=0)
for company in sums['Company Name'].unique():
sums[sums['Company Name'] == company]['PaidTotal'].plot();

groupby pandas dataframe, take difference between value of latest and earliest date

I have a Cumulative column and I want to groupby index and take the values corresponding to the latest date minus the values corresponding to the earliest date.
Very similar to this: group by pandas dataframe and select latest in each group
But take the difference between latest and earliest in each group.
I'm a python rookie, and here is my solution:
import pandas as pd
from io import StringIO
csv = StringIO("""index id product date
0 220 6647 2014-09-01
1 220 6647 2014-09-03
2 220 6647 2014-10-16
3 826 3380 2014-11-11
4 826 3380 2014-12-09
5 826 3380 2015-05-19
6 901 4555 2014-09-01
7 901 4555 2014-10-05
8 901 4555 2014-11-01""")
df = pd.read_table(csv, sep='\s+',index_col='index')
df['date']=pd.to_datetime(df['date'],errors='coerce')
df_sort=df.sort_values('date')
df_sort.drop(['product'], axis=1,inplace=True)
df_sort.groupby('id').tail(1).set_index('id')-df_sort.groupby('id').head(1).set_index('id')

Resample and interpolate pandas df

I have a df that looks like the following:
TotalSpend Date
100 2001-04-26
230 2001-05-12
340 2001-06-16
610 2001-07-31
770 2001-08-31
I'm trying interpolate the data so I can see how much was spent during each month like so:
TotalSpend Date MonthlySpend
110 2001-04-30
310 2001-05-31 200
400 2001-06-30 90
610 2001-07-31 210
770 2001-08-31 160
I set the date column as the index and have tried to upsample the data (below) so that I have every day of the year and can then interpolate the missing values and select the month ends however this is proving troublesome.
resample = realdf.resample('d').mean()
Any help would be much appreciated.

TSQL: Returning customers by date descending?

Each time my user looks up a customer, I store the customer ID, Name and timestamp (timestamp = when the user performed the look up).
Kinda like:
ID Name Timestamp
1 CompanyA 2012-10-01 10:00
2 ComapnyB 2012-10-01 10:11
3 CompanyA 2012-10-01 10:22
4 CompanyA 2012-10-01 10:25
4 CompanyC 2012-10-01 10:32
My question is ...
I want to return TOP 30 distinct customers sorted by date descending - how do I do that?
I want to return this:
CompanyC
CompanyA
CompanyB
... only a single instance sorted by the date descending.
SELECT TOP 30 Name
FROM Customer
GROUP BY Name
ORDER BY MAX(Timestamp) DESC