My data looks like that:
data_dte
Year
Month
usg_apt
Total
01/1990
1990
1
JFK
80
01/1990
1990
1
MIA
100
01/1990
1990
1
ORD
58
I want to have a yearly total for each "usg_apt" instead of monthly.
"usg_apt" stands for "US Gateway Airport Code".
Assuming that the dataframe in question is called df, maybe you could try df.groupby(["usg_apt", "Year"])['Total'].agg('sum').
Related
I'm working on PySpark and I have long format dataframe like this:
KPI
GROUP
TIME
VALUE
Sales
A
Before
100
Sales
A
After
135
Sales
B
Before
90
Sales
B
After
98
Revenue
A
Before
10
Revenue
A
After
12
Revenue
B
Before
5
Revenue
B
After
8
And what I expect to have is something like this:
KPI
GROUP
BEFORE
AFTER
Sales
A
100
135
Sales
B
90
98
Revenue
A
10
12
Revenue
B
5
8
Just pivot
df1.groupBy('KPI' ,'GROUP').pivot('TIME').agg(first('VALUE')).show()
My dataset consists of a date column in 'datetime64[ns]' dtype; it also has a price and a no. of sales column.
I want to calculate the monthly VWAP (Volume Weighted Average Price ) of the stock.
( VWAP = sum(price*no.of sales)/sum(no. of sales) )
What I applied is:-
created a new dataframe column of month and year using pandas functions.
Now, I want monthly VWAP from this dataset which I modified, also, it should be distinct by year.
For eg. - March,2016 and March,2017 should have their seperate VWAP monthly values.
Start from defining a function to count vwap for the current
month (group of rows):
def vwap(grp):
return (grp.price * grp.salesNo).sum() / grp.salesNo.sum()
Then apply it to monthly groups:
df.groupby(df.dat.dt.to_period('M')).apply(vwap)
Using the following test DataFrame:
dat price salesNo
0 2018-05-14 120.5 10
1 2018-05-16 80.0 22
2 2018-05-20 30.2 12
3 2018-08-10 75.1 41
4 2018-08-20 92.3 18
5 2019-05-10 10.0 33
6 2019-05-20 20.0 41
(containing data from the same months in different years), I got:
dat
2018-05 75.622727
2018-08 80.347458
2019-05 15.540541
Freq: M, dtype: float64
As you can see, the result contains separate entries for May in both
years from the source data.
I made data frame shown below which has 3 companies A,B and C. Companies buy certain amount of vouchers during 2016 to 2018 period. Some days eg. Company A buys 100 pieces for $3000 other days no company buys any.
I'd like see how these three companies compare for last two years when it comes to money spent for vouchers so my ideas was following:
Sum all money spent each month for each company, and plot them in bar graph or just standard line - so three lines each with different color.
Since its 2 years of data, there would be roughly 24 date-points on x axis
I tried something like: plt.bar(A['Datetime'], A['PaidTotal'])
But get: ufunc subtract cannot use operands with types dtype('
But this is just for one company anyway not for all 3 in one graph
(I can sort those dates, thats not a problem)
Company Name PaidTotal Datetime
585 CompanyA 218916.0 2016-10-14 10:51:07
586 CompanyB 430000.0 2017-01-23 11:05:08
591 CompanyB 546217.0 2016-09-26 14:20:00
592 CompanyC 73780.0 2016-12-07 07:52:01
593 CompanyA 132720.0 2016-10-04 16:14:10
595 CompanyC 52065.0 2016-11-12 14:32:40
For a bar chart you can call df.groupby('Company Name')['PaidTotal'].sum().plot.bar():
To see a line chart of all three over time, you can try this (the axes are wrong, but this is the general idea):
sums = df.groupby(['Company Name', 'Datetime'])['PaidTotal'].sum().reset_index(level=0)
for company in sums['Company Name'].unique():
sums[sums['Company Name'] == company]['PaidTotal'].plot();
I have a series of events with a Start and End date, with a Value and a series of other attributes.
Country --- Location --- Start-Date --- End-Date --- Value per day
Italy Rome 2018-01-01 2018-03-15 50
Belgium BXL 2017-12-04 2017-12-6 120
Italy Milan 2018-03-17 2018-04-12 80
I want to convert this, in DAX, to a monthly time-series like:
Country --- Location --- Month --- Value per day
Italy Rome 2018-01 50
Italy Rome 2018-02 50
Italy Rome 2018-03 22.58 (= 50 /31*(31-17) days)
The value is a weighted average of industrial capacity.
I have done this with a CROSS JOIN with the Calendar table, but this is quite heavy and requires to calculate each possible value, while a calculation on-the-fly would be likely faster.
Any help?
Many thanks
DAX would be similar to:
Total =
Var DayDiff =
SUMMARIZE(Table1,Appointments[End-Date],
"DayDiff",DATEDIFF(min(Table1[Start-Date]),MAX(Table1[End-Date]),DAY)
)
RETURN
sumx(DayDiff,[DayDiff])
You do not have to use the Country, Location and Month in the filter (in above dax) as they will be available in filter context
you use (for e.g. PivotTable).
Please paste sample rows if this do not work.
http://www.geocities.com/colinpriley/sql/sqlitepg09.htm has a nice technique for creating a tabular report where the column names for the table can be coded in the query but in my case, the columns should be values from the database. Say I have daily sales figures like:
Transaction Date Rep Product Amount
1 July 1 Bob A12 $10
2 July 2 Bob B24 $12
3 July 2 Ted A12 $25
...
and I want a weekly summary report that shows how much of each product each rep sold:
A12 B24
Bob $10 $12
Ted $25 $0
My column names come from the Product column. Say, any product that has a row in the specified date range should have a column in the report. But other products -- which weren't sold in that time frame -- should not have a column of all 0s. How can I do that? Bonus points if it works in SQLite.
TIA.
http://weblogs.asp.net/wallen/archive/2005/02/18/376150.aspx has a good way to extract columns