Pyspark - Divide the customers into income buckets - dataframe

How to divide customers into income buckets using pyspark?
Assume I've 2 columns named Customer and Income having range 10K to 50K.
Now I want to Divide the customers into income buckets 10K -20K, 20K-30K etc using pyspark Data frame.
TIA!

Related

SQL Weighted Average - Population & Income

I am trying to find the average household income while using a weighted average. I have a data source of ZIP codes with the total population and the average household income. I want to be able to select multiple ZIP codes and still pull an accurate average household income.
Can I use SQL to pull a weighted average like this?
ZIP
TOTAL_POP
AVG_HH_INC
12345
130350
66000
54321
55750
78000
44668
17300
89000
If you want the overall average, then use arithmetic:
select sum(total_pop * avg_hh_inc) / sunm(total_pop)
from t;
Note: If the values are stored as 4-byte integers, then this runs the risk of overflow. Just use a different numeric representation if that is an issue.

Pivot Table Calculated Field to Avg the Sums totaled

Need this to average = 7.41
But this happens when I use AVG
I am trying to calculate the daily average of each employee based on the number of days worked.
I already used a pivot table to calculate the daily total hours of each employee per day but I cannot figure out how to get the pivot table to display the average work day. When I alter the field settings, it averages the source data which I do not need.
The employees worked a different # of workdays, so I need the average function to calculate based on the # of instances for each employee. When I highlight the first employees data, Excel returns an average of 7.41.
Further down the list though, there are employees with 0.00 hours for a date that is beiong calculated into the averages.
How do I get this pivot table to give me a true snapshot of the persons daily hours worked - without having to manually delete 0.00 hour instances in the source data?

Calculating the rolling exponential weighted moving average for each share price over time

This question is similar to my previous one: Shifting elements of column based on index given condition on another column
I have a dataframe (df) with 2 columns and 1 index.
Index is datetime index and is in format of 2001-01-30 .... etc and the index is ordered by DATE and there are thousands of identical dates (and is monthly dates). Column A is company name (which corresponds to the date), Column B are share prices for the company names in column A for the date in the Index.
Now there are multiple companies in Column A for each date, and companies do vary over time (so the data is not predictable fully).
I want to create a column C which has the 3 day rolling exponential weighting average of the price for a particular company using the current and 2 dates before for a particular company in column A.
I have tried a few methods but have failed. Thanks.
Try:
df.groupby('ColumnA', as_index=False).apply(lambda g: g.ColumnB.ewm(3).mean())

Powerpivot sum from dimension table

I am a graduate intern at a big company and I'm having some trouble with creating a measure in PowerPivot.
I'm quite new with PowerPivot and I need some help. I am the first person to use PowerPivot in this office so I can't ask for help here.
I have a fact table that has basically all journal entries. See next table. All entries are done with a unique ID (serialnumber) for every product
ID DATE ACCOUNT# AMOUNT
110 2010-1-1 900 $1000
There is a dimension table with has all accounts allocated to a specific country and expense or revenue.
ACCOUNT# Expense Country
900 Revenue Germany
And another dimension table to split the dates.
The third dimension table contains product information, but also contains a column with a certain expense (Expense X).
ID Expense X ProductName Productcolour
110 $50 Flower Green
I made sure I made the correct relations between the tables of course. And slicing works in general.
To calculate the margin I need to deduct this expense x from the revenue. I already made a measure that shows total Revenue, that one was easy.
Now I need a measure to show the total for Expense X, related to productID. So I can slice in a pivot table on date and product name etc.
The problem is that I can't use RELATED function because the serial number is used multiple times in the fact table (journal entries can have the same serial number)
And if I use the SUM or CALCULATE function it won't slice properly.
So how can I calculate the total for expense X so it will slice properly?
Check the function RELATEDTABLE.
If you create a dummy dataset I can play around and send you a solution.

How to calculate the number of days from two dates in a table and store it in another field of the same table in Access 2013

I am making a basic hospital management system in Access 2013.I have two tables named "Bed" and "Receipt".
Bed(BedID,AssignedDate,PatientID,DischargeDate,BedCharges)
Reciept(ReceiptID,PatientID,BedCharges)
I want to calculate "BedCharges" by calculating the number of days using "AssignedDate" and "DischargeDate" and then multiplying with a constant amount of charges per day.
Also the BedCharges calculated in "Bed" Table also needs to be in the "Receipt" table.
How can I count the number of days and then calculate the "BedCharges" in both the tables?