Concatenate multiple row and sum on a specific column - pandas

I want to concatenate multiple row into a single row. I manage to concatenate the row, however, when I try to apply sum based on a specific column, it gave me an error TypeError: can only concatenate str (not "float") to str
Item Sum Brand Type User ID
ABC 5 High Zinc John 20A
CDD 3 Low Iron Bail 10B
ABC 10 High Zinc John 20A
CDD 200 Low Iron Bail 10B
Below is my code:
df = df.groupby(['ID','User','Type','Brand']).agg({'Item':''.join, 'Sum':'sum'}).reset_index()
Desired Output:
Item Sum Brand Type User ID
ABC 15 High Zinc John 20A
CDD 203 Low Iron Bail 10B
Thank You in advance!

df = df.pivot_table(index=['Brand', 'Type', 'User', 'ID'],values=['Sum'], columns=['Item'], aggfunc=sum).stack().reset_index()
Brand Type User ID Item Sum
0 High Zinc John 20A ABC 15.0
1 Low Iron Bail 10B CDD 203.0

Related

Pandas Display Format on a specific column

So I want to display a single column with a currency format. Basically with a dollar sign, thousand comma separators, and two decimal places.
Input:
Invoice Name Amount Tax
0001 Best Buy 1324 .08
0002 Target 1238593.1 .12
0003 Walmart 10.32 .55
Output:
Invoice Name Amount Tax
0001 Best Buy $1,324.00 .08
0002 Target $1,238,593.10 .12
0003 Walmart $10.32 .55
Note: I still want to be able to do calculations on it, so it would only be a display feature.
If you are just format to print out, you can try:
df.apply(lambda x: [f'${y:,}'for y in x] if x.name=='Amount' else x)
which creates a new dataframe that looks like:
Invoice Name Amount Tax
0 1 Best Buy $1,324.0 0.08
1 2 Target $1,238,593.1 0.12
2 3 Walmart $10.32 0.55
You can simply add this line (before printing your data frame of course):
pd.options.display.float_format = '${:,.2f}'.format
it will print your columns in the data frame (but only float columns ) like this :
$12,500.00

Calculate average of non numeric columns in pandas

I have a df "data" as below
Name Quality city
Tom High A
nick Medium B
krish Low A
Jack High A
Kevin High B
Phil Medium B
I want group it by city and a create a new columns based on the column "quality" and calculate avegare as below
city High Medium Low High_Avg Medium_AVG Low_avg
A 2 0 1 66.66 0 33.33
B 1 1 0 50 50 0
I tried with the below script and I know it is completely wrong.
data_average = data_df.groupby(['city'], as_index = False).count()
Get a count of the frequencies, divide the outcome by the sum across columns, and finally concatenate the datframes into one :
result = pd.crosstab(df.city, df.Quality)
averages = result.div(result.sum(1).array, axis=0).mul(100).round(2).add_suffix("_Avg")
#combine the dataframes
pd.concat((result, averages), axis=1)
Quality High Low Medium High_Avg Low_Avg Medium_Avg
city
A 2 1 0 66.67 33.33 0.00
B 1 0 2 33.33 0.00 66.67

In UniQuery, how do you get the count of unique values found while doing a BREAK.ON

I know I can get the counts for how many individual entries are in each unique groups of records with the following.
LIST CUSTOMER BREAK-ON CITY TOTAL EVAL "1" COL.HDG "Customer Count" TOTAL CUR_BALANCE BY CITY
And I end up with something like this.
Cust...... City...... Customer Count Currently Owes
6 Arvada 1 4.54
********** -------------- --------------
Arvada 1 4.54
190 Boulder 1 0.00
1 Boulder 1 13.65
********** -------------- --------------
Boulder 2 13.65
...
============== ==============
TOTAL 29 85.28
29 records listed
Which becomes this, after we suppress the details and focus on the groups themselves.
City...... Customer Count Currently Owes
Arvada 1 4.54
Boulder 2 13.65
Chicago 3 4.50
Denver 6 0.00
...
============== ==============
TOTAL 29 85.28
29 records listed
But can I get a count of how many unique grouping are in the same report? Something like this.
City...... Customer Count Currently Owes City Count
Arvada 1 4.54 1
Boulder 2 13.65 1
Chicago 3 4.50 1
Denver 6 0.00 1
...
============== ============== ==========
TOTAL 29 85.28 17
29 records listed
Essentially, I want the unique value count integrated into the other report so that I don't have to create an extra report just for something so simple.
SELECT CUSTOMER SAVING UNIQUE CITY
17 records selected to list 0.
I swear that this should be easier. I see various # variables in the documentation that hint at the possibility of doing this easily but I have never been about to get one of them to work.
If your data is structured in such a way that your id is what you would be grouping by and the data you want is stored in Value delimited field and you don't want to include or exclude anything you can use something like the following.
In UniVerse using the CUSTOMER table in the demo HS.SALES account installed on many systems, you can do this. The CUSTID is the the record #ID and Attribute 13 is where there PRICE is stored in a Value delimited array.
LIST CUSTOMER BREAK-ON CUSTID TOTAL EVAL "DCOUNT(#RECORD<13>,#VM)" TOTAL PRICE AS P.PRICE BY CUSTID DET.SUP
Which outputs this.
DCOUNT(#RECORD<13>,#
Customer ID VM)................. P.PRICE
1 1 $4,200
2 3 $19,500
3 1 $4,250
4 1 $16,500
5 2 $3,800
6 0 $0
7 2 $5,480
8 2 $12,900
9 0 $0
10 3 $10,390
11 0 $0
12 0 $0
==================== =======
15 $77,020
That is a little juice for a lot of squeeze, but I hope you find it useful.
Good Luck!
Since the system variable #NB is set only on the total lines, this will allow your counter to calculate the number of TOTAL lines, which occur per unique city, excluding the grand total.
LIST CUSTOMER BREAK-ON CITY TOTAL EVAL "IF #NB < 127 THEN 1 ELSE 0" COL.HDG "Customer Count" TOTAL CUR_BALANCE BY CITY
I don't have a system to try this on, but this is my understanding of the variable.

Unexpected results on groupby([]).sum()

n = df1.groupby(['Year', 'State', 'Regulator', 'Industry','Product', 'Count']).sum() # <-- this produces the error
Problem description
[Hi, I think there's a problem dropping/excluding data points with groupby.sum function. I've performed the following code (see above), which at hindsight seemed ok until I compared with the same data using Excel and/or simple plot of the dataset. In addition, removing 'Count' will throw off values on other df columns. Thanks for checking this out.]
Expected Output
Year | 2012
State | Alabama
Regulator | SEC
Insurance/Annuity Products | 2
Stocks | 4
Year | 2012
State | Alabama
Regulator | FDIC
Debit Card | 1
Residential Mortgage | 3
Output of pd.df
Year | 2012
State | Alabama
Regulator | FDIC
Debit Card | 1
Residential Mortgage | 1
Problem solved. I've had ran the code including and excluding the Column ['Count'] from the code which gave me a mix of good and bad results. For some reason the CSV wasn't being read correctly if that makes any sense. Column ['Count'] was dtypes int, but it seems was being read as string. So i did a .apply(pd.to_numeric), removed 'Count' and re-ran the cell which solved the issue.
Here's the final code for groupby/sum:
n = df1.groupby(['Year', 'State', 'Regulator', 'Industry','Product'])['Count'].sum()

vba sum value in column except one

I have a matrix in microsoft reports. It looks:
Product | Sold
apple | 1000
melon | 200
banana | 500
orange | 2000
sum(without orange) | x
sum | 3700
How to write expression in vba to sum all values without orange? Number of rows with fruits can be different so i cant use static index to identify product
=sum(IIf(Fields!Product.Value<>"orange", Fields!Sold.Value, 0))