So I want to display a single column with a currency format. Basically with a dollar sign, thousand comma separators, and two decimal places.
Input:
Invoice Name Amount Tax
0001 Best Buy 1324 .08
0002 Target 1238593.1 .12
0003 Walmart 10.32 .55
Output:
Invoice Name Amount Tax
0001 Best Buy $1,324.00 .08
0002 Target $1,238,593.10 .12
0003 Walmart $10.32 .55
Note: I still want to be able to do calculations on it, so it would only be a display feature.
If you are just format to print out, you can try:
df.apply(lambda x: [f'${y:,}'for y in x] if x.name=='Amount' else x)
which creates a new dataframe that looks like:
Invoice Name Amount Tax
0 1 Best Buy $1,324.0 0.08
1 2 Target $1,238,593.1 0.12
2 3 Walmart $10.32 0.55
You can simply add this line (before printing your data frame of course):
pd.options.display.float_format = '${:,.2f}'.format
it will print your columns in the data frame (but only float columns ) like this :
$12,500.00
Related
I am reading a gzip file and converting it to a Dataframe by the below method
df = pd.read_csv(file.gz, compression='gzip', header=0, sep=',', quotechar='"', error_bad_lines=False)
This actually populates the first row as column header. As the data in the gzip varies every time the column header also changes.Also there is no fixed column count it also differs as per file as below .
File 1
01-10-2019 Samsung Owned
-----------------------------
01-10-2019 Samsung Owned
03-10-2019 Motorolla Sold
File 2
SAMSUNG Walmart DHL 300$ Sold Alaska
--------------------------------------------------
SAMSUNG Walmart DHL 300$ Sold Alaska
Sony Motorolla Fedex 250$ Sold Chicago
For me to do some data manipulation it would be great if I have a fixed column as 1,2,3 based on the no of columns the dataframe has like
File 1
1 2 3
-----------------------------
01-10-2019 Samsung Owned
03-10-2019 Sony Sold
File 2
1 2 3 4 5 6
--------------------------------------------------
SAMSUNG Walmart DHL 300$ Sold Alaska
Sony Motorolla Fedex 250$ Sold Chicago
If I understood you correctly, you don't want to read the header from the csv file.
That can be done using header=None.
If your csv file contain a header that you want to ignore, then you can also add skiprows=1.
df = pd.read_csv(file.gz, compression='gzip', header=None, sep=',', quotechar='"', error_bad_lines=False)
I want to concatenate multiple row into a single row. I manage to concatenate the row, however, when I try to apply sum based on a specific column, it gave me an error TypeError: can only concatenate str (not "float") to str
Item Sum Brand Type User ID
ABC 5 High Zinc John 20A
CDD 3 Low Iron Bail 10B
ABC 10 High Zinc John 20A
CDD 200 Low Iron Bail 10B
Below is my code:
df = df.groupby(['ID','User','Type','Brand']).agg({'Item':''.join, 'Sum':'sum'}).reset_index()
Desired Output:
Item Sum Brand Type User ID
ABC 15 High Zinc John 20A
CDD 203 Low Iron Bail 10B
Thank You in advance!
df = df.pivot_table(index=['Brand', 'Type', 'User', 'ID'],values=['Sum'], columns=['Item'], aggfunc=sum).stack().reset_index()
Brand Type User ID Item Sum
0 High Zinc John 20A ABC 15.0
1 Low Iron Bail 10B CDD 203.0
n = df1.groupby(['Year', 'State', 'Regulator', 'Industry','Product', 'Count']).sum() # <-- this produces the error
Problem description
[Hi, I think there's a problem dropping/excluding data points with groupby.sum function. I've performed the following code (see above), which at hindsight seemed ok until I compared with the same data using Excel and/or simple plot of the dataset. In addition, removing 'Count' will throw off values on other df columns. Thanks for checking this out.]
Expected Output
Year | 2012
State | Alabama
Regulator | SEC
Insurance/Annuity Products | 2
Stocks | 4
Year | 2012
State | Alabama
Regulator | FDIC
Debit Card | 1
Residential Mortgage | 3
Output of pd.df
Year | 2012
State | Alabama
Regulator | FDIC
Debit Card | 1
Residential Mortgage | 1
Problem solved. I've had ran the code including and excluding the Column ['Count'] from the code which gave me a mix of good and bad results. For some reason the CSV wasn't being read correctly if that makes any sense. Column ['Count'] was dtypes int, but it seems was being read as string. So i did a .apply(pd.to_numeric), removed 'Count' and re-ran the cell which solved the issue.
Here's the final code for groupby/sum:
n = df1.groupby(['Year', 'State', 'Regulator', 'Industry','Product'])['Count'].sum()
I find myself struggling with data import for further nMDS and Bioenv analysis with "vegan" and "ggplot2". I have a data frame "Taxa" that looks like this (the values are there to mean it is "numeric". —
head(Taxa)
X1 Station1 Stations1_2 Stations1_3 ...
Species1 123 456 789
Species2 123 456 789
Species3 123 456 789
...
After I transpose my data to have the stations (observations) as rows
Taxa <- t(Taxa)
X_1 Species1 Species2 Species3 ...
Station1 123 456 789
Species1_2 123 456 789
Species1_3 123 456 789
...
Now if I check how the data has been transposed I see that it has been converted into a "matrix"
class(Taxa)
[1] "matrix"
Now I can change again the matrix into a data frame
Taxa.df <- data.frame(Taxa)
And what I get then is the following:
head(Taxa.df)
X1 X2 X3
X_1 Species1 Species2 Species3 ...
Station1 123 456 789
Species1_2 123 456 789
Species1_3 123 456 789
...
Now what I would need is to get the first row to become the columns header so that I can restore the initial structure
colnames(Taxa.df)=Taxa.df[1,]
When I do this this happens to the data frame
23 10 16 ....
X_1 Species1 Species2 Species3 ...
Station1 123 456 789
Species1_2 123 456 789
Species1_3 123 456 789
...
I don't manage to get to have the first row as header.
If I can't do this I can't run the transformation I need and all the stats analysis I still need to run. I spent the whole day simply trying to import the data from xlsx on Rstudio for Mac and solve this issue. I hope you guys can help. I did already look around a lot and mostly thought to have found these two links as useful answers, but nothing solved my exact problem.
http://r.789695.n4.nabble.com/Transposing-Data-Frame-does-not-return-numeric-entries-td852889.html
Why does the transpose function change numeric to character in R?
The first variable in your data frame was X1 with values Species1 etc. You should have read your data so that the first variable is numeric (Species1) which you can achieve with argument row.names=1 in the read.* command. Alternatively, you can only transpose the numeric data and then label the rows and columns with the original data. The following may work
mat <- t(Taxa[,-1]) # remove col 1
colnames(mat) <- rownames(Taxa)
mat <- as.data.frame(mat)
However, I think you have not posted the actual output of your R commands, but written by hand the things you think are essential to the structure. So it may be that your data are different than you display, and you may also have non-numeric rows. Just check sum(Taxa) which is number if your data are numeric, and sum(Taxa[,-1]) which is a number if removing the first column is sufficient, and summary(Taxa) which gives Mean and Median for columns which are all numeric (including first row).
I have the following table on SQL:
Category | Requests
Cat1 | 150
Cat2 | 200
Cat3 | 550
Cat4 | 100
Cat5 | 50
SUM | 1050
How can create an expression to calculate the percentage of Cat5 compared to the total? (4.7% in this case).
Try this:
=Lookup("Cat5",Fields!Category.Value,Fields!Requests.Value,"DataSetName")/
Sum(Fields!Requests.Value,"DataSetName")
Replace "DataSetName" by the actual name of your dataset.
Assuming you want 150 to represent 150% within the rdl you can do the following:
first apply the following formula: =Fields!field.Value/100
Where Fields!field.Value is the field you want to convert to percentage so if your field is called Requests then you will have =Fields!Requests.Value/100
Then you need to change the type of the textbox to be percentage from the TextboxProperties
you should get a result like this: