I have a small issue trying to do a simple pivot with pandas. I have on one column some values that are entered more than once with a different value in a second column and a year on a third column. What i want to do is get a sum of the second column for the year, using as rows the values on the first column.
import pandas as pd
year = 2022
base = pd.read_csv("Database.csv")
raw_monthly = pd.read_csv("Monthly.csv")
raw_types = pd.read_csv("types.csv")
monthly = raw_monthly.assign(Year= year)
ty= raw_types[['cparty', 'sales']]
typ= sec.rename(columns={"Sales": "sales"})
type= typ.assign(Year=year)
fin = pd.concat([base, monthly, type])
fin.drop(fin.tail(1).index,inplace=True)
currentYear = fin.loc[fin['Year'] == 2022]
final = pd.pivot_table(currentYear, index=['cparty', 'sales'], values='sales', aggfunc='sum')
With the above, I am getting this result, but what i want is to have
the 2 sales values of '3' for 2022 summed in a single value so later i can also break it down by year. Any help appreciated!
Edit: The issue seems to come from the fact that the 3 csvs are concatenated into a single dataframe. Doing the 3->1 CSV conversion manually in excel and then trying to use the Groupby answer works as intended, but it does not work if i try to automatically make the 3 CVS to 1 using the
fin = pd.concat([base, monthly, type])
The 3 csvs look like this.
Base looks like this:
cparty sales year
0 50969 -146602.14 2016
1 51056 -104626.62 2016
2 51129 -101742.99 2016
3 51036 -81801.84 2016
4 51649 -35992.60 2016
monthly looks like this, missing the year
cparty sales
0 818243 -330,052.47
1 82827 -178,630.85
2 508637 -156,369.87
3 29253 -104,028.30
4 596037 -95,312.07
type is like this.
cparty sales
0 582454 -16,056.46
1 597321 24,336.16
2 567172 20,736.78
3 614070 18,590.45
4 5601295 -3,661.46
What i am attempting to do is add a new column for the last 2 to have the Year set as 2022, so that later i can do the groupby per year. When i try to concat the 3 csvs, it breaks down.
Suppose cparty is a categorical metric
# create sales and retail dataframes with year
df = pd.DataFrame({
'year':[2022, 2022, 2018, 2019, 2020, 2021, 2022, 2022, 2022, 2021, 2019, 2018],
'cparty':['cparty1', 'cparty1', 'cparty1', 'cparty2', 'cparty2', 'cparty2', 'cparty2', 'cparty3', 'cparty4', 'cparty4', 'cparty4', 'cparty4'],
'sales':[230, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 1100]
})
df
###
year cparty sales
0 2022 cparty1 230
1 2022 cparty1 100
2 2018 cparty1 200
3 2019 cparty2 300
4 2020 cparty2 400
5 2021 cparty2 500
6 2022 cparty2 600
7 2022 cparty3 700
8 2022 cparty4 800
9 2021 cparty4 900
10 2019 cparty4 1000
11 2018 cparty4 1100
output = df.groupby(['year','cparty']).sum()
output
###
sales
year cparty
2018 cparty1 200
cparty4 1100
2019 cparty2 300
cparty4 1000
2020 cparty2 400
2021 cparty2 500
cparty4 900
2022 cparty1 330
cparty2 600
cparty3 700
cparty4 800
Filter by year
final = output.query('year == 2022')
final
###
sales
year cparty
2022 cparty1 330
cparty2 600
cparty3 700
cparty4 800
Have figured out the issue.
result = res.groupby(['Year', 'cparty']).sum()
output = result.query('Year == 2022')
output
##
sales
Year cparty
2022 3 -20409.04
4 12064.34
5 9656.64
8081 51588.55
8099 5625.22
... ...
Baron's groupby method was the way to go. The issue is that it only works if I have all the data in 1 csv from the beginning. I was trying to add the year manually for the 2 new csv that i concat to the base, setting Year = 2022. The errors come when i concat the 3 different CSVs. If i don't add the year = 2022 it works giving this:
cparty sales Year
87174 3 -3.89 2022.0
27 3 -20,405.15 NaN
If i do .fillna(2022) then it won't work as expected.
C:\Users\user\AppData\Local\Temp/ipykernel_14252/1015456002.py:32: FutureWarning: Dropping invalid columns in DataFrameGroupBy.add is deprecated. In a future version, a TypeError will be raised. Before calling .add, select only columns which should be valid for the function.
result = fin.groupby(['Year', 'cparty']).sum()
cparty sales Year
87174 3 -3.89 2022.0
27 3 -20,405.15 2022.0
Adding the year but not doing the sum to have 'cparty' 3, 'sales' -20,409.04, Year 2022.
Any feedback appreciated.
Could you help me out with an issue I have in Oracle?
Let's say I have a table that tells me about how many items were sold in each month, and looks like so:
Item
January
February
March
April
Computer
3
5
2
9
TV
10
12
16
14
Camera
22
25
20
27
What I need in the output is a table that would count the total number of items sold over the period, and would look like this:
Item
January
February
March
April
Total
Computer
3
5
2
9
19
TV
10
12
16
14
52
Camera
22
25
20
27
94
I am honestly not sure how to do that. Should I use grouping()?
Thank you in advance.
You don't need to use grouping at all just try to plus all columns as a new column Total.
SELECT T.*,
(January + February + March + April) Total
FROM T
i have a hard time figuring out how to compare the same period (e.g. iso week 48) from different years for a certain metric in different columns. I am new to SQL and haven't fully understand how PARTITION BY works but guess that i'll need it for my desired output.
How can i sum the data from column "metric" and compare same periods of different date ranges (e.g. YEAR) in a table?
current table
date iso_week iso_year metric
2021-12-01 48 2021 1000
2021-11-30 48 2021 850
...
2020-11-28 48 2020 800
2020-11-27 48 2020 950
...
2019-11-27 48 2019 700
2019-11-26 48 2019 820
desired output
iso_week metric_thisYear metric_prevYear metric_prev2Year
48 1850 1750 1520
...
Consider below simple approach
select * from (
select * except(date)
from your_table
)
pivot (sum(metric) as metric for iso_year in (2021, 2020, 2019))
if applied to sample data in your question - output is
I need to have date sorting with the partial dates. I have a table with the following columns.
Day Month Year
-- ---- -----
NULL 03 1990
26 10 1856
03 07 Null
31 NULL 2018
NULL NULL NULL
I have a grid in which One of the column is Date where I am combining the above three columns and displays the dates.
Now I want sorting on this date column in the grid. The sort order of the dates should be like following :
[blank date]
22 [day]
March
April 12
May
July 29
August
September
September 14
October
1948
October 1948
October 1 1948
July 1976
1977
July 1977
July 23 1977
December 1981
December 29 1981
I have tried various ways to achieve this. But I am not able to get the desired result. Following are some of the ways I have applied.
I have tried sorting by creating the stored procedure in which I am creating the whole date by combining 3 columns and converting them in standard date formats and comparing the values. I have also tried by creating the computed property in the model and sorting them accordingly.
How can I do this in SQL?
I think you could do:
order by coalesce(year, '0000'), coalesce(month, '00'), coalesce(day, '')
You can be more explicit, but this puts the NULL values before the other values in the column.
Note: This uses the SQL standard operator for string concatenation. Not all databases support this, so you might need to tweak the code for your database.
On my dataset I select information from four different years sorted by date and how many subscriptions I had on said date, which looks something like this:
Date Year Subs Day
15/09/2014 2015 57 1
16/09/2014 2015 18 2
17/09/2014 2015 16 3
14/09/2015 2016 10 1
15/09/2015 2016 45 2
16/09/2015 2016 28 3
12/09/2016 2017 32 1
13/09/2016 2017 11 2
14/09/2016 2017 68 3
24/08/2017 2018 23 1
25/08/2017 2018 53 2
26/08/2017 2018 13 3
What I'm trying to do is create an 'Year' Column Group to align them horizontally, but when I do that, this is the result:
result
Expected result:
expected result
Is this achievable in SSRS? I've tried removing the group =(Details), which gives me the desired result, except it only returns one line of information.
Any insight aprreciated.
By default, the Details group causes you to get one row per row in the dataset. In your case, I would suggest grouping the Rows by the Day column and create a column group by Year.
First, create the two groups and add columns inside the column group.
Then, add a row outside and above the Day row group. Place the headings here and then delete the top row. It should look like this:
Now these 4 columns will repeat to the right for each year and you will get rows based on the number of days in your dataset.