Replacing Pandas Dataframe Header with Date column but in ascending order - pandas

I have a Dataframe of 5k+ rows that looks like this. It has Date column which has Month/Year format. The Date column is in string format.
Name Date Friends
A June 2017 100
A April 2017 45
A March 2016 180
B June 2017 43
B April 2017 23
B March 2016 23
C June 2017 64
C April 2017 643
C March 2016 344
I want to format in the following way, which makes unique values from Date Column into headers. But in the ascending order according to Month/Year.
Name March 2016 April 2017 June 2017
A 180 45 100
B 23 23 43
C 344 643 64
I tried using the Pandas function - Pivot.
df=df.pivot(index='Name',columns='Date',values='Friends')
But this doesn't sort the month/year in ascending order but instead it does in alphabetically order. Also Pivot transforms the dataframe in Stacked format.
Any ideas on how to achieve the desired format?

Something like this,
df['Date']=pd.to_datetime(df['Date'])
df=df.sort_values(['Date'], ascending=False)
df.groupby(['Name', 'Date'], sort=False)['Friends'].sum().unstack('Date')

Related

Pivot table with Pandas

I have a small issue trying to do a simple pivot with pandas. I have on one column some values that are entered more than once with a different value in a second column and a year on a third column. What i want to do is get a sum of the second column for the year, using as rows the values on the first column.
import pandas as pd
year = 2022
base = pd.read_csv("Database.csv")
raw_monthly = pd.read_csv("Monthly.csv")
raw_types = pd.read_csv("types.csv")
monthly = raw_monthly.assign(Year= year)
ty= raw_types[['cparty', 'sales']]
typ= sec.rename(columns={"Sales": "sales"})
type= typ.assign(Year=year)
fin = pd.concat([base, monthly, type])
fin.drop(fin.tail(1).index,inplace=True)
currentYear = fin.loc[fin['Year'] == 2022]
final = pd.pivot_table(currentYear, index=['cparty', 'sales'], values='sales', aggfunc='sum')
With the above, I am getting this result, but what i want is to have
the 2 sales values of '3' for 2022 summed in a single value so later i can also break it down by year. Any help appreciated!
Edit: The issue seems to come from the fact that the 3 csvs are concatenated into a single dataframe. Doing the 3->1 CSV conversion manually in excel and then trying to use the Groupby answer works as intended, but it does not work if i try to automatically make the 3 CVS to 1 using the
fin = pd.concat([base, monthly, type])
The 3 csvs look like this.
Base looks like this:
cparty sales year
0 50969 -146602.14 2016
1 51056 -104626.62 2016
2 51129 -101742.99 2016
3 51036 -81801.84 2016
4 51649 -35992.60 2016
monthly looks like this, missing the year
cparty sales
0 818243 -330,052.47
1 82827 -178,630.85
2 508637 -156,369.87
3 29253 -104,028.30
4 596037 -95,312.07
type is like this.
cparty sales
0 582454 -16,056.46
1 597321 24,336.16
2 567172 20,736.78
3 614070 18,590.45
4 5601295 -3,661.46
What i am attempting to do is add a new column for the last 2 to have the Year set as 2022, so that later i can do the groupby per year. When i try to concat the 3 csvs, it breaks down.
Suppose cparty is a categorical metric
# create sales and retail dataframes with year
df = pd.DataFrame({
'year':[2022, 2022, 2018, 2019, 2020, 2021, 2022, 2022, 2022, 2021, 2019, 2018],
'cparty':['cparty1', 'cparty1', 'cparty1', 'cparty2', 'cparty2', 'cparty2', 'cparty2', 'cparty3', 'cparty4', 'cparty4', 'cparty4', 'cparty4'],
'sales':[230, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 1100]
})
df
###
year cparty sales
0 2022 cparty1 230
1 2022 cparty1 100
2 2018 cparty1 200
3 2019 cparty2 300
4 2020 cparty2 400
5 2021 cparty2 500
6 2022 cparty2 600
7 2022 cparty3 700
8 2022 cparty4 800
9 2021 cparty4 900
10 2019 cparty4 1000
11 2018 cparty4 1100
output = df.groupby(['year','cparty']).sum()
output
###
sales
year cparty
2018 cparty1 200
cparty4 1100
2019 cparty2 300
cparty4 1000
2020 cparty2 400
2021 cparty2 500
cparty4 900
2022 cparty1 330
cparty2 600
cparty3 700
cparty4 800
Filter by year
final = output.query('year == 2022')
final
###
sales
year cparty
2022 cparty1 330
cparty2 600
cparty3 700
cparty4 800
Have figured out the issue.
result = res.groupby(['Year', 'cparty']).sum()
output = result.query('Year == 2022')
output
##
sales
Year cparty
2022 3 -20409.04
4 12064.34
5 9656.64
8081 51588.55
8099 5625.22
... ...
Baron's groupby method was the way to go. The issue is that it only works if I have all the data in 1 csv from the beginning. I was trying to add the year manually for the 2 new csv that i concat to the base, setting Year = 2022. The errors come when i concat the 3 different CSVs. If i don't add the year = 2022 it works giving this:
cparty sales Year
87174 3 -3.89 2022.0
27 3 -20,405.15 NaN
If i do .fillna(2022) then it won't work as expected.
C:\Users\user\AppData\Local\Temp/ipykernel_14252/1015456002.py:32: FutureWarning: Dropping invalid columns in DataFrameGroupBy.add is deprecated. In a future version, a TypeError will be raised. Before calling .add, select only columns which should be valid for the function.
result = fin.groupby(['Year', 'cparty']).sum()
cparty sales Year
87174 3 -3.89 2022.0
27 3 -20,405.15 2022.0
Adding the year but not doing the sum to have 'cparty' 3, 'sales' -20,409.04, Year 2022.
Any feedback appreciated.

Total of Various Columns as a Separate Column in Oracle

Could you help me out with an issue I have in Oracle?
Let's say I have a table that tells me about how many items were sold in each month, and looks like so:
Item
January
February
March
April
Computer
3
5
2
9
TV
10
12
16
14
Camera
22
25
20
27
What I need in the output is a table that would count the total number of items sold over the period, and would look like this:
Item
January
February
March
April
Total
Computer
3
5
2
9
19
TV
10
12
16
14
52
Camera
22
25
20
27
94
I am honestly not sure how to do that. Should I use grouping()?
Thank you in advance.
You don't need to use grouping at all just try to plus all columns as a new column Total.
SELECT T.*,
(January + February + March + April) Total
FROM T

How to compare same period of different date ranges in columns in BigQuery standard SQL

i have a hard time figuring out how to compare the same period (e.g. iso week 48) from different years for a certain metric in different columns. I am new to SQL and haven't fully understand how PARTITION BY works but guess that i'll need it for my desired output.
How can i sum the data from column "metric" and compare same periods of different date ranges (e.g. YEAR) in a table?
current table
date iso_week iso_year metric
2021-12-01 48 2021 1000
2021-11-30 48 2021 850
...
2020-11-28 48 2020 800
2020-11-27 48 2020 950
...
2019-11-27 48 2019 700
2019-11-26 48 2019 820
desired output
iso_week metric_thisYear metric_prevYear metric_prev2Year
48 1850 1750 1520
...
Consider below simple approach
select * from (
select * except(date)
from your_table
)
pivot (sum(metric) as metric for iso_year in (2021, 2020, 2019))
if applied to sample data in your question - output is

How to manage complex sorting in SQL?

I need to have date sorting with the partial dates. I have a table with the following columns.
Day Month Year
-- ---- -----
NULL 03 1990
26 10 1856
03 07 Null
31 NULL 2018
NULL NULL NULL
I have a grid in which One of the column is Date where I am combining the above three columns and displays the dates.
Now I want sorting on this date column in the grid. The sort order of the dates should be like following :
[blank date]
22 [day]
March
April 12
May
July 29
August
September
September 14
October
1948
October 1948
October 1 1948
July 1976
1977
July 1977
July 23 1977
December 1981
December 29 1981
I have tried various ways to achieve this. But I am not able to get the desired result. Following are some of the ways I have applied.
I have tried sorting by creating the stored procedure in which I am creating the whole date by combining 3 columns and converting them in standard date formats and comparing the values. I have also tried by creating the computed property in the model and sorting them accordingly.
How can I do this in SQL?
I think you could do:
order by coalesce(year, '0000'), coalesce(month, '00'), coalesce(day, '')
You can be more explicit, but this puts the NULL values before the other values in the column.
Note: This uses the SQL standard operator for string concatenation. Not all databases support this, so you might need to tweak the code for your database.

SSRS: Horizontal alignment on a group

On my dataset I select information from four different years sorted by date and how many subscriptions I had on said date, which looks something like this:
Date Year Subs Day
15/09/2014 2015 57 1
16/09/2014 2015 18 2
17/09/2014 2015 16 3
14/09/2015 2016 10 1
15/09/2015 2016 45 2
16/09/2015 2016 28 3
12/09/2016 2017 32 1
13/09/2016 2017 11 2
14/09/2016 2017 68 3
24/08/2017 2018 23 1
25/08/2017 2018 53 2
26/08/2017 2018 13 3
What I'm trying to do is create an 'Year' Column Group to align them horizontally, but when I do that, this is the result:
result
Expected result:
expected result
Is this achievable in SSRS? I've tried removing the group =(Details), which gives me the desired result, except it only returns one line of information.
Any insight aprreciated.
By default, the Details group causes you to get one row per row in the dataset. In your case, I would suggest grouping the Rows by the Day column and create a column group by Year.
First, create the two groups and add columns inside the column group.
Then, add a row outside and above the Day row group. Place the headings here and then delete the top row. It should look like this:
Now these 4 columns will repeat to the right for each year and you will get rows based on the number of days in your dataset.