pivoting and multi-indexing - pandas

How do I transform a pandas dataframe that is in the following format:
Index Code Year Week Count
0 AE 2005 1 0
1 AE 2005 2 0
2 AE 2005 3 2
3 AE 2005 4 0
4 AE 2005 5 0
.....
51 AE 2005 52 1
52 AE 2006 1 3
53 AE 2006 2 0
54 AE 2006 3 1
55 AE 2006 4 0
56 AE 2006 5 0
.....
102 AE 2006 52 1
103 AU 2005 1 0
104 AU 2005 2 0
105 AU 2005 3 2
106 AU 2005 4 0
107 AU 2005 5 0
.....
153 AU 2005 52 1
154 AU 2006 1 3
155 AU 2006 2 0
156 AU 2006 3 1
157 AU 2006 4 0
158 AU 2006 5 0
.....
203 AU 2006 52 1
There are multiple codes, multiple years, 52 weeks entries for each year and a count value for each week
The required format:
Year 2005 2006
Week 1 2 3 ... 52 1 2 3 ...
Code
AE 0 0 2 ... 0 0 1 2 ...
AU
...
ZC
I have tried looking up different solutions here as well as have tried pivot, pivot_table, combinations of stack and unstack, but haven't been able to workout the solution :(.

You can do with set_index and stack
df.set_index(['Code','Year','Week']).Count.unstack([1,2])
Year 2005
Week 1 2 3 4 5
Code
AE 0 0 2 0 0

Related

Pandas - creating new column based on data from other records

I have a pandas dataframe which has the folowing columns -
Day, Month, Year, City, Temperature.
I would like to have a new column that has the average (mean) temperature in same date (day\month) of all previous years.
Can someone please assist?
Thanks :-)
Try:
dti = pd.date_range('2000-1-1', '2021-12-1', freq='D')
temp = np.random.randint(10, 20, len(dti))
df = pd.DataFrame({'Day': dti.day, 'Month': dti.month, 'Year': dti.year,
'City': 'Nice', 'Temperature': temp})
out = df.set_index('Year').groupby(['City', 'Month', 'Day']) \
.expanding()['Temperature'].mean().reset_index()
Output:
>>> out
Day Month Year City Temperature
0 1 1 2000 Nice 12.000000
1 1 1 2001 Nice 12.000000
2 1 1 2002 Nice 11.333333
3 1 1 2003 Nice 12.250000
4 1 1 2004 Nice 11.800000
... ... ... ... ... ...
8001 31 12 2016 Nice 15.647059
8002 31 12 2017 Nice 15.555556
8003 31 12 2018 Nice 15.631579
8004 31 12 2019 Nice 15.750000
8005 31 12 2020 Nice 15.666667
[8006 rows x 5 columns]
Focus on 1st January of the dataset:
>>> df[df['Day'].eq(1) & df['Month'].eq(1)]
Day Month Year City Temperature # Mean
0 1 1 2000 Nice 12 # 12
366 1 1 2001 Nice 12 # 12
731 1 1 2002 Nice 10 # 11.33
1096 1 1 2003 Nice 15 # 12.25
1461 1 1 2004 Nice 10 # 11.80
1827 1 1 2005 Nice 12 # and so on
2192 1 1 2006 Nice 17
2557 1 1 2007 Nice 16
2922 1 1 2008 Nice 19
3288 1 1 2009 Nice 12
3653 1 1 2010 Nice 10
4018 1 1 2011 Nice 16
4383 1 1 2012 Nice 13
4749 1 1 2013 Nice 15
5114 1 1 2014 Nice 14
5479 1 1 2015 Nice 13
5844 1 1 2016 Nice 15
6210 1 1 2017 Nice 13
6575 1 1 2018 Nice 15
6940 1 1 2019 Nice 18
7305 1 1 2020 Nice 11
7671 1 1 2021 Nice 14

How to create one variable conditional on other variables in R

I have a very big data-frame like the following:
Region year prate
1 2005 24
1 2006 17
1 2007 56
2 2005 13
2 2006 65
2 2007 43
3 2005 91
3 2006 65
3 2007 12
.....
I want to create a new variable called prate07 in which the variable for year 2007 is the value of prate in that year and the value for other years is 0. Something like the following:
Region year prate prate07
1 2005 24 0
1 2006 17 0
1 2007 56 56
2 2005 13 0
2 2006 65 0
2 2007 43 43
3 2005 91 0
3 2006 65 0
3 2007 12 12
.....
May someone please help me to find the code for it?
Thanks for the help in advance
I used the following code, but it does not work:
library(tidyverse)
dat2 <- dat %>%
mutate(group2 = str_c("p_rate", year), prate07 = prate) %>%
spread(group2, prate07, fill = 0)

Oracle - Group By Creating Duplicate Rows

I have a query that looks like this:
select nvl(trim(a.code), 'Blanks') as Ward, count(b.apcasekey) as UNSP, count(c.apcasekey) as GRAPH,
count(d.apcasekey) as "ANI/PIG",
(count(b.apcasekey) + count(c.apcasekey) + count(d.apcasekey)) as "TOTAL ACTIVE",
count(a.apcasekey) as "TOTAL OPEN" from (etc...)
group by a.code
order by Ward
The reason I have nvl(trim(a.code), 'Blanks') as Ward is that sometimes a.code is a blank string, sometimes it's a null.
The problem is that when I use the Group By statement, I can't use Ward or I get the error
Ward: Invalid Identifier
I can only use a.code so I get 2 rows for 'Blanks', as per below
1 Blanks 7 0 0 7 7
2 Blanks 23 1 1 25 30
3 W01 75 4 0 79 91
4 W02 62 1 0 63 72
5 W03 140 2 0 142 162
6 W04 6 1 0 7 7
7 W05 46 0 1 47 48
8 W06 322 46 1 369 425
9 W07 91 0 1 92 108
10 W08 93 2 0 95 104
11 W09 28 1 0 29 30
12 W10 25 0 0 25 28
What I need, is for the row with 'Blanks' to combined into 1 row. Little help?
Thanks.
You can not use the alias in the GROUP BY, but you can use the expression that builds the value:
GROUP BY nvl(trim(a.code), 'Blanks')

Creating Pivots based for a rowcount in SQL

I have 2 tables T1 and T2.
T1 has the following columns:
Id, EntityId, TypeofValue, Year, value
EntityId can have 7 values:
1,2,3,4,5,100,101
Typeofvalue can have 2 values
1 indicates Actual
2 indicates Target
T2 has the following columns:
NoOfRecordsToDisplay
I need to fetch the number of records (if existing) for Target corresponding to an Id. The record count (1 if any record is present 0 if none) for Target needs to be grouped into two categories: First having Entityid 1 and Maximum records under Second group with entityids 2,3,4,100,101 [Not 5]
However, the catches are:
Sometimes Target value might not be present for a year
I need to get only last records for targets on the basis of NoOfRecordsToDisplay (The number of records to display comes from T2) for actual
Example1:
NoOfRecordsToDisplay =3, ID =123
The data below should return
CountGroup1: 1, Countgroup2: 1
as Entityid 1 has least one value for target for last 3 years -2015, 2014,2013 in this case
as Entityid 2 or 3 has at least 1 value for years -2015, 2014
Id EntityId TypeofValue Year Value
123 1 1 2015 55
123 1 1 2014 56
123 1 1 2013 57
123 1 1 2012 58
123 1 2 2015 50
123 1 2 2014 50
123 1 2 2013 50
123 1 2 2012 50
123 2 1 2015 55
123 2 1 2014 56
123 3 1 2015 57
123 3 1 2014 58
123 2 2 2015 55
123 2 2 2014 56
123 3 2 2015 57
124 1 1 2015 55
124 1 1 2014 56
124 2 1 2013 57
124 2 1 2012 58
124 1 2 2015 50
124 1 2 2014 50
124 2 2 2013 50
124 2 2 2012 50
Another dataset
NoOfRecordsToDisplay =3, ID =123
The data below should return:
CountGroup1: 0, Countgroup2: 1
as Entityid 1 has no target value for last 3 years (entityid 1 has a target value but for 2012)
as Entityid 2 has at one value for target years -2015 (entityid 3 has a target value but for 2010)
Id EntityId TypeofValue Year Value
123 1 1 2015 55
123 1 1 2014 56
123 1 1 2013 57
123 1 1 2012 58
123 1 2 2012 58
123 2 1 2015 55
123 2 1 2014 56
123 2 2 2015 55
123 2 2 2011 56
123 3 2 2010 57
Thank you so much for your help.
I have been trying to find this solution for a long time, I am not sure if Pivot will help
The question is different from the other one that I had posted as I am trying to create a group count based on entity groups.

SQL terminology to combine a NOT EXIST query with latest value

I am a beginner with basic knowledge.
I have a single table that I am trying to pull all UID's that have not had a particular code in the table within the past year.
My table looks like this: (but much larger of course)
FACID DPID EID DID UID DT Code Units Charge ET Ord
1 1 6 2 1002 15-Mar-07 99204 1 180 09:36.7 1
1 1 7 5 10004 15-Mar-07 99213 1 68 02:36.9 1
1 1 24 55 25887 15-Mar-07 99213 1 68 43:55.3 1
1 1 25 2 355688 15-Mar-07 99213 1 68 53:20.2 1
1 1 26 5 555654 15-Mar-07 99213 1 68 42:22.6 1
1 1 27 44 135514 15-Mar-07 99213 1 68 00:36.8 1
1 1 28 2 3244522 15-Mar-07 99214 1 98 34:59.4 1
1 1 29 5 235445 15-Mar-07 99213 1 68 56:42.1 1
1 1 30 3 3214444 15-Mar-07 99213 1 68 54:56.5 1
1 1 33 1 221444 15-Mar-07 99204 1 180 37:44.5 1
I am attempting to use the following, but this is not working for my time frame limits.
select distinct UID from PtProcTbl
where DT<'20120101'
and NOT EXISTS (Select Distinct UID
where Code in ('99203','99204','99205','99213',
'99214','99215','99244','99245'))
I need to know how to make sure the UID's that I am pulling are the ones don't have a DT after the 1/1/2012 cut off date that contains one of the NOT Exists codes.
The above query returned UID's that actually dates after 1/1/2012 that does contain one of the above codes...
Not sure what I am doing wrong or if I am totally off base on this..
Thanks in advance.
Are you sure you need the NOT EXISTS? How about instead:
AND Code NOT IN ('99203','99204','99205','99213','99214','99215','99244','99245')