pandas pivot_table SQL equivalent

pandas pivot_table SQL equivalent - sql

Can we in SQL (preferably Postgresql) mimic the pandas function pivot_table?
For example, let's say we have a table with the following 3 columns:
Name Day Value
John Sunday 6
John Monday 3
John Tuesday 2
Mary Sunday 6
Mary Monday 4
Mary Tuesday 7
Alex Tuesday 1
I want to pivot the table so that the index is the name, the columns are the days, and cells are the values:
names Monday Sunday Tuesday
John 3 6 2
Mary 4 6 7
Alex null null 1
Part of the example was taken from the question Transform a 3-column dataframe into a matrix

Related

Sum of field in a consecutive period based on a condition

I did this without complicated query and with Python. But I'm looking for a way to do this with Django ORM.
I have a table as follows:
user
date
point
Mary
2022/01/04
13
John
2022/01/04
10
Mary
2022/01/03
0
John
2022/01/03
5
Mary
2022/01/01
1
John
2022/01/01
1
Mary
2021/12/31
5
I want to calculate the Sum of points from now() to the date when the point value is greater than one.
Desired Output:
user
sum
Mary
14
13+1
John
10
10

Function to get rolling average with lowest 2 values eliminated?

This is my sample data with the current_Rating column my desired output.
Date Name Subject Importance Location Time Rating Current_rating
12/08/2020 David Work 1 London - - 4
1/08/2020 David Work 3 London 23.50 4 3.66
2/10/2019 David Emails 3 New York 18.20 3 4.33
2/08/2019 David Emails 3 Paris 18.58 4 4
11/07/2019 David Work 1 London - 3 4
1/06/2019 David Work 3 London 23.50 4 4
2/04/2019 David Emails 3 New York 18.20 3 5
2/03/2019 David Emails 3 Paris 18.58 5 -
12/08/2020 George Updates 2 New York - - 2
1/08/2019 George New Appointments5 London 55.10 2 -
I need to use a function to get values in the current_Rating column.The current_Rating gets the previous 5 results from the rating column for each name, then eliminates the lowest 2 results, then gets the average for the remaining 3. Also some names may not have 5 results, so I will just need to get the average of the results if 3 or below, if 4 results I will need to eliminate the lowest value and average the remaining 3. Also to get the right 5 previous results it will need to be sorted by date. Is this possible? Thanks for your time in advance.

What a pain! I think the simplest method might be to use arrays and then unnest() and aggregate:
select t.*, r.current_rating
from (select t.*,
array_agg(rating) over (partition by name order by date rows between 4 preceding and current row) as rating_5
from t
) t cross join lateral
(select avg(r) as current_rating
from (select u.*
from unnest(t.rating_5) with ordinality u(r, n)
where r is not null
order by r desc desc
limit 3
) r
) r

How to number the occurrences of a particular value in multiple cells

I have a table that represents the purchases of a list of customers by date. The data is sorted in order by customer, and purchase date.
I need to place the total number of orders a particular customer has made in a third column (probably by checking the number of previous instances of the customer's name).
My table currently looks like this:
Column A Column B Column C
1 12/03/13 Angela
2 01/05/14 Angela
3 03/07/14 Angela
4 04/01/14 Angela
5 03/06/13 Ben
6 04/02/13 Ben
7 11/11/15 Carl
8 12/11/15 Carl
9 01/01/16 Carl
10 02/03/17 David
11 04/04/17 Ethan
And what I need to see is (where Column C is the Total Orders for that customer)
Column A Column B Column C
1 12/03/13 Angela 1
2 01/05/14 Angela 2
3 03/07/14 Angela 3
4 04/01/14 Angela 4
5 03/06/13 Ben 1
6 04/02/13 Ben 2
7 11/11/15 Carl 1
8 12/11/15 Carl 2
9 01/01/16 Carl 3
10 02/03/17 David 1
11 04/04/17 Ethan 1
Any help is greatly appreciated!

Try the following in C2
=COUNTIF($B$2:$B2,$B2)
Drag down for as many rows as required.

SQL : Group By on range of dynamic values

This is similar to some other questions here, but those use a CASE which I cannot. This is on Oracle, and I will be running the query from an excel sheet. (And by the way these do not support WITH, which makes life much harder)
I have a range of dates in one big table - like 1/3/2011, 4/5/2012, 7/1/2013, 9/1/2013.....
Then I have another table with hours worked by employees on certain dates. So what I need to do is get a sum of number of hours worked by each employee in each intervening time period. So the tables are like
Dates
1-May-2011
5-Aug-2011
4-Apr-2012
....
and another
Employee Hours Date
Sam 4 1-Jan-2011
Sam 7 5-Jan-2011
Mary 12 7-Jan-2012
Mary 5 12-Dec-2013
......
so the result should be
Employee Hours In Date Range Till
Sam 11 1-May-2011
Sam 0 5-Aug-2011
Sam 0 4-Apr-2012
Mary 0 1-May-2011
Mary 0 5-Aug-2011
Mary 12 4-Apr-2012
....
Any pointers on how to achieve this please?

I'm unfamiliar with Oracle SQL and it's abilities/limitations, but since you asked for pointers, here's my take:
Join the tables (INNER JOIN) with the join rule being EmployeeHours.Date < Dates.Dates. Then GROUP BY Employee, Dates.Dates and select the grouping columns + SUM(Hours). What you'd end up with (Using your sample data) is:
Employee | Dates | Hours
Sam | 1-May-2011 | 11
Sam | 5-Aug-2011 | 11
Sam | 4-Apr-2012 | 11
Mary | 1-May-2011 | 0
Mary | 5-Aug-2011 | 0
Mary | 4-Apr-2012 | 12
With other (more complex) data, there will be more "interesting" results, but basically each row contains total hours up to that point.
You could then use that as an input to an outer query to find MAX(Hours) for all rows where Dates < currentDates and subtract that from your result.
Again, this is not a complete answer, but it's a direction that should work.

Showing Null Value Based on Three Tables

I’m working on showing employees that have not entered in any hours for a previous week. I’m currently working with three tables. One table is a calendar that has the first date of each week. The week format is Sunday to Saturday. The second table is the list of hours entered. The Time table contains the date the time was entered and the employees name. The third table is the list of all the employees. I can’t seem to get the joins to work how I would like them to. The end result I would like to see that Bob entered time in week 7 and 8, but week 9 is null. Thank you for your help. Its greatly appreciated.
Current Code
SELECT
d.Resource
,SUM(p.Hours) AS Hours
,m.[WeeksSundayToSaturday]
,DatePart(wk, m.[WeeksSundayToSaturday]) AS WeekNumber
FROM CalendarWeeks m
LEFT JOIN [TimeTracking] p ON
(m.[WeeksSundayToSaturday] BETWEEN p.Date AND p.Date + 7)
RIGHT JOIN [DepartmentMembers] d ON
d.Resource = p.CreatedBy
GROUP BY
d.Resource
,m.WeeksSundayToSaturday
Data Tables
Department Members
Name Department
Bob Engineer
Sue HR
John Operations
Time Tracking
Resource Hours Date
Bob 13 2/9/2014
Sue 12 2/10/2014
John 2 2/11/2014
Bob 6 2/12/2014
Bob 8 2/13/2014
John 8 2/14/2014
John 8 2/15/2014
Bob 8 2/16/2014
Bob 1 2/17/2014
Bob 2 2/18/2014
Bob 1 2/19/2014
Bob 8 2/20/2014
Bob 9 2/21/2014
Bob 6 2/22/2014
Sue 8 2/23/2014
John 2 2/24/2014
Calendar
WeeksSundayToSaturday
1/5/2014
1/12/2014
1/19/2014
1/26/2014
2/2/2014
2/9/2014
2/16/2014
2/23/2014
3/2/2014
3/9/2014
3/16/2014
3/23/2014
3/30/2014
Desired Result
Bob
Week 7 = 27
Week 8 = 35
Week 9 = NULL

Your above query is giving compilation error, please try below query i think it will help you
SELECT
d.Resource
,SUM(p.Hours) AS Hours
,m.[WeeksSundayToSaturday]
,DatePart(wk, m.[WeeksSundayToSaturday]) AS WeekNumber
FROM CalendarWeeks m
LEFT JOIN [TimeTracking] p ON (p.Date BETWEEN
m.[WeeksSundayToSaturday] AND Dateadd(d,6, m.[WeeksSundayToSaturday])
RIGHT JOIN [DepartmentMembers] d ON d.Resource = p.CreatedBy
GROUP BY d.Resource ,m.WeeksSundayToSaturday

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas