SUM with Windows Function BIGQUERY - google-bigquery

I want to know the number of goals scored away and at home for each team in each season
season |home_goal |away_goal |team_home |team_away|
-----------------------------------------------------
1 | 1 |0 |France |Spain |
1 | 1 |2 |Italie |Spain |
1 | 0 |1 |Spain |Italie |
1 | 1 |3 |France |Italie |
1 | 1 |4 |Spain |Portugal |
1 | 3 |4 |Portugal |Italie |
2 | 1 |2 |France |Portugal |
2 | 1 |0 |Spain |Italie |
2 | 0 |1 |Spain |Portugal |
2 | 3 |2 |Italie |Spain |
2 | 0 |1 |France |Portugal |
... | ... |... |... |... |
I want this output
season |hg |ag |team |
-------------------------------------------
1 | 2 |0 |France |
1 | 1 |8 |Italie |
1 | 1 |2 |Spain |
1 | 3 |4 |Portugal |
2 | 1 |0 |France |
... | ... |... |... |
I don't have the expected result, I only have the goals scored at home...
WITH all_match AS (SELECT
season,
match.team_home AS ht,
match.home_goal AS hg
FROM
match
UNION ALL
SELECT
season,
match.team_away AS ta,
match.away_goal AS ag
FROM
match)
SELECT
season,
ht,
SUM(hg)
FROM
all_match
group by 1,2

Use below approach
select * from (
select season, 'home' location, team_home as team, home_goal as goal
from your_table
union all
select season, 'away' location, team_away as team, away_goal as goal
from your_table
)
pivot (sum(goal) for location in ('home', 'away'))
if applied to sample data in your question - output is

Related

how to add columns on pandas pivot table( multi-column)

Please tell me how to add columns?
This DataFrame to pivot.
|date |country|type|qty|
|----------|-------|----|---|
|2021/03/01|jp |A |10 |
|2021/03/01|en |C |20 |
|2021/03/01|jp |C |15 |
|2021/03/02|jp |A |10 |
|2021/03/02|en |A |20 |
|2021/03/02|en |C |15 |
(to pivot)
| |2021/03/01|2021/03/02|
|-----|----------|----------|
| |jp |en |jp |en |
|-----|----------|----------|
| A |10 | 0 |50 |30 |
| C |15 | 15 |0 |75 |
I would like to add "rate column"
| |2021/03/01 |2021/03/02 |
|-----|---------------------|----------------------|
| | jp | en | jp | en |
|-----|---------------------|----------------------|
| |cnt | rate|cnt |rate|cnt | rate|cnt |rate |
|-----|----------|----------|----------|-----------|
| A |10 | 0.4 | 0 | 0 |50 | 1 | 30 | 0.26|
| C |15 | 0.6 | 15 | 1 |0 | 0 | 85 | 0.74|
You can use concat with keys parameter with divide values by sums, then add DataFrame.reorder_levels and sort MultiIndex:
#change to your function if necessary
df1 = df.pivot_table(index='type', columns=['date','country'], values='qty', fill_value=0)
print (df1)
date 2021/03/01 2021/03/02
country en jp en jp
type
A 0 10 20 10
C 20 15 15 0
df = (pd.concat([df1, df1.div(df1.sum())], axis=1, keys=('cnt','rate'))
.reorder_levels([1,2,0], axis=1)
.sort_index(axis=1))
print (df)
date 2021/03/01 2021/03/02
country en jp en jp
cnt rate cnt rate cnt rate cnt rate
type
A 0 0.0 10 0.4 20 0.571429 10 1.0
C 20 1.0 15 0.6 15 0.428571 0 0.0

Update un-null columns from 2 source tables into null columns of target table

I have TABLE A
ID|POS|Location|ITEM |COLOR
------------------------------
1 | 1 |ABC |A | RED
1 | 2 |ABC |B | BLUE
1 | 3 |ABC |NULL | YELLOW
1 | 4 |ABC |D | NULL
2 | 1 |ABC |A | BLACK
2 | 2 |ABC |B | BLUE
2 | 3 |ABC |C | RED
3 | 1 |ABC |NULL | BROWN
4 | 1 |ABC |A | WHITE
4 | 2 |ABC |B | RED
4 | 3 |ABC |NULL | BLUE
4 | 4 |ABC |NULL | YELLOW
5 | 1 |ABC |A | NULL
5 | 2 |ABC |C | NULL
5 | 3 |ABC |D | BLUE
6 | 1 |ABC |A | RED
6 | 2 |ABC |B | BROWN
6 | 3 |ABC |C | WHITE
7 | 1 |ABC |NULL | RED
7 | 2 |ABC |B | NULL
7 | 3 |ABC |C | YELLOW
8 | 1 |ABC |A | NULL
8 | 2 |ABC |B | BLACK
8 | 3 |ABC |C | BLUE
8 | 4 |ABC |D | RED
8 | 5 |ABC |E | BROWN
9 | 1 |ABC |NULL | WHITE
9 | 2 |ABC |C | BLUE
9 | 3 |ABC |D | YELLOW
9 | 4 |ABC |E | NULL
10 | 1 |ABC |A | NULL
10 | 2 |ABC |B | WHITE
10 | 3 |ABC |C | BLACK
11 | 1 |ABC |A | BLUE
11 | 2 |ABC |B | NULL
TABLE B
ID|POS|Location|ITEM
1 | 1 |ABC |A
1 | 2 |ABC |B
1 | 3 |ABC |B
1 | 4 |ABC |D
2 | 1 |ABC |A
2 | 2 |ABC |B
2 | 3 |ABC |C
3 | 1 |ABC |E
4 | 1 |ABC |A
4 | 2 |ABC |B
4 | 3 |ABC |F
4 | 4 |ABC |NULL
5 | 1 |ABC |A
5 | 2 |ABC |C
5 | 3 |ABC |NULL
6 | 1 |ABC |A
6 | 2 |ABC |B
and TABLE C
ID|POS|Location |COLOR
--------------------------
1 | 1 |ABC | RED
1 | 2 |ABC | BLUE
1 | 3 |ABC | YELLOW
1 | 4 |ABC | RED
2 | 1 |ABC | BLACK
2 | 2 |ABC | BLUE
2 | 3 |ABC | VIOLET
3 | 1 |ABC | BROWN
4 | 1 |ABC | WHITE
4 | 2 |ABC | RED
4 | 3 |ABC | BLUE
4 | 4 |ABC | YELLOW
5 | 1 |ABC | WHITE
5 | 2 |ABC | BLACK
5 | 3 |ABC | BLUE
6 | 1 |ABC | RED
6 | 2 |ABC | BROWN
6 | 3 |ABC | WHITE
7 | 1 |ABC | RED
7 | 2 |ABC | BLUE
7 | 3 |ABC | YELLOW
8 | 1 |ABC | PURPLE
8 | 2 |ABC | BLACK
8 | 3 |ABC | PINK
8 | 4 |ABC | RED
8 | 5 |ABC | BROWN
9 | 1 |ABC | WHITE
9 | 2 |ABC | BLUE
9 | 3 |ABC | YELLOW
9 | 4 |ABC | NULL
10 | 1 |ABC | CYAN
10 | 2 |ABC | WHITE
10 | 3 |ABC | BLACK
11 | 1 |ABC | INDIGO
11 | 2 |ABC | NULL
I want to copy the Items column from Table B (not null items) and Colors column from Table C (not null colors) to update them into Table A only if the Item column is null or color item in Table A is null.
Thanks,
It looks like your keys are ID and POS? If you are using sql server then you can do this with a case statement
UPDATE a
set
item = CASE WHEN b.item is not null then b.item else a.item END,
color = CASE WHEN c.color is not null then c.color else a.color END
FROM TableA a
INNER JOIN TableB b
on a.ID = b.ID and a.POS = b.POS
INNER JOIN TableC c
on a.ID = c.ID and a.POS = c.POS

Derive and Update Column Value based on Row Value SQL Server

So I have a Request History table that I would like to flag its versions (version is based on end of cycle); I was able to mark the end of the cycle, but somehow I couldn't update the values of each associated with each cycle. Here is an example:
|history_id | Req_id | StatID | Time |EndCycleDate |
|-------------|---------|-------|---------- |-------------|
|1 | 1 |18 | 3/26/2017 | NULL |
|2 | 1 | 19 | 3/26/2017 | NULL |
|3 | 1 |20 | 3/30/2017 | NULL |
|4 |1 | 23 |3/30/2017 | NULL |
|5 | 1 |35 |3/30/2017 | 3/30/2017 |
|6 | 1 |33 |4/4/2017 | NULL |
|7 | 1 |34 |4/4/2017 | NULL |
|8 | 1 |39 |4/4/2017 | NULL |
|9 | 1 |35 |4/4/2017 | 4/4/2017 |
|10 | 1 |33 |4/5/2017 | NULL |
|11 | 1 |34 |4/6/2017 | NULL |
|12 | 1 |39 |4/6/2017 | NULL |
|13 | 1 |35 |4/7/2017 | 4/7/2017 |
|14 | 1 |33 |4/8/2017 | NULL |
|15 | 1 | 34 |4/8/2017 | NULL |
|16 | 2 |18 |3/28/2017 | NULL |
|17 | 2 |26 |3/28/2017 | NULL |
|18 | 2 |20 |3/30/2017 | NULL |
|19 | 2 |23 |3/30/2017 | NULL |
|20 | 2 |35 |3/30/2017 | 3/30/2017 |
|21 | 2 |33 |4/12/2017 | NULL |
|22 | 2 |34 |4/12/2017 | NULL |
|23 | 2 |38 |4/13/2017 | NULL |
Now what I would like to achieve is to derive a new column, namely VER, and update its value like the following:
|history_id | Req_id | StatID | Time |EndCycleDate | VER |
|-------------|---------|-------|---------- |-------------|------|
|1 | 1 |18 | 3/26/2017 | NULL | 1 |
|2 | 1 | 19 | 3/26/2017 | NULL | 1 |
|3 | 1 |20 | 3/30/2017 | NULL | 1 |
|4 |1 | 23 |3/30/2017 | NULL | 1 |
|5 | 1 |35 |3/30/2017 | 3/30/2017 | 1 |
|6 | 1 |33 |4/4/2017 | NULL | 2 |
|7 | 1 |34 |4/4/2017 | NULL | 2 |
|8 | 1 |39 |4/4/2017 | NULL | 2 |
|9 | 1 |35 |4/4/2017 | 4/4/2017 | 2 |
|10 | 1 |33 |4/5/2017 | NULL | 3 |
|11 | 1 |34 |4/6/2017 | NULL | 3 |
|12 | 1 |39 |4/6/2017 | NULL | 3 |
|13 | 1 |35 |4/7/2017 | 4/7/2017 | 3 |
|14 | 1 |33 |4/8/2017 | NULL | 4 |
|15 | 1 | 34 |4/8/2017 | NULL | 4 |
|16 | 2 |18 |3/28/2017 | NULL | 1 |
|17 | 2 |26 |3/28/2017 | NULL | 1 |
|18 | 2 |20 |3/30/2017 | NULL | 1 |
|19 | 2 |23 |3/30/2017 | NULL | 1 |
|20 | 2 |35 |3/30/2017 | 3/30/2017 | 1 |
|21 | 2 |33 |4/12/2017 | NULL | 2 |
|22 | 2 |34 |4/12/2017 | NULL | 2 |
|23 | 2 |38 |4/13/2017 | NULL | 2 |
One method that comes really close is a cumulative count:
select t.*,
count(endCycleDate) over (partition by req_id order by history_id) as ver
from t;
However, this doesn't get the value when the endCycle date is defined exactly right. And the value starts at 0. Most of these problems are fixed with a windowing clause:
select t.*,
(count(endCycleDate) over (partition by req_id
order by history_id
rows between unbounded preceding and 1 preceding) + 1
) as ver
from t;
But that misses the value on the first row first one. So, here is a method that actually works. It enumerates the values backward and then subtracts from the total to get the versions in ascending order:
select t.*,
(1 + count(*) over (partition by req_id) -
(count(endCycleDate) over (partition by req_id
order by history_id desc)
) as ver
from t;

GET top 10 using sql in access joining tables

These are my tables:
table BUSINESS
BUSINESSUSERNAME|BUSINESSPASSWORD|BUSINESSNAME|
Res1 |123 |Cafe |
Res2 |456 |Foodtruck |
table USER
USERNAME|USERPASSWORD|NAME|
user1 |123 |mr.1|
user2 |234 |mr.2|
table FOOD
FOODID|FOODNAME|FOODPRICE|BUSINESSUSERNAME|
1 |CAKE | 5 |Res1 |
2 |SHAKE | 2 |Res2 |
3 |COLA | 3 |Res1 |
table FOOD_RATING
FOODREVIEWID|FOODID|FOODRATING|BUSINESSUSERNAME|USERNAME|
1 |2 |3 |Res2 |user1 |
2 |2 |5 |Res2 |user2 |
3 |1 |4 |Res1 |user1 |
4 |3 |1 |Res1 |user1 |
I would like to get the top 10 foods based on average rating:
RANK|FOODNAME|FODPRICE|AVGRATING|BUSINESSUSERNAME
1 |CAKE |5 |4 |Res1
2 |SHAKE |3 |4 |Res2
3 |COLA |3 |1 |Res1
.
.
.
10
EDIT: Added SELECT TOP 10
The ORDER BY comes after GROUP BY
SELECT TOP 10 FOOD.FOODNAME, FOOD.FOODPRICE
, IIF(Round(Avg(FOODRATING), 1) IS NULL, 0, Round(Avg(FOODRATING), 1)) AS FOODAVGRATING
FROM FOOD
LEFT JOIN FOOD_REVIEW ON FOOD.FOODID = FOOD_REVIEW.FOODID
WHERE (((FOOD.BUSINESSUSERNAME) = "someusername"))
GROUP BY FOOD.FOODNAME, FOOD.FOODPRICE
ORDER BY IIF(Round(Avg(FOODRATING), 1) IS NULL, 0, Round(Avg(FOODRATING), 1)) DESC;

Group by records by date

I am using SQL Server 2008 R2. I am having a database table like below :
+--+-----+---+---------+--------+----------+-----------------------+
|Id|Total|New|Completed|Assigned|Unassigned|CreatedDtUTC |
+--+-----+---+---------+--------+----------+-----------------------+
|1 |29 |1 |5 |6 |5 |2014-01-07 06:00:00.000|
+--+-----+---+---------+--------+----------+-----------------------+
|2 |29 |1 |5 |6 |5 |2014-01-07 06:00:00.000|
+--+-----+---+---------+--------+----------+-----------------------+
|3 |29 |1 |5 |6 |5 |2014-01-07 06:00:00.000|
+--+-----+---+---------+--------+----------+-----------------------+
|4 |30 |1 |3 |2 |3 |2014-01-08 06:00:00.000|
+--+-----+---+---------+--------+----------+-----------------------+
|5 |30 |0 |3 |4 |3 |2014-01-09 06:00:00.000|
+--+-----+---+---------+--------+----------+-----------------------+
|6 |30 |0 |0 |0 |0 |2014-01-10 06:00:00.000|
+--+-----+---+---------+--------+----------+-----------------------+
|7 |30 |0 |0 |0 |0 |2014-01-11 06:00:00.000|
+--+-----+---+---------+--------+----------+-----------------------+
Now, I am facing a strange problem while grouping the records by CreatedDtUTC column.
I want the distinct records from this table. Here you can observe that the first three records are duplicates created at the same date time. I want the distinct records so I had ran the query given below :
SELECT Id, Total, New, Completed, Assigned, Unassigned, MAX(CreatedDtUTC)
FROM TblUsage
GROUP BY CreatedDtUTC
But it gives me error :
Column 'TblUsage.Id' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
I also have tried DISTINCT for CreatedDtUTC column, but had given the same error. Can anyone let me know how to get rid of this?
P.S. I want the CreatedDtUTC coumn in CONVERT(VARCHAR(10), CreatedDtUTC,101) format.
Try this............
SELECT min(Id) Id, Total, New, Completed, Assigned, Unassigned, CreatedDtUTC
FROM TblUsage
GROUP BY Total, New, Completed, Assigned, Unassigned, CreatedDtUTC
The error message itself is very explicit. You can't put a column without applying an aggregate function to it into SELECT clause if it's not a part of GROUP BY. And the reason behind is very simple SQL Server doesn't know which value for that column within a group you want to select. It's not deterministic and therefore prohibited.
You can either put all the columns besides Id in GROUP BY and use MIN() or MAX() on Id or you can leverage windowing function ROW_NUMBER() in the following way
SELECT Id, Total, New, Completed, Assigned, Unassigned, CONVERT(VARCHAR(10), CreatedDtUTC,101) CreatedDtUTC
FROM
(
SELECT t.*, ROW_NUMBER() OVER (PARTITION BY Total, New, Completed, Assigned, Unassigned, CreatedDtUTC
ORDER BY id DESC) rnum
FROM TblUsage t
) q
WHERE rnum = 1
Output:
| ID | TOTAL | NEW | COMPLETED | ASSIGNED | UNASSIGNED | CREATEDDTUTC |
|----|-------|-----|-----------|----------|------------|--------------|
| 3 | 29 | 1 | 5 | 6 | 5 | 01/07/2014 |
| 6 | 30 | 0 | 0 | 0 | 0 | 01/10/2014 |
| 7 | 30 | 0 | 0 | 0 | 0 | 01/11/2014 |
| 5 | 30 | 0 | 3 | 4 | 3 | 01/09/2014 |
| 4 | 30 | 1 | 3 | 2 | 3 | 01/08/2014 |
Here is SQLFiddle demo
Try this:
SELECT MIN(Id) AS Id, Total, New, Completed, Assigned, Unassigned,
CONVERT(VARCHAR(10), CreatedDtUTC, 101) AS CreatedDtUTC
FROM TblUsage
GROUP BY Total, New, Completed, Assigned, Unassigned, CreatedDtUTC
Check the SQL FIDDLE DEMO
OUTPUT
| ID | TOTAL | NEW | COMPLETED | ASSIGNED | UNASSIGNED | CREATEDDTUTC |
|----|-------|-----|-----------|----------|------------|--------------|
| 1 | 29 | 1 | 5 | 6 | 5 | 01/07/2014 |
| 4 | 30 | 1 | 3 | 2 | 3 | 01/08/2014 |
| 5 | 30 | 0 | 3 | 4 | 3 | 01/09/2014 |
| 6 | 30 | 0 | 0 | 0 | 0 | 01/10/2014 |
| 7 | 30 | 0 | 0 | 0 | 0 | 01/11/2014 |