Related
Let' say I have a table like this:
name
math
english
science
Amy
69
70
70
Mike
65
71
63
Jay
66
66
66
I want to create a new column which counts the number of unique value over each row in columns math,english,science;
So this is my expected output:
name
math
english
science
n_unique
Amy
69
70
70
2
Mike
65
71
63
3
Jay
66
66
66
1
For the first row, there are only two kind of score 69, 70 so n_unique is 2,
for the second row, there are 65,71,63 so n_unique is 3,
for the third row, only one score 66, so n_unique is 1;
How to write the query to create such column in Bigquery using SQL?
You can "unpivot" your table, count the distinct grades per student, and then join back to your original table:
with mytable as (
select 'Amy' as name, 69 as math, 70 as english, 70 as science union all
select 'Mike', 65, 71, 63 union all
select 'Jay', 66, 66, 66
),
tmp_unpivot as (
select * from mytable
unpivot(grade for class in(math, english, science))
),
agg as (
select name, count(distinct grade) as n_unique
from tmp_unpivot
group by 1
)
select
mytable.*,
agg.n_unique
from mytable
inner join agg on mytable.name = agg.name
Consider below approach
select *, (
select count(distinct val)
from unnest(regexp_extract_all(format('%t', t), r'\d+')) val
) as n_unique
from your_table t
if applied to sample data in your question - output is
I have three worksheets: players, teams, and weights (how highly a particular attribute is weighted when determining player-team match).
Players
Name
Age
Height
Free_Throw_Perc
...
Bod
23
74
62
...
Teams
| Team_Name | Age | Height | Free_Throw_Perc | ... |
|-----------|-----|--------|-----------------|-----|
|Team1|23|78|62|...|
Weights
| Team_Name | Age | Height | Free_Throw_Perc | ... |
|:---------:|:---:|:------:|:---------------:|:---:|
| Team1 | 5 | 10 | 10 | ... |
CREATE TABLE players (name, age, height, free_throw_perc) AS
SELECT 'Alice', 20, 160, 90 FROM DUAL UNION ALL
SELECT 'Betty', 21, 165, 80 FROM DUAL UNION ALL
SELECT 'Carol', 22, 170, 70 FROM DUAL UNION ALL
SELECT 'Debra', 23, 175, 60 FROM DUAL UNION ALL
SELECT 'Emily', 24, 180, 50 FROM DUAL UNION ALL
SELECT 'Fiona', 25, 185, 40 FROM DUAL UNION ALL
SELECT 'Gerri', 26, 190, 30 FROM DUAL UNION ALL
SELECT 'Heidi', 27, 195, 20 FROM DUAL UNION ALL
SELECT 'Irene', 28, 200, 10 FROM DUAL;
CREATE TABLE teams (team_name, age, height, free_throw_perc) AS
SELECT 'ALPHA', 20,175,90 FROM DUAL;
CREATE TABLE weights team_name, age, height, free_throw_perc) AS
SELECT 'ALPHA', 5,10,10 FROM DUAL;
The teams table corresponds to the players table but contains a record for each team detailing their ideal player based on the current composition of the team. The weights table contains a record for each team with an integer value weight stating how much they care about each player attribute. I am trying to compute a total match score for each player-team combination. I was able to do this quite easily with python but am struggling to accomplish the same in SQL.
In Python this would be a simple for loop with logical operators comparing each cell of one dataframe to each cell of another, but the lack of positional referencing in SQL makes this a lot trickier to do and generalize (be able to use the same queries for other pairs of tables with different attributes).
So far I have
BEGIN
FOR c in (SELECT column_name FROM all_tab_columns WHERE table_name = 'teams')
LOOP
INSERT INTO match_table (players.Name, candidates.c)
SELECT players.Name, players.c WHERE players.c = teams.c
END LOOP;
BEGIN
FOR c IN (SELECT column_name FROM all_tab_columns WHERE table_name = 'weights')
LOOP
UPDATE match_table
SET match_table.c = (SELECT weights.c FROM weights WHERE match_table.c = weights.c)
END LOOP;
From what I can tell that will generate a table of player names with a single column corresponding to a match to a team attribute populated by the corresponding weight and all other columns full of null values. If that is the case, I can group by name to create a singular record with all matches and corresponding weights.
The script should loop through each player and team and compare the attributes of the player with those desired by the team. Where there is a match a new row should be added to the match_table with the player name and nulls values except for the column that matched. That should be done for each player-team attribute match. Then those matches should be replaced by the corresponding weight from the weight table. I would then like to sum those to get a total match score. I can't use the '+' operator because the column nammes will vary. They will always match between the three tables, but there will be varied attributes of interest.
The expected output would look something like:
players.name
Age
Height
Free_Throw_Perc
...
'Alice'
5
NULL
NULL
...
'Alice'
NULL
10
NULL
...
How would I then sum across each record to find the total match score of each candidate for a team?
If you have the sample data:
CREATE TABLE teams ( id, name ) AS
SELECT 1, 'Alpha' FROM DUAL UNION ALL
SELECT 2, 'Beta' FROM DUAL UNION ALL
SELECT 3, 'Gamma' FROM DUAL;
CREATE TABLE players (name, team, age, height, free_throw_perc) AS
SELECT 'Alice', 1, 20, 160, 90 FROM DUAL UNION ALL
SELECT 'Betty', 1, 21, 165, 80 FROM DUAL UNION ALL
SELECT 'Carol', 1, 22, 170, 70 FROM DUAL UNION ALL
SELECT 'Debra', 2, 23, 175, 60 FROM DUAL UNION ALL
SELECT 'Emily', 2, 24, 180, 50 FROM DUAL UNION ALL
SELECT 'Fiona', 2, 25, 185, 40 FROM DUAL UNION ALL
SELECT 'Gerri', 3, 26, 190, 30 FROM DUAL UNION ALL
SELECT 'Heidi', 3, 27, 195, 20 FROM DUAL UNION ALL
SELECT 'Irene', 3, 28, 200, 10 FROM DUAL;
CREATE TABLE weights(team, key, weight) AS
SELECT 1, 'AGE', 1.0 FROM DUAL UNION ALL
SELECT 1, 'HEIGHT', 0.5 FROM DUAL UNION ALL
SELECT 1, 'FREE_THROW_PERC', 0.2 FROM DUAL UNION ALL
SELECT 2, 'AGE', 0.0 FROM DUAL UNION ALL
SELECT 2, 'HEIGHT', 1.0 FROM DUAL UNION ALL
SELECT 2, 'FREE_THROW_PERC', 0.8 FROM DUAL UNION ALL
SELECT 3, 'AGE', 0.5 FROM DUAL UNION ALL
SELECT 3, 'HEIGHT', 0.5 FROM DUAL UNION ALL
SELECT 3, 'FREE_THROW_PERC', 1.0 FROM DUAL;
And you want to insert the sum of the weight column from the weights table multiplied by the respective value in the players table into the following table:
CREATE TABLE match_table(
team INT,
value NUMBER
);
Then you can use the following INSERT query:
INSERT INTO match_table (team, value)
SELECT p.team,
SUM(p.value * w.weight)
FROM ( SELECT name, team, key, value
FROM players
UNPIVOT ( value FOR key IN (age, height, free_throw_perc) )
) p
INNER JOIN weights w
ON ( p.team = w.team AND p.key = w.key )
GROUP BY p.team
Then the table will contain the weighted totals:
TEAM
VALUE
2
660
3
393
1
358.5
fiddle
And if your match_table is:
CREATE TABLE match_table(
player VARCHAR2(20),
team INT,
age NUMBER,
height NUMBER,
free_throw_perc NUMBER,
total NUMBER
);
Then you can use the query (and calculate the total with the + operator):
INSERT INTO match_table (player, team, age, height, free_throw_perc, total)
SELECT p.name,
p.team,
p.age * w.age_weight,
p.height * w.height_weight,
p.free_throw_perc * w.free_throw_perc_weight,
p.age * w.age_weight
+ p.height * w.height_weight
+ p.free_throw_perc * w.free_throw_perc_weight
FROM players p
INNER JOIN (
SELECT *
FROM weights
PIVOT (
MAX(weight)
FOR key IN (
'AGE' AS age_weight,
'HEIGHT' AS height_weight,
'FREE_THROW_PERC' AS free_throw_perc_weight
)
)
) w
ON (p.team = w.team)
Which gives the values:
PLAYER
TEAM
AGE
HEIGHT
FREE_THROW_PERC
TOTAL
Alice
1
20
80
18
118
Betty
1
21
82.5
16
119.5
Carol
1
22
85
14
121
Debra
2
0
175
48
223
Emily
2
0
180
40
220
Fiona
2
0
185
32
217
Gerri
3
13
95
30
138
Heidi
3
13.5
97.5
20
131
Irene
3
14
100
10
124
fiddle
Or, if the players are uncorrelated to a team then:
INSERT INTO match_table (player, team, age, height, free_throw_perc, total)
SELECT p.name,
w.team,
p.age * w.age_weight,
p.height * w.height_weight,
p.free_throw_perc * w.free_throw_perc_weight,
p.age * w.age_weight
+ p.height * w.height_weight
+ p.free_throw_perc * w.free_throw_perc_weight
FROM players p
CROSS JOIN (
SELECT *
FROM weights
PIVOT (
MAX(weight)
FOR key IN (
'AGE' AS age_weight,
'HEIGHT' AS height_weight,
'FREE_THROW_PERC' AS free_throw_perc_weight
)
)
) w
Which, for the sample data, outputs:
PLAYER
TEAM
AGE
HEIGHT
FREE_THROW_PERC
TOTAL
Alice
1
20
80
18
118
Alice
2
0
160
72
232
Alice
3
10
80
90
180
Betty
1
21
82.5
16
119.5
Betty
2
0
165
64
229
Betty
3
10.5
82.5
80
173
Carol
1
22
85
14
121
Carol
2
0
170
56
226
Carol
3
11
85
70
166
Debra
1
23
87.5
12
122.5
Debra
2
0
175
48
223
Debra
3
11.5
87.5
60
159
Emily
1
24
90
10
124
Emily
2
0
180
40
220
Emily
3
12
90
50
152
Fiona
1
25
92.5
8
125.5
Fiona
2
0
185
32
217
Fiona
3
12.5
92.5
40
145
Gerri
1
26
95
6
127
Gerri
2
0
190
24
214
Gerri
3
13
95
30
138
Heidi
1
27
97.5
4
128.5
Heidi
2
0
195
16
211
Heidi
3
13.5
97.5
20
131
Irene
1
28
100
2
130
Irene
2
0
200
8
208
Irene
3
14
100
10
124
fiddle
select
case
when code = number_1
then number_1=code
when count(code)>=count(number_1)
then number_1 = sum(code)
else 'Null'
end
from table_1, table_2
ORDER BY code, number_1 ;
table 1
code
value
0
None
1
R
2
W
4
C
8
D
16
U
32
Uown
Table 2
number
0
1
2
3
4
5
8
12
13
16
20
25
26
27
32
43
44
45
60
61
62
63
64
68
70
expected output is
number
output
0
0
1
1
2
2
3
2,1
4
4
7
4,2,1
8
8
16
16
32
32
43
32,8,2,1
63
32,16,8,4,2,1
64
null
70
null
You can use the BITAND function in the JOIN condition:
SELECT t2."NUMBER",
CASE SUM(t1.code)
WHEN t2."NUMBER"
THEN LISTAGG(t1.code, ',') WITHIN GROUP (ORDER BY t1.code DESC)
END AS output,
CASE SUM(t1.code)
WHEN t2."NUMBER"
THEN LISTAGG(t1.value, ',') WITHIN GROUP (ORDER BY t1.code DESC)
END AS value_output
FROM table_2 t2
INNER JOIN table_1 t1
ON ( t2."NUMBER" = t1.code
OR (t1.code > 0 AND BITAND(t2."NUMBER", t1.code) = t1.code))
GROUP BY t2."NUMBER"
Which, for the sample data:
CREATE TABLE table_1 (code, value) AS
SELECT 0, 'None' FROM DUAL UNION ALL
SELECT 1, 'R' FROM DUAL UNION ALL
SELECT 2, 'W' FROM DUAL UNION ALL
SELECT 4, 'C' FROM DUAL UNION ALL
SELECT 8, 'D' FROM DUAL UNION ALL
SELECT 16, 'U' FROM DUAL UNION ALL
SELECT 32, 'Uown' FROM DUAL;
CREATE TABLE Table_2 ("NUMBER") AS
SELECT COLUMN_VALUE
FROM SYS.ODCINUMBERLIST(
0,1,2,3,4,5,8,12,13,16,20,25,26,27,32,43,44,45,60,61,62,63,64,68,70
);
Outputs:
NUMBER
OUTPUT
VALUE_OUTPUT
0
0
None
1
1
R
2
2
W
3
2,1
W,R
4
4
C
5
4,1
C,R
8
8
D
12
8,4
D,C
13
8,4,1
D,C,R
16
16
U
20
16,4
U,C
25
16,8,1
U,D,R
26
16,8,2
U,D,W
27
16,8,2,1
U,D,W,R
32
32
Uown
43
32,8,2,1
Uown,D,W,R
44
32,8,4
Uown,D,C
45
32,8,4,1
Uown,D,C,R
60
32,16,8,4
Uown,U,D,C
61
32,16,8,4,1
Uown,U,D,C,R
62
32,16,8,4,2
Uown,U,D,C,W
63
32,16,8,4,2,1
Uown,U,D,C,W,R
68
null
null
70
null
null
db<>fiddle here
I don't know if the title describes my requirements. I have a table C_Bpartner (with C_BPartner_ID as a Primary Key) for employees like this:
name Hiringorderno Orderdate C_Bpartner_ID
A 30 25/02/2002 100
B 47 13/10/2005 101
D 110 22/09/2010 105
and other tables like emp_training:
C_Bpartner_ID TrainingOrderno Orderdate
100 46 14/05/2012
100 58 10/07/2013
101 76 22/10/2015
and emp_penalty:
C_Bpartner_ID PenaltyOrderno Orderdate
105 133 14/05/2012
101 153 25/03/2018
I want the resulting table to be like:
name orderno Orderdate C_Bpartner_ID
A 30 25/02/2012 100
A 46 14/05/2005 100
A 58 10/07/2013 100
B 47 13/10/2005 101
B 76 22/10/2015 101
B 153 25/03/2018 101
D 110 22/09/2010 105
D 133 14/05/2012 105
so, I joined C_BPartner with itself and coalesce them, in order to get a second record for the same C_BPartner_ID. then tried to get the Hiringorderno from C_BPartner bp and join C_BPartner pp with emp_penalty pt(as an example) and get PenaltyOrderno
and combine them with coalesce(bp.Hiringorderno,pt.PenaltyOrderno) and do that for all other tables and for Orderdate as well. but it doesn't duplicate records. it picks the first coalesce parameter and discards the other. like this
coalesce(bp.name,pp.name) coalesce(bp.Hiringorderno,pt.PenaltyOrderno) Hiringorderno PenaltyOrderno
A 30 30 null
B 47 47 153
the emp_penalty record for B is not there.
There's other ways to do this, but I think the most clear and intuitive way is to UNION the 3 queries that you're trying to do.
select name, hiringorderno as orderno, orderdate, C_Bpartner_ID, 'HIRING' as ordertype, null as emp_penalty_ID
from C_Bpartner
union all
select bp.name, trainingorderno, t.orderdate, bp.C_Bpartner_ID, 'TRAINING', null
from emp_training t
join C_Bpartner bp
on bp.C_Bpartner_ID = t.C_Bpartner_ID
union all
select bp.name, PenaltyOrderno, p.orderdate, bp.C_Bpartner_ID, 'PENALTY', p.emp_penalty_ID
from emp_penalty p
join C_Bpartner bp
on bp.C_Bpartner_ID = p.C_Bpartner_ID
;
Edit: I added 2 columns to show 2 common ways to differentiate the union'ed records.
One way is to add a constant string or number to each select statement - that way you can use CASE WHEN ordertype = 'PENALTY' ... or WHERE ordertype = 'TRAINING' to filter your records.
Another way, like you mentioned, is to fill in a column for one of the selects, like emp_penalty_id, but set it to null for the other select statements.
All the select statements being unioned together need to have the same number of columns, with compatible types. The first select statement defines the column names and types for the rest, which is why I didn't need to add column aliases to the second and third selects.
One option is to union 3 queries:
SQL> with
2 c_bpartner (name, hiringorderno, orderdate, c_bpartner_id) as
3 (select 'A', 30, date '2002-02-25', 100 from dual union all
4 select 'B', 47, date '2005-10-13', 101 from dual union all
5 select 'D', 110,date '2010-09-22', 105 from dual
6 ),
7 emp_training(c_bpartner_id, trainingorderno, orderdate) as
8 (select 100, 46, date '2012-05-14' from dual union all
9 select 100, 58, date '2013-07-10' from dual union all
10 select 101, 76, date '2015-10-22' from dual
11 ),
12 emp_penalty (c_bpartner_id, penaltyorderno, orderdate) as
13 (select 105, 133, date '2012-05-14' from dual union all
14 select 101, 153, date '2018-03-25' from dual
15 )
16 select c.name, c.hiringorderno as orderno, c.orderdate, c.c_bpartner_id
17 from c_bpartner c
18 union all
19 select c.name, t.trainingorderno, t.orderdate, t.c_bpartner_id
20 from c_bpartner c join emp_training t on t.c_bpartner_id = c.c_bpartner_id
21 union all
22 select c.name, p.penaltyorderno, p.orderdate, p.c_bpartner_id
23 from c_bpartner c join emp_penalty p on p.c_bpartner_id = c.c_bpartner_id
24 order by 1, 2;
N ORDERNO ORDERDATE C_BPARTNER_ID
- ---------- ---------- -------------
A 30 25/02/2002 100
A 46 14/05/2012 100
A 58 10/07/2013 100
B 47 13/10/2005 101
B 76 22/10/2015 101
B 153 25/03/2018 101
D 110 22/09/2010 105
D 133 14/05/2012 105
8 rows selected.
SQL>
I have below table:
student marks subject
------- ----- -------
AAA 67 ENG
AAA 78 MAT
CCC 88 SCI
I want it to convert as below:
student eng mat sci
------- --- --- ---
AAA 67 78
CCC 88
with dat as (
select 'AAA' stud, 67 mk, 'ENG' subj from dual union all
select 'AAA' stud, 78 mk, 'MAT' subj from dual union all
select 'CCC' stud, 88 mk, 'SCI' subj from dual )
SELECT *
FROM (SELECT stud, mk, subj from dat)
PIVOT (max(mk) for (subj) in ('ENG' as eng, 'MAT' as mat, 'SCI' as sci))
STUD ENG MAT SCI
AAA 67 78
CCC 88