SQL - BigQuery - Using Group & MAX in several columns - Similar to a pivot table - sql

How would you approach this via SQL? Let's take this example
| id | type | score_a | score_b | score_c | label_a | label_b | label_c |
|----|------|---------|---------|---------|---------|---------|---------|
| 1 | A | 0.9 | | | L1 | | |
| 1 | B | | 0.7 | | | L2 | |
| 1 | B | | 0.2 | | | L3 | |
| 1 | C | | | 0.2 | | | L4 |
| 1 | C | | | 0.18 | | | L5 |
| 1 | C | | | 0.12 | | | L6 |
| 2 | A | 0.6 | | | L1 | | |
| 2 | A | 0.3 | | | L2 | | |
I want to return the max score per type in conjunction with the label_X, Almost, like a pivot table but with these custom column names. So the outcome of the above will be like:
| id | type | score_a | label_a | score_b | label_b | score_c | label_c |
|----|------|---------|---------|---------|---------|---------|---------|
| 1 | A | 0.9 | L1 | 0.7 | L2 | 0.2 | L4 |
| 2 | A | 0.6 | L1 | NULL | NULL | NULL | NULL |
Something like this is wrong as it yields both results per type per label
SELECT id,
MAX(score_a) as score_a,
label_a,
MAX(score_b) as score_b,
label_b as label_b,
MAX(score_c) as score_c,
label_c
FROM sample_table
GROUP BY id, label_a, label_b, label_c
Is there an easy way to do this via SQL, I'm doing it right now from BigQuery and tried also pivot table as described here but still no luck on how to flatten these into one big row with several columns
Any other ideas?
UPDATE
Expanding on what BGM mentioned about design; the source of this data is a table with the following form:
| id | type | label | score |
|----|------|-------|-------|
| 1 | A | L1 | 0.9 |
| 1 | B | L2 | 0.7 |
| 1 | B | L3 | 0.2 |
| 1 | C | L4 | 0.6 |
| 1 | C | L5 | 0.2 |
That gets converted to a flattened state as depicted at the top of this question using a query like
SELECT id,
type,
MAX(CASE WHEN type = 'A' THEN score ELSE 0 END) as score_a,
MAX(CASE WHEN type = 'B' THEN score ELSE 0 END) as score_b,
MAX(CASE WHEN type = 'C' THEN score ELSE 0 END) as score_c,
MAX(CASE WHEN model_type = 'theme' THEN label_score ELSE 0 END) as
-- labels
(CASE WHEN type = 'A' THEN label ELSE '' END) as label_a,
(CASE WHEN type = 'B' THEN label ELSE '' END) as label_b,
(CASE WHEN type = 'C' THEN label ELSE '' END) as label_c,
FROM table
GROUP id, label_a, label_b, label_c
Do you think the intermediate step is unnecessary to get to the final solution?

You can do conditional aggregation. In Big Query, arrays come handy for this:
select
id,
max(score_a) score_a,
array_agg(label_a order by score_a desc limit 1)[offset(0)] label_a,
max(score_b) score_b,
array_agg(label_b order by score_b desc limit 1)[offset(0)] label_b,
max(score_c) score_c,
array_agg(label_c order by score_c desc limit 1)[offset(0)] label_c
from mytable
group by id
Note: in terms of design, you should not have multiple columns to store the scores and labels per types; you already have a column that represents the types, so you should have just two columns for the store and type.

Related

SQL Select random rows partitioned by a column

I have a dataset looks like this
| Country | id |
-------------------
| a | 5 |
| a | 1 |
| a | 2 |
| b | 1 |
| b | 5 |
| b | 4 |
| b | 7 |
| c | 5 |
| c | 1 |
| c | 2 |
and i need a query which returns 2 random values from where country in ('a', 'c'):
| Country | id |
------------------
| a | 2 | -- Two random rows from Country = 'a'
| a | 1 |
| c | 1 |
| c | 5 | --Two random rows from Country = 'c'
This should work:
select Country, id from
(select Country,
id,
row_number() over(partition by Country order by rand()) as rn
from table_name
) t
where Country in ('a', 'c') and rn <= 2
Replace rand() with random() if you're using Postgres or newid() in SQL Server.

Boolean Condition Group By, Window Functions

I am trying to add a new column based on a condition in a group.
Could we do something like
BOOL() OVER (PARTITION BY id 'D' in val)
That is something like GROUP BY id and check if the value 'D" val column
Input:
-------------
| id | val |
------------|
| 1 | A |
| 1 | B |
| 1 | D |
| 2 | B |
| 2 | C |
| 2 | A |
Output
-------------------
| id | val | res |
------------|-----|
| 1 | A | 1 |
| 1 | B | 1 |
| 1 | D | 1 |
| 2 | B | 0 |
| 2 | C | 0 |
| 2 | A | 0 |
You didn't specify your DBMS, but in standard ANSI SQL, you can use a filter() clause:
count(*) filter (where val = 'D') over (partition by id) > 0
In Postgres, you can use bool_or() for this:
select t.*, bool_or(val = 'D') over(partition by id) res
from mytable t
Demo on DB Fiddle:
id | val | res
-: | :-- | :--
1 | A | t
1 | B | t
1 | D | t
2 | B | f
2 | C | f
2 | A | f
This gives you a boolean result. If you want it as an integer value instead, then:
(bool_or(val = 'D') over(partition by id))::int

Exclude select duplication

There is a table: prov_dl
| ID | Code | Value |
+----+----------+-------+
| 2 | PRC | 0,1701|
| 2 | Stad | 3 |
Data is stored in this form, that is,
there are several entries by code
You need to pull the data in this form:
| ID | Stadya | Percent |
+----+----------+-----------+
| 2 | 3 | 0,1701 |
I try this:
select id,
case when code='Stad' then Value end Stadya,
case when code='PRC' then Value end Percent
from prov_dl
| ID | Stadya | Percent|
+----+----------+--------+
| 2 | | 0,1701 |
| 2 | 3 | |
use max()
select id,
max(case when code='Stad' then Value end) as Stadya,
max(case when code='PRC' then Value end) as Percent
from prov_dl group by id

SQL Aggregation depending on value of attribute in unselected column

I've got a table TABLE1 like this:
|--------------|--------------|--------------|
| POS | UNIT | VOLUME |
|--------------|--------------|--------------|
| 1 | M2 | 20 |
| 1 | M2 | 30 |
| 1 | M3 | 40 |
| 2 | M2 | 100 |
| 2 | M3 | 20 |
| 3 | ST | 30 |
| 3 | M2 | 10 |
|--------------|--------------|--------------|
Depending on the value of the column UNIT I want to aggregate as follows (each UNIT becomes a new column with the sum of the according value):
|--------------|--------------|--------------|--------------|
| POS | VOLUME_M2 | VOLUME_M3 | VOLUME_ST |
|--------------|--------------|--------------|--------------|
| 1 | 50 | 40 | 0 |
| 2 | 100 | 20 | 0 |
| 3 | 10 | 0 | 30 |
|--------------|--------------|--------------|--------------|
My Solution is
SELECT POS,
CASE
WHEN UNIT = 'M2'
THEN SUM(VOLUME)
ELSE 0
END AS VOLUME_M2,
CASE
WHEN UNIT = 'M3'
THEN SUM(VOLUME)
ELSE 0
END AS VOLUME_M3,
CASE
WHEN UNIT = 'ST'
THEN SUM(VOLUME)
ELSE 0
END AS VOLUME_S
FROM TABLE1
GROUP BY POS, UNIT
My problem is, that my code does not work if I leave out UNIT in the GROUP BY statement (I either have to use it in my aggregation or in my GROUP BY statement)
Therefore I get something like this:
|--------------|--------------|--------------|--------------|
| POS | VOLUME_M2 | VOLUME_M3 | VOLUME_ST |
|--------------|--------------|--------------|--------------|
| 1 | 50 | 0 | 0 |
| 1 | 0 | 40 | 0 |
| 2 | 0 | 20 | 0 |
| 2 | 100 | 0 | 0 |
| 3 | 10 | 0 | 0 |
| 3 | 0 | 0 | 30 |
|--------------|--------------|--------------|--------------|
Besides, could anyone give me a hint, how it is possible to automatically get this type of result (especially if there are a lot of values for UNIT).
Close. For conditional aggregation, the case expression is an argument to the aggregation function:
SELECT POS,
SUM(CASE WHEN UNIT = 'M2' THEN VOLUME ELSE 0 END) AS VOLUME_M2,
SUM(CASE WHEN UNIT = 'M3' THEN VOLUME ELSE 0 END) AS VOLUME_M3,
SUM(CASE WHEN UNIT = 'ST' THEN VOLUME ELSE 0 END) AS VOLUME_ST
FROM TABLE1
GROUP BY POS;

Encapsulated Group by or conditional aggregation in Vertica DB

I have the following table in a Vertica DB:
+---------+-------+
| Readout | Event |
+---------+-------+
| 1 | A |
| 1 | B |
| 1 | A |
| 2 | B |
| 2 | A |
+---------+-------+
I would like to group each readout and count the frequency of the events, resulting in a table like this:
+---------+----------------+----------------+-----------------+
| Readout | Count(Readout) | Count(Event A) | Count (Event B) |
+---------+----------------+----------------+-----------------+
| 1 | 3 | 2 | 1 |
| 2 | 2 | 1 | 1 |
+---------+----------------+----------------+-----------------+
I am sure there is an easy GROUP BY command, but I can't wrap my head around it.
You want conditional aggregation:
select readout, count(*),
sum(case when event = 'A' then 1 else 0 end) as num_a,
sum(case when event = 'B' then 1 else 0 end) as num_b
from t
group by readout;