Use a CASE expression without typing matched conditions manually using PostgreSQL - sql

I have a long and wide list, the following table is just an example. Table structure might look a bit horrible using SQL, but I was wondering whether there's a way to extract IDs' price using CASE expression without typing column names in order to match in the expression
IDs
A_Price
B_Price
C_Price
...
A
23
...
B
65
82
...
C
...
A
10
...
..
...
...
...
...
Table I want to achieve:
IDs
price
A
23;10
B
65
C
82
..
...
I tried:
SELECT IDs, string_agg(CASE IDs WHEN 'A' THEN A_Price
WHEN 'B' THEN B_Price
WHEN 'C' THEN C_Price
end::text, ';') as price
FROM table
GROUP BY IDs
ORDER BY IDs
To avoid typing A, B, A_Price, B_Price etc, I tried to format their names and call them from a subquery, but it seems that SQL cannot recognise them as columns and cannot call the corresponding values.
WITH CTE AS (
SELECT IDs, IDs||'_Price' as t FROM ID_list
)
SELECT IDs, string_agg(CASE IDs WHEN CTE.IDs THEN CTE.t
end::text, ';') as price
FROM table
LEFT JOIN CTE cte.IDs=table.IDs
GROUP BY IDs
ORDER BY IDs

You can use a document type like json or hstore as stepping stone:
Basic query:
SELECT t.ids
, to_json(t.*) ->> (t.ids || '_price') AS price
FROM tbl t;
to_json() converts the whole row to a JSON object, which you can then pick a (dynamically concatenated) key from.
Your aggregation:
SELECT t.ids
, string_agg(to_json(t.*) ->> (t.ids || '_price'), ';') AS prices
FROM tbl t
GROUP BY 1
ORDER BY 1;
Converting the whole (big?) row adds some overhead, but you have to read the whole table for your query anyway.

A union would be one approach here:
SELECT IDs, A_Price FROM yourTable WHERE A_Price IS NOT NULL
UNION ALL
SELECT IDs, B_Price FROM yourTable WHERE B_Price IS NOT NULL
UNION ALL
SELECT IDs, C_Price FROM yourTable WHERE C_Price IS NOT NULL;

Related

Add a column to BigQuery results that adds a description for an ID in a column from the results

I am using BQ to pull some data and I need to add a column to the results that includes a lookup.
SELECT
timestamp_trunc(a.timestamp,day) date,
a.custom_parameter1,
a.custom_parameter2,
a.score,
a.type,
b.ref
FROM
`data-views_batch_20221021` a
left outer join (select client_uuid,STRING_AGG(document_referrer, "," LIMIT 1) ref from `activities_batch_20221021` where app_id="12345" and document_referrer is not null group by client_uuid) b using (client_uuid)
WHERE
a.app_id="12345"
How can I add a column that takes an array in a.type and looks up each value in the dict. I currently do this in Python and it looks up the values in a dict but I want to include it in the query.
The dict is:
{23:"Description1", 24:"Description2", 25:"Description3"}
I don't have these values in a table within BQ, can I include it within the query? There are about 14 total descriptions to map.
My end result would look like this:
date | custom_parameter1 | customer_paramter2 | score | types | ref | type_descriptions
Edited to add that types is an array.
I don't have these values in a table within BQ, can I include it within the query?
Yes, you can have them as CTE as in below example
with dict as (
select 23 type, "description1" type_description union all
select 24, "description2" union all
select 25, "description3"
)
select
timestamp_trunc(a.timestamp,day) date,
a.custom_parameter1,
a.custom_parameter2,
a.score,
a.type,
b.ref,
type_description
from `data-views_batch_20221021` a
left outer join (
select client_uuid, string_agg(document_referrer, "," limit 1) ref
from `activities_batch_20221021`
where app_id="12345" and document_referrer is not null
group by client_uuid
) b using (client_uuid)
left join dict using (type)
where a.app_id="12345"
There are about 14 total descriptions to map
You can add to dict CTE as many as you need

Efficiently outer join two array columns row-wise in BigQuery table

I'll first state the question as simply as possible, then elaborate with more detail and an example.
Concise question without context
I have a table with rows containing columns of arrays. I need to outer join the elements of some pairs of these, compute some variables, and then aggregate the results back into a new array. I'm currently using a pattern where I:
unnest each column in the pair to be joined (cross join to PK of row)
full outer join the two on the PK and compute desired fields
group by PK to get back to single row with array column that summarizes the results
Is there a way to do this without the multiple unnesting and grouping back down?
More context and an example
I have a table which represents edits to an entity that is made up of multiple sub-records. Each row represents a single entity. There is a column before that contains the records before the edit, and another after that contains the records afterwards.
My goal is to label each sub-record with exactly one of the four valid edit types:
DELETE - record exists in before but not after
ADD - record exists in after but not before
EDIT - record exists in both before and after but any field was changed
NONE - record exists in both before and after and no fields were changed
Each of the sub-record values is represented by its ID and a hash of all of its fields. I've created some fake data and provided my initial implementation below. This works, but it seems very roundabout.
WITH source_data AS (
SELECT
1 AS pkField,
[
STRUCT(1 AS id, 1 AS fieldHash),
STRUCT(2 AS id, 2 AS fieldHash),
STRUCT(3 AS id, 3 AS fieldHash)
] AS before,
[
STRUCT(1 AS id, 1 AS fieldHash),
STRUCT(2 AS id, 0 AS fieldHash), -- record 2 edited
-- record 3 deleted
STRUCT(4 AS id, 4 AS fieldHash), -- record 4 added
STRUCT(5 AS id, 5 AS fieldHash) -- record 5 added
] AS after
)
SELECT
pkField,
ARRAY_AGG(STRUCT(
id,
CASE
WHEN beforeHash IS NULL THEN "ADD"
WHEN afterHash IS NULL THEN "DELETE"
WHEN beforeHash <> afterHash THEN "EDIT"
ELSE "NONE"
END AS editType
)) AS edits
FROM (
SELECT pkField, id, fieldHash AS beforeHash
FROM source_data
CROSS JOIN UNNEST(source_data.before)
)
FULL OUTER JOIN (
SELECT pkField, id, fieldHash AS afterHash
FROM source_data
CROSS JOIN UNNEST(source_data.after)
)
USING (pkField, id)
GROUP BY pkField
Is there a simpler and/or more efficient way to do this? Perhaps something that avoids the multiple unnesting and grouping back down?
I think, what you have is already simple and efficient way!
Meantime, you can consider below optimized version
select pkField,
array(select struct(
id, case
when b.fieldHash is null then 'ADD'
when a.fieldHash is null then 'DELETE'
when b.fieldHash != a.fieldHash then 'EDIT'
else 'NONE'
end as editType
) edits
from (select id, fieldHash from t.before) b
full outer join (select id, fieldHash from t.after) a
using(id)
) edits
from source_data t
if applied to sample data in your question - output is

How to unnest BigQuery nested records into multiple columns

I am trying to unnest the below table .
Using the below unnest query to flatten the table
SELECT
id,
name ,keyword
FROM `project_id.dataset_id.table_id`
,unnest (`groups` ) as `groups`
where id = 204358
Problem is , this duplicates the rows (except name) as is the case with flattening the table.
How can I modify the query to put the names in two different columns rather than rows.
Expected output below -
That's because the comma is a cross join - in combination with an unnested array it is a lateral cross join. You repeat the parent row for every row in the array.
One problem with pivoting arrays is that arrays can have a variable amount of rows, but a table must have a fixed amount of columns.
So you need a way to decide for a certain row that becomes a certain column.
E.g. with
SELECT
id,
name,
groups[ordinal(1)] as firstArrayEntry,
groups[ordinal(2)] as secondArrayEntry,
keyword
FROM `project_id.dataset_id.table_id`
unnest(groups)
where id = 204358
If your array had a key-value pair you could decide using the key. E.g.
SELECT
id,
name,
(select value from unnest(groups) where key='key1') as key1,
keyword
FROM `project_id.dataset_id.table_id`
unnest(groups)
where id = 204358
But that doesn't seem to be the case with your table ...
A third option could be PIVOT in combination with your cross-join solution but this one has restrictions too: and I'm not sure how computation-heavy this is.
Consider below simple solution
select * from (
select id, name, keyword, offset
from `project_id.dataset_id.table_id`,
unnest(`groups`) with offset
) pivot (max(name) name for offset + 1 in (1, 2))
if applied to sample data in your question - output is
Note , when you apply to your real case - you just need to know how many such name_NNN columns to expect and extend respectively list - for example for offset + 1 in (1, 2, 3, 4, 5)) if you expect 5 such columns
In case if for whatever reason you want improve this - use below where everything is built dynamically for you so you don't need to know in advance how many columns it will be in the output
execute immediate (select '''
select * from (
select id, name, keyword, offset
from `project_id.dataset_id.table_id`,
unnest(`groups`) with offset
) pivot (max(name) name for offset + 1 in (''' || string_agg('' || pos, ', ') || '''))
'''
from (select pos from (
select max(array_length(`groups`)) cnt
from `project_id.dataset_id.table_id`
), unnest(generate_array(1, cnt)) pos
))
Your question is a little unclear, because it does not specify what to do with other keywords or other columns. If you specifically want the first two values in the array for keyword "OVG", you can unnest the array and pull out the appropriate names:
SELECT id,
(SELECT g.name
FROM UNNEST(t.groups) g WITH OFFSET n
WHERE key = 'OVG'
ORDER BY n
LIMIT 1
) as name_1,
(SELECT g.name
FROM UNNEST(t.groups) g WITH OFFSET n
WHERE key = 'OVG'
ORDER BY n
LIMIT 1 OFFSET 1
) as name_2,
'OVG' as keyword
FROM `project_id.dataset_id.table_id` t
WHERE id = 204358;

SQL unique combinations

I have a table with three columns with an ID, a therapeutic class, and then a generic name. A therapeutic class can be mapped to multiple generic names.
ID therapeutic_class generic_name
1 YG4 insulin
1 CJ6 maleate
1 MG9 glargine
2 C4C diaoxy
2 KR3 supplies
3 YG4 insuilin
3 CJ6 maleate
3 MG9 glargine
I need to first look at the individual combinations of therapeutic class and generic name and then want to count how many patients have the same combination. I want my output to have three columns: one being the combo of generic names, the combo of therapeutic classes and the count of the number of patients with the combination like this:
Count Combination_generic combination_therapeutic
2 insulin, maleate, glargine YG4, CJ6, MG9
1 supplies, diaoxy C4C, KR3
One way to match patients by the sets of pairs (therapeutic_class, generic_name) is to create the comma-separated strings in your desired output, and to group by them and count. To do this right, you need a way to identify the pairs. See my Comment under the original question and my Comments to Gordon's Answer to understand some of the issues.
I do this identification in some preliminary work in the solution below. As I mentioned in my Comment, it would be better if the pairs and unique ID's existed already in your data model; I create them on the fly.
Important note: This assumes the comma-separated lists don't become too long. If you exceed 4000 characters (or approx. 32000 characters in Oracle 12, with certain options turned on), you CAN aggregate the strings into CLOBs, but you CAN'T GROUP BY CLOBs (in general, not just in this case), so this approach will fail. A more robust approach is to match the sets of pairs, not some aggregation of them. The solution is more complicated, I will not cover it unless it is needed in your problem.
with
-- Begin simulated data (not part of the solution)
test_data ( id, therapeutic_class, generic_name ) as (
select 1, 'GY6', 'insulin' from dual union all
select 1, 'MH4', 'maleate' from dual union all
select 1, 'KJ*', 'glargine' from dual union all
select 2, 'GY6', 'supplies' from dual union all
select 2, 'C4C', 'diaoxy' from dual union all
select 3, 'GY6', 'insulin' from dual union all
select 3, 'MH4', 'maleate' from dual union all
select 3, 'KJ*', 'glargine' from dual
),
-- End of simulated data (for testing purposes only).
-- SQL query solution continues BELOW THIS LINE
valid_pairs ( pair_id, therapeutic_class, generic_name ) as (
select rownum, therapeutic_class, generic_name
from (
select distinct therapeutic_class, generic_name
from test_data
)
),
first_agg ( id, tc_list, gn_list ) as (
select t.id,
listagg(p.therapeutic_class, ',') within group (order by p.pair_id),
listagg(p.generic_name , ',') within group (order by p.pair_id)
from test_data t join valid_pairs p
on t.therapeutic_class = p.therapeutic_class
and t.generic_name = p.generic_name
group by t.id
)
select count(*) as cnt, tc_list, gn_list
from first_agg
group by tc_list, gn_list
;
Output:
CNT TC_LIST GN_LIST
--- ------------------ ------------------------------
1 GY6,C4C supplies,diaoxy
2 GY6,KJ*,MH4 insulin,glargine,maleate
You are looking for listagg() and then another aggregation. I think:
select therapeutics, generics, count(*)
from (select id, listagg(therapeutic_class, ', ') within group (order by therapeutic_class) as therapeutics,
listagg(generic_name, ', ') within group (order by generic_name) as generics
from t
group by id
) t
group by therapeutics, generics;

Group rows with similar strings

I have searched a lot, but most of solutions are for concatenation option and not what I really want.
I have a table called X (in a Postgres database):
anm_id anm_category anm_sales
1 a_dog 100
2 b_dog 50
3 c_dog 60
4 a_cat 70
5 b_cat 80
6 c_cat 40
I want to get total sales by grouping 'a_dog', 'b_dog', 'c_dog' as dogs and 'a_cat', 'b_cat', 'c_cat' as cats.
I cannot change the data in the table as it is an external data base from which I am supposed to get information only.
How to do this using an SQL query? It does not need to be specific to Postgres.
Use case statement to group the animals of same categories together
SELECT CASE
WHEN anm_category LIKE '%dog' THEN 'Dogs'
WHEN anm_category LIKE '%cat' THEN 'cats'
ELSE 'Others'
END AS Animals_category,
Sum(anm_sales) AS total_sales
FROM yourtables
GROUP BY CASE
WHEN anm_category LIKE '%dog' THEN 'Dogs'
WHEN anm_category LIKE '%cat' THEN 'cats'
ELSE 'Others'
END
Also this query should work with most of the databases.
By using PostgreSQL's split_part()
select animal||'s' animal_cat,count(*) total_sales,sum(anm_sales) sales_sum from(
select split_part(anm_cat,'_',2) animal,anm_sales from x
)t
group by animal
sqlfiddle
By creating split_str() in MySQL
select animal||'s' animal_cat,count(*) total_sales,sum(anm_sales) sales_sum from(
select split_str(anm_cat,'_',2) animal,anm_sales from x
)t
group by animal
sqlfiddle
You could group by a substr of anm_catogery:
SELECT SUBSTR(anm_catogery, 3) || 's', COUNT(*)
FROM x
GROUP BY anm_catogery
If you have a constant length of the appendix like in the example:
SELECT CASE right(anm_category, 3) AS animal_type -- 3 last char
, sum(anm_sales) AS total_sales
FROM x
GROUP BY 1;
You don't need a CASE statement at all, but if you use one, make it a "simple" CASE:
Simplify nested case when statement
Use a positional reference instead of repeating a possibly lengthy expression.
If the length varies, but there is always a single underscore like in the example:
SELECT split_part(anm_category, '_', 2) AS animal_type -- word after "_"
, sum(anm_sales) AS total_sales
FROM x
GROUP BY 1;