Bigquery: Retrieve information given unique combinations

Bigquery: Retrieve information given unique combinations - sql

Currently I have the following R data.table object with product/cities combinations:
product_code place
product1_code city1
product2_code city1
product3_code city1
product4_code city1
product1_code city2
product6_code city2
product9_code city3
What I would like to do is to pass the previous product_code/city combinations to a query string and then pass it to bigrquery. Something like the following:
SELECT *
FROM my.table
WHERE city AND product_code in (list.with.unique.previous.combinations)
However I don't any idea of how I can pass the unique combinations as a list so it only retreives the information for those specific combinations. I know that I can use glue library to pass single elements to the query string something like this:
SELECT *
FROM my.table
WHERE city = {city.selected} AND product_code = {product.code.selected}
but that would only work for 1 combination.
If anyone could give me an idea of how I could pass the entire list of combinations I would appreciate it.

Below example should give you an idea of how it is to be achieved in BigQuery Standard SQL
#standardSQL
WITH combinations AS (
SELECT 'product1_code' product_code, 'city1' place UNION ALL
SELECT 'product2_code', 'city1' UNION ALL
SELECT 'product3_code', 'city1' UNION ALL
SELECT 'product4_code', 'city1' UNION ALL
SELECT 'product1_code', 'city2' UNION ALL
SELECT 'product6_code', 'city2' UNION ALL
SELECT 'product9_code', 'city3'
)
SELECT *
FROM `project.dataset.table` t
WHERE (city, product_code) IN (
SELECT AS STRUCT place, product_code
FROM combinations
)
As you can see you need to combine city and product_code into STRUCT - (city, product_code) and look for it in list of combinations presented also as a struct via SELECT AS STRUCT place, product_code FROM combinations

Related

Use a CASE expression without typing matched conditions manually using PostgreSQL

I have a long and wide list, the following table is just an example. Table structure might look a bit horrible using SQL, but I was wondering whether there's a way to extract IDs' price using CASE expression without typing column names in order to match in the expression
IDs
A_Price
B_Price
C_Price
...
A
23
...
B
65
82
...
C
...
A
10
...
..
...
...
...
...
Table I want to achieve:
IDs
price
A
23;10
B
65
C
82
..
...
I tried:
SELECT IDs, string_agg(CASE IDs WHEN 'A' THEN A_Price
WHEN 'B' THEN B_Price
WHEN 'C' THEN C_Price
end::text, ';') as price
FROM table
GROUP BY IDs
ORDER BY IDs
To avoid typing A, B, A_Price, B_Price etc, I tried to format their names and call them from a subquery, but it seems that SQL cannot recognise them as columns and cannot call the corresponding values.
WITH CTE AS (
SELECT IDs, IDs||'_Price' as t FROM ID_list
)
SELECT IDs, string_agg(CASE IDs WHEN CTE.IDs THEN CTE.t
end::text, ';') as price
FROM table
LEFT JOIN CTE cte.IDs=table.IDs
GROUP BY IDs
ORDER BY IDs

You can use a document type like json or hstore as stepping stone:
Basic query:
SELECT t.ids
, to_json(t.*) ->> (t.ids || '_price') AS price
FROM tbl t;
to_json() converts the whole row to a JSON object, which you can then pick a (dynamically concatenated) key from.
Your aggregation:
SELECT t.ids
, string_agg(to_json(t.*) ->> (t.ids || '_price'), ';') AS prices
FROM tbl t
GROUP BY 1
ORDER BY 1;
Converting the whole (big?) row adds some overhead, but you have to read the whole table for your query anyway.

A union would be one approach here:
SELECT IDs, A_Price FROM yourTable WHERE A_Price IS NOT NULL
UNION ALL
SELECT IDs, B_Price FROM yourTable WHERE B_Price IS NOT NULL
UNION ALL
SELECT IDs, C_Price FROM yourTable WHERE C_Price IS NOT NULL;

Unnest array of integers SQL BigQuery

I cannot seem to find anything that helps with unnesting a list of integers in SQL BigQuery.
I've tried using select * from my_table, unnest(column) as column but I get this error:
Values referenced in UNNEST must be arrays. UNNEST contains expression of type STRUCT<os STRING, product_ids STRING...
|Product IDs|
|[123456,234567,345678,456789]|
|[987654,876543,765432,654321]|
I basically want to to get it so that I just have each product number on a separate line. So...
|Product IDs|
|123456|
|234567|
|345678|
|456789|
|987654|
|876543|
|765432|
|654321|
EDIT:
sorry forgot to add in - there is a customer id, then the product_ids has the list of product numbers.
So I want the customer_id, and the product ids on separate lines.
so output for 1 customer to be like the below:

Giving you 4 solutions, see which one fits your case:
-- Solution1
select product_id from
(select "123,234,345,456,456,678,789" as product_ids),unnest(split(product_ids)) as product_id
--Solution2
select product_id from
(select struct("ios" as os, "123,234,345,456,456,678,789" as product_ids) as os_products), unnest(split(os_products.product_ids)) as product_id
--Solution3
select product_id from
(select array_agg(os_products) as os_products from
(select struct("ios" as os, "123,234,345,456,456,678,789" as product_ids) as os_products
union all
select struct("android" as os, "abc,cde,efg" as product_ids) as os_products
)), unnest(os_products) as op, unnest(split(op.product_ids)) as product_id
--Solution4
select product_id from
(select array_agg(os_products) as os_products from
(select struct("ios" as os, split("123,234,345,456,456,678,789") as product_ids) as os_products
union all
select struct("android" as os, split("abc,cde,efg") as product_ids) as os_products
)), unnest(os_products) as op, unnest((op.product_ids)) as product_id
========
Based on your latest re-edit of your question and having total 2 columns in following way: customer_id, array of product_ids
select customer_id, product_id from
(select "customer1" as customer_id, split("123,234,345,456,456,678,789") as product_ids), unnest(product_ids) as product_id
This will give you customer_id and product_id , 1 per each row.

Based on the error message it seems like there are additional fields in your struct. To unnest you will need to isolate the array, try the following:
with sample_data as (
select [STRUCT('linux' as os, [123456,234567,345678,456789] as product_id),
STRUCT('macos' as os, [987654,876543,765432,654321] as product_id)] as values
)
select pid
from sample_data,
UNNEST(values) v,
UNNEST(v.product_id) pid

convert data from multiple columns into single row sorting descending

I am trying to query the original source which contain totals from a category (in this case Vehicles) into the second table.
Motorcycle
Bicycle
Car
1
3
2
Desired Output:
Vehicle
Quantity
Bicycle
3
Car
2
Motorcycle
1
Additionally, I need that the Quantity is sorted in descending order like showing above.
So far I have tried to do an Unpivot, but there is a syntax error in the Unpivot function. Is there another way to reach out the same results?
My code so far:
SELECT Vehicle_Name
FROM
(
SELECT [Motorcycle], [Bycycle], [Car] from Data
) as Source
UNPIVOT
(
Vehicle FOR Vehicle_Name IN ([Motorcycle], [Bycycle], [Car])
) as Unpvt

Edit: Added sort requirement.
You can use CROSS APPLY here too
select vehicle, amnt
from test
cross apply(
VALUES('motorcycle', motorcycle)
,('bicycle', bicycle)
,('car', car)) x (vehicle, amnt)
order by amnt desc
Fiddle here

Try this
with data1 as
(
Select * from data)
Select * From
(
Select 'motorcycle' as "Vehicle", motorcycle as quantity from data1
union all
Select 'bicycle' , bicycle from data1
union all
Select 'car', car from data1
) order by quantity desc;

Since we don't know what DBMS, here's a way that'd work in the one I use the most.
SELECT *
FROM (SELECT map_from_entries(
ARRAY[('Motorcycle', Motorcycle),
('Bicycle', Bicycle),
('Car', Car)])
FROM Source) AS t1(type_quant)
CROSS JOIN UNNEST(type_quant) AS t2(Vehicle, Quantity)
ORDER BY Quantity DESC
-Trino

SQL unique combinations

I have a table with three columns with an ID, a therapeutic class, and then a generic name. A therapeutic class can be mapped to multiple generic names.
ID therapeutic_class generic_name
1 YG4 insulin
1 CJ6 maleate
1 MG9 glargine
2 C4C diaoxy
2 KR3 supplies
3 YG4 insuilin
3 CJ6 maleate
3 MG9 glargine
I need to first look at the individual combinations of therapeutic class and generic name and then want to count how many patients have the same combination. I want my output to have three columns: one being the combo of generic names, the combo of therapeutic classes and the count of the number of patients with the combination like this:
Count Combination_generic combination_therapeutic
2 insulin, maleate, glargine YG4, CJ6, MG9
1 supplies, diaoxy C4C, KR3

One way to match patients by the sets of pairs (therapeutic_class, generic_name) is to create the comma-separated strings in your desired output, and to group by them and count. To do this right, you need a way to identify the pairs. See my Comment under the original question and my Comments to Gordon's Answer to understand some of the issues.
I do this identification in some preliminary work in the solution below. As I mentioned in my Comment, it would be better if the pairs and unique ID's existed already in your data model; I create them on the fly.
Important note: This assumes the comma-separated lists don't become too long. If you exceed 4000 characters (or approx. 32000 characters in Oracle 12, with certain options turned on), you CAN aggregate the strings into CLOBs, but you CAN'T GROUP BY CLOBs (in general, not just in this case), so this approach will fail. A more robust approach is to match the sets of pairs, not some aggregation of them. The solution is more complicated, I will not cover it unless it is needed in your problem.
with
-- Begin simulated data (not part of the solution)
test_data ( id, therapeutic_class, generic_name ) as (
select 1, 'GY6', 'insulin' from dual union all
select 1, 'MH4', 'maleate' from dual union all
select 1, 'KJ*', 'glargine' from dual union all
select 2, 'GY6', 'supplies' from dual union all
select 2, 'C4C', 'diaoxy' from dual union all
select 3, 'GY6', 'insulin' from dual union all
select 3, 'MH4', 'maleate' from dual union all
select 3, 'KJ*', 'glargine' from dual
),
-- End of simulated data (for testing purposes only).
-- SQL query solution continues BELOW THIS LINE
valid_pairs ( pair_id, therapeutic_class, generic_name ) as (
select rownum, therapeutic_class, generic_name
from (
select distinct therapeutic_class, generic_name
from test_data
)
),
first_agg ( id, tc_list, gn_list ) as (
select t.id,
listagg(p.therapeutic_class, ',') within group (order by p.pair_id),
listagg(p.generic_name , ',') within group (order by p.pair_id)
from test_data t join valid_pairs p
on t.therapeutic_class = p.therapeutic_class
and t.generic_name = p.generic_name
group by t.id
)
select count(*) as cnt, tc_list, gn_list
from first_agg
group by tc_list, gn_list
;
Output:
CNT TC_LIST GN_LIST
--- ------------------ ------------------------------
1 GY6,C4C supplies,diaoxy
2 GY6,KJ*,MH4 insulin,glargine,maleate

You are looking for listagg() and then another aggregation. I think:
select therapeutics, generics, count(*)
from (select id, listagg(therapeutic_class, ', ') within group (order by therapeutic_class) as therapeutics,
listagg(generic_name, ', ') within group (order by generic_name) as generics
from t
group by id
) t
group by therapeutics, generics;

Selecting distinct values for multiple columns

I have a table where many pieces of data match to one in another column, similar to a tree, and then data at the 'leaf' about each specific leaf
eg
Food Group Name Caloric Value
Vegetables Broccoli 100
Vegetables Carrots 80
Fruits Apples 120
Fruits Bananas 120
Fruits Oranges 90
I would like to design a query that will return only the distinct values of each column, and then nulls to cover the overflow
eg
Food group Name Caloric Value
Vegetables Broccoli 100
Fruit Carrots 80
Apples 120
Bananas 90
Oranges
I'm not sure if this is possible, right now I've been trying to do it with cases, however I was hoping there would be a simpler way

Seems like you are simply trying to have all the distinct values at hand. Why? For displaying purposes? It's the application's job, not the server's. You could simply have three queries like this:
SELECT DISTINCT [Food Group] FROM atable;
SELECT DISTINCT Name FROM atable;
SELECT DISTINCT [Caloric Value] FROM atable;
and display their results accordingly.
But if you insist on having them all in one table, you might try this:
WITH atable ([Food Group], Name, [Caloric Value]) AS (
SELECT 'Vegetables', 'Broccoli', 100 UNION ALL
SELECT 'Vegetables', 'Carrots', 80 UNION ALL
SELECT 'Fruits', 'Apples', 120 UNION ALL
SELECT 'Fruits', 'Bananas', 120 UNION ALL
SELECT 'Fruits', 'Oranges', 90
),
atable_numbered AS (
SELECT
[Food Group], Name, [Caloric Value],
fg_rank = DENSE_RANK() OVER (ORDER BY [Food Group]),
n_rank = DENSE_RANK() OVER (ORDER BY Name),
cv_rank = DENSE_RANK() OVER (ORDER BY [Caloric Value])
FROM atable
)
SELECT
fg.[Food Group],
n.Name,
cv.[Caloric Value]
FROM (
SELECT fg_rank FROM atable_numbered UNION
SELECT n_rank FROM atable_numbered UNION
SELECT cv_rank FROM atable_numbered
) r (rank)
LEFT JOIN (
SELECT DISTINCT [Food Group], fg_rank
FROM atable_numbered) fg ON r.rank = fg.fg_rank
LEFT JOIN (
SELECT DISTINCT Name, n_rank
FROM atable_numbered) n ON r.rank = n.n_rank
LEFT JOIN (
SELECT DISTINCT [Caloric Value], cv_rank
FROM atable_numbered) cv ON r.rank = cv.cv_rank
ORDER BY r.rank

I guess what I would want to know is why you need this in one result set? What does the code look like that would consume this result? The attributes on each row have nothing to do with each other. If you want to, say, build the contents of a set of drop-down boxes, you're better off doing these one at a time. In your requested result set, you'd need to iterate through the dataset three times to do anything useful, and you would need to either check for NULL each time or needlessly iterate all the way to the end of the dataset.
If this is in a stored procedure, couldn't you run three separate SELECT DISTINCT and return the values as three results. Then you can consume them one at a time, which is what you would be doing anyway I would guess.
If there REALLY IS a connection between the values, you could add each of the results to an array or list, then access all three lists in parallel using the index.

Something like this maybe?
select *
from (
select case
when row_number() over (partition by fruit_group) = 1 then fruit_group
else null
end as fruit_group,
case
when row_number() over (partition by name) = 1 then name
else null
end as name,
case
when row_number() over (partition by caloric) = 1 then caloric
else null
end as caloric
from your_table
) t
where fruit_group is not null
or name is not null
or caloric is not null
But I fail to see any sense in this

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Bigquery: Retrieve information given unique combinations - sql

Related

Use a CASE expression without typing matched conditions manually using PostgreSQL

Unnest array of integers SQL BigQuery

convert data from multiple columns into single row sorting descending

SQL unique combinations

Selecting distinct values for multiple columns

Categories

Resources