Transforming columns to row values - sql

Have a table in Google Bigquery like this with 1 id column (customers) and 3 store-name columns:
id |PA|PB|Mall|
----|--|--|----|
3699|1 |1 | 1 |
1017| |1 | 1 |
9991|1 | | |
My objective is to have the option to select customers (id's) who visited for example:
ONLY PA
PA and PB
PA and Mall
PA, PB and Mall
One alternative output could be:
id |Store |
----|--------- |
3699|PA+PB+Mall|
1017|PB+Mall |
9991|PA |
However this would not give me counts of all stopping by PA regardless of other stores visited. In the example above that count would have been 2 (3699 and 9991).
A second alternative output could be:
id |Store|
----|-----|
3699|PA |
3699|PB |
3699|Mall |
1017|PB |
1017|Mall |
9991|PA |
However, this would not allow me (i think) to select/filter those who has visited for example BOTH PA and Mall (only 3699)
A third alternative output could be a combo:
id |Store| Multiple store|
----|-----|---------------|
3699|PA | PA+PB+Mall |
3699|PB | PA+PB+Mall |
3699|Mall | PA+PB+Mall |
1017|PB | PB+Mall |
1017|Mall | PB+Mall |
9991|PA | |
What option is the best and is there any other alternatives to achieve my objective? I believe alternative 3 could be best, but not sure how to achieve it.

It depends what you want. For instance, the third would simply be:
select t.*,
string_agg(store, '+') over (partition by id)
from t;
The second would be:
select id, string_agg(store, '+')
from t
group by id;

For the third option, you may try unpivoting your current table, then applying STRING_AGG to get the computed column containing all stores for each id:
WITH cte AS (
SELECT id, CASE WHEN PA = 1 THEN 'PA' END AS Store
FROM yourTable
UNION ALL
SELECT id, CASE WHEN PB = 1 THEN 'PB' END
FROM yourTable
UNION ALL
SELECT id, CASE WHEN Mall = 1 THEN 'Mall' END
FROM yourTable
)
SELECT id, Store,
STRING_AGG(store, '+') OVER (PARTITION BY id) All_Stores
FROM cte
WHERE Store IS NOT NULL
ORDER BY id, Store;

Consider below approaches to all three options
Assuming input data is filled with nulls when it is empty in question's sample
with `project.dataset.table` as (
select 3699 id, 1 PA, 1 PB, 1 Mall union all
select 1017, null, 1, 1 union all
select 9991, 1, null, null
)
Option #1
select id, string_agg(key, '+') as Store
from `project.dataset.table` t,
unnest(split(translate(to_json_string(t), '{}"', ''))) kv,
unnest([struct(split(kv,':')[offset(0)] as key, split(kv,':')[offset(1)] as value)])
where key !='id'
and value != 'null'
group by id
with output
Option #2
select id, key as Store
from `project.dataset.table` t,
unnest(split(translate(to_json_string(t), '{}"', ''))) kv,
unnest([struct(split(kv,':')[offset(0)] as key, split(kv,':')[offset(1)] as value)])
where key !='id'
and value != 'null'
with output
Option #3
select id, key as Store,
string_agg(key, '+') over(partition by id) as Multiple_Store
from `project.dataset.table` t,
unnest(split(translate(to_json_string(t), '{}"', ''))) kv,
unnest([struct(split(kv,':')[offset(0)] as key, split(kv,':')[offset(1)] as value)])
where key !='id'
and value != 'null'
with output

Related

How can I combine 3 columns into one

I have 3 columns called product_one, product_two and product_three and I want to combine all of them into one.
From this:
|product_one | product_two | product_three |
|------------------------------------------|
|spoon | phone | knife
|fork | case |
To this:
|products|
|--------
|spoon |
|fork |
|phone |
|case |
|knife |
How is this possible in sql?
Consider below approach
select products from your_table
unpivot (products for col in (product_one, product_two, product_three))
if applied to sample data in your question - output is
Another option that does not require specifying column names:
select split(kv, ':')[offset(1)] products
from your_table t,
unnest(split(translate(to_json_string(t), '{}"', ''))) kv
where split(kv, ':')[offset(1)] != 'null'
Yet another one -
select trim(value) products
from your_table t,
unnest(split(trim(format('%t', t), '()'))) value
where trim(value) != 'NULL'
all with same output
select t.product_one
from your_table t
union all
select t2.product_two
from your_table t2
union all
select t3.product_three
from your_table t3
By union:
select a.product_one as pr from a where a.product_one is not null
union
select b.product_two as pr from b where b.product_two is not null
union
select c.product_three as pr from c where c.product_three is not null

how to add an alphabet with condition value more than one in postgresql?

I want to add alphabet at the end of no_surat but with the condition, it has more than 1 same value in the account field, if account only has 1 value it will not add an alphabet
I already try this query but it also added an alphabet to account that has only 1 value like in account no 335, an account that has only 1 value like 355, should have no added alphabet account 335 should have no_suratABC No.SKF.161
SELECT ACCOUNT, NO_SURAT, no_suratABC
FROM
(SELECT *, concat((NO_SURAT) , CHR(64 + CAST ( row_number() OVER(PARTITION BY ACCOUNT_NUMBER order by ACCOUNT_NUMBER) AS integer ))) AS No_suratABC
FROM
(SELECT DISTINCT ON (ADDRESS) * FROM account_information) a) b;
I have this data record
|account | no_surat | no_suratABC|
|----- | ------ | ----- |
|337 | No.SKF.6 | No.SKF.6A |
|337 | No.SKF.5 | No.SKF.5B |
|337 | No.SKF.4 | No.SKF.4C |
|335 | No.SKF.161| No.SKF.161A|
|184 | No.SKF.105| No.SKF.105A|
|184 | No.SKF.71 | No.SKF.71B |
any suggestion on how I should add in my query?
You could use ROW_NUMBER here to choose which letter appears:
WITH cte AS (
SELECT DISTINCT ON (ADDRESS) *,
ROW_NUMBER() OVER (PARTITION BY ACCOUNT ORDER BY NO_SURAT DESC) rn,
COUNT(*) OVER (PARTITION BY ACCOUNT) cnt
FROM account_information
)
SELECT ACCOUNT, NO_SURAT,
CASE WHEN cnt > 1
THEN CONCAT(NO_SURAT, SUBSTRING('ABCDEFGHIJKLMNOPQRSTUVWXYZ' FROM rn::int FOR 1))
ELSE NO_SURAT END AS No_suratABC
FROM cte;
The trick used above is to take a 1-length substring of the alphabet string ABCDEFGHIJKLMNOPQRSTUVWXYZ, using the row number to decide which letter to choose. Note that I assume here that you would never require a letter label beyond Z, which assumes that each account would never have more than 26 records associate with it.
Your basic approach is fine. Just fix the syntax and put in the logic to do what you want. What you describe is:
SELECT ACCOUNT, NO_SURAT,
(CASE WHEN COUNT(*) OVER (PARTITION BY ACCOUNT) > 1
THEN NO_SURAT || (CHR(64 + CAST (ROW_NUMBER() OVER (PARTITION BY ACCOUNT ORDER BY ACCOUNT)::int)
ELSE NO_SURAT
END) as No_suratABC
FROM account_information;
However, your sample data suggests that you want:
SELECT ACCOUNT, NO_SURAT,
(NO_SURAT || CHR(64 + ROW_NUMBER() OVER (PARTITION BY ACCOUNT ORDER BY ACCOUNT)::int)) as No_suratABC
FROM account_information;
Here is a db<>fiddle.

How to pivot array build from REGEXP_EXTRACT_ALL

I'm collecting url with query parameters in a BigQuery table. I want to parse these urls and then pivot the table. Input data and expected Output at the end.
I found two queries that I want to merge.
This one to pivot my parsed url:
select id,
max(case when test.name='a' then test.score end) as a,
max(case when test.name='b' then test.score end) as b,
max(case when test.name='c' then test.score end) as c
from
(
select a.id, t
from `table` as a,
unnest(test) as t
)A group by id
then I have this query to parse the url:
WITH examples AS (
SELECT 1 AS id,
'?foo=bar' AS query,
'simple' AS description
UNION ALL SELECT 2, '?foo=bar&bar=baz', 'multiple params'
UNION ALL SELECT 3, '?foo[]=bar&foo[]=baz', 'arrays'
UNION ALL SELECT 4, '', 'no query'
)
SELECT
id,
query,
REGEXP_EXTRACT_ALL(query,r'(?:\?|&)((?:[^=]+)=(?:[^&]*))') as params,
REGEXP_EXTRACT_ALL(query,r'(?:\?|&)(?:([^=]+)=(?:[^&]*))') as keys,
REGEXP_EXTRACT_ALL(query,r'(?:\?|&)(?:(?:[^=]+)=([^&]*))') as values,
description
FROM examples
I'm not sure to explain my issues. But I think that is because when I'm splitting my query parameters as separate columns It doesn't match with the format of the first query where I need to merge the key and values under the same column so I can unnest them correctly.
Input data:
| id | url |
|---- |-------------------- |
| 1 | url/?foo=aaa&bar=ccc |
| 2 | url/?foo=bbb&bar=ccc |
expected output:
| id | foo | bar |
|---- |---- |---- |
| 1 | aaa | ccc |
| 2 | bbb | ccc |
I have exactly the same number of parameters
Use below
select id,
max(if(split(kv, '=')[offset(0)] = 'foo', split(kv, '=')[offset(1)], null)) as foo,
max(if(split(kv, '=')[offset(0)] = 'bar', split(kv, '=')[offset(1)], null)) as bar
from `project.dataset.table` t,
unnest(regexp_extract_all(url, r'[?&](\w+=\w+)')) kv
group by id
if applied to sample data in your question - output is

How can you sort string value or array in SQL

Hello Stackoverflow SQL experts,
What I am looking for:
A way to sort string of text in Snowflake SQL.
Example:
My table looks like something like this:
---------------------
| ID | REFS |
---------------------
| ID1 | 'ANN,BOB' |
| ID2 | 'BOB,ANN' |
---------------------
As you can see my ID1 and ID2 are referred by both Ann and Bob.
But because they were inputted in different orders, they aren't recognized as a group.
Is there a way to sort the String/list values in REF? to clean up REFs?
so when I do counts and group bys. it would be
--------------------------
| REFS | COUNT(ID) |
--------------------------
| 'ANN,BOB' | 2 |
--------------------------
Instead of....
--------------------------
| REFS | COUNT(ID) |
--------------------------
| 'ANN,BOB' | 1 |
| 'BOB,ANN' | 1 |
--------------------------
What I have tried:
TO_ARRAY(REFS) - But this just creates two lists, ['ANN','BOB'] and ['BOB','ANN']
SPLIT(REFS,',') - This also just creates
I have other REF lists containing all sorts of combinations.
'BOB,CHRIS,ANN'
'BOB,CHRIS'
'CHRIS'
'DAVE,ANN'
'ANN,ERIC'
'FRANK,BOB'
...
You should fix the data model! Storing multiple values in a string is a bad idea. That said, you can split, unnest, and reaggregate. I think this works in Snowflake:
select t.*,
(select list_agg(s.value, ',') within group (order by s.value)
from table(split_to_table(t.refs, ',')) s
) normalized_refs
from t;
WITH data(id, refs) as (
SELECT * FROM VALUES
('ID1', 'ANN,BOB'),
('ID2', 'BOB,ANN'),
('ID3', 'CHRIS,BOB,ANN')
)
SELECT order_arry, count(distinct(id)) as count
FROM (
SELECT array_agg(val) WITHIN GROUP (ORDER BY val) over (partition by id) as order_arry, id
FROM (
SELECT d.id, trim(s.value) as val
FROM data d, lateral split_to_table(d.refs, ',') s
)
)
GROUP BY 1 ORDER BY 1;
gives:
ORDER_ARRY COUNT
[ "ANN", "BOB" ] 2
[ "ANN", "BOB", "CHRIS" ] 1
but as Gordon notes, the partiton by is not needed thus the distinct is also not needed;
SELECT ordered_arry, count(id) as count
FROM (
SELECT id, array_agg(val) WITHIN GROUP (ORDER BY val) as ordered_arry
FROM (
SELECT d.id, trim(s.value) as val
FROM data d, lateral split_to_table(d.refs, ',') s
)
GROUP BY 1
)
GROUP BY 1 ORDER BY 1;

Oracle complex string replacement

I've got the following table
mytable
type | id | name | formula
"simple" | 1 | "COUNT" | "<1>"
"simple" | 2 | "DISTINCT" | "<2>"
"simple" | 3 | "mycol" | "<3>"
"complex" | 4 | null | "<1>(<2> <3>)"
Now I would like to read this table and add an additional column which replaces the formula string.
For id 4 I would need: "COUNT(DISTINCT mycol)"
Any idea how I can do that?
In Oracle 11 it may looks like this:
select
type, id, name, formula, value
from
mytable
left join (
select
id_complex,
listagg(decode(pos, 2, name, part)) within group (order by occ, pos) as value
from
(
select
id_complex, occ, pos,
regexp_replace(pair, '^(.*?)(<.*?>)$', '\'||pos) as part
from
(
select
id as id_complex,
occ,
regexp_substr(formula||'<>', '.*?<.*?>', 1, occ) as pair
from
(
select level as occ from dual
connect by level <= (select max(length(formula)) from mytable)
),
mytable
where type = 'complex'
),
(select level as pos from dual connect by level <= 2)
)
left join mytable on part = formula and type = 'simple'
group by id_complex
) on id = id_complex
order by id
SQL Fiddle