SQL - Create an array based on the values of two other columns - sql

I have the following data:
-----------------------------------------
| client_id | link_hash_a | link_hash_b |
-----------------------------------------
| 1 | abc | xyz |
| 2 | def | xyz |
| 3 | def | uvw |
-----------------------------------------
I would like to create an array of client_id that are linked with the two hash values from the columns link_hash_a and link_hash_b using SQL.
In the current situation, the result would be a unique array with the value {1,2,3} because the clients 1 and 2 are linked with the value xyz of the link_hash_b column and the client 2 and 3 are linked with the value def of the link_hash_a column.
Is there a way to do that with an SQL query? Thank you really much for your input.

As alternative can be used this way:
SELECT groupUniqArrayArray(client_ids) client_ids
FROM (
SELECT link_hash, groupArray(client_id) client_ids
FROM (
SELECT DISTINCT client_id, arrayJoin([link_hash_a, link_hash_b]) as link_hash
FROM (
/* test data */
SELECT data.1 client_id, data.2 link_hash_a, data.3 link_hash_b
FROM (
SELECT arrayJoin([
(1, 'abc', 'xyz'),
(2, 'def', 'xyz'),
(3, 'def', 'uvw')]) data)))
GROUP BY link_hash
HAVING count() = 2)
/* result
┌─client_ids─┐
│ [2,1,3] │
└────────────┘
*/

I think I found a way through. I used another column which is the club_id of which the clients are part of. In this case, the clients 1, 2 and 3 are all part of the club_id 1 for example.
Here is my code using ClickHouse SQL and taken into account that input_table is the table of data as shown in the question:
SELECT club_id, arrayConcat( clt_a, clt_b ) as tot_clt_arr, arrayUniq( arrayConcat( clt_a, clt_b ) ) as tot_clt
FROM
(
SELECT club_id, clt_a
FROM
(
SELECT club_id, link_hash_a, groupUniqArray(client_id) as clt_a
FROM input_table
GROUP BY club_id, link_hash_a
)
WHERE length(clt_a) >= 2
) JOIN
(
SELECT club_id, clt_b
FROM
(
SELECT club_id, link_hash_b, groupUniqArray(client_id) as clt_b
FROM input_table
GROUP BY club_id, link_hash_b
)
WHERE length(clt_b) >= 2
)
USING club_id
GROUP BY club_id, tot_clt_arr;
It returns the array of client_id as well as the number of unique client_id in the tot_clt column.
Thank you #TomášZáluský for your help.

Related

How can you sort string value or array in SQL

Hello Stackoverflow SQL experts,
What I am looking for:
A way to sort string of text in Snowflake SQL.
Example:
My table looks like something like this:
---------------------
| ID | REFS |
---------------------
| ID1 | 'ANN,BOB' |
| ID2 | 'BOB,ANN' |
---------------------
As you can see my ID1 and ID2 are referred by both Ann and Bob.
But because they were inputted in different orders, they aren't recognized as a group.
Is there a way to sort the String/list values in REF? to clean up REFs?
so when I do counts and group bys. it would be
--------------------------
| REFS | COUNT(ID) |
--------------------------
| 'ANN,BOB' | 2 |
--------------------------
Instead of....
--------------------------
| REFS | COUNT(ID) |
--------------------------
| 'ANN,BOB' | 1 |
| 'BOB,ANN' | 1 |
--------------------------
What I have tried:
TO_ARRAY(REFS) - But this just creates two lists, ['ANN','BOB'] and ['BOB','ANN']
SPLIT(REFS,',') - This also just creates
I have other REF lists containing all sorts of combinations.
'BOB,CHRIS,ANN'
'BOB,CHRIS'
'CHRIS'
'DAVE,ANN'
'ANN,ERIC'
'FRANK,BOB'
...
You should fix the data model! Storing multiple values in a string is a bad idea. That said, you can split, unnest, and reaggregate. I think this works in Snowflake:
select t.*,
(select list_agg(s.value, ',') within group (order by s.value)
from table(split_to_table(t.refs, ',')) s
) normalized_refs
from t;
WITH data(id, refs) as (
SELECT * FROM VALUES
('ID1', 'ANN,BOB'),
('ID2', 'BOB,ANN'),
('ID3', 'CHRIS,BOB,ANN')
)
SELECT order_arry, count(distinct(id)) as count
FROM (
SELECT array_agg(val) WITHIN GROUP (ORDER BY val) over (partition by id) as order_arry, id
FROM (
SELECT d.id, trim(s.value) as val
FROM data d, lateral split_to_table(d.refs, ',') s
)
)
GROUP BY 1 ORDER BY 1;
gives:
ORDER_ARRY COUNT
[ "ANN", "BOB" ] 2
[ "ANN", "BOB", "CHRIS" ] 1
but as Gordon notes, the partiton by is not needed thus the distinct is also not needed;
SELECT ordered_arry, count(id) as count
FROM (
SELECT id, array_agg(val) WITHIN GROUP (ORDER BY val) as ordered_arry
FROM (
SELECT d.id, trim(s.value) as val
FROM data d, lateral split_to_table(d.refs, ',') s
)
GROUP BY 1
)
GROUP BY 1 ORDER BY 1;

How to convert JSONB array of pair values to rows and columns?

Given that I have a jsonb column with an array of pair values:
[1001, 1, 1002, 2, 1003, 3]
I want to turn each pair into a row, with each pair values as columns:
| a | b |
|------|---|
| 1001 | 1 |
| 1002 | 2 |
| 1003 | 3 |
Is something like that even possible in an efficient way?
I found a few inefficient (slow) ways, like using LEAD(), or joining the same table with the value from next row, but queries take ~ 10 minutes.
DDL:
CREATE TABLE products (
id int not null,
data jsonb not null
);
INSERT INTO products VALUES (1, '[1001, 1, 10002, 2, 1003, 3]')
DB Fiddle: https://www.db-fiddle.com/f/2QnNKmBqxF2FB9XJdJ55SZ/0
Thanks!
This is not an elegant approach from a declarative standpoint, but can you please see whether this performs better for you?
with indexes as (
select id, generate_series(1, jsonb_array_length(data) / 2) - 1 as idx
from products
)
select p.id, p.data->>(2 * i.idx) as a, p.data->>(2 * i.idx + 1) as b
from indexes i
join products p on p.id = i.id;
This query
SELECT j.data
FROM products
CROSS JOIN jsonb_array_elements(data) j(data)
should run faster if you just need to unpivot all elements within the query as in the demo.
Demo
or even remove the columns coming from products table :
SELECT jsonb_array_elements(data)
FROM products
OR
If you need to return like this
| a | b |
|------|---|
| 1001 | 1 |
| 1002 | 2 |
| 1003 | 3 |
as unpivoting two columns, then use :
SELECT MAX(CASE WHEN mod(rn,2) = 1 THEN data->>(rn-1)::int END) AS a,
MAX(CASE WHEN mod(rn,2) = 0 THEN data->>(rn-1)::int END) AS b
FROM
(
SELECT p.data, row_number() over () as rn
FROM products p
CROSS JOIN jsonb_array_elements(data) j(data)) q
GROUP BY ceil(rn/2::float)
ORDER BY ceil(rn/2::float)
Demo

Get the min of one column but select multiple columns

I have a table as following:
ID NAME AMOUNT
______________________
1 A 3
1 B 4
2 C 18
4 I 2
4 P 9
And I want the min(Amount) for each ID but I still want to display its Name. So I want this:
ID NAME min(AMOUNT)
______________________
1 A 3
2 C 18
4 I 2
ID's can occur multiple times, Names too. I tried this:
SELECT ID, NAME, min(AMOUNT) FROM TABLE
GROUP BY ID
But of course its an error because I have to
GROUP BY ID, NAME
But then I get
ID NAME AMOUNT
______________________
1 A 3
1 B 4
2 C 18
4 I 2
4 P 9
And I understand why, it looks for the min(AMOUNT) for each combination of ID + NAME. So my question is basically, how can I select multiple column (ID, NAME, AMOUNT) and get the minimum for only one column, still displaying the others?
Im new to SQL but I cant seem to find an answer..
If you are using PostgreSQL, SQL Server, MySQL 8.0 and Oracle then try the following with window function row_number().
in case you have one id with similar amount then you can use dense_rank() instead of row_number()
Here is the demo.
select
id,
name,
amount
from
(
select
*,
row_number() over (partition by id order by amount) as rnk
from yourTable
) val
where rnk = 1
Output:
| id | name | amount |
| --- | ---- | ------ |
| 1 | A | 3 |
| 2 | C | 18 |
| 4 | I | 2 |
Second Option without using window function
select
val.id,
t.name,
val.amount
from myTable t
join
(
select
id,
min(amount) as amount
from myTable
group by
id
) val
on t.id = val.id
and t.amount = val.amount
You did not specify your db vendor. If it is luckily Postgres, the problem can be also solved without nested subquery using proprietary distinct on clause:
with t(id,name,amount) as (values
(1, 'A', 3),
(1, 'B', 4),
(1, 'W', 3),
(2, 'C', 18),
(4, 'I', 2),
(4, 'P', 9)
)
select distinct on (id, name_of_min) id
, first_value(name) over (partition by id order by amount) as name_of_min
, amount
from t
order by id, name_of_min
Just for widening knowledge. I don't recommend using proprietary features. first_value is standard function but to solve problem in simple query is still not enough. #zealous' answer is perfect.
In many databases, the most efficient method uses a correlated subquery:
select t.*
from t
where t.amount = (select min(t2.amount) from t t2 where t2.id = t.id);
In particular, this can take advantage of an index on (id, amount).

SELECT First Group

Problem Definition
I have an SQL query that looks like:
SELECT *
FROM table
WHERE criteria = 1
ORDER BY group;
Result
I get:
group | value | criteria
------------------------
A | 0 | 1
A | 1 | 1
B | 2 | 1
B | 3 | 1
Expected Result
However, I would like to limit the results to only the first group (in this instance, A). ie,
group | value | criteria
------------------------
A | 0 | 1
A | 1 | 1
What I've tried
Group By
SELECT *
FROM table
WHERE criteria = 1
GROUP BY group;
I can aggregate the groups using a GROUP BY clause, but that would give me:
group | value
-------------
A | 0
B | 2
or some aggregate function of EACH group. However, I don't want to aggregate the rows!
Subquery
I can also specify the group by subquery:
SELECT *
FROM table
WHERE criteria = 1 AND
group = (
SELECT group
FROM table
WHERE criteria = 1
ORDER BY group ASC
LIMIT 1
);
This works, but as always, subqueries are messy. Particularly, this one requires specifying my WHERE clause for criteria twice. Surely there must be a cleaner way to do this.
You can try following query:-
SELECT *
FROM table
WHERE criteria = 1
AND group = (SELECT MIN(group) FROM table)
ORDER BY value;
If your database supports the WITH clause, try this. It's similar to using a subquery, but you only need to specify the criteria input once. It's also easier to understand what's going on.
with main_query as (
select *
from table
where criteria = 1
order by group, value
),
with min_group as (
select min(group) from main_query
)
select *
from main_query
where group in (select group from min_group);
-- this where clause should be fast since there will only be 1 record in min_group
Use DENSE_RANK()
DECLARE #yourTbl AS TABLE (
[group] NVARCHAR(50),
value INT,
criteria INT
)
INSERT INTO #yourTbl VALUES ( 'A', 0, 1 )
INSERT INTO #yourTbl VALUES ( 'A', 1, 1 )
INSERT INTO #yourTbl VALUES ( 'B', 2, 1 )
INSERT INTO #yourTbl VALUES ( 'B', 3, 1 )
;WITH cte AS
(
SELECT i.* ,
DENSE_RANK() OVER (ORDER BY i.[group]) AS gn
FROM #yourTbl AS i
WHERE i.criteria = 1
)
SELECT *
FROM cte
WHERE gn = 1
group | value | criteria
------------------------
A | 0 | 1
A | 1 | 1

dividing sum of the column with each part

i have the following table in my database
i am currently using oracle 11g
the data is like this
id valus
1 2 3
100 200 300 = 600
I want to derive new column as: divide each value from the column "value" with the total sum of the column "value". Then load into the another table. The data in other table should look as
id value drived_col
1 100 100/600
2 200 200/600
3 300 300/600
thanks
SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE data ( id, value ) AS
SELECT 1, 100 FROM DUAL
UNION ALL SELECT 2, 200 FROM DUAL
UNION ALL SELECT 3, 300 FROM DUAL;
CREATE TABLE derived_data AS
SELECT id,
value,
value/SUM(value) OVER ( ORDER BY NULL ) AS derived_col
FROM data;
Or if the derived_data table already exists then you can do:
INSERT INTO derived_data
SELECT id,
value,
value/SUM(value) OVER ( ORDER BY NULL ) AS derived_col
FROM data;
Query 1:
SELECT * FROM derived_data
Results:
| ID | VALUE | DERIVED_COL |
|----|-------|----------------|
| 1 | 100 | 0.166666666667 |
| 2 | 200 | 0.333333333333 |
| 3 | 300 | 0.5 |
Or if you want the derived_col as a string:
SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE data ( id, value ) AS
SELECT 1, 100 FROM DUAL
UNION ALL SELECT 2, 200 FROM DUAL
UNION ALL SELECT 3, 300 FROM DUAL;
CREATE TABLE derived_data AS
SELECT id,
value,
value||'/'||SUM(value) OVER ( ORDER BY NULL ) AS derived_col
FROM data;
Query 1:
SELECT * FROM derived_data
Results:
| ID | VALUE | DERIVED_COL |
|----|-------|-------------|
| 1 | 100 | 100/600 |
| 2 | 200 | 200/600 |
| 3 | 300 | 300/600 |
Assuming your table already exists, you want to use an INSERT INTO new_table SELECT to insert the data in the derived table based on a query. For the insertion query to perform the division, it needs two subqueries:
query the sum of the values
query the (id,value) pair
Because the sum of the values is a single value, constant for all rows, you can then join these subqueries together with an INNER JOIN that has no conditions:
INSERT INTO derived_table
SELECT
ot.id AS id,
ot.value AS value,
CAST(ot.value AS float)/summed.total AS derived_col
FROM
orig_table AS ot
INNER JOIN
SELECT sum(value) AS total FROM orig_table AS summed;
The CAST(ot.value AS FLOAT) is necessary if value is a column of integers. Otherwise, your division will be integer division and all of the derived values will be zero.
There is no join condition here because the summation is a single value to all rows of orig_table. If you want to apply different divisors to different rows, you would need a more complicated subquery and an appropriate join condition.