count combination of columns in postgresql matrix - sql

I have a table in postgres like below
I want an sql in postgres that count a combination of 2 columns that has YY
Expecting an output like
Combination Count
AB 2
AC 1
AD 2
AZ 1
BC 1
BD 3
BZ 2
CD 2
CZ 0
DZ 1
Can anyone help me?

WITH stacked AS (
SELECT id
, unnest(array['A', 'B', 'C', 'D', 'Z']) AS col_name
, unnest(array[a, b, c, d, z]) AS col_value
FROM test t
)
SELECT combo, sum(cnt) AS count
FROM (
SELECT t1.id, t1.col_name || t2.col_name AS combo
, (CASE WHEN t1.col_value = 'Y' AND t2.col_value = 'Y' THEN 1 ELSE 0 END) AS cnt
FROM stacked t1
INNER JOIN stacked t2
ON t1.id = t2.id
AND t1.col_name < t2.col_name) t3
GROUP BY combo
ORDER BY combo
yields
| combo | count |
|-------+-------|
| AB | 2 |
| AC | 1 |
| AD | 2 |
| AZ | 2 |
| BC | 1 |
| BD | 3 |
| BZ | 2 |
| CD | 2 |
| CZ | 0 |
| DZ | 1 |
The unnesting recipe for unpivoting the table comes from Stew's post, here.
To count occurrances of YYY among 3 columns you could use:
WITH stacked AS (
SELECT id
, unnest(array['A', 'B', 'C', 'D', 'Z']) AS col_name
, unnest(array[a, b, c, d, z]) AS col_value
FROM test t
)
SELECT combo, sum(cnt) AS count
FROM (
SELECT t1.id, t1.col_name || t2.col_name || t3.col_name AS combo
, (CASE WHEN t1.col_value = 'Y'
AND t2.col_value = 'Y'
AND t3.col_value = 'Y' THEN 1 ELSE 0 END) AS cnt
FROM stacked t1
INNER JOIN stacked t2
ON t1.id = t2.id
INNER JOIN stacked t3
ON t1.id = t3.id
AND t1.col_name < t2.col_name
And t2.col_name < t3.col_name
) t3
GROUP BY combo
ORDER BY combo
;
which yields
| combo | count |
|-------+-------|
| ABC | 0 |
| ABD | 1 |
| ABZ | 2 |
| ACD | 1 |
| ACZ | 0 |
| ADZ | 1 |
| BCD | 1 |
| BCZ | 0 |
| BDZ | 1 |
| CDZ | 0 |
Or, to handle combinations of N columns, you could use WITH RECURSIVE:
For example, for N = 3,
WITH RECURSIVE result AS (
WITH stacked AS (
SELECT id
, unnest(array['A', 'B', 'C', 'D', 'Z']) AS col_name
, unnest(array[a, b, c, d, z]) AS col_value
FROM test t)
SELECT id, array[col_name] AS path, array[col_value] AS path_val, col_name AS last_name
FROM stacked
UNION
SELECT r.id, path || s.col_name, path_val || s.col_value, s.col_name
FROM result r
INNER JOIN stacked s
ON r.id = s.id
AND s.col_name > r.last_name
WHERE array_length(r.path, 1) < 3) -- Change 3 to your value for N
SELECT combo, sum(cnt)
FROM (
SELECT id, array_to_string(path, '') AS combo, (CASE WHEN 'Y' = all(path_val) THEN 1 ELSE 0 END) AS cnt
FROM result
WHERE array_length(path, 1) = 3) t -- Change 3 to your value for N
GROUP BY combo
ORDER BY combo
Note that N = 3 is used in 2 places in the SQL above.

I would do this using a lateral join:
with vals as (
select v.*
from t cross join lateral
(values ('A', A), ('B', B), ('C', C), ('D', D), ('Z', Z)
) v(which, val)
)
select (v1.which || v2.which) as combo,
sum( (val = 'Y')::int ) as count
from vals v1 join
vals v2
on v1.which < v2.which
group by combo
order by combo;
I consider lateral joins to be a more direct way to unpivot the values. There is no need to convert the values to an array an unnest, much less unnest two arrays and align the values.

Related

Oracle - how to join multiple rows in vertical oriented table

Let's say I have two tables like those:
table1:
-----------------
| someId | value|
|--------|------|
| 1 | 2 |
| 2 | 3 |
| 3 | 4 |
-----------------
table2:
-----------------------------------
| someId | type | value1 | value2 |
|--------|------|--------|--------|
| 1 | 2 | hello | |
| 1 | 3 | | 2 |
| 1 | 4 | | |
| 2 | 4 | | |
-----------------------------------
table1.someId = table2.someId
table2 is vertical, so multiple rows of this table (based on someId) refer to table1.someId.
Now I need to obtain count of rows from table1 for which table1.value=? AND (table2.type=2 andtable2.value1=?) AND (table2.type=3 and table2.value2=?) joined on table1.someId = table2.someId.
This is the query I have right now (it is parametrized and parameters for value, value1 and value2 are passed from a client):
select count(case when t1.value = ? then 1 end) from table1 t1
inner join
(select value1.someId from
(select someId from table2 where type = 2 and value1 = ?) value1
inner join
(select someId from table2 where type = 3 and value2 = ?) value2
on value1.someId = value2.someId
) t2
on t1.someId = t2.someId;
Example query:
select count(case when t1.value = 2 then 1 end) from table1 t1
inner join
(select value1.someId from
(select someId from table2 where type = 2 and value1 ='hello') value1
inner join
(select someId from table2 where type = 3 and value2 = 2) value2
on value1.someId = value2.someId
) t2
on t1.someId = t2.someId;
Is there any other way how to achieve this instead of multiple selects joined by inner joins? (In reality, I have to search by three types from table2).
Running example with correct result (updated example from Michael Buen):
db-fiddle.com
Thank you.
What you need is write a custom pivot for table2 groping by someid before join with table1:
with s (someId, type, value1, value2) as (
select 1, 2, 'hello', to_number(null) from dual union all
select 1, 3, null , 2 from dual union all
select 1, 4, null , null from dual union all
select 2, 4, null , null from dual)
select someid,
max(case when type = 2 then value1 end) type2_value1,
max(case when type = 3 then value2 end) type3_value2/*,
max(case when type = 4 then value1 end) type4_value1
max(case when type = 4 then value2 end) type4_value2*/
from s
group by someid;
SOMEID TYPE2 TYPE3_VALUE2
---------- ----- ------------
1 hello 2
2

How to join two tables when there's no coincidence?

I have two tables that I want to join. I've tried an usual left and right join but neither gives the result I want.
TABLE A
ID_A VALUE_A
-----------------
A 1
B 2
TABLE B
ID_B ID_A VALUE_B
-------------------------
90 A 1
90 C 1
90 E 1
91 A 1
91 B 1
92 B 1
92 E 1
92 F 1
I want to get this result:
ID_A VALUE_A ID_B ID_A VALUE_B
-------------------------------------------------
A 1 90 A 1
B 2 90 NULL NULL
A 1 91 A 1
B 2 91 B 1
A 1 92 NULL NULL
B 2 92 B 1
If I understand correctly, you want all combinations of id_a and value_a from the first table along with all distinct id_b from the second table. If so:
select iv.id_a, iv.value_a, ib.id_b, b.id_a, b.value_b
from (select distinct id_a, value_a from a) iv cross join
(select distinct id_b from b) ib left join
b
on b.id_b = ib.id_b and b.id_a = iv.id_a;
The cross join generates the rows. The left join brings in the additional columns.
I usually break things like this down into CTEs:
DDL
use tempdb
CREATE TABLE Table1
([ID_A] varchar(1), [VALUE_A] int)
;
INSERT INTO Table1
([ID_A], [VALUE_A])
VALUES
('A', 1),
('B', 2)
;
CREATE TABLE Table2
([ID_B] int, [ID_A] varchar(1), [VALUE_B] int)
;
INSERT INTO Table2
([ID_B], [ID_A], [VALUE_B])
VALUES
(90, 'A', 1),
(90, 'C', 1),
(90, 'E', 1),
(91, 'A', 1),
(91, 'B', 1),
(92, 'B', 1),
(92, 'E', 1),
(92, 'F', 1)
;
Answer
with a as (
select distinct id_b
from Table2
),
b as (
select id_a, value_a, id_b
from Table1 cross join a
)
select b.id_a, b.value_a, b.id_b, t2.id_a, t2.value_b
from b left join Table2 t2
on b.id_a = t2.id_a
and b.id_b = t2.id_b
Results
+------+---------+------+------+---------+
| id_a | value_a | id_b | id_a | value_b |
+------+---------+------+------+---------+
| A | 1 | 90 | A | 1 |
| B | 2 | 90 | NULL | NULL |
| A | 1 | 91 | A | 1 |
| B | 2 | 91 | B | 1 |
| A | 1 | 92 | NULL | NULL |
| B | 2 | 92 | B | 1 |
+------+---------+------+------+---------+
I couldn't resolve the exact logic and couldn't match the results exactly as desired , but presume you'd like to get something like :
SELECT a.ID_A, COALESCE(a.VALUE_A,b.VALUE_B) VALUE_A, b.ID_B, a.ID_A,
(CASE WHEN a.ID_A IS NULL THEN a.ID_A ELSE CAST(b.VALUE_B as VARCHAR(1)) END)
as VALUE_B
FROM TABLE_A a FULL OUTER JOIN TABLE_B b
ON ( a.ID_A = b.ID_A )
GROUP BY a.ID_A, a.VALUE_A, b.ID_B, a.ID_A, b.VALUE_B
ORDER BY 3, 2, 1;
SQL Fiddle Demo
Try this:
SELECT A.ID_A , A.VALUE_A , B.ID_B , B.ID_A , B.VALUE_B
FROM TABLE_A A
LEFT OUTER JOIN TABLE_B B
ON A.ID_A = B.ID_A ;
EDIT: Typos corrected following sticky bit note (thanks!!).

Eliminating duplicate rows except one column with condition

I am having trouble trying to find an appropriate query(SQL-SERVER) for selecting records with condition however, the table I will be using has more than 100,000 rows and more than 20 columns.
So I need a code that satisfies the following condition:
1.)If [policy] and [plan] column is unique between rows then I will select that record
2.)If [policy] and [plan] return 2 or more rows then I will select the record which 'code' column isn't 999
3.)In some cases the unwanted rows may not have '999' in [code] column but may be other specifics
In other words, I would like to get row number 1,2,4,5,7.
Here is an example of what the table looks like
row #|policy|plan|code
-----------------------
1 | a | aa |111
-----------------------
2 | b | bb |112
-----------------------
3 | b | bb |999
-----------------------
4 | c | cc |111
-----------------------
5 | c | cc |112
-----------------------
6 | c | cc |999
-----------------------
7 | d | dd |999
-----------------------
I'm expecting to see something like
row #|policy|plan|code
-----------------------
1 | a | aa |111
-----------------------
2 | b | bb |112
-----------------------
4 | c | cc |111
-----------------------
5 | c | cc |112
-----------------------
7 | d | dd |999
-----------------------
Thank you in advance
This sounds like a prioritization query. You an use row_number():
select t.*
from (select t.*,
row_number() over (partition by policy, plan
order by code
) as seqnum
from t
) t
where seqnum = 1;
The expected output makes this a bit clearer:
select t.*
from (select t.*,
rank() over (partition by policy, plan
order by (case when code = 999 then 1 else 2 end) desc
) as seqnum
from t
) t
where seqnum = 1;
The OP wants all codes that are not 999 unless the only codes are 999. So, another approach is:
select t.*
from t
where t.code <> 999
union all
select t.*
from t
where t.code = 999 and
not exists (select 1
from t t2
where t2.policy = t.policy and t2.plan = t.plan and
t2.code <> 999
);
May be you want this (eliminate the last row if more than one)?
select t.*
from (select t.*
, row_number() over (partition by policy, plan
order by code desc
) AS RN
, COUNT(*) over (partition by policy, plan) AS RC
from t
) t
where RN > 1 OR RN=RC;
Output:
row policy plan code RN RC
1 1 a aa 111 1 1
2 2 b bb 112 2 2
3 5 c cc 112 2 3
4 4 c cc 111 3 3
5 7 d dd 999 1 1
CREATE TABLE #Table2
([row] int, [policy] varchar(1), [plan] varchar(2), [code] int)
;
INSERT INTO #Table2
([row], [policy], [plan], [code])
VALUES
(1, 'a', 'aa', 111),
(2, 'b', 'bb', 112),
(3, 'b', 'bb', 999),
(4, 'c', 'cc', 111),
(5, 'c', 'cc', 112),
(6, 'c', 'cc', 999),
(7, 'd', 'dd', 999)
;
with cte
as
(
select *,
row_number() over (partition by policy, [plan]
order by code
) as seqnum
from #Table2
)
select [row], [policy], [plan], [code] from cte where seqnum=1

Count Top 5 Elements spread over rows and columns

Using T-SQL for this table:
+-----+------+------+------+-----+
| No. | Col1 | Col2 | Col3 | Age |
+-----+------+------+------+-----+
| 1 | e | a | o | 5 |
| 2 | f | b | a | 34 |
| 3 | a | NULL | b | 22 |
| 4 | b | c | a | 55 |
| 5 | b | a | b | 19 |
+-----+------+------+------+-----+
I need to count the TOP 3 names (Ordered by TotalCount DESC) across all rows and columns, for 3 Age groups: 0-17, 18-49, 50-100. Also, how do I ignore the NULLS from my results?
If it's possible, how I can also UNION the results for all 3 age groups into one output table to get 9 results (TOP 3 x 3 Age groups)?
Output for only 1 Age Group: 18-49 would look like this:
+------+------------+
| Name | TotalCount |
+------+------------+
| b | 4 |
| a | 3 |
| f | 1 |
+------+------------+
You need to unpivot first your table and then exclude the NULLs. Then do a simple COUNT(*):
WITH CteUnpivot(Name, Age) AS(
SELECT x.*
FROM tbl t
CROSS APPLY ( VALUES
(col1, Age),
(col2, Age),
(col3, Age)
) x(Name, Age)
WHERE x.Name IS NOT NULL
)
SELECT TOP 3
Name, COUNT(*) AS TotalCount
FROM CteUnpivot
WHERE Age BETWEEN 18 AND 49
GROUP BY Name
ORDER BY COUNT(*) DESC
ONLINE DEMO
If you want to get the TOP 3 for each age group:
WITH CteUnpivot(Name, Age) AS(
SELECT x.*
FROM tbl t
CROSS APPLY ( VALUES
(col1, Age),
(col2, Age),
(col3, Age)
) x(Name, Age)
WHERE x.Name IS NOT NULL
),
CteRn AS (
SELECT
AgeGroup =
CASE
WHEN Age BETWEEN 0 AND 17 THEN '0-17'
WHEN Age BETWEEN 18 AND 49 THEN '18-49'
WHEN Age BETWEEN 50 AND 100 THEN '50-100'
END,
Name,
COUNT(*) AS TotalCount
FROM CteUnpivot
GROUP BY
CASE
WHEN Age BETWEEN 0 AND 17 THEN '0-17'
WHEN Age BETWEEN 18 AND 49 THEN '18-49'
WHEN Age BETWEEN 50 AND 100 THEN '50-100'
END,
Name
)
SELECT
AgeGroup, Name, TotalCount
FROM(
SELECT *,
rn = ROW_NUMBER() OVER(PARTITION BY AgeGroup, Name ORDER BY TotalCount DESC)
FROM CteRn
) t
WHERE rn <= 3;
ONLINE DEMO
The unpivot technique using CROSS APPLY and VALUES:
An Alternative (Better?) Method to UNPIVOT (SQL Spackle) by Dwain Camps
You can check below multiple-CTE SQL select statement
Row_Number() with Partition By clause is used ordering records within each group categorized by ages
/*
CREATE TABLE tblAges(
[No] Int,
Col1 VarChar(10),
Col2 VarChar(10),
Col3 VarChar(10),
Age SmallInt
)
INSERT INTO tblAges VALUES
(1, 'e', 'a', 'o', 5),
(2, 'f', 'b', 'a', 34),
(3, 'a', NULL, 'b', 22),
(4, 'b', 'c', 'a', 55),
(5, 'b', 'a', 'b', 19);
*/
;with cte as (
select
col1 as col, Age
from tblAges
union all
select
col2, Age
from tblAges
union all
select
col3, Age
from tblAges
), cte2 as (
select
col,
case
when age < 18 then '0-17'
when age < 50 then '18-49'
else '50-100'
end as grup
from cte
where col is not null
), cte3 as (
select
grup,
col,
count(grup) cnt
from cte2
group by
grup,
col
)
select * from (
select
grup, col, cnt, ROW_NUMBER() over (partition by grup order by cnt desc) cnt_grp
from cte3
) t
where cnt_grp <= 3
order by grup, cnt

pgsql 1 to n relation into json

Long story short, how can i use 1 to n select data to build json like shown in example:
SELECT table1.id AS id1,table2.id AS id2,t_id,label
FROM table1 LEFT JOIN table2 ON table2.t_id = table1.id
result
|id1|id2|t_id|label|
+---+---+----+-----+
|1 | 1 | 1 | a |
| | 2 | 1 | b |
| | 3 | 1 | c |
| | 4 | 1 | d |
|2 | 5 | 2 | x |
| | 6 | 2 | y |
turn into this
SELECT table1.id, build_json(table2.id,table2.label) AS json_data
FROM table1 JOIN table2 ON table2.t_id = table1.id
GROUP BY table1.id
|id1|json_data
+--+-----------------
|1 |{"1":"a","2":"b","3":"c","4":"d"}
|2 |{"5":"x","6":"y"}
My guess the best start woulb be building an array from columns
Hstore instead of json would be ok too
well your table structure is a bit strange (is looks more like report than table), so I see two tasks here:
Replace nulls with correct id1. You can do it like this
with cte1 as (
select
sum(case when id1 is null then 0 else 1 end) over (order by t_id) as id1_partition,
id1, id2, label
from Table1
), cte2 as (
select
first_value(id1) over(partition by id1_partition) as id1,
id2, label
from cte1
)
select *
from cte2
Now you have to aggregate data into json. As far as I remember, there's no such a function in PostgreSQL, so you have to concatenate data manually:
with cte1 as (
select
sum(case when id1 is null then 0 else 1 end) over (order by t_id) as id1_partition,
id1, id2, label
from Table1
), cte2 as (
select
first_value(id1) over(partition by id1_partition) as id1,
id2, label
from cte1
)
select
id1,
('{' || string_agg('"' || id2 || '":' || to_json(label), ',') || '}')::json as json_data
from cte2
group by id1
sql fiddle demo
And if you want to convert into hstore:
with cte1 as (
select
sum(case when id1 is null then 0 else 1 end) over (order by t_id) as id1_partition,
id1, id2, label
from Table1
), cte2 as (
select
first_value(id1) over(partition by id1_partition) as id1,
id2, label
from cte1
)
select
c.id1, hstore(array_agg(c.id2)::text[], array_agg(c.label)::text[])
from cte2 as c
group by c.id1
sql fiddle demo