Getting common value count of text array column in Postgres - sql

I have a table which looks like this:
id num
--- ----
1 {'1','2','3','3'}
2 {'2','3'}
3 {'5','6','7'}
Here id is a unique column and num is a text array which can contain duplicate elements . I want to do something like an intersection between two consecutive rows so that I get the count of common elements between num of two rows. Consider something like a set where duplicates are considered only once. For example, for the above table I am expecting something like the following:
id1 id2 count
--- --- -----
1 2 2
1 3 0
2 1 2
2 3 0
3 1 0
3 2 0
It is not necessary to get the output like the above. The only part I am concerned about is count.
I have the following query which gives the output only for one id compared with one other id:
select unnest(num) from emp where id=1
intersect
select unnest(num) from emp where id=2
How can I generalize it to get the required output?

A straight forward approach puts the intersection of the unnested arrays in a subquery and gets their count.
SELECT t1.id id1,
t2.id id2,
(SELECT count(*)
FROM (SELECT num1.num
FROM unnest(t1.num) num1(num)
INTERSECT
SELECT num2.num
FROM unnest(t2.num) num2(num)) x) count
FROM emp t1
INNER JOIN emp t2
ON t2.id > t1.id
ORDER BY t1.id,
t2.id;
Should you be only interested in whether the arrays share elements or not but not in the exact count, you can also use the overlap operator &&.
SELECT t1.id id1,
t2.id id2,
t1.num && t2.num intersection_not_empty
FROM emp t1
INNER JOIN emp t2
ON t2.id > t1.id
ORDER BY t1.id,
t2.id;

For the example data, this works:
with t as (
select v.*
from (values (1000, array['acct', 'hr']), (1005, array['dev', hr'])) v(empid, depts)
)
select t1.empid, t2.empid,
(select count(distinct d1)
from unnest(t1.depts) d1 join
unnest(t2.depts) d2
on d1 = d2
) cnt
from t t1 join
t t2
on t1.empid < t2.empid;
I'm not 100% sure this is what you intend, though.

Related

Joining and grouping to equate on two tables

I've tried to minify this problem as much as possible. I've got two tables which share some Id's (among other columns)
id id
---- ----
1 1
1 1
2 1
2
2
Firstly, I can get each table to resolve to a simple count of how many of each Id there is:
select id, count(*) from tbl1 group by id
select id, count(*) from tbl2 group by id
id | tbl1-count id | tbl2-count
--------------- ---------------
1 2 1 3
2 1 2 2
but then I'm at a loss, I'm trying to get the following output which shows the count from tbl2 for each id, divided by the count from tbl1 for the same id:
id | count of id in tbl2 / count of id in tbl1
==========
1 | 1.5
2 | 2
So far I've got this:
select tbl1.Id, tbl2.Id, count(*)
from tbl1
join tbl2 on tbl1.Id = tbl2.Id
group by tbl1.Id, tbl2.Id
which just gives me... well... something nowhere near what I need, to be honest! I was trying count(tbl1.Id), count(tbl2.Id) but get the same multiplied amount (because I'm joining I guess?) - I can't get the individual representations into individual columns where I can do the division.
This gives consideration to your naming of tables -- the query from tbl2 needs to be first so the results will include all records from tbl2. The LEFT JOIN will include all results from the first query, but only join those results that exist in tbl1. (Alternatively, you could use a FULL OUTER JOIN or UNION both results together in the first query.) I also added an IIF to give you an option if there are no records in tbl1 (dividing by null would produce null anyway, but you can do what you want).
Counts are cast as decimal so that the ratio will be returned as a decimal. You can adjust precision as required.
SELECT tb2.id, tb2.table2Count, tb1.table1Count,
IIF(ISNULL(tb1.table1Count, 0) != 0, tb2.table2Count / tb1.table1Count, null) AS ratio
FROM (
SELECT id, CAST(COUNT(1) AS DECIMAL(18, 5)) AS table2Count
FROM tbl2
GROUP BY id
) AS tb2
LEFT JOIN (
SELECT id, CAST(COUNT(1) AS DECIMAL(18, 5)) AS table1Count
FROM tbl1
GROUP BY id
) AS tb1 ON tb1.id = tb2.id
(A subqquery with a LEFT JOIN will allow the query optimizer to determine how to generate the results and will generally outperform a CROSS APPLY, as that executes a calculation for every record.)
Assuming your expected results are wrong, then this is how I would do it:
CREATE TABLE T1 (ID int);
CREATE TABLE T2 (ID int);
GO
INSERT INTO T1 VALUES(1),(1),(2);
INSERT INTO T2 VALUES(1),(1),(1),(2),(2);
GO
SELECT T1.ID AS OutID,
(T2.T2Count * 1.) / COUNT(T1.ID) AS OutCount --Might want a CONVERT to a smaller scale and precision decimal here
FROM T1
CROSS APPLY (SELECT T2.ID, COUNT(T2.ID) AS T2Count
FROM T2
WHERE T2.ID = T1.ID
GROUP BY T2.ID) T2
GROUP BY T1.ID,
T2.T2Count;
GO
DROP TABLE T1;
DROP TABLE T2;
You can aggregate in subqueries and then join:
select t1.id, t2.cnt * 1.0 / t1.cnt
from (select id, count(*) as cnt
from tbl1
group by id
) t1 join
(select id, count(*) as cnt
from tbl2
group by id
) t2
on t1.id = t2.id

Querying two tables to filter data using select case

I have two tables
Table 1 looks like this
ID Repeats
-----------
A 1
A 1
A 0
B 2
B 2
C 2
D 1
Table 2 looks like this
ID values
-----------
A 100
B 200
C 100
D 300
Using a view I need a result like this
ID values Repeats
-------------------
A 100 NA
B 200 2
C 100 2
D 300 1
that means, I want unique ID, its values and Repeats. Repeats value should display NA when there are multiple values against single ID and it should display the Repeats value in case there is single value for repeats.
Initially I needed to display the max value of repeats so I tried the following view
ALTER VIEW [dbo].[BookingView1]
AS
SELECT bv.*, bd2.Repeats FROM Table1 bv
JOIN
(
SELECT distinct bd.id, bd.Repeats FROM table2 bd
JOIN
(
SELECT Id, MAX(Repeats) AS MaxRepeatCount
FROM table2
GROUP BY Id
) bd1
ON bd.Id = bd1.Id
AND bd.Repeats = bd1.MaxRepeatCount
) bd2
ON bv.Id = bd2.Id;
and this returns the correct result but when trying to implement the CASE it fails to return unique ID results. Please help!!
One method uses outer apply:
select t2.*, t1.repeats
from table2 t2 outer apply
(select (case when max(repeats) = min(repeats) then max(repeats)
else 'NA'
end) as repeats
from table1 t1
where t1.id = t2.id
) t1;
Two notes:
This assumes that repeats is a string. If it is a number, you need to cast it to a string.
repeats is not null.
For the sake of completeness, I'm including another approach that will work if repeats is NULL. However, Gordon's answer has a much simpler query plan and should be preferred.
Option 1 (Works with NULLs):
SELECT
t1.ID, t2.[Values],
CASE
WHEN COUNT(*) > 1 THEN 'NA'
ELSE CAST(MAX(Repeats) AS VARCHAR(2))
END Repeats
FROM (
SELECT DISTINCT t1.ID, t1.Repeats
FROM #table1 t1
) t1
LEFT OUTER JOIN #table2 t2
ON t1.ID = t2.ID
GROUP BY t1.ID, t2.[Values]
Option 2 (does not contain explicit subqueries, but does not work with NULLs):
SELECT DISTINCT
t1.ID,
t2.[Values],
CASE
WHEN COUNT(t1.Repeats) OVER (PARTITION BY COUNT(DISTINCT t1.Repeats), t1.ID) > 1 THEN 'NA'
ELSE CAST(t1.Repeats AS VARCHAR(2))
END Repeats
FROM #table1 t1
LEFT OUTER JOIN #table2 t2
ON t1.ID = t2.ID
GROUP BY t1.ID, t2.[Values], t1.Repeats
NOTE:
This may not give desired results if table2 has different values for the same ID.

SQL - get max result

Assume there is a table name "test" below:
name value
n1 1
n2 2
n3 3
Now, I want to get the name which has the max value, I have some solution below:
Solution 1:
SELECT TOP 1 name
FROM test
ORDER BY value DESC
solution 2:
SELECT name
FROM test
WHERE value = (SELECT MAX(value) FROM test);
Now, I hope use join operation to find the result, like
SELECT name
FROM test
INNER JOIN test ON...
Could someone please help and explain how it works?
If you are looking for JOIN then
SELECT T.name, T.value
FROM test T
INNER JOIN
( SELECT T1.name, T1.value ,
RANK() OVER (PARTITION BY T1.name ORDER BY T1.value) N
FROM test T1
WHERE T1.value IN (SELECT MAX(t2.value) FROM test T2)
)T3 ON T3.N = 1 AND T.name = T3.name
FIDDLE DEMO
or
select name, value
from
(
select name, value,
row_number() over(order by value desc) rn
from test
) src
where rn = 1
FIDDLE DEMO
First, note that solutions 1 and 2 could give different results when value is not unique. If in your test data there would be an additional record ('n4', 3), then solution 1 would return either 'n3' or 'n4', but solution 2 would return both.
A solution with JOIN will need aliases for the table, because as you started of, the engine would say Ambiguous column name 'name'.: it would not know whether to take name from the first or second occurrence of the test table.
Here is a way to complete the JOIN version:
SELECT t1.name
FROM test t1
LEFT JOIN test t2
ON t2.value > t1.value
WHERE t2.value IS NULL;
This query takes each of the records, and checks if any records exist that have a higher value. If not, the first record will be in the result. Note the use of LEFT: this denotes an outer join, so that records from t1 that have no match with t2 -- based on the ON condition -- are not immediately rejected (as would be the case with INNER): in fact, we want to reject all the other records, which is done with the WHERE clause.
A way to understand this mechanism, is to look at a variant of the query above, which lacks the WHERE clause and returns the values of both tables:
SELECT t1.value, t2.value
FROM test t1
LEFT JOIN test t2
ON t2.value > t1.value
On your test data this will return:
t1.value t2.value
1 2
1 3
2 3
3 (null)
Note that the last entry would not be there if the join where an INNER JOIN. But with the outer join, one can now look for the NULL values and actually get those records in the result that would be excluded from an INNER JOIN.
Note that this query will give the same result as solution 2 when there are duplicate values. If you want to have also only one result like with solution 1, it suffices to add TOP 1 after SELECT.
Here is a fiddle.
Alternative with pure INNER JOIN
If you really want an INNER join, then this will do it. Again the TOP 1 is only needed if you have non-unique values:
SELECT TOP 1 t1.name
FROM test t1
INNER JOIN (SELECT Max(value) AS value FROM test) t2
ON t2.value = t1.value;
But this one really is very similar to what you did in solution 2. Here is fiddle for it.

SQL select 1 to many within the same row

I have a table with 1 record, which then ties back to a secondary table which can contain either no match, 1 match, or 2 matches.
I need to fetch the corresponding records and display them within the same row which would be easy using left join if I just had 1 or no matches to tie back, however, because I can get 2 matches, it produces 2 records.
Example with 1 match:
Select T1.ID, T1.Person1, T2.Owner
From T1
Left Join T2
ON T1.ID = T2.MatchID
Output
ID Person1 Owner1
----------------------
1 John Frank
Example with 2 match:
Select T1.ID, T1.Person1, T2.Owner
From T1
Left Join T2
ON T1.ID = T2.MatchID
Output
ID Person1 Owner
----------------------
1 John Frank
1 John Peter
Is there a way I can formulate my select so that my output would reflect the following When I have 2 matches:
ID Person1 Owner1 Owner2
-------------------------------
1 John Frank Peter
I explored Oracle Pivots a bit, however couldn't find a way to make this work. Also explored the possibility of using left join on the same table twice using MIN() and MAX() when fetching the matches, however I can only see myself resorting this as a "no other option" scenario.
Any suggestions?
** EDIT **
#ughai - Using CTE does address the issue to some extent, however when attempting to retrieve all of the records, the details derived from this common table isn't showing any records on the LEFT JOIN unless I specify the "MatchID" (CASE_MBR_KEY) value, meaning by removing the "where" clause, my outer joins produce no records, even though the CASE_MBR_KEY values are there in the CTE data.
WITH CTE AS
(
SELECT TEMP.BEAS_KEY,
TEMP.CASE_MBR_KEY,
TEMP.FULLNAME,
TEMP.BIRTHDT,
TEMP.LINE1,
TEMP.LINE2,
TEMP.LINE3,
TEMP.CITY,
TEMP.STATE,
TEMP.POSTCD,
ROW_NUMBER()
OVER(ORDER BY TEMP.BEAS_KEY) R
FROM TMP_BEN_ASSIGNEES TEMP
--WHERE TEMP.CASE_MBR_KEY = 4117398
)
The reason for this is because the ROW_NUMBER value, given the amount of records won't necessarily be 1 or 2, so I attempted the following, but getting ORA-01799: a column may not be outer-joined to a subquery
--// BEN ASSIGNEE 1
LEFT JOIN CTE BASS1
ON BASS1.CASE_MBR_KEY = C.CASE_MBR_KEY
AND BASS1.R IN (SELECT min(R) FROM CTE A WHERE A.CASE_MBR_KEY = C.CASE_MBR_KEY)
--// END BA1
--// BEN ASSIGNEE 2
LEFT JOIN CTE BASS2
ON BASS2.CASE_MBR_KEY = C.CASE_MBR_KEY
AND BASS2.R IN (SELECT MAX(R) FROM CTE B WHERE B.CASE_MBR_KEY = C.CASE_MBR_KEY)
--// END BA2
** EDIT 2 **
Fixed the above issue by moving the Row number clause to the "Where" portion of the query instead of within the JOIN clause. Seems to work now.
You can use CTE with ROW_NUMBER() with 2 LEFT JOIN OR with PIVOT like this.
SQL Fiddle
Query with Multiple Left Joins
WITH CTE as
(
SELECT MatchID,Owner,ROW_NUMBER()OVER(ORDER BY Owner) r FROM t2
)
select T1.ID, T1.Person, t2.Owner as Owner1, t3.Owner as Owner2
FROM T1
LEFT JOIN CTE T2
ON T1.ID = T2.MatchID AND T2.r = 1
LEFT JOIN CTE T3
ON T1.id = T3.MatchID AND T3.r = 2;
Query with PIVOT
WITH CTE as
(
SELECT MatchID,Owner,ROW_NUMBER()OVER(ORDER BY Owner) R FROM t2
)
SELECT ID, Person,O1,O2
FROM T1
LEFT JOIN CTE T2
ON T1.ID = T2.MatchID
PIVOT(MAX(Owner) FOR R IN (1 as O1,2 as O2));
Output
ID PERSON OWNER1 OWNER2
1 John Maxwell Peter
If you know there are at most two matches, you can also use aggregation:
Select T1.ID, T1.Person1,
MIN(T2.Owner) as Owner1,
(CASE WHEN MIN(t2.Owner) <> MAX(t2.Owner) THEN MAX(t2.Owner) END) as Owner2
From T1 Left Join
T2
on T1.ID = T2.MatchID
Group By t1.ID, t1.Person1;

Select rows from table such that sum of computed values of their column is less than given limit

I have a table myTable of the following structure:
id: int PRIMARY KEY
number: int
I'd like to randomly select 3 rows at most from myTable under the conditions:
Three rows at most should be selected
We should select id and a value calculated as 0,3*RAND()*number converted to INT. The alias of the computed column is randomValue
Only rows with randomValue>0 should be included in the result
Sum of randomValues should be less than given treshold, say 60.
So far, I've written this:
SELECT TOP 3 id,randomValue
FROM(
SELECT id, CONVERT(INT,(0.3*RAND()*number)) AS randomValue
FROM myTable
WHERE number>0
) AS D
WHERE randomValue>0
ORDER BY NEWID()
The code above selects at most 3 random rows where randomValue is greater than zero. However, I don't know how to fulfill condition 4, i.e. how to achieve that sum of randomValues in selected rows is less than 60.
This is myTable where I'm testing the solutions:
WITH
random_values AS (
SELECT
id,
CONVERT(INT,(0.3*RAND()*number)) AS randomValue
FROM myTable
WHERE number>0
),
valid_sets AS (
SELECT
t1.id id1,
t2.id id2,
t3.id id3
FROM random_values t1
INNER JOIN random_values t2 ON (t2.id > t1.id)
INNER JOIN random_values t3 ON (t3.id > t2.id)
WHERE t1.randomValue + t2.randomValue + t3.randomValue < 60
)
SELECT c.id,c.number
FROM (SELECT TOP 1 * FROM valid_sets ORDER BY NEWID()) a
UNPIVOT(id FOR n IN (id1,id2,id3)) b
INNER JOIN myTable c ON (b.id = c.id)