array_agg contains another array_agg - sql

t1
id|entity_type
9|3
9|4
9|5
2|3
2|5
t2
id|entity_type
1|3
1|4
1|5
SELECT t1.id, array_agg(t1.entity_type)
FROM t1
GROUP BY
t1.id
HAVING ARRAY_AGG(t1.entity_type by t1.entity_type) =
(SELECT ARRAY_AGG(t2.entity_type by t2.entity_type)
FROM t2
WHERE t2.id = 1
GROUP BY t2.id);
Result:
t1.id = 9|array_agg{3,4,5}
I have two tables t1 and t2. I want to get value of t1.id where t1.entity_type array equals t2.entity_type array.
In this scenario everything works fine. For t2.id = 1 I receive t1.id = 9.
Both have the same array of entity_type: {3,4,5}
Now I'd like to get t1.id not only for equal sets, but also for smaller sets.
If I modify t2 this way:
t2
id|entity_type
1|3
1|4
and modify query this way:
SELECT t1.id, array_agg(t1.entity_type)
FROM t1
GROUP BY
t1.id
HAVING ARRAY_AGG(t1.entity_type by t1.entity_type) >= /*MODIFICATION*/
(SELECT ARRAY_AGG(t2.entity_type by t2.entity_type)
FROM t2
WHERE t2.id = 1
GROUP BY t2.id);
I don't receive the expected result:
t1.id = 1 has {3, 4, 5}
t2.id = 1 has {3, 4}
Arrays in t1 that contain the array in t2 should qualify. I expect to receive results as in first case but I get no rows.
Is there any method like: ARRAY_AGG contains another ARRAY_AGG?

Clean up
First of all, syntax error. I assume you mean:
ARRAY_AGG(t1.entity_type ORDER BY t1.entity_type)
Details in the manual.
Next, it would be inefficient to use two differing invocations of array_agg(). Use the same (ORDER BY in SELECT list and HAVING clause):
SELECT id, array_agg(entity_type ORDER BY entity_type) AS arr
FROM t1
GROUP BY 1
HAVING array_agg(entity_type ORDER BY entity_type) = (
SELECT array_agg(entity_type ORDER BY entity_type)
FROM t2
WHERE id = 1
-- GROUP BY id -- not needed
);
"contains" operator #>
Like Nick commented, your 2nd query would work with the "contains" operator #>
SELECT id, array_agg(entity_type ORDER BY entity_type) AS arr
FROM t1
GROUP BY 1
HAVING array_agg(entity_type ORDER BY entity_type) #> (
SELECT array_agg(entity_type ORDER BY entity_type)
FROM t2
WHERE id = 1
);
But this is very inefficient for big tables.
Faster query
This is a case of relational division. Depending on your (missing) exact table definition, there are more efficient techniques. We have gathered a whole arsenal under this related question:
How to filter SQL results in a has-many-through relation
Assuming (id, entity_type) is unique in both tables, this should be substantially faster for big tables, especially because it can use an index on t1 (as opposed to your original query):
SELECT t1.id
FROM t2
JOIN t1 USING (entity_type)
WHERE t2.id = 1
GROUP BY 1
HAVING count(*) = (SELECT count(*) FROM t2 WHERE id = 1);
You need two indexes:
First on t2.id, typically covered by the primary key.
Second on t1.entity_type:
CREATE INDEX t1_foo_idx ON t1 (entity_type, id);
The added id column is optional to allow index-only scans. Sequence of columns is essential:
Is a composite index also good for queries on the first field?
SQL Fiddle.

Related

SQL: Joining data from table with multiple conditions where value matches

Table1 has columns POS, ID, NAME, TYPE
I have been running a standard query as so;
SELECT POS, NAME FROM Table1 WHERE ID = 100 AND TYPE LIKE '%someType%' ORDER BY POS ASC
Table2 has columns POS, ID, VALUE, ROLE
SELECT POS, VALUE FROM Table2 WHERE IS = 100 AND ROLE LIKE '%someType%' ORDER BY POS ASC
I would like to combine these two in order to return to a recordset with 3 columns; POS, Table1.NAME, and Table2.VALUE... No matter what I try with inner joins, I keep getting way more rows than I should. Also, if the corresponding value in Table does not exist, I would like it to return a null or something so that essentially a recordset could look like this;
POS NAME VALUE
1 A DF1
2 B DF1
3 C DF2
4 C null
5 null DF3
etc...
Is this possible at all?
You seem to be looking for a simple join. Something like:
select t1.pos, t1.name, t2.value
from table1 t1
inner join table2 t2
on t2.pos = t1.pos
and t2.is = t1.id
and t2.role like '%someType%'
where
t1.type like '%someType%'
and t1.id = 100
order by t1.pos
Please note that, if one or both of your original queries return more than one row, you will get a cartesian products of these in the resultset.
Or if you want to allow rows coming from both tables, even if there is no match in the other table, then you can full outer join both queries, like:
select coalesce(t1.pos, t2.pos) pos, t1.name, t2.value
from (
select pos, name from table1 where id = 100 and type like '%someType%'
) t1
full outer join (
select pos, value from table2 where is = 100 and role like '%someType%'
) t2
on t1.pos = t2.pos
order by coalesce(t1.pos, t2.pos)

Getting common value count of text array column in Postgres

I have a table which looks like this:
id num
--- ----
1 {'1','2','3','3'}
2 {'2','3'}
3 {'5','6','7'}
Here id is a unique column and num is a text array which can contain duplicate elements . I want to do something like an intersection between two consecutive rows so that I get the count of common elements between num of two rows. Consider something like a set where duplicates are considered only once. For example, for the above table I am expecting something like the following:
id1 id2 count
--- --- -----
1 2 2
1 3 0
2 1 2
2 3 0
3 1 0
3 2 0
It is not necessary to get the output like the above. The only part I am concerned about is count.
I have the following query which gives the output only for one id compared with one other id:
select unnest(num) from emp where id=1
intersect
select unnest(num) from emp where id=2
How can I generalize it to get the required output?
A straight forward approach puts the intersection of the unnested arrays in a subquery and gets their count.
SELECT t1.id id1,
t2.id id2,
(SELECT count(*)
FROM (SELECT num1.num
FROM unnest(t1.num) num1(num)
INTERSECT
SELECT num2.num
FROM unnest(t2.num) num2(num)) x) count
FROM emp t1
INNER JOIN emp t2
ON t2.id > t1.id
ORDER BY t1.id,
t2.id;
Should you be only interested in whether the arrays share elements or not but not in the exact count, you can also use the overlap operator &&.
SELECT t1.id id1,
t2.id id2,
t1.num && t2.num intersection_not_empty
FROM emp t1
INNER JOIN emp t2
ON t2.id > t1.id
ORDER BY t1.id,
t2.id;
For the example data, this works:
with t as (
select v.*
from (values (1000, array['acct', 'hr']), (1005, array['dev', hr'])) v(empid, depts)
)
select t1.empid, t2.empid,
(select count(distinct d1)
from unnest(t1.depts) d1 join
unnest(t2.depts) d2
on d1 = d2
) cnt
from t t1 join
t t2
on t1.empid < t2.empid;
I'm not 100% sure this is what you intend, though.

SQL - get max result

Assume there is a table name "test" below:
name value
n1 1
n2 2
n3 3
Now, I want to get the name which has the max value, I have some solution below:
Solution 1:
SELECT TOP 1 name
FROM test
ORDER BY value DESC
solution 2:
SELECT name
FROM test
WHERE value = (SELECT MAX(value) FROM test);
Now, I hope use join operation to find the result, like
SELECT name
FROM test
INNER JOIN test ON...
Could someone please help and explain how it works?
If you are looking for JOIN then
SELECT T.name, T.value
FROM test T
INNER JOIN
( SELECT T1.name, T1.value ,
RANK() OVER (PARTITION BY T1.name ORDER BY T1.value) N
FROM test T1
WHERE T1.value IN (SELECT MAX(t2.value) FROM test T2)
)T3 ON T3.N = 1 AND T.name = T3.name
FIDDLE DEMO
or
select name, value
from
(
select name, value,
row_number() over(order by value desc) rn
from test
) src
where rn = 1
FIDDLE DEMO
First, note that solutions 1 and 2 could give different results when value is not unique. If in your test data there would be an additional record ('n4', 3), then solution 1 would return either 'n3' or 'n4', but solution 2 would return both.
A solution with JOIN will need aliases for the table, because as you started of, the engine would say Ambiguous column name 'name'.: it would not know whether to take name from the first or second occurrence of the test table.
Here is a way to complete the JOIN version:
SELECT t1.name
FROM test t1
LEFT JOIN test t2
ON t2.value > t1.value
WHERE t2.value IS NULL;
This query takes each of the records, and checks if any records exist that have a higher value. If not, the first record will be in the result. Note the use of LEFT: this denotes an outer join, so that records from t1 that have no match with t2 -- based on the ON condition -- are not immediately rejected (as would be the case with INNER): in fact, we want to reject all the other records, which is done with the WHERE clause.
A way to understand this mechanism, is to look at a variant of the query above, which lacks the WHERE clause and returns the values of both tables:
SELECT t1.value, t2.value
FROM test t1
LEFT JOIN test t2
ON t2.value > t1.value
On your test data this will return:
t1.value t2.value
1 2
1 3
2 3
3 (null)
Note that the last entry would not be there if the join where an INNER JOIN. But with the outer join, one can now look for the NULL values and actually get those records in the result that would be excluded from an INNER JOIN.
Note that this query will give the same result as solution 2 when there are duplicate values. If you want to have also only one result like with solution 1, it suffices to add TOP 1 after SELECT.
Here is a fiddle.
Alternative with pure INNER JOIN
If you really want an INNER join, then this will do it. Again the TOP 1 is only needed if you have non-unique values:
SELECT TOP 1 t1.name
FROM test t1
INNER JOIN (SELECT Max(value) AS value FROM test) t2
ON t2.value = t1.value;
But this one really is very similar to what you did in solution 2. Here is fiddle for it.

SQL - remove duplicates from left join

I'm creating a joined view of two tables, but am getting unwanted duplicates from table2.
For example: table1 has 9000 records and I need the resulting view to contain exactly the same; table2 may have multiple records with the same FKID but I only want to return one record (random chosen is ok with my customer). I have the following code that works correctly, but performance is slower than desired (over 14 seconds).
SELECT
OBJECTID
, PKID
,(SELECT TOP (1) SUBDIVISIO
FROM dbo.table2 AS t2
WHERE (t1.PKID = t2.FKID)) AS ProjectName
,(SELECT TOP (1) ASBUILT1
FROM dbo.table2 AS t2
WHERE (t1.PKID = t2.FKID)) AS Asbuilt
FROM dbo.table1 AS t1
Is there a way to do something similar with joins to speed up performance?
I'm using SQL Server 2008 R2.
I got close with the following code (~.5 seconds), but 'Distinct' only filters out records when all columns are duplicate (rather than just the FKID).
SELECT
t1.OBJECTID
,t1.PKID
,t2.ProjectName
,t2.Asbuilt
FROM dbo.table1 AS t1
LEFT JOIN (SELECT
DISTINCT FKID
,ProjectName
,Asbuilt
FROM dbo.table2) t2
ON t1.PKID = t2.FKID
table examples
table1 table2
OID, PKID FKID, ProjectName, Asbuilt
1, id1 id1, P1, AB1
2, id2 id1, P5, AB5
3, id4 id2, P10, AB2
5, id5 id5, P4, AB4
In the above example returned records should be id5/P4/AB4, id2/P10/AB2, and (id1/P1/AB1 OR id1/P5/AB5)
My search came up with similar questions, but none that resolved my problem. link, link
Thanks in advance for your help. This is my first post so let me know if I've broken any rules.
This will give the results you requested and should have the best performance.
SELECT
OBJECTID
, PKID
, t2.SUBDIVISIO,
, t2.ASBUILT1
FROM dbo.table1 AS t1
OUTER APPLY (
SELECT TOP 1 *
FROM dbo.table2 AS t2
WHERE t1.PKID = t2.FKID
) AS t2
Your original query is producing arbitrary values for the two columns (the use of top with no order by). You can get the same effect with this:
SELECT t1.OBJECTID, t1.PKID, t2.ProjectName, t2.Asbuilt
FROM dbo.table1 t1 LEFT JOIN
(SELECT FKID, min(ProjectName) as ProjectName, MIN(asBuilt) as AsBuilt
FROM dbo.table2
group by fkid
) t2
ON t1.PKID = t2.FKID
This version replaces the distinct with a group by.
To get a truly random row in SQL Server (which your syntax suggests you are using), try this:
SELECT t1.OBJECTID, t1.PKID, t2.ProjectName, t2.Asbuilt
FROM dbo.table1 t1 LEFT JOIN
(SELECT FKID, ProjectName, AsBuilt,
ROW_NUMBER() over (PARTITION by fkid order by newid()) as seqnum
FROM dbo.table2
) t2
ON t1.PKID = t2.FKID and t2.seqnum = 1
This assumes version 2005 or greater.
If you want described result, you need to use INNER JOIN and following query will satisfy your need:
SELECT
t1.OID,
t1.PKID,
MAX(t2.ProjectName) AS ProjectName,
MAX(t2.Asbuilt) AS Asbuilt
FROM table1 t1
JOIN table2 t2 ON t1.PKID = t2.FKID
GROUP BY
t1.OID,
t1.PKID
If you want to see all rows from left table (table1) whether it has pair in right table or not, then use LEFT JOIN and same query will gave you desired result.
EDITED
This construction has good performance, and you dont need to use subqueries.

SQL nested query

I have a table like below
id name dependency
-----------------------
1 xxxx 0
2 yyyy 1
3 zzzz 2
4 aaaaaa 0
5 bbbbbb 4
6 cccccc 5
the list goes on. I want to select group of rows from this table , by giving the name of 0 dependency in where clause of SQL and till it reaches a condition where there is no more dependency. (For ex. rows 1,2, 3 forms a group, and rows 4,5,6 is another group) .please help
Since you did not specify a product, I'll go with features available in the SQL specification. In this case, I'm using a common-table expression which are supported by many database products including SQL Server 2005+ and Oracle (but not MySQL):
With MyDependents As
(
Select id, name, 0 As level
From MyTable
Where dependency = 0
And name = 'some value'
Union All
Select T.id, T.name, T.Level + 1
From MyDependents As D
Join MyTable As T
On T.id = D.dependency
)
Select id, name, level
From MyDependents
Another solution which does not rely on common-table expressions but does assume a maximum level of depth (in this case two levels below level 0) would something like
Select T1.id, T1.name, 0 As level
From MyTable As T1
Where T1.name = 'some value'
Union All
Select T2.id, T2.name, 1
From MyTable As T1
Join MyTable As T2
On T2.Id = T1.Dependency
Where T1.name = 'some value'
Union All
Select T3.id, T3.name, 2
From MyTable As T1
Join MyTable As T2
On T2.Id = T1.Dependency
Join MyTable As T3
On T3.Id = T2.Dependency
Where T1.name = 'some value'
Sounds like you want to recursively query your table, for which you will need a Common Table Expression (CTE)
This MSDN article explains CTEs very well. They are confusing at first but surprisingly easy to implement.
BTW this is obviously only for SQL Server, I'm not sure how you'd achieve that in MySQL.
This is the first thing that came to mind. It can be probably done more directly/succinctly, I'll try to dwell on it a little.
SELECT *
FROM table T1
WHERE T1.id >=
(SELECT T2.id FROM table T2 WHERE T2.name = '---NAME HERE---')
AND T1.id <
(SELECT MIN(id)
FROM table T3
WHERE T3.dependency = 0 AND T3.id > T2.id)
If you can estimate a max depth, this works out to something like:
SELECT
COALESCE(t4.field1, t3.field1, t2.field1, t1.field1, t.field1),
COALESCE(t4.field2, t3.field2, t2.field2, t1.field2, t.field2),
COALESCE(t4.field3, t3.field3, t2.field3, t1.field3, t.field3),
....
FROM table AS t
LEFT JOIN table AS t1 ON t.dependency = t1.id
LEFT JOIN table AS t2 ON t1.dependency = t2.id
LEFT JOIN table AS t3 ON t2.dependency = t3.id
LEFT JOIN table AS t4 ON t3.dependency = t4.id
....
This is a wild guess just to be different, but I think it's kind of pretty, anyway. And it's at least as portable as any of the others. But I don't want to look to closely; I'd want to use sensible data, start testing, and check for sensible results.
Hierarchical query will do:
SELECT *
FROM your_table
START WITH id = :id_of_group_header_row
CONNECT BY dependency = PRIOR id
Query works like this:
1. select all rows satisfying START WITH condition (this rows are roots now)
2. select all rows satisfying CONNECT BY condition,
keyword PRIOR means this column's value will be taken from the root row
3. consider rows selected on step 2 to be roots
4. go to step 2 until there are no more rows