I'd like to know better ways of checking if a set of values is a subset of another.
For some reason, I can't make IN work so I use something like this approach:
-- check if `table1.id` is in other tables
SELECT (
-- check if all `table1.id` is in table2's `table1_id`
ARRAY(SELECT id FROM table1) <# ARRAY(SELECT table1_id FROM table2)
AND
-- check if all `table1.id` is in table3's `table1_id`
ARRAY(SELECT id FROM table1) <# ARRAY(SELECT table1_id FROM table3)
-- ...and so on
)
So for example, if I have these these two rows on table1:
+----+
| id |
+----+
| 1 |
| 2 |
+----+
And these two rows on table2:
+----+-----------+
| id | table1_id |
+----+-----------+
| 1 | 1 |
| 2 | 2 |
+----+-----------+
And this one row at table3:
+----+-----------+
| id | table1_id |
+----+-----------+
| 1 | 2 |
+----+-----------+
The result would be false because table3 does not contain both line_id of 1 and 2.
But, if table3 is like below:
+----+-----------+
| id | table1_id |
+----+-----------+
| 1 | 2 |
| 2 | 1 |
+----+-----------+
It would return true
Is my approach already good? If I use IN correctly, would it be faster? Are there some other ways that I am totally missing?
You can just use inner joins and count the results:
with table1_count as (
select count(*) as count
FROM table1
),
all_table_count as (
select count(*) as count
from (
select table1.id from table1
join table2 on table1.id = table2.table1_id
join table3 on table1.id = table3.table1_id
) sub
)
select table1_count.count = all_table_count.count as ids_everywhere
from all_table_count,table1_count
;
ids_everywhere
----------------
f
(1 row)
Joining will be much faster than array comparison.
Use exists
select t1.*
from Table1 t1
where exists (select 1 from table2 t2 where t2.table1_id = t1.id)
and exists (select 1 from table3 t3 where t3.table1_id = t1.id)
and exists (select 1 from table4 t4 where t4.table1_id = t1.id)
You can also use exists in a case statement
select t1.id,
case
when exists (select 1 from table2 t2 where t2.table1_id = t1.id)
and exists (select 1 from table3 t3 where t3.table1_id = t1.id)
and exists (select 1 from table4 t4 where t4.table1_id = t1.id)
then 1
else 0
end
from Table1 t1
Or list each separate (edit):
select t1.id,
case
when exists (select 1 from table2 t2 where t2.table1_id = t1.id)
then 1 else 0
end as in_tab2,
case
when exists (select 1 from table3 t3 where t3.table1_id = t1.id)
then 1 else 0
end as in_tab3,
case
when exists (select 1 from table4 t4 where t4.table1_id = t1.id)
then 1 else 0
end as in_tab4
from table1
Related
I want to SELECT one record from table1 (WHERE t1.id = 1) and then JOIN table2 and table3 (t2.field2 and t3.field3) to table1 but ONLY if the values exists (IS NOT NULL).
So for example, if the value doesn't exist for t3.field3, the field3 column is not displayed for that table...
t1
id | field1
---------------
1 | f1val
2 | f1val
3 | f1val
t2
id(fk) | field2
-------------------
1 | f2val
2 | null
3 | null
t3
id(fk) | field3
-------------------
1 | null
2 | f3val
3 | f3val
the code I tried to do is this:
SELECT t1.id, t2.field1, t3.field3
FROM (
SELECT t1.id
FROM t1
WHERE t1.id = 1
)
LEFT JOIN t2 ON t2.id = t1.id AND t2.id is not null
LEFT JOIN t3 ON t2.id = t1.id AND t3.id is not null;
The joined table returned from the query above looks like this:
id | field2 | field3
----------------------------
1 | f1val | null
However, since field3 is null, I want it to return only the id and field2 like this:
id | field2
----------------
1 | f1val
Your help will be highly appreciated.
You could return one column, using coalesce():
SELECT t1.id, COALESCE(t2.field1, t3.field3) as field_2_3
FROM t1 LEFT JOIN
t2
ON t2.id = t1.id LEFT JOIN
t3
ON t3.id = t1.id
WHERE t1.id = 1;
However, you cannot sometimes return two columns and sometimes return 3 columns.
Notes:
The subquery on t1 is utterly unnecessary. You can just apply the filter in a single WHERE clause.
The comparisons for IS NOT NULL are unnecessary because they fail the JOIN condition anyway.
The last JOIN condition is presumably on t3.id = t1.id.
I'm trying to find a simple solution for my SQL Server problem.
I have two tables look like this:
table1
--id
-- data
table2
--id
--table1_id
--value
I have some records like this:
Table1
+-----------------------+
| id | data |
+-----------------------+
| 1 | ? |
+-----------------------+
| 2 | ? |
+-----------------------+
Table2
+-----------------------+
|id | table1_id | value |
+-----------------------+
| 1 | 1 | 'a' |
+-----------------------+
| 2 | 1 | 'b' |
+-----------------------+
| 3 | 2 | 'a' |
+-----------------------+
Now I want to get table1 with all it's additional values where the relation to table2 has 'a' AND 'b' as values.
So I would get the id 1 of table1.
Currently I have an query like this:
SELECT t1.[id], t1.[data]
FROM [table1] t1,
(SELECT [id]
FROM [table1] t1
JOIN [table2] t2 ON t1.[id] = t2.[table1_id] AND t2.[Value] IN('a', 'b')
GROUP BY t1[id]
HAVING COUNT(t2.[Value]) = 2) x
WHERE t1.id = x.id
Has anyone an idea on how to achieve my goal in a simpler way?
One way uses exists:
select t1.*
from table1 t1
where exists (select 1
from table2 t2
where t2.table1_id = t1.id and t2.value = 'a'
) and
exists (select 1
from table2 t2
where t2.table1_id = t1.id and t2.value = 'b'
);
This can take advantage of an index on table2(table1_id, value).
You could also write:
select t1.*
from table1 t1
where (select count(distinct t2.value)
from table2 t2
where t2.table1_id = t1.id and t2.value in ('a', 'b')
) = 2 ;
This would probably also have very good performance with the index, if table2 doesn't have duplicates.
SELECT T1.[id], T1.[data]
FROM table1 AS T1
JOIN table2 AS T2
ON T1.[id]=T2.[table1_id]
JOIN table2 AS T3
ON T1.[id]=T3.[table1_id]
WHERE
T2.[Value] ='a'
AND T3.[Value] = 'b'
As Gordon Linoff suggested, exists clause usage works as well and could be performance efficient depending on the data you are playing with.
you have to do several steps to solve the problem:
established which records are related to table 1 and table 2 and which of these are of value (A or B) and eliminate the repeated ones with the group by(InfoRelationate )
validate that only those related to a and b were allowed by means of a count in the table above (ValidateAYB)
see what data meets the condition of table1 and table 2 and joined table 1
this query meets the conditions
with InfoRelationate as
(
select Table2.table1_id,value
from Table2 inner join
Table1 on Table2.table1_id=Table1.id and Table2.value IN('a', 'b')
group by Table2.table1_id,value
),
ValidateAYB as
(
select InfoRelationate.table1_id
from InfoRelationate
group by InfoRelationate.table1_id
having count (1)=2
)
select InfoRelationate.table1_id,InfoRelationate.value
from InfoRelationate
inner join ValidateAYB on InfoRelationate.table1_id=ValidateAYB.table1_id
union all
select id,data
from Table1
Example code
So I want to update the action column to the value 'Insert' inside Table1, if the ids from Table1 and Table2 match but the UIDs dont.
Right now my query looks like
UPDATE Table1
SET Action = 'Insert'
FROM Table1
JOIN Table2 ON Table1.id = Table2.id
AND Table1.UID <> Table2.UID
This is setting the action to Insert even if the UIDs don't differ, can someone help me and explain why this is behaving this way?
My assumption is you have something like this:
Table1
id | UID | action
1 | 1 | bla
1 | 2 | bleck
1 | 3 | floop
Table2
id | UID | action
1 | 1 | bla
1 | 2 | bleck
1 | 4 | floop
And you hope to update the third row in Table1 because the UID isn't in Table2.
The problem is that the third row in Table2 matches all rows in Table1 on your condition: Table1.id = Table2.id AND Table1.UID <> Table2.UID
Which means that in this case, all rows in Table1 will be updated with Action = 'Insert'
I think you want to use NOT EXISTS():
UPDATE T1
SET Action = 'Insert'
FROM Table1 T1
WHERE NOT EXISTS (SELECT *
FROM Table2 T2
WHERE T1.id = T2.id
AND T1.UID = T2.UID)
Edit, more explanation on why the join fails:
This is a many to many join, meaning that the condition allows multiple rows from Table1 to match multiple rows from Table2
The easiest way to see this in action is to change your update to a select:
SELECT *
FROM Table1 T1
JOIN Table2 T2 on T1.id = T2.id
and T1.UID <> T2.UID
You may expect this to result in:
id | UID | action id | UID | action
1 | 3 | floop 1 | 4 | floop
But really it will result in:
id | UID | action id | UID | action
1 | 1 | bla 1 | 4 | floop
1 | 2 | bleck 1 | 4 | floop
1 | 3 | floop 1 | 4 | floop
This means that when you update you are hitting all the rows for id = 1 in Table1
If you put condition Table1.UID <> Table2.UID into WHERE clause, doesn't it solve your problem?
UPDATE Table1
SET Action = 'Insert'
FROM Table1
JOIN Table2 ON Table1.id = Table2.id
WHERE Table1.UID <> Table2.UID
I need to modify the following code to search for groups where one surv is null and the other is not. Currently, the query returns groups where both surv is populated. I am looking all groups where the surv for one record A does not match an id in the other record B, but only in cases where the surv in record B is null.
SELECT *
FROM MY_TABLE t3
WHERE t3.GROUP_id IN (
SELECT t1.GROUP_id
FROM MY_TABLE t1, MY_TABLE t2
WHERE t1.id <> t2.id
AND t1.GROUP_id = t2.GROUP_id
AND t1.id <> t2.surv
AND t2.id <> t1.surv
);
This is returning differences where both survs are populated. What am I missing?
edit:
---------------------------------
| group | id | surv |
----------------------------------
| 1 | 1 | null |
| 1| | 2 | 1 |
| 2 | 3 | 107 |
| 2 | 4 | null |
| 3 | 5 | 89 |
| 3 | 6 | 89 |
----------------------------------
return
---------------------------------
| group | id | surv |
----------------------------------
| 2 | 3 | 107 |
| 2 | 4 | null |
----------------------------------
reason:
group 1 has id 1 matches to surv of the second record; as such we do not want it returned.
group 2, id 3 has a surv that does not match the ID of the other record. Along with this, the second surv field is null. This is what we need returned.
group 3, both have a surv of not null. These are not needed.
edit 2: I eventually came up with this query:
SELECT cluster_id, oidmu, survoid
FROM MY_TABLE t3
WHERE t3.GROUP_id IN (
SELECT t1.GROUP_id
FROM MY_TABLE t1, MY_TABLE t2
WHERE t1.ID <> t2.ID
AND t1.GROUP_id = t2.GROUP_id
AND (t1.ID <> t2.SURV and t1.SURV is null)
);
add and t2.surv is null to your query.
SELECT *
FROM MY_TABLE t3
WHERE t3.GROUP_id IN (
SELECT t1.GROUP_id
FROM MY_TABLE t1, MY_TABLE t2
WHERE t1.id <> t2.id
AND t1.GROUP_id = t2.GROUP_id
AND t1.id <> t2.surv
AND t2.id <> t1.surv
and t2.surv is null
);
If you just want the groups, perhaps an aggregation will do:
SELECT t.GROUP_ID
FROM MY_TABLE t
GROUP BY GROUP_ID
HAVING COUNT(surv) > 0 AND -- at least one is not null
COUNT(surv) < COUNT(*); -- at least one is null
Actually, even if you do need the original rows, you could do this with analytic functions:
SELECT t.GROUP_ID
FROM (SELECT t.*, COUNT(*) OVER (PARTITION BY GROUP_ID) as cnt,
COUNT(surv) OVER (PARTITION BY GROUP_ID) as cnt_surv
FROM MY_TABLE t
) t
WHERE cnt_surv > 0 and cnt_surv < cnt
Lets say I have a table1:
id name
-------------
1 "one"
2 "two"
3 "three"
And a table2 with a foreign key to the first:
id tbl1_fk option value
-------------------------------
1 1 1 1
2 2 1 1
3 1 2 1
4 3 2 1
Now I want to have as a query result:
table1.id | table1.name | option | value
-------------------------------------
1 "one" 1 1
2 "two" 1 1
3 "three"
1 "one" 2 1
2 "two"
3 "three" 2 1
How do I achieve that?
I already tried:
SELECT
table1.id,
table1.name,
table2.option,
table2.value
FROM table1 AS table1
LEFT outer JOIN table2 AS table2 ON table1.id = table2.tbl1fk
but the result seems to omit the null vales:
1 "one" 1 1
2 "two" 1 1
1 "one" 2 1
3 "three" 2 1
SOLVED: thanks to Mahmoud Gamal: (plus the GROUP BY)
Solved with this query
SELECT
t1.id,
t1.name,
t2.option,
t2.value
FROM
(
SELECT t1.id, t1.name, t2.option
FROM table1 AS t1
CROSS JOIN table2 AS t2
) AS t1
LEFT JOIN table2 AS t2 ON t1.id = t2.tbl1fk
AND t1.option = t2.option
group by t1.id, t1.name, t2.option, t2.value
ORDER BY t1.id, t1.name
You have to use CROSS JOIN to get every possible combination of name from the first table with the option from the second table. Then LEFT JOIN these combination with the second table. Something like:
SELECT
t1.id,
t1.name,
t2.option,
t2.value
FROM
(
SELECT t1.id, t1.name, t2.option
FROM table1 AS t1
CROSS JOIN table2 AS t2
) AS t1
LEFT JOIN table2 AS t2 ON t1.id = t2.tbl1_fk
AND t1.option = t2.option
SQL Fiddle Demo
Simple version: option = group
It's not specified in the Q, but it seems like option is supposed to define a group somehow. In this case, the query can simply be:
SELECT t1.id, t1.name, t2.option, t2.value
FROM (SELECT generate_series(1, max(option)) AS option FROM table2) o
CROSS JOIN table1 t1
LEFT JOIN table2 t2 ON t2.option = o.option AND t2.tbl1_fk = t1.id
ORDER BY o.option, t1.id;
Or, if options are not numbered in sequence, starting with 1:
...
FROM (SELECT DISTINCT option FROM table2) o
...
Returns:
id | name | option | value
----+-------+--------+-------
1 | one | 1 | 1
2 | two | 1 | 1
3 | three | |
1 | one | 2 | 1
2 | two | |
3 | three | 2 | 1
Faster and cleaner, avoiding the big CROSS JOIN and the big GROUP BY.
You get distinct rows with a group number (grp) per set.
Requires Postgres 8.4+.
More complex: group indicated by sequence of rows
WITH t2 AS (
SELECT *, count(step OR NULL) OVER (ORDER BY id) AS grp
FROM (
SELECT *, lag(tbl1_fk, 1, 2147483647) OVER (ORDER BY id) >= tbl1_fk AS step
FROM table2
) x
)
SELECT g.grp, t1.id, t1.name, t2.option, t2.value
FROM (SELECT generate_series(1, max(grp)) AS grp FROM t2) g
CROSS JOIN table1 t1
LEFT JOIN t2 ON t2.grp = g.grp AND t2.tbl1_fk = t1.id
ORDER BY g.grp, t1.id;
Result:
grp | id | name | option | value
-----+----+-------+--------+-------
1 | 1 | one | 1 | 1
1 | 2 | two | 1 | 1
1 | 3 | three | |
2 | 1 | one | 2 | 1
2 | 2 | two | |
2 | 3 | three | 2 | 1
-> SQLfiddle for both.
How?
Explaining the complex version ...
Every set is started with a tbl1_fk <= the last one. I check for this with the window function lag(). To cover the corner case of the first row (no preceding row) I provide the biggest possible integer 2147483647 the default for lag().
With count() as aggregate window function I add the running count to each row, effectively forming the group number grp.
I could get a single instance for every group with:
(SELECT DISTINCT grp FROM t2) g
But it's faster to just get the maximum and employ the nifty generate_series() for the reduced CROSS JOIN.
This CROSS JOIN produces exactly the rows we need without any surplus. Avoids the need for a later GROUP BY.
LEFT JOIN t2 to that, using grp in addition to tbl1_fk to make it distinct.
Sort any way you like - which is possible now with a group number.
try this
SELECT
table1.id, table1.name, table2.option, table2.value FROM table1 AS table11
JOIN table2 AS table2 ON table1.id = table2.tbl1_fk
This is enough:
select * from table1 left join table2 on table1.id=table2.tbl1_fk ;