Oracle: Update with join not working as expected - sql

Edit: The answer given by mathguy works perfectly. But I really wanted to understand why the first update statement isn't working and second one does. I know that + operator is not suggested, but in this case as the second table was in subquery, I had to use it.
Short Question. (Detailed explanation and Create/Insert statements below)
What is the difference between these 2 update statement and why first one is not working as expected, whereas second one does.
update d_dim d
set FLAG=
(select case when t.id is null then 'N' else 'Y' end as FLAG
from t_temp t
where d.id=t.id(+)
);
And
update d_dim d
set FLAG=
(select case when t.id is null then 'N' else 'Y' end as FLAG
from t_temp t,d_dim d1
where d1.id=t.id(+)
and d1.id=d.id
);
Detailed Explanation
I am trying to replicate a workplace scenario. Unfortunately SQLFiddle for Oracle is not working so couldn't create fiddle demo.
I have 2 tables, d_dim(ID,FLAG) and t_temp(ID) like below
select * from d_dim;
+----+------+
| ID | FLAG |
+----+------+
| 1 | |
| 2 | |
| 3 | |
| 4 | |
+----+------+
select * from t_temp;
+----+
| ID |
+----+
| 1 |
| 3 |
+----+
Now I need to set the FLAG in d_dim as Y or N.
If ID exists in t_temp the set it Y. Else set it N.
So expected output should be like.
+----+------+
| ID | FLAG |
+----+------+
| 1 | Y |
| 2 | N |
| 3 | Y |
| 4 | N |
+----+------+
This is the update statement I am using (Using (+) as in this case I need left join from d_dim to t_temp
update d_dim d
set FLAG=
(select case when t.id is null then 'N' else 'Y' end as FLAG
from t_temp t
where d.id=t.id(+)
)
--4 rows updated.
But ID 2 and 4 are updated as NULL.
select * from d_dim;
+----+------+
| ID | FLAG |
+----+------+
| 1 | Y |
| 2 | |
| 3 | Y |
| 4 | |
+----+------+
If I use just the select clause after plugging in d_dim table, I get correct output.
select d.id,
case when t.id is null then 'N' else 'Y' end as FLAG
from t_temp t,d_dim d
where d.id=t.id(+)
order by id
+----+------+
| ID | FLAG |
+----+------+
| 1 | Y |
| 2 | N |
| 3 | Y |
| 4 | N |
+----+------+
I did some hit and trial and came up with this query, which seems to be working
update d_dim d
set FLAG=(select case when t.id is null then 'N' else 'Y' end as FLAG
from t_temp t,d_dim d1
where d1.id=t.id(+)
and d1.id=d.id);
select * from d_dim;
+----+------+
| ID | FLAG |
+----+------+
| 1 | Y |
| 2 | N |
| 3 | Y |
| 4 | N |
+----+------+
So my question is that
Why the initial update statement doesn't work properly and why is it
updating null for id 2 and 4.
Please find the Create and Insert statements below
CREATE TABLE d_dim (id int, flag varchar2(4));
INSERT ALL INTO d_dim (id, flag) VALUES (1, NULL)
INTO d_dim (id, flag) VALUES (2, NULL)
INTO d_dim (id, flag) VALUES (3, NULL)
INTO d_dim (id, flag) VALUES (4, NULL)
SELECT * FROM dual;
CREATE TABLE t_temp (id int)
;
INSERT ALL
INTO t_temp (id) VALUES (1)
INTO t_temp (id) VALUES (3)
SELECT * FROM dual;

In the second query you have an outer join. In the first query you don't have any kind of join; you simply have a select from t, with a where clause where there is a (+) after t.id. I don't know why that syntax doesn't return an error; but when d.id doesn't exist in t, that subquery returns no rows, and that's how update works when the update value is supposed to be the output of a scalar subquery: if the subquery returns no rows, the update statement will update the field with NULL.
You didn't ask for a different way to make the update work, but if you want to see one, here it is. No doubt you know how to do this; offering it for the benefit of other forum members.
update d_dim
set FLAG = case when id in (select id from t_temp) then 'Y' else 'N' end;
EDIT: It seems the OP didn't fully understand my point so here are more details.
The Oracle documentation states explicitly:
•The (+) operator does not produce an outer join if you specify one table in the outer query and the other table in an inner query.
https://docs.oracle.com/cd/B28359_01/server.111/b28286/queries006.htm
(under the heading "Outer Joins", after the "See Also" box)

An update statement would update based on whether the condition was satisfied for the update as specified in the subselect clause.
When you mention
update d_dim d
set FLAG=
(select case when t.id is null then 'N' else 'Y' end as FLAG
from t_temp t
where d.id=t.id(+)
);
This statement would be executed only when d.id is equal to t.id. In your case, for id=2, d.id is 2, t.id is null. 2 is NOT equal to null, so the update has no effect on id 2.

Related

Comparing different columns in SQL for each row

after some transformation I have a result from a cross join (from table a and b) where I want to do some analysis on. The table for this looks like this:
+-----+------+------+------+------+-----+------+------+------+------+
| id | 10_1 | 10_2 | 11_1 | 11_2 | id | 10_1 | 10_2 | 11_1 | 11_2 |
+-----+------+------+------+------+-----+------+------+------+------+
| 111 | 1 | 0 | 1 | 0 | 222 | 1 | 0 | 1 | 0 |
| 111 | 1 | 0 | 1 | 0 | 333 | 0 | 0 | 0 | 0 |
| 111 | 1 | 0 | 1 | 0 | 444 | 1 | 0 | 1 | 1 |
| 112 | 0 | 1 | 1 | 0 | 222 | 1 | 0 | 1 | 0 |
+-----+------+------+------+------+-----+------+------+------+------+
The ids in the first column are different from the ids in the sixth column.
In a row are always two different IDs that are matched with each other. The other columns always have either 0 or 1 as a value.
I am now trying to find out how many values(meaning both have "1" in 10_1, 10_2 etc) two IDs have on average in common, but I don't really know how to do so.
I was trying something like this as a start:
SELECT SUM(CASE WHEN a.10_1 = 1 AND b.10_1 = 1 then 1 end)
But this would obviously only count how often two ids have 10_1 in common. I could make something like this for example for different columns:
SELECT SUM(CASE WHEN (a.10_1 = 1 AND b.10_1 = 1)
OR (a.10_2 = 1 AND b.10_1 = 1) OR [...] then 1 end)
To count in general how often two IDs have one thing in common, but this would of course also count if they have two or more things in common. Plus, I would also like to know how often two IDS have two things, three things etc in common.
One "problem" in my case is also that I have like ~30 columns I want to look at, so I can hardly write down for each case every possible combination.
Does anyone know how I can approach my problem in a better way?
Thanks in advance.
Edit:
A possible result could look like this:
+-----------+---------+
| in_common | count |
+-----------+---------+
| 0 | 100 |
| 1 | 500 |
| 2 | 1500 |
| 3 | 5000 |
| 4 | 3000 |
+-----------+---------+
With the codes as column names, you're going to have to write some code that explicitly references each column name. To keep that to a minimum, you could write those references in a single union statement that normalizes the data, such as:
select id, '10_1' where "10_1" = 1
union
select id, '10_2' where "10_2" = 1
union
select id, '11_1' where "11_1" = 1
union
select id, '11_2' where "11_2" = 1;
This needs to be modified to include whatever additional columns you need to link up different IDs. For the purpose of this illustration, I assume the following data model
create table p (
id integer not null primary key,
sex character(1) not null,
age integer not null
);
create table t1 (
id integer not null,
code character varying(4) not null,
constraint pk_t1 primary key (id, code)
);
Though your data evidently does not currently resemble this structure, normalizing your data into a form like this would allow you to apply the following solution to summarize your data in the desired form.
select
in_common,
count(*) as count
from (
select
count(*) as in_common
from (
select
a.id as a_id, a.code,
b.id as b_id, b.code
from
(select p.*, t1.code
from p left join t1 on p.id=t1.id
) as a
inner join (select p.*, t1.code
from p left join t1 on p.id=t1.id
) as b on b.sex <> a.sex and b.age between a.age-10 and a.age+10
where
a.id < b.id
and a.code = b.code
) as c
group by
a_id, b_id
) as summ
group by
in_common;
The proposed solution requires first to take one step back from the cross-join table, as the identical column names are super annoying. Instead, we take the ids from the two tables and put them in a temporary table. The following query gets the result wanted in the question. It assumes table_a and table_b from the question are the same and called tbl, but this assumption is not needed and tbl can be replaced by table_a and table_b in the two sub-SELECT queries. It looks complicated and uses the JSON trick to flatten the columns, but it works here:
WITH idtable AS (
SELECT a.id as id_1, b.id as id_2 FROM
-- put cross join of table a and table b here
)
SELECT in_common,
count(*)
FROM
(SELECT idtable.*,
sum(CASE
WHEN meltedR.value::text=meltedL.value::text THEN 1
ELSE 0
END) AS in_common
FROM idtable
JOIN
(SELECT tbl.id,
b.*
FROM tbl, -- change here to table_a
json_each(row_to_json(tbl)) b -- and here too
WHERE KEY<>'id' ) meltedL ON (idtable.id_1 = meltedL.id)
JOIN
(SELECT tbl.id,
b.*
FROM tbl, -- change here to table_b
json_each(row_to_json(tbl)) b -- and here too
WHERE KEY<>'id' ) meltedR ON (idtable.id_2 = meltedR.id
AND meltedL.key = meltedR.key)
GROUP BY idtable.id_1,
idtable.id_2) tt
GROUP BY in_common ORDER BY in_common;
The output here looks like this:
in_common | count
-----------+-------
2 | 2
3 | 1
4 | 1
(3 rows)

SQL Select a group when attributes match at least a list of values

Given a table with a (non-distinct) identifier and a value:
| ID | Value |
|----|-------|
| 1 | A |
| 1 | B |
| 1 | C |
| 1 | D |
| 2 | A |
| 2 | B |
| 2 | C |
| 3 | A |
| 3 | B |
How can you select the grouped identifiers, which have values for a given list? (e.g. ('B', 'C'))
This list might also be the result of another query (like SELECT Value from Table1 WHERE ID = '2' to find all IDs which have a superset of values, compared to ID=2 (only ID=1 in this example))
Result
| ID |
|----|
| 1 |
| 2 |
1 and 2 are part of the result, as they have both A and B in their Value-column. 3 is not included, as it is missing C
Thanks to the answer from this question: SQL Select only rows where exact multiple relationships exist I created a query which works for a fixed list. However I need to be able to use the results of another query without changing the query. (And also requires the Access-specific IFF function):
SELECT ID FROM Table1
GROUP BY ID
HAVING SUM(Value NOT IN ('A', 'B')) = 0
AND SUM(IIF(Value='A', 1, 0)) = 1
AND SUM(IIF(Value='B', 1, 0)) = 1
In case it matters: The SQL is run on a Excel-table via VBA and ADODB.
In the where criteria filter on the list of values you would like to see, group by id and in the having clause filter on those ids which have 3 matching rows.
select id from table1
where value in ('A', 'B', 'C') --you can use a result of another query here
group by id
having count(*)=3
If you can have the same id - value pair more than once, then you need to slightly alter the having clause: having count(distinct value)=3
If you want to make it completely dynamic based on a subquery, then:
select id, min(valcount) as minvalcount from table1
cross join (select count(*) as valcount from table1 where id=2) as t1
where value in (select value from table1 where id=2) --you can use a result of another query here
group by id
having count(*)=minvalcount

Filling null values with last not null one => huge number of columns

I want to copy rows from SOURCE_TABLE to TARGET_TABLE while filling null values with last not null one.
What I have :
SOURCE_TABLE :
A | B | C | PARENT | SEQ_NUMBER
1 | 2 | NULL | 1 | 1
NULL | NULL | 1 | 1 | 2
NULL | 3 | 2 | 1 | 3
DEST_TABLE is empty
What I want :
DEST_TABLE :
A | B | C | PARENT | SEQ_NUMBER
1 | 2 | NULL | 1 | 1
1 | 2 | 1 | 1 | 2
1 | 3 | 2 | 1 | 3
To achieve that I'm dynamically generating the following SQL :
insert into TARGET_TABLE (A, B, C)
select coalesce(A, lag(A ignore nulls) over (partition by parent order by seq_number)) as A,
coalesce(B, lag(B ignore nulls) over (partition by parent order by seq_number)) as B,
coalesce(C, lag(C ignore nulls) over (partition by parent order by seq_number)) as C,
from SOURCE_TABLE;
Everything works fine if SOURCE and TARGET tables have a small number of columns. In my case they have 400+ columns (yes this is bad but it is legacy and cannot be changed) and I got the following error :
ORA-01467: sort key too long
First I don't really understand this error. Is this because I'm using too many lag functions that use themselves "order by"/"partition by" ? Replace coalesce(A, lag(A....)) by coalesce(A,A) and the error disappear.
Then, is there a workaround or another way to achieve the same result ?
Thx
just do it using pl/sql anonymous block:
declare
x tgt_table%rowtype; --keep values from last row
begin
for r in (select * from src_table order by seq_number) loop
x.a := nvl(r.a, x.a);
...
x.z := nvl(r.z, x.z);
insert into tgt_table
values x;
end loop;
end;

Is it possible that LEFT JOIN fails while subquery with NOT IN clause suceeds?

A while I have posted an answer to this question PostgreSQL multiple criteria statement.
Task was quite simple - select values from one table if there is no corresponding value in another table. Assuming we have tables like below:
CREATE TABLE first (foo numeric);
CREATE TABLE second (foo numeric);
we would like to get all the values from first.foo which doesn’t occur in the second.foo. I've proposed two solutions:
using LEFT JOIN
SELECT first.foo
FROM first
LEFT JOIN second
ON first.foo = second.foo
WHERE second.foo IS NULL;
combining subquery and IN operator:
SELECT first.foo
FROM first
WHERE first.foo NOT IN (
SELECT second.foo FROM second
);
For some reason the first wouldn't work (returned 0 rows) in the context of the OP and it has been bugging me since then. I've tried to reproduce that issue using different versions of PostgreSQL but no luck so far.
Is there any particular reason why the first solution would fail and the second worked as expected? Am I missing something obvious?
Here is sqlfiddle but it seems to work on any available platform.
Edit
Like #bma and #MostyMostacho pointed out in the comments it should be rather second one that returned no results (sqlfiddle).
As per your sql fiddle, your NOT IN query fails to return results because of the NULL in the second table.
The problem is that NULL means "UNKNOWN" and therefore we cannot say that the following expression is true: 10 not in (5, null).
The reason is what happens when 10 = NULL is compared. We get a NULL back, not a true. This means that a NULL in the NOT IN clause means that no rows will ever pass.
To get the second one to perform the way you expect you have a relatively convoluted query:
SELECT first.foo
FROM first
WHERE (first.foo IN (
SELECT second.foo FROM second
) IS NOT TRUE);
This will properly handle the NULL comparisons, but the join syntax is probably cleaner.
select values from one table if there is no corresponding value in another table. You just answered your own question:
SELECT o.value
FROM table_one o
WHERE NOT EXISTS (
SELECT *
FROM table_two t
WHERE t.value = o.value
);
A short demonstration:
CREATE TABLE first (foo numeric);
CREATE TABLE second (foo numeric);
INSERT INTO first VALUES (1);
INSERT INTO first VALUES (2);
INSERT INTO first VALUES (3);
INSERT INTO first VALUES (4);
INSERT INTO first VALUES (5);
INSERT INTO first VALUES (NULL); -- added this for completeness
INSERT INTO second VALUES (1);
INSERT INTO second VALUES (3);
INSERT INTO second VALUES (NULL);
SELECT f.foo AS ffoo, s.foo AS sfoo
-- these expressions all yield boolean values
, (f.foo = s.foo) AS is_equal
, (f.foo IN (SELECT foo FROM second)) AS is_in
, (f.foo NOT IN (SELECT foo FROM second)) AS is_not_in
, (EXISTS (SELECT * FROM second x WHERE x.foo = f.foo)) AS does_exist
, (NOT EXISTS (SELECT * FROM second x WHERE x.foo = f.foo)) AS does_not_exist
, (EXISTS (SELECT * FROM first x LEFT JOIN second y ON x.foo = y.foo
WHERE x.foo = f.foo AND y.foo IS NULL))
AS left_join_is_null
FROM first f
FULL JOIN second s ON (f.foo = s.foo AND (f.foo IS NOT NULL OR s.foo IS NOT NULL) )
;
Result:
CREATE TABLE
CREATE TABLE
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
ffoo | sfoo | is_equal | is_in | is_not_in | does_exist | does_not_exist | left_join_is_null
------+------+----------+-------+-----------+------------+----------------+-------------------
1 | 1 | t | t | f | t | f | f
2 | | | | | f | t | t
3 | 3 | t | t | f | t | f | f
4 | | | | | f | t | t
5 | | | | | f | t | t
| | | | | f | t | f
| | | | | f | t | f
(7 rows)
As you can see, the boolean can be NULL for the IN() and equals cases.
It cannot be NULL for the EXISTS() case. To be or not to be.
The LEFT JOIN ... WHERE s.foo IS NULL is (almost) equivalent to the NOT EXISTS case, except that it actually includes second.* into the query results (which is not needed, in most cases)

Distinct Values Ignoring Column Order

I have a table similar to:-
+----+---+---+
| Id | A | B |
+----+---+---+
| 1 | 1 | 2 |
+----+---+---+
| 2 | 2 | 1 |
+----+---+---+
| 3 | 3 | 4 |
+----+---+---+
| 4 | 0 | 5 |
+----+---+---+
| 5 | 5 | 0 |
+----+---+---+
I want to remove all duplicate pairs of values, regardless of which column contains which value, e.g. after whatever the query might be I want to see:-
+----+---+---+
| Id | A | B |
+----+---+---+
| 1 | 1 | 2 |
+----+---+---+
| 3 | 3 | 4 |
+----+---+---+
| 4 | 0 | 5 |
+----+---+---+
I'd like to find a solution in Microsoft SQL Server (has to work in <= 2005, though I'd be interested in any solutions which rely upon >= 2008 features regardless).
In addition, note that A and B are going to be in the range 1-100 (but that's not guaranteed forever. They are surrogate seeded integer foreign keys, however the foreign table might grow to a couple hundred rows max).
I'm wondering whether I'm missing some obvious solution here. The ones which have occurred all seem rather overwrought, though I do think they'd probably work, e.g.:-
Have a subquery return a bitfield with each bit corresponding to one of the ids and use this value to remove duplicates.
Somehow, pivot, remove duplicates, then unpivot. Likely to be tricky.
Thanks in advance!
Test data and sample below.
Basically, we do a self join with an OR criteria so either a=a and b=b OR a=b and b=a.
The WHERE in the subquery gives you the max for each pair to eliminate.
I think this should work for triplicates as well (note I added a 6th row).
DECLARE #t table(id int, a int, b int)
INSERT INTO #t
VALUES
(1,1,2),
(2,2,1),
(3,3,4),
(4,0,5),
(5,5,0),
(6,5,0)
SELECT *
FROM #t
WHERE id NOT IN (
SELECT a.id
FROM #t a
INNER JOIN #t b
ON (a.a=b.a
AND a.b=b.b)
OR
(a.b=b.a
AND a.a = b.b)
WHERE a.id > b.id)
Try:
select min(Id) Id, A, B
from (select Id, A, B from DuplicatesTable where A <= B
union all
select Id, B A, A B from DuplicatesTable where A > B) v
group by A, B
order by 1
Not 100% tested and I'm sure it can be tidied up but it produces your required result:
DECLARE #T TABLE (id INT IDENTITY(1,1), A INT, B INT)
INSERT INTO #T
VALUES (1,2), (2,1), (3,4), (0,5), (5,0);
SELECT *
FROM #T
WHERE id IN (SELECT DISTINCT MIN(id)
FROM (SELECT id, a, b
FROM #T
UNION ALL
SELECT id, b, a
FROM #T) z
GROUP BY a, b)