Postgresql update column based on set of values from another table - sql

Dummy data to illustrate my problem:
create table table1 (category_id int,unit varchar,is_valid bool);
insert into table1 (category_id, unit, is_valid)
VALUES (1, 'a', true), (2, 'z', true);
create table table2 (category_id int,unit varchar);
insert into table2 (category_id, unit)
values(1, 'a'),(1, 'b'),(1, 'c'),(2, 'd'),(2, 'e');
So the data looks like:
Table 1:
category_id
unit
is_valid
1
a
true
2
z
true
Table 2:
category_id
unit
1
a
1
b
1
c
2
d
2
e
I want to update the is_valid column in Table 1, if the category_id/unit combination from Table 1 doesn't match any of the rows in Table 2. For example, the first row in Table 1 is valid, since (1, a) is in Table 2. However, the second row in Table 1 is not valid, since (2, z) is not in Table 2.
How can I update the column using postgresql? I tried a few different where clauses of the form
UPDATE table1 SET is_valid = false WHERE...
but I cannot get a WHERE clause that works how I want.

You can just set the value of is_valid the the result of a ` where exists (select ...). See Demo.
update table1 t1
set is_valid = exists (select null
from table2 t2
where (t2.category_id, t2.unit) = (t1.category_id, t1.unit)
);
NOTES:
Advantage: Query correctly sets the is_valid column regardless of the current value and is a vary simple query.
Disadvantage: Query sets the value of is_valid for every row in the table; even thoes already correctly set.
You need to decide whether the disadvantage out ways the advantage. If so then the same basic technique in a much more complicated query:
with to_valid (category_id, unit, is_valid) as
(select category_id
, unit
, exists (select null
from table2 t2
where (t2.category_id, t2.unit) = (t1.category_id, t1.unit)
)
from table1 t1
)
update table1 tu
set is_valid = to_valid.is_valid
from to_valid
where (tu.category_id, tu.unit) = (to_valid.category_id, to_valid.unit)
and tu.is_valid is distinct from to_valid.is_valid;

Related

How to correct my Snowflake Unique Constraint SQL statement?

I have a table that looks like:
ID|CREATED |VALUE
1 |1649122158|200
1 |1649122158|200
1 |1649122158|200
That I'd like to look like:
ID|CREATED |VALUE
1 |1649122158|200
And I run the following query:
DELETE FROM MY_TABLE T USING (SELECT ID,CREATED,ROW_NUMBER() OVER (PARTITION BY ID ORDER BY CREATED DESC) AS RANK_IN_KEY FROM MY_TABLE T) X WHERE X.RANK_IN_KEY <> 1 AND T.ID = X.ID AND T.CREATED = X.CREATED
But it removes everything from MY_TABLE and not just other rows with the same value. This is more than just selecting distinct records, I'd like to enforce a unique constraint to get the latest value of ID and keep just one record for it, even if there were duplicates.
So
ID|CREATED |VALUE
1 |1649122158|200
1 |1649122159|300
2 |1649122158|200
2 |1649122158|200
3 |1649122170|500
3 |1649122160|200
Would become (using the same final unique constraint statement):
ID|CREATED |VALUE
1 |1649122159|300
2 |1649122158|200
3 |1649122170|500
How can I improve my logic to properly handle these unique constraint modifications?
Check out this post: https://community.snowflake.com/s/question/0D50Z00008EJgemSAD/how-to-delete-duplicate-records-
If all columns make up a unique records, the recommended solution is the insert all the records into a new table with SELECT DISTINCT * and do a swap. You could also do a INSERT OVERWRITE INTO the same table.
Something like INSERT OVERWRITE INTO tableA SELECT DISTINCT * FROM tableA;
The following setup should leave rows with id of 1 and 3. And not delete all rows as you say.
Schema
create table t (
id int,
created int ,
value int
);
insert into t values(1, 1649122158, 200);
insert into t values(1 ,1649122159, 300);
insert into t values(2 ,1649122158, 200);
insert into t values(2 ,1649122158, 200);
insert into t values(3 ,1649122170, 500);
insert into t values(3 ,1649122160, 200);
Delete statement
with x as (
SELECT
id, created,
row_number() over(partition by id) as r
FROM t
)
delete from t
using x
where x.id = t.id and x.r <> 1 and x.created = t.created
;
Output
select * from t;
1 1649122158 200
3 1649122170 500
The logic is such, that the table in the using clause is joined with the operated on table. Following the join logic, it just matches by some key. In your case, you have key as {id,created}. This key is duplicated for rows with id of 2. So the whole group is deleted.
I'm no savvy in database schemas. But as a thought, you may add a row with a rank to existing table. And after that you can proceed with deletion. This way you do not need to create other table and insert values to that. Be warned that data may become fragmented(physically, on disks). So you will need to run some kind of tune up later.
Update
You may find this almost one-liner interesting:
SO answer
I will duplicate code here, as it is so small and well written.
WITH
u AS (SELECT DISTINCT * FROM your_table),
x AS (DELETE FROM your_table)
INSERT INTO your_table SELECT * FROM u;

Update each row with a new entity

I have two tables:
CREATE TABLE public.test
(
id integer NOT NULL GENERATED ALWAYS AS IDENTITY
)
and
CREATE TABLE public.test2
(
id integer,
test_id integer
)
Table test2 has two rows (1, null) and (2, null). Table test has nothing. Now I want to fill test_id by creating new rows in test. I nead a new entity each time so that I will have (1, 1), (2, 2), etc. I try to prepare update query with an insert statement but I don't understand how to do it. This is what I try:
update t2 set t2.test_id = t.id
from test2 t2 full join (INSERT INTO test(id) VALUES (default) RETURNING id) t on t2.test_id = t.id
but I get the following:
ERROR: syntax error at or near "INTO"
LINE 2: from test2 t2 full join (INSERT INTO test(id) VALUES (defaul...
^
SQL state: 42601
Character: 65
Can I create the query I want somehow?
Having just one column in the target table makes things a little tricky. It might be simpler to generate and assign the new ids first, using next_val, and then insert the values in test (we need option overriding system value to insert into a generated always colum,)
with t2 as (
update test2
set test_id = nextval('test_id_seq')
where test_id is null
returning test_id
)
insert into test(id) overriding system value
select test_id from t2
Demo on DB Fiddlde

Oracle -- Update the exact column referenced in the ON clause

I think this requirement is rarely encountered so I couldn't search for similar questions.
I have a table that needs to update the ID. For example ID 123 in table1 is actually supposed to be 456. I have a separate reference table built that stores the mapping (e.g. old 123 maps to new id 456).
I used the below query but apparently it returned error 38104, columns referenced in the ON clause cannot be updated.
MERGE INTO table1
USING ref_table ON (table1.ID = ref_table.ID_Old)
WHEN MATCHED THEN UPDATE SET table.ID = ref_table.ID_New;
Is there other way to achieve my purpose?
Thanks and much appreciated for your answer!
Use the ROWID pseudocolumn:
SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE TABLE1( ID ) AS
SELECT 1 FROM DUAL UNION ALL
SELECT 2 FROM DUAL UNION ALL
SELECT 3 FROM DUAL;
CREATE TABLE REF_TABLE( ID_OLD, ID_NEW ) AS
SELECT 1, 4 FROM DUAL UNION ALL
SELECT 2, 5 FROM DUAL;
MERGE INTO TABLE1 dst
USING ( SELECT t.ROWID AS rid,
r.id_new
FROM TABLE1 t
INNER JOIN REF_TABLE r
ON ( t.id = r.id_old ) ) src
ON ( dst.ROWID = src.RID )
WHEN MATCHED THEN
UPDATE SET id = src.id_new;
Query 1:
SELECT * FROM table1
Results:
| ID |
|----|
| 4 |
| 5 |
| 3 |
You can't update a column used in the ON clause in a MERGE. But if you don't need to make other changes that MERGE allows like WHEN NOT MATCHED or deleting, etc. you can just use a UPDATE to achieve this.
You mentioned this is an ID that needs an update. Here's an example using a scalar subquery. As it is an ID, this presumes UNIQUE ID_OLD values in REF_TABLE. I wasn't sure if Every row needs an update or only a sub-set, so set the update here to only update rows that have a value in REF_TABLE.
CREATE TABLE TABLE1(
ID NUMBER
);
CREATE TABLE REF_TABLE(
ID_OLD NUMBER,
ID_NEW NUMBER
);
INSERT INTO TABLE1 VALUES (1);
INSERT INTO TABLE1 VALUES (2);
INSERT INTO TABLE1 VALUES (100);
INSERT INTO REF_TABLE VALUES (1,10);
INSERT INTO REF_TABLE VALUES (2,20);
Initial State:
SELECT * FROM TABLE1;
ID
1
2
100
Then make the UPDATE
UPDATE TABLE1
SET TABLE1.ID = (SELECT REF_TABLE.ID_NEW
FROM REF_TABLE
WHERE REF_TABLE.ID_OLD = ID)
WHERE TABLE1.ID IN (SELECT REF_TABLE.ID_OLD
FROM REF_TABLE);
2 rows updated.
And check the change:
SELECT * FROM TABLE1;
ID
10
20
100

Quick comparison of two columns in other TABLE

How to quickly find values ​​in a column that does not contain another column in another table
The problem is the speed of the query that is dynamically built in "execute immediate "stmt and average size of test tables test_table = 40mln and test_table2 = 1mln
Unfortunately I've been able to find similar topics and i will be grateful for any help
My queries:
select pole2 from test_table tt
where exists( select 1 from test_table2 tt2
where tt2.pole1 = 'ABC'
and tt.pole2 != tt.pole2)
select pole2 from test_table tt
where pole2 not in ( select pole2 from test_table2 tt2
where tt2.pole1 = 'ABC')

In postgresql, how can I fill in missing values within a column?

I'm trying to figure out how to fill in values that are missing from one column with the non-missing values from other rows that have the same value on a given column. For instance, in the below example, I'd want all the "1" values to be equal to Bob and all of the "2" values to be equal to John
ID # | Name
-------|-----
1 | Bob
1 | (null)
1 | (null)
2 | John
2 | (null)
2 | (null)
`
EDIT: One caveat is that I'm using postgresql 8.4 with Greenplum and so correlated subqueries are not supported.
CREATE TABLE bobjohn
( ID INTEGER NOT NULL
, zname varchar
);
INSERT INTO bobjohn(id, zname) VALUES
(1,'Bob') ,(1, NULL) ,(1, NULL)
,(2,'John') ,(2, NULL) ,(2, NULL)
;
UPDATE bobjohn dst
SET zname = src.zname
FROM bobjohn src
WHERE dst.id = src.id
AND dst.zname IS NULL
AND src.zname IS NOT NULL
;
SELECT * FROM bobjohn;
NOTE: this query will fail if more than one name exists for a given Id. (and it won't touch records for which no non-null name exists)
If you are on a postgres version >-9, you could use a CTE to fetch the source tuples (this is equivalent to a subquery, but is easier to write and read (IMHO). The CTE also tackles the duplicate values-problem (in a rather crude way):
--
-- CTE's dont work in update queries for Postgres version below 9
--
WITH uniq AS (
SELECT DISTINCT id
-- if there are more than one names for a given Id: pick the lowest
, min(zname) as zname
FROM bobjohn
WHERE zname IS NOT NULL
GROUP BY id
)
UPDATE bobjohn dst
SET zname = src.zname
FROM uniq src
WHERE dst.id = src.id
AND dst.zname IS NULL
;
SELECT * FROM bobjohn;
UPDATE tbl
SET name = x.name
FROM (
SELECT DISTINCT ON (id) id, name
FROM tbl
WHERE name IS NOT NULL
ORDER BY id, name
) x
WHERE x.id = tbl.id
AND tbl.name IS NULL;
DISTINCT ON does the job alone. Not need for additional aggregation.
In case of multiple values for name, the alphabetically first one (according to the current locale) is picked - that's what the ORDER BY id, name is for. If name is unambiguous you can omit that line.
Also, if there is at least one non-null value per id, you can omit WHERE name IS NOT NULL.
If you know for a fact that there are no conflicting values (multiple rows with the same ID but different, non-null names) then something like this will update the table appropriately:
UPDATE some_table AS t1
SET name = (
SELECT name
FROM some_table AS t2
WHERE t1.id = t2.id
AND name IS NOT NULL
LIMIT 1
)
WHERE name IS NULL;
If you only want to query the table and have this information filled in on the fly, you can use a similar query:
SELECT
t1.id,
(
SELECT name
FROM some_table AS t2
WHERE t1.id = t2.id
AND name IS NOT NULL
LIMIT 1
) AS name
FROM some_table AS t1;