In postgresql, how can I fill in missing values within a column? - sql

I'm trying to figure out how to fill in values that are missing from one column with the non-missing values from other rows that have the same value on a given column. For instance, in the below example, I'd want all the "1" values to be equal to Bob and all of the "2" values to be equal to John
ID # | Name
-------|-----
1 | Bob
1 | (null)
1 | (null)
2 | John
2 | (null)
2 | (null)
`
EDIT: One caveat is that I'm using postgresql 8.4 with Greenplum and so correlated subqueries are not supported.

CREATE TABLE bobjohn
( ID INTEGER NOT NULL
, zname varchar
);
INSERT INTO bobjohn(id, zname) VALUES
(1,'Bob') ,(1, NULL) ,(1, NULL)
,(2,'John') ,(2, NULL) ,(2, NULL)
;
UPDATE bobjohn dst
SET zname = src.zname
FROM bobjohn src
WHERE dst.id = src.id
AND dst.zname IS NULL
AND src.zname IS NOT NULL
;
SELECT * FROM bobjohn;
NOTE: this query will fail if more than one name exists for a given Id. (and it won't touch records for which no non-null name exists)
If you are on a postgres version >-9, you could use a CTE to fetch the source tuples (this is equivalent to a subquery, but is easier to write and read (IMHO). The CTE also tackles the duplicate values-problem (in a rather crude way):
--
-- CTE's dont work in update queries for Postgres version below 9
--
WITH uniq AS (
SELECT DISTINCT id
-- if there are more than one names for a given Id: pick the lowest
, min(zname) as zname
FROM bobjohn
WHERE zname IS NOT NULL
GROUP BY id
)
UPDATE bobjohn dst
SET zname = src.zname
FROM uniq src
WHERE dst.id = src.id
AND dst.zname IS NULL
;
SELECT * FROM bobjohn;

UPDATE tbl
SET name = x.name
FROM (
SELECT DISTINCT ON (id) id, name
FROM tbl
WHERE name IS NOT NULL
ORDER BY id, name
) x
WHERE x.id = tbl.id
AND tbl.name IS NULL;
DISTINCT ON does the job alone. Not need for additional aggregation.
In case of multiple values for name, the alphabetically first one (according to the current locale) is picked - that's what the ORDER BY id, name is for. If name is unambiguous you can omit that line.
Also, if there is at least one non-null value per id, you can omit WHERE name IS NOT NULL.

If you know for a fact that there are no conflicting values (multiple rows with the same ID but different, non-null names) then something like this will update the table appropriately:
UPDATE some_table AS t1
SET name = (
SELECT name
FROM some_table AS t2
WHERE t1.id = t2.id
AND name IS NOT NULL
LIMIT 1
)
WHERE name IS NULL;
If you only want to query the table and have this information filled in on the fly, you can use a similar query:
SELECT
t1.id,
(
SELECT name
FROM some_table AS t2
WHERE t1.id = t2.id
AND name IS NOT NULL
LIMIT 1
) AS name
FROM some_table AS t1;

Related

Postgresql update column based on set of values from another table

Dummy data to illustrate my problem:
create table table1 (category_id int,unit varchar,is_valid bool);
insert into table1 (category_id, unit, is_valid)
VALUES (1, 'a', true), (2, 'z', true);
create table table2 (category_id int,unit varchar);
insert into table2 (category_id, unit)
values(1, 'a'),(1, 'b'),(1, 'c'),(2, 'd'),(2, 'e');
So the data looks like:
Table 1:
category_id
unit
is_valid
1
a
true
2
z
true
Table 2:
category_id
unit
1
a
1
b
1
c
2
d
2
e
I want to update the is_valid column in Table 1, if the category_id/unit combination from Table 1 doesn't match any of the rows in Table 2. For example, the first row in Table 1 is valid, since (1, a) is in Table 2. However, the second row in Table 1 is not valid, since (2, z) is not in Table 2.
How can I update the column using postgresql? I tried a few different where clauses of the form
UPDATE table1 SET is_valid = false WHERE...
but I cannot get a WHERE clause that works how I want.
You can just set the value of is_valid the the result of a ` where exists (select ...). See Demo.
update table1 t1
set is_valid = exists (select null
from table2 t2
where (t2.category_id, t2.unit) = (t1.category_id, t1.unit)
);
NOTES:
Advantage: Query correctly sets the is_valid column regardless of the current value and is a vary simple query.
Disadvantage: Query sets the value of is_valid for every row in the table; even thoes already correctly set.
You need to decide whether the disadvantage out ways the advantage. If so then the same basic technique in a much more complicated query:
with to_valid (category_id, unit, is_valid) as
(select category_id
, unit
, exists (select null
from table2 t2
where (t2.category_id, t2.unit) = (t1.category_id, t1.unit)
)
from table1 t1
)
update table1 tu
set is_valid = to_valid.is_valid
from to_valid
where (tu.category_id, tu.unit) = (to_valid.category_id, to_valid.unit)
and tu.is_valid is distinct from to_valid.is_valid;

SQL Select Where Opposite Match Does Not Exist

Trying to compare between two columns and check if there are no records that exist with the reversal between those two columns. Other Words looking for instances where 1-> 3 exists but 3->1 does not exist. If 1->2 and 2->1 exists we will still consider 1 to be part of the results.
Table = Betweens
start_id | end_id
1 | 2
2 | 1
1 | 3
1 would be added since it is a start to an end with no opposite present of 3,1. Though it did not get added until the 3rd entry since 1 and 2 had an opposite.
So, eventually it will just return names where the reversal does not exist.
I then want to join another table where the number from the previous problem has its name installed on it.
Table = Names
id | name
1 | Mars
2 | Earth
3 | Jupiter
So results will just be the names of those that don't have an opposite.
You can use a not exists condition:
select t1.start_id, t1.end_id
from the_table t1
where not exists (select *
from the_table t2
where t2.end_id = t1.start_id
and t2.start_id = t1.end_id);
I'm not sure about your data volume, so with your ask, below query will supply desired result for you in Sql Server.
create table TableBetweens
(start_id INT,
end_id INT
)
INSERT INTO TableBetweens VALUES(1,2)
INSERT INTO TableBetweens VALUES(2,1)
INSERT INTO TableBetweens VALUES(1,3)
create table TableNames
(id INT,
NAME VARCHAR(50)
)
INSERT INTO TableNames VALUES(1,'Mars')
INSERT INTO TableNames VALUES(2,'Earth')
INSERT INTO TableNames VALUES(3,'Jupiter')
SELECT *
FROM TableNames c
WHERE c.id IN (
SELECT nameid1.nameid
FROM (SELECT a.start_id, a.end_id
FROM TableBetweens a
LEFT JOIN TableBetweens b
ON CONCAT(a.start_id,a.end_id) = CONCAT(b.end_id,b.start_id)
WHERE b.end_id IS NULL
AND b.start_id IS NULL) filterData
UNPIVOT
(
nameid
FOR id IN (filterData.start_id,filterData.end_id)
) AS nameid1
)

Oracle -- Update the exact column referenced in the ON clause

I think this requirement is rarely encountered so I couldn't search for similar questions.
I have a table that needs to update the ID. For example ID 123 in table1 is actually supposed to be 456. I have a separate reference table built that stores the mapping (e.g. old 123 maps to new id 456).
I used the below query but apparently it returned error 38104, columns referenced in the ON clause cannot be updated.
MERGE INTO table1
USING ref_table ON (table1.ID = ref_table.ID_Old)
WHEN MATCHED THEN UPDATE SET table.ID = ref_table.ID_New;
Is there other way to achieve my purpose?
Thanks and much appreciated for your answer!
Use the ROWID pseudocolumn:
SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE TABLE1( ID ) AS
SELECT 1 FROM DUAL UNION ALL
SELECT 2 FROM DUAL UNION ALL
SELECT 3 FROM DUAL;
CREATE TABLE REF_TABLE( ID_OLD, ID_NEW ) AS
SELECT 1, 4 FROM DUAL UNION ALL
SELECT 2, 5 FROM DUAL;
MERGE INTO TABLE1 dst
USING ( SELECT t.ROWID AS rid,
r.id_new
FROM TABLE1 t
INNER JOIN REF_TABLE r
ON ( t.id = r.id_old ) ) src
ON ( dst.ROWID = src.RID )
WHEN MATCHED THEN
UPDATE SET id = src.id_new;
Query 1:
SELECT * FROM table1
Results:
| ID |
|----|
| 4 |
| 5 |
| 3 |
You can't update a column used in the ON clause in a MERGE. But if you don't need to make other changes that MERGE allows like WHEN NOT MATCHED or deleting, etc. you can just use a UPDATE to achieve this.
You mentioned this is an ID that needs an update. Here's an example using a scalar subquery. As it is an ID, this presumes UNIQUE ID_OLD values in REF_TABLE. I wasn't sure if Every row needs an update or only a sub-set, so set the update here to only update rows that have a value in REF_TABLE.
CREATE TABLE TABLE1(
ID NUMBER
);
CREATE TABLE REF_TABLE(
ID_OLD NUMBER,
ID_NEW NUMBER
);
INSERT INTO TABLE1 VALUES (1);
INSERT INTO TABLE1 VALUES (2);
INSERT INTO TABLE1 VALUES (100);
INSERT INTO REF_TABLE VALUES (1,10);
INSERT INTO REF_TABLE VALUES (2,20);
Initial State:
SELECT * FROM TABLE1;
ID
1
2
100
Then make the UPDATE
UPDATE TABLE1
SET TABLE1.ID = (SELECT REF_TABLE.ID_NEW
FROM REF_TABLE
WHERE REF_TABLE.ID_OLD = ID)
WHERE TABLE1.ID IN (SELECT REF_TABLE.ID_OLD
FROM REF_TABLE);
2 rows updated.
And check the change:
SELECT * FROM TABLE1;
ID
10
20
100

How select values where all columns are null for particular ID, ID is not unique

I have a table with following format and I want to get the LotId if Value1 is null for all the rows.
Now If I am doing Select,
Select * from Table1 where Value1 IS null , I am getting back a row .
But I want nothing should be returned as there are two rows which have some value.
I thought of self join , but this can have n number of rows.
Id LotId Value1
-------------------------------------------------
1 LOt0065 NULL
2 LOt0065 SomeValue
3 LOt0065 SomeValue
I think you'll need to use an EXISTS subquery here:
SELECT a.lotid
FROM table1 a
WHERE NOT EXISTS (
SELECT 1
FROM table1 b
WHERE b.lotid = a.lotid
AND b.value1 IS NOT NULL
);
If my syntax is right, then this will show you all records that don't have any NULL values for that lotid:
It uses a SELECT 1 because the subquery doesn't need to show any value, it just needs to match on the outer query.
You compare the table in the inner query to the table in the outer query and match on the common field you're looking at (lotid in this case)
This could also be done with a NOT IN clause.
Does this give you the result you want?

Tricky MS Access SQL query to remove surplus duplicate records

I have an Access table of the form (I'm simplifying it a bit)
ID AutoNumber Primary Key
SchemeName Text (50)
SchemeNumber Text (15)
This contains some data eg...
ID SchemeName SchemeNumber
--------------------------------------------------------------------
714 Malcolm ABC123
80 Malcolm ABC123
96 Malcolms Scheme ABC123
101 Malcolms Scheme ABC123
98 Malcolms Scheme DEF888
654 Another Scheme BAR876
543 Whatever Scheme KJL111
etc...
Now. I want to remove duplicate names under the same SchemeNumber. But I want to leave the record which has the longest SchemeName for that scheme number. If there are duplicate records with the same longest length then I just want to leave only one, say, the lowest ID (but any one will do really). From the above example I would want to delete IDs 714, 80 and 101 (to leave only 96).
I thought this would be relatively easy to achieve but it's turning into a bit of a nightmare! Thanks for any suggestions. I know I could loop it programatically but I'd rather have a single DELETE query.
See if this query returns the rows you want to keep:
SELECT r.SchemeNumber, r.SchemeName, Min(r.ID) AS MinOfID
FROM
(SELECT
SchemeNumber,
SchemeName,
Len(SchemeName) AS name_length,
ID
FROM tblSchemes
) AS r
INNER JOIN
(SELECT
SchemeNumber,
Max(Len(SchemeName)) AS name_length
FROM tblSchemes
GROUP BY SchemeNumber
) AS w
ON
(r.SchemeNumber = w.SchemeNumber)
AND (r.name_length = w.name_length)
GROUP BY r.SchemeNumber, r.SchemeName
ORDER BY r.SchemeName;
If so, save it as qrySchemes2Keep. Then create a DELETE query to discard rows from tblSchemes whose ID value is not found in qrySchemes2Keep.
DELETE
FROM tblSchemes AS s
WHERE Not Exists (SELECT * FROM qrySchemes2Keep WHERE MinOfID = s.ID);
Just beware, if you later use Access' query designer to make changes to that DELETE query, it may "helpfully" convert the SQL to something like this:
DELETE s.*, Exists (SELECT * FROM qrySchemes2Keep WHERE MinOfID = s.ID)
FROM tblSchemes AS s
WHERE (((Exists (SELECT * FROM qrySchemes2Keep WHERE MinOfID = s.ID))=False));
DELETE FROM Table t1
WHERE EXISTS (SELECT 1 from Table t2
WHERE t1.SchemeNumber = t2.SchemeNumber
AND Length(t2.SchemeName) > Length(t1.SchemeName)
)
Depend on your RDBMS you may use function different from Length (Oracle - length, mysql - length, sql server - LEN)
delete ShortScheme
from Scheme ShortScheme
join Scheme LongScheme
on ShortScheme.SchemeNumber = LongScheme.SchemeNumber
and (len(ShortScheme.SchemeName) < len(LongScheme.SchemeName) or (len(ShortScheme.SchemeName) = len(LongScheme.SchemeName) and ShortScheme.ID > LongScheme.ID))
(SQL Server flavored)
Now updated to include the specified tie resolution. Although, you may get better performance doing it in two queries: first deleting the schemes with shorter names as in my original query and then going back and deleting the higher ID where there was a tie in name length.
I'd do this in multiple steps. Large delete operations done in a single step make me too nervous -- what if you make a mistake? There's no sql 'undo' statement.
-- Setup the data
DROP Table foo;
DROP Table bar;
DROP Table bat;
DROP Table baz;
CREATE TABLE foo (
id int(11) NOT NULL,
SchemeName varchar(50),
SchemeNumber varchar(15),
PRIMARY KEY (id)
);
insert into foo values (714, 'Malcolm', 'ABC123' );
insert into foo values (80, 'Malcolm', 'ABC123' );
insert into foo values (96, 'Malcolms Scheme', 'ABC123' );
insert into foo values (101, 'Malcolms Scheme', 'ABC123' );
insert into foo values (98, 'Malcolms Scheme', 'DEF888' );
insert into foo values (654, 'Another Scheme ', 'BAR876' );
insert into foo values (543, 'Whatever Scheme ', 'KJL111' );
-- Find all the records that have dups, find the longest one
create table bar as
select max(length(SchemeName)) as max_length, SchemeNumber
from foo
group by SchemeNumber
having count(*) > 1;
-- Find the one we want to keep
create table bat as
select min(a.id) as id, a.SchemeNumber
from foo a join bar b on a.SchemeNumber = b.SchemeNumber
and length(a.SchemeName) = b.max_length
group by SchemeNumber;
-- Select into this table all the rows to delete
create table baz as
select a.id from foo a join bat b where a.SchemeNumber = b.SchemeNumber
and a.id != b.id;
This will give you a new table with only records for rows that you want to remove.
Now check these out and make sure that they contain only the rows you want deleted. This way you can make sure that when you do the delete, you know exactly what to expect. It should also be pretty fast.
Then when you're ready, use this command to delete the rows using this command.
delete from foo where id in (select id from baz);
This seems like more work because of the different tables, but it's safer probably just as fast as the other ways. Plus you can stop at any step and make sure the data is what you want before you do any actual deletes.
If your platform supports ranking functions and common table expressions:
with cte as (
select row_number()
over (partition by SchemeNumber order by len(SchemeName) desc) as rn
from Table)
delete from cte where rn > 1;
try this:
Select * From Table t
Where Len(SchemeName) <
(Select Max(Len(Schemename))
From Table
Where SchemeNumber = t.SchemeNumber )
And Id >
(Select Min (Id)
From Table
Where SchemeNumber = t.SchemeNumber
And SchemeName = t.SchemeName)
or this:,...
Select * From Table t
Where Id >
(Select Min(Id) From Table
Where SchemeNumber = t.SchemeNumber
And Len(SchemeName) <
(Select Max(Len(Schemename))
From Table
Where SchemeNumber = t.SchemeNumber))
if either of these selects the records that should be deleted, just change it to a delete
Delete
From Table t
Where Len(SchemeName) <
(Select Max(Len(Schemename))
From Table
Where SchemeNumber = t.SchemeNumber )
And Id >
(Select Min (Id)
From Table
Where SchemeNumber = t.SchemeNumber
And SchemeName = t.SchemeName)
or using the second construction:
Delete From Table t Where Id >
(Select Min(Id) From Table
Where SchemeNumber = t.SchemeNumber
And Len(SchemeName) <
(Select Max(Len(Schemename))
From Table
Where SchemeNumber = t.SchemeNumber))