I have a table like this:
ID | Flag
-----------
1 | True
1 | True
1 | NULL
1 | True
1 | NULL
2 | False
2 | False
2 | False
2 | NULL
2 | NULL
And I want an output like this:
ID | Flag
-----------
1 | True
1 | True
1 | True
1 | True
1 | True
2 | False
2 | False
2 | False
2 | False
2 | False
I want to replace nulls with the value assigned in different records. Is there a way to do it in a single update statement?
One option uses a correlated subquery:
update mytable t
set flag = (select bool_or(flag) from mytable t1 where t1.id = t.id)
Demo on DB Fiddle:
id | flag
-: | :---
1 | t
1 | t
1 | t
1 | t
1 | t
2 | f
2 | f
2 | f
2 | f
2 | f
You can also use exists:
update t
set flag = exists (select 1 from t t2 where t2.id = t.id and t2.flag);
The advantage of exists over a subquery with aggregation is performance: the query can stop at the first row where flag is true. This is a simple index lookup on an index on (id, flag).
Performance would be more improved by limiting the number of rows being updated. That actually suggests two separate statements:
update t
set flag = true
where (flag is null or not flag) and
exists (select 1 from t t2 where t2.id = t.id and t2.flag);
update t
set flag = false
where (flag is null or flag) and
not exists (select 1 from t t2 where t2.id = t.id and not t2.flag);
These could be combined into a single (more complicated) statement, but the sets being updated are disjoint. This limits the updates to the rows that need to be updated, as well as limiting the subquery to a simple lookup (assuming an index on (id, flag)).
The answers provided satisfy your sample data, but may still leave you short of a satisfactory answer. That is because your sample data is missing a couple significant sets. What happens if you had the following, either instead of or in addition to your current sample data?
+----+-------+
| id | flag |
+----+-------+
| 3 | true |
| 3 | false |
| 3 | null |
| 4 | null |
| 4 | null |
+----+-------+
The answer could be significantly different.
Assuming (like your sample data suggests):
There can never be the same id with true and false in the set. Else, you'd have to define what to do.
null values remain unchanged if there is no non-null value for the same id.
This should give you best performance:
UPDATE tbl t
SET flag = t1.flag
FROM (
SELECT DISTINCT ON (id)
id, flag
FROM tbl
ORDER BY id, flag
) t1 -- avoid repeated computation for same id
WHERE t.id = t1.id
AND t.flag IS NULL -- avoid costly no-op updates
AND t1.flag IS NOT NULL; -- avoid costly no-op updates;
db<>fiddle here
The subquery t1 distills target values per id once.
SELECT DISTINCT ON (id)
id, flag
FROM tbl
ORDER BY id, flag;
Since null sorts last, it effectively grabs the first non-null value per id. false sorts before true, but that has no bearing on the case as there can never be both for the same id. See:
Sort NULL values to the end of a table
Select first row in each GROUP BY group?
If you have many rows per id, there are faster techniques:
Optimize GROUP BY query to retrieve latest row per user
The added conditions in the outer query prevent all no-op updates from happening, thus avoiding major cost. Only rows are updated where a null value actually changes. See:
How do I (or can I) SELECT DISTINCT on multiple columns?
Related
I am stuck for some time in this. Imagine that I have this table:
diagId | astigmatic
1 | No
1 | Yes
2 | No
3 | No
4 | No
5 | No
5 | Yes
6 | No
And I want the output:
diagId | astigmatic
1 | Yes
2 | No
3 | No
4 | No
5 | Yes
6 | No
So if there is a diagId with Yes and No, I want the Yes tuple to pervail and the No tuple to disappear.
How can I achieve this?
Thanks!
One method is aggregation:
select diagid, max(astigmatic) as astigmatic
from t
group by diagid;
This works because 'yes' > 'no'.
Or, a conceptually similar method but one that is probably faster in Postgres:
select distinct on (diagid) t.*
from t
order by diagid, astigmatic desc;
Another approach is or and not exists:
select t.*
from t
where t.astigmatic = 'yes' or
(t.astigmatic = 'no' and
not exists (select 1
from t t2
where t2.id = t.id and
t2.astigmatic = 'yes'
)
);
The first two method return one row per id -- guaranteed. This last method could return multiple rows, if there are multiple 'yes's or 'no's for a given id.
I'm not certain how to describe my problem in words so I've created an illustration to help.
|-------------------------------------------|
| version_table (many-to-many) |
|-------------------------------------------|
| version_id | a_id | b_id | operation_type |
|------------|------|------|----------------|
| 1 | 1 | 1 | INSERT |
| 1 | 1 | 2 | INSERT |
| 2 | 1 | 1 | DELETE |
| 3 | 1 | 2 | DELETE |
|------------|------|------|----------------|
In this table querying for each version would yield these results:
Version 1 should return two rows (obvious because of the inserts).
Version 2 should return one row (less obvious but the row exists until a DELETE operation has been called).
Version 3 should return zero rows (all rows cleared by the previous DELETE operations).
Its obvious is that we need to fetch all of the rows that were inserted before or on the supplied version.
WHERE table.version_id <= :VERSION
But whats not obvious is how we exclude rows that have been "DELETED".
AND table.version_id > alias.version_id AND alias.operation_type = "DELETE"
This is the query I ended up writing:
SELECT tag.id AS tag_id, tag.name AS tag_name
FROM tag
JOIN article_tag_version ON article_tag_version.tag_id = tag.id
LEFT OUTER JOIN article_tag_version AS article_tag_version_1 ON
article_tag_version_1.tag_id = tag.id AND
article_tag_version_1.operation_type = "DELETE"
WHERE article_tag_version.version_id <= ? AND article_tag_version.version_id > article_tag_version_1.version_id
...but it doesn't return the results in the way I expect (no results).
You can check with a NOT EXISTS if the "thing" (whatever it is) has been deleted in a version between the version it was inserted and the target version.
SELECT *
FROM version_table v1
WHERE v1.operation_type = 'INSERT'
AND NOT EXISTS (SELECT *
FROM version_table v2
WHERE v2.version_id >= v1.version_id
AND v2.version_id <= :VERSION
AND (v2.a_id,
v2.b_id) = (v1.a_id,
v1.b_id)
AND v2.operation_type = 'DELETE')
AND v1.version_id <= :VERSION;
SQL Fiddle
I would just select the most recent record for each a_id/b_id pair, then filter out the ones that are deleted:
select atv.*
from (select distinct on (a_id, b_id) atv.*
from article_tag_version atv
where version <= ? -- the version you care about
order by a_id, b_id, version desc
) atv
where operation_type <> 'DELETE';
In the table below.. I am supposed to retrieve all row where the deleted is false and disabled is true and a distinct phrase.. If the phrase isn't the only one in the table (for example the "bad" word).. I must return the one with the device_id.. If it is only one in the table, I must return it even if the device_id is blank..
id | device_id | phrase | disabled | deleted |
----+-----------+---------+----------+---------+
2 | 1 | WTF | f | f |
3 | 1 | White | f | f |
4 | | WTF | f | f |
5 | | wTf | f | f |
6 | 2 | fck | f | f |
7 | 1 | damn | f | f |
8 | 1 | bitch | f | f |
9 | 1 | crap | f | f |
1 | 1 | Shit | t | t |
10 | 1 | ass | f | f |
11 | | bad | f | f |
12 | 1 | bad | t | f |
13 | 1 | badshit | f | f |
What I've done is this query and returns what I've expected.. (for example, the return is only 1 "bad" word with device_id = 1)
select distinct on (phrase) id, device_id, phrase, disabled, deleted
from filter
where phrase like '%' and deleted = false and
(device_id is null or device_id = 1)
order by phrase;
But when add a keyword search for example the "bad"..
select distinct on (phrase) id, device_id, phrase, disabled, deleted
from filter
where phrase like '%bad%' and deleted = false and
(device_id is null or device_id = 1)
order by phrase;
The return is "badshit" (ok) and "bad" (but the device_id is null).. My expected is that the "bad" word's device_id is 1..
I'm kind of new to postgresql.. Thanks!
I already fixed this error 9 months ago but was too busy to post it here.
Here's my answer:
order by phrase, device_id
either:
select distinct on (phrase) id, device_id, phrase, disabled, deleted
from filter
where phrase like '%bad%' and deleted = false and
(device_id is not null)
order by phrase;
or:
select distinct on (phrase) id, device_id, phrase, disabled, deleted
from filter
where phrase = 'bad' and deleted = false and
(device_id is null or device_id = 1)
order by phrase;
first if you want to only retrieve records without null values in device. second if you want to retrieve records with exact phrase bad.
where phrase like '%bad%'
specifically asks postgres to return both bad and bad****, because they are both 'like' bad.
On another note, clean up your post before asking for help.
Nevermind, I fixed it by adding device_id:
order by phrase;
into
order by phrase, device_id;
DISTINCT ON ( expression [, ...] ) keeps only the first row of each set of rows where the given expressions evaluate to equal. The DISTINCT ON expressions are interpreted using the same rules as for ORDER BY (see above). Note that the "first row" of each set is unpredictable unless ORDER BY is used to ensure that the desired row appears first. For example:
SELECT DISTINCT ON (location) location, time, report
FROM weather_reports
ORDER BY location, time DESC;
retrieves the most recent weather report for each location. But if we had not used ORDER BY to force descending order of time values for each location, we'd have gotten a report from an unpredictable time for each location.
The DISTINCT ON expression(s) must match the leftmost ORDER BY expression(s). The ORDER BY clause will normally contain additional expression(s) that determine the desired precedence of rows within each DISTINCT ON group
for your case use below code as you want device_id=1
select distinct on (phrase) phrase, id, device_id, disabled, deleted
from filter
where phrase like '%bad%' and deleted = false and
device_id = 1
order by phrase,device_id;
I have used window functions in my query to sum my rows according to value in combination of rows. Now If 1 row contains null then I have to consider it as false what should i do? I had tried adding coalesce(atg.flag,false) in partition but it didn't work.
coalesce is the way, here is an example:
t=# with dset(i,bool) as (values(1,true),(2,false),(3,null))
select i, bool::text, count(1) over (partition by coalesce(bool,false))
from dset;
i | bool | count
---+-------+-------
2 | false | 2
3 | | 2
1 | true | 1
(3 rows)
as you can see count =2 for null and false and =1 for true
My context is PostgreSQL 8.3
I need to speed up this query as both tables have millions of records.
For each row in table Calls, there are two rows in Trunks table. For every call_id, I want to copy value from trunks.trunk to calls.orig_trunk when trunk_id is the lowest trunk_id of the two rows. ...And copy value from trunks.trunk to calls.orig_trunk when trunk_id is the highest trunk_id of the two rows.
initial content of Table Calls:
Call_ID | dialed_number | orig_trunk | dest_trunk
--------|---------------|------------|-----------
1 | 5145551212 | null | null
2 | 8883331212 | null | null
3 | 4164541212 | null | null
Table Trunks:
Call_ID | trunk_id | trunk
--------|----------|-------
1 | 1 | 116
1 | 2 | 9
2 | 3 | 168
2 | 4 | 3
3 | 5 | 124
3 | 6 | 9
final content of Table Calls:
Call_ID | dialed_number | orig_trunk| dest_trunk
--------|---------------|-----------|----------
1 | 5145551212 | 116 | 9
2 | 8883331212 | 168 | 3
3 | 4164541212 | 124 | 9
I have created index for every column.
update calls set orig_trunk = t2.trunk
from ( select call_id,trunk_id from trunks
order by trunk_id ASC ) as t2
where (calls.call_id=t2.call_id );
update calls set dest_trunk = t2.trunk
from ( select call_id,trunk_id from trunks
order by trunk_id DESC ) as t2
where (calls.call_id=t2.call_id );
Any ideas ?
This is the final code with test conditions as comments.
The subquery is very efficient and rapid. However the test revealed that partitionning the table will have a greater impact on execution time than efficiency of the subquery. On a table of 1 million rows, the update takes 80 seconds. On a table of 12 millions rows, the update takes 580 seconds.
update calls1900 set orig_trunk = a.orig_trunk, dest_trunk = a.dest_trunk
from (select
x.call_id,
t1.trunk as orig_trunk, t2.trunk as dest_trunk
from (select calls1900.call_id
,min(t.trunk_id) as orig_trunk_id
,max(t.trunk_id) as dest_trunk_id
from calls1900
join trunks t on (t.call_id = calls1900.call_id)
-- where calls1900.call_id between 43798930 and 43798950
group by calls1900.call_id
) x
join trunks t1 on (t1.trunk_id = x.orig_trunk_id)
join trunks t2 on (t2.trunk_id = x.dest_trunk_id)
) a
where (calls1900.call_id = a.call_id); -- and (calls1900.call_id between 43798930 and 43798950)<code>
From the example posted, it looks like many unnecessary updates are being performed. Here is an example of a query to get the results you are looking for:
select distinct c.call_id, c.dialed_number
,first_value(t.trunk) over w as orig_trunk
,last_value(t.trunk) over w as dest_trunk
from calls c
join trunks t on (t.call_id = c.call_id)
window w as (partition by c.call_id
order by trunk_id
range between unbounded preceding
and unbounded following
)
There are other ways to do it without the analytic function, for example:
select x.call_id
,x.dialed_number
,t1.trunk as orig_trunk
,t2.trunk as dest_trunk
from (select c.call_id, c.dialed_number
,min(t.trunk_id) as orig_trunk_id
,max(t.trunk_id) as dest_trunk_id
from calls c
join trunks t on (t.call_id = c.call_id)
group by c.call_id, c.dialed_number
) x
join trunks t1 on (t1.trunk_id = x.orig_trunk_id)
join trunks t2 on (t2.trunk_id = x.dest_trunk_id)
Experiment to see what works best in your situation. Probably want to be indexed on the joining columns.
What to do with the result set is dependent on the nature of the application. Is this a one off? Then why not just create a new table from the result set:
CREATE TABLE trunk_summary AS
SELECT ...
Is it constantly changing? Is it frequently accessed? Is it sufficient to just create a view? Or maybe an update is to be performed based on the result set. Maybe a range can be updated at a time. It really depends, but this might give a start.