SQL selecting first record per group

SQL selecting first record per group - sql

I have a table that looks like this:
CREATE TABLE UTable (
m_id TEXT PRIMARY KEY,
u1 TEXT,
u2 TEXT,
u3 TEXT,
-- other stuff, as well as
gid INTEGER,
gt TEXT,
d TEXT,
timestamp TIMESTAMP
);
CREATE TABLE OTable (
gid INTEGER,
gt TEXT,
d TEXT,
-- other stuff, such as
n INTEGER
);
CREATE UNIQUE INDEX OTable_idx ON OTable (gid, gt, d);
For each record in OTable that matches a condition (fixed values of gid, gt), I want to join the corresponding record in UTable with the minimum timestamp.
What's catching me is that in my final result I don't care about the timestamp, I clearly need to group on d (since gid and gt are fixed), and yet I do need to extract u1, u2, u3 from the selected record.
SELECT o.d, u.u1, u.u2, u.u3, o.n
FROM UTable u
INNER JOIN OTable o
ON u.gid = o.gid AND u.gt = o.gt AND u.d = o.d
WHERE u.gid = 3 AND u.gt = 'dog night'
GROUP BY u.d
-- and u.timestamp is the minimum for each group
;
I think my first step should be just to do the select on UTable and then I can join against that. But even there I'm a bit confused.
SELECT u.d, u.u1, u.u2, u.u3
FROM UTable u
WHERE u.gid = 3 AND u.gt = 'dog night';
I want to add HAVING MIN(u.timestamp), but that's not valid.
Any pointers as to what I need to do?
I did see this question, but it isn't quite what I need, since I can't group on all the UTable values lest I select too many things.

GROUP BY u.d (without also listing u1, u2, u3) would only work if u.d was the PRIMARY KEY (which it is not, and also wouldn't make sense in your scenario). See:
Is it possible to have an SQL query that uses AGG functions in this way?
I suggest DISTINCT ON in a subquery on UTable instead:
SELECT o.d, u.u1, u.u2, u.u3, o.n
FROM (
SELECT DISTINCT ON (u.d)
u.d, u.u1, u.u2, u.u3
FROM UTable u
WHERE u.gid = 3
AND u.gt = 'dog night'
ORDER BY u.d, u.timestamp
) u
JOIN OTable o USING (gid, gt, d);
See:
Select first row in each GROUP BY group?
If UTable is big, at least a multicolumn index on (gid, gt) is advisable. Same for OTable.
Maybe even on (gid, gt, d). Depends on data types, cardinalities, ...

Related

UPDATE on seemingly key preserving view in Oracle raises ORA-01779

Problem
I'm trying to refactor a low-performing MERGE statement to an UPDATE statement in Oracle 12.1.0.2.0. The MERGE statement looks like this:
MERGE INTO t
USING (
SELECT t.rowid rid, u.account_no_new
FROM t, u, v
WHERE t.account_no = u.account_no_old
AND t.contract_id = v.contract_id
AND v.tenant_id = u.tenant_id
) s
ON (t.rowid = s.rid)
WHEN MATCHED THEN UPDATE SET t.account_no = s.account_no_new
It is mostly low performing because there are two expensive accesses to the large (100M rows) table t
Schema
These are the simplified tables involved:
t The target table whose account_no column is being migrated.
u The migration instruction table containing a account_no_old → account_no_new mapping
v An auxiliary table modelling a to-one relationship between contract_id and tenant_id
The schema is:
CREATE TABLE v (
contract_id NUMBER(18) NOT NULL PRIMARY KEY,
tenant_id NUMBER(18) NOT NULL
);
CREATE TABLE t (
t_id NUMBER(18) NOT NULL PRIMARY KEY,
-- tenant_id column is missing here
account_no NUMBER(18) NOT NULL,
contract_id NUMBER(18) NOT NULL REFERENCES v
);
CREATE TABLE u (
u_id NUMBER(18) NOT NULL PRIMARY KEY,
tenant_id NUMBER(18) NOT NULL,
account_no_old NUMBER(18) NOT NULL,
account_no_new NUMBER(18) NOT NULL,
UNIQUE (tenant_id, account_no_old)
);
I cannot modify the schema. I'm aware that adding t.tenant_id would solve the problem by preventing the JOIN to v
Alternative MERGE doesn't work:
ORA-38104: Columns referenced in the ON Clause cannot be updated
Note, the self join cannot be avoided, because this alternative, equivalent query leads to ORA-38104:
MERGE INTO t
USING (
SELECT u.account_no_old, u.account_no_new, v.contract_id
FROM u, v
WHERE v.tenant_id = u.tenant_id
) s
ON (t.account_no = s.account_no_old AND t.contract_id = s.contract_id)
WHEN MATCHED THEN UPDATE SET t.account_no = s.account_no_new
UPDATE view doesn't work:
ORA-01779: cannot modify a column which maps to a non-key-preserved table
Intuitively, I would apply transitive closure here, which should guarantee that for each updated row in t, there can be only at most 1 row in u and in v. But apparently, Oracle doesn't recognise this, so the following UPDATE statement doesn't work:
UPDATE (
SELECT t.account_no, u.account_no_new
FROM t, u, v
WHERE t.account_no = u.account_no_old
AND t.contract_id = v.contract_id
AND v.tenant_id = u.tenant_id
)
SET account_no = account_no_new
The above raises ORA-01779. Adding the undocumented hint /*+BYPASS_UJVC*/ does not seem to work anymore on 12c.
How to tell Oracle that the view is key preserving?
In my opinion, the view is still key preserving, i.e. for each row in t, there is exactly one row in v, and thus at most one row in u. The view should thus be updatable. Is there any way to rewrite this query to make Oracle trust my judgement?
Or is there any other syntax I'm overlooking that prevents the MERGE statement's double access to t?

Is there any way to rewrite this query to make Oracle trust my judgement?
I've managed to "convince" Oracle to do MERGE by introducing helper column in target:
MERGE INTO (SELECT (SELECT t.account_no FROM dual) AS account_no_temp,
t.account_no, t.contract_id
FROM t) t
USING (
SELECT u.account_no_old, u.account_no_new, v.contract_id
FROM u, v
WHERE v.tenant_id = u.tenant_id
) s
ON (t.account_no_temp = s.account_no_old AND t.contract_id = s.contract_id)
WHEN MATCHED THEN UPDATE SET t.account_no = s.account_no_new;
db<>fiddle demo
EDIT
A variation of idea above - subquery moved directly to ON part:
MERGE INTO (SELECT t.account_no, t.contract_id FROM t) t
USING (
SELECT u.account_no_old, u.account_no_new, v.contract_id
FROM u, v
WHERE v.tenant_id = u.tenant_id
) s
ON ((SELECT t.account_no FROM dual) = s.account_no_old
AND t.contract_id = s.contract_id)
WHEN MATCHED THEN UPDATE SET t.account_no = s.account_no_new;
db<>fiddle demo2
Related article: Columns referenced in the ON Clause cannot be updated
EDIT 2:
MERGE INTO (SELECT t.account_no, t.contract_id FROM t) t
USING (SELECT u.account_no_old, u.account_no_new, v.contract_id
FROM u, v
WHERE v.tenant_id = u.tenant_id) s
ON((t.account_no,t.contract_id,'x')=((s.account_no_old,s.contract_id,'x')) OR 1=2)
WHEN MATCHED THEN UPDATE SET t.account_no = s.account_no_new;
db<>fiddle demo3

You may define a temporary table containing the pre-joined data from U and V.
Back it with a unique index on contract_id, account_no_old (which should be unique).
Then you may use this temporary table in an updateable join view.
create table tmp as
SELECT v.contract_id, u.account_no_old, u.account_no_new
FROM u, v
WHERE v.tenant_id = u.tenant_id;
create unique index tmp_ux1 on tmp ( contract_id, account_no_old);
UPDATE (
SELECT t.account_no, tmp.account_no_new
FROM t, tmp
WHERE t.account_no = tmp.account_no_old
AND t.contract_id = tmp.contract_id
)
SET account_no = account_no_new
;

Trying to do this with a simpler update. Still requires a subselect.
update t
set t.account_no = (SELECT u.account_no_new
FROM u, v
WHERE t.account_no = u.account_no_old
AND t.contract_id = v.contract_id
AND v.tenant_id = u.tenant_id);
Bobby

SELECT and JOIN column not in Group by function

I have to join two different table to get my result.
The table 'Resource' it is simple, while the table 'Dimension.[Code]' contains, among the others, a column with different values (i.e :
Code
SILO
GRADE
OTHER 1
OTHER2
This is the reason why a join twice that column to get two different columns called GRADE and SILO.
Now, I have a query that selects the maximum value of a grade within the group as follows:
`SELECT
R.[ID] -- If I inserted that here, it is not working obviously.
-- This cannot But this is the additional column I need (see later)
DD_SILO.[Value] DIR ,
max(R.[GRADE]) GRADE_DIR
FROM [Resource] R
LEFT JOIN
Dimension DD_SILO ON R.[ID] = DD_SILO.[ID] AND DD_SILO.[Code] = 'SILO'
group by DD_SILO.[Value]'
What I need is basically to have, beside GRADE AND SILO, also the ID name, which is contained into the [Resource] table.
Please notice that [Resource].ID = [Dimension].ID
I would have solved the problem with ROW_NUMBER () to select the highest within the group, avoiding then then 'group by', but as the query has to be inserted in a bigger one, that would take too much time to run. I am using Microsoft SQL Server 2016.

Could you use a virtual table something like: -
`
select
a.max_grade_silo,
a.max_grade_value,
(select max(r.id)
from [resource] r,
[dimension] d
where r.[ID] = d.[ID] and
d.[CODE]= 'SILO' and
r.[GRADE] = a.[max_grade_value]
),
max_grade_silo a
from
(SELECT
DD_SILO.[Value] DIR ,
max(R.[GRADE]) GRADE_DIR
FROM [Resource] R
LEFT JOIN
Dimension DD_SILO ON R.[ID] = DD_SILO.[ID] AND DD_SILO.[Code] = 'SILO'
group by DD_SILO.[Value]
) temp_result (max_grade_silo, max_grade_value)
'
Probably better to look at normalizing the tables?

SELECT
MAX(R.[ID]) as ID ,
DD_SILO.[Value] DIR ,
max(R.[GRADE]) GRADE_DIR
FROM [Resource] R
LEFT JOIN
Dimension DD_SILO ON R.[ID] = DD_SILO.[ID] AND DD_SILO.[Code] = 'SILO'
group by DD_SILO.[Value]

How to find the key for the minimum value in jsonb column of postgres?

I need to find the key of the minimum value in a jsonb object,I have found out minimum value, need to find the key of the same in the same query.
Query I am using
SELECT id,min((arr ->> 2)::numeric) AS custom_value
FROM (
SELECT id, jdoc
FROM table,
jsonb_each(column1) d (key, jdoc)
) sub,
jsonb_each(jdoc) doc (key, arr)
group by 1

This will do the job.
The left join ... on 1=1 is for keeping IDs with empty json
select t.id
,j.key
,j.value
from mytable t
left join lateral (select j.key,j.value
from jsonb_each(column1) as j
order by j.value
limit 1
) j
on 1=1

Save row in %ROWTYPE and delete it efficiently afterwards?

This is how I do it currently:
DECLARE tmp message%ROWTYPE;
BEGIN;
SELECT * INTO tmp FROM [...]
DELETE FROM message m WHERE m.id = tmp.id;
END;
I'm afraid that the db will do two queries here: One for doing the SELECT and one for the DELETE. In case this is true - can I make this more efficient somehow? After all the row that should be deleted was already found in the SELECT query.
N.b. I'm eventually storing something from the SELECT query and return it from the function. The above is just simplified.

delete from message m
using (
select *
from ...
) s
where m.id = s.id
returning s.*

For simple cases you do not even need a subquery.
If your secret first query only involves the same table:
DELETE FROM message m
WHERE m.id = <something>
RETURNING m.*; -- or what you need from the deleted row.
If your secret first query involves one or more additional tables:
DELETE FROM message m
USING some_tbl s
WHERE s.some_column = <something>
AND m.id = s.id
RETURNING m.id, s.foo; -- you didn't define ...
Solution for actual query (after comments)
An educated guess, to delete the oldest row (smallest timestamp) from each set with identical id:
DELETE FROM message m
USING (
SELECT DISTINCT ON (id)
id, timestamp
FROM message
WHERE queue_id = _queue_id
AND source_client_id = _source_client_id
AND (target_client_id IN (-1, _client_id))
ORDER BY id, timestamp
) sub
WHERE m.id = sub.id
AND m.timestamp = sub.timestamp
RETURNING m.content
INTO rv;
Or, if (id, timestamp) is UNIQUE, NOT EXISTS is probably faster:
DELETE FROM message m
WHERE queue_id = _queue_id
AND source_client_id = _source_client_id
AND target_client_id IN (-1, _client_id)
AND NOT EXISTS (
SELECT 1
FROM message
WHERE queue_id = _queue_id
AND source_client_id = _source_client_id
AND target_client_id IN (-1, _client_id)
AND id = m.id
AND timestamp < m.timestamp
) sub
WHERE m.id = sub.id
RETURNING m.content
INTO rv;
More about DISTINCT ON and selecting "the greatest" from each group:
Select first row in each GROUP BY group?
If performance is your paramount objective, look to the last chapter ...
Aside: timestamp is a basic type name in Postgres and a reserved word in standard SQL. Don't use it as identifier.
Solution in comment below, audited:
DELETE FROM message m
USING (
SELECT id
FROM message
WHERE queue_id = _queue_id
AND target_client_id IN (client_id, -1)
ORDER BY timestamp
LIMIT 1
) AS tmp
WHERE m.id = tmp.id
RETURNING m.content
INTO rv;
INTO ... only makes sense inside a plpgsql function, of course.
An index on (queue_id, timestamp) makes this fast - possibly even a partial index with the condition WHERE target_client_id IN (client_id, -1) (depends).

selecting latest rows per distinct foreign key value

excuse the title, i couldn't come up with something short and to the point...
I've got a table 'updates' with the three columns, text, typeid, created - text is a text field, typeid is a foreign key from a 'type' table and created is a timestamp. A user is entering an update and select the 'type' it corresponds too.
There's a corresponding 'type' table with columns 'id' and 'name'.
I'm trying to end up with a result set with as many rows as is in the 'type' table and the latest value from updates.text for the particular row in types. So if i've got 3 types, 3 rows would be returned, one row for each type and the most recent updates.text value for the type in question.
Any ideas?
thanks,
John.

select u.text, u.typeid, u.created, t.name
from (
select typeid, max(created) as MaxCreated
from updates
group by typeid
) mu
inner join updates u on mu.typeid = u.typeid and mu.MaxCreated = u.Created
left outer join type t on u.typeid = t.typeid

What are the actual columns you want returned?
SELECT t.*,
y.*
FROM TYPE t
JOIN (SELECT u.typeid,
MAX(u.created) 'max_created'
FROM UPDATES u
GROUP BY u.typeid) x ON x.typeid = t.id
JOIN UPDATES y ON y.typeid = x.typeid
AND y.created = x.max_created

SELECT
TYP.id,
TYP.name,
TXT.comment
FROM
dbo.Types TYP
INNER JOIN dbo.Type_Comments TXT ON
TXT.type_id = TYP.id
WHERE
NOT EXISTS
(
SELECT
*
FROM
dbo.Type_Comments TXT2
WHERE
TXT2.type_id = TYP.id AND
TXT2.created > TXT.created
)
Or:
SELECT
TYP.id,
TYP.name,
TXT.comment
FROM
dbo.Types TYP
INNER JOIN dbo.Type_Comments TXT ON
TXT.type_id = TYP.id
LEFT OUTER JOIN dbo.Type_Comments TXT2 ON
TXT2.type_id = TYP.id AND
TXT2.created > TXT.created
WHERE
TXT2.type_id IS NULL
In either case, if the created date can be identical between two rows with the same type_id then you would need to account for that.
I've also assumed at least one comment per type exists. If that's not the case then you would need to make a minor adjustment for that as well.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL selecting first record per group - sql

Related

UPDATE on seemingly key preserving view in Oracle raises ORA-01779

SELECT and JOIN column not in Group by function

How to find the key for the minimum value in jsonb column of postgres?

Save row in %ROWTYPE and delete it efficiently afterwards?

selecting latest rows per distinct foreign key value

Categories

Resources