Optimizing a query in PostgreSQL

Optimizing a query in PostgreSQL - sql

CREATE TABLE master.estado_movimiento_inventario
(
id integer NOT NULL,
eliminado boolean NOT NULL DEFAULT false,
fundamentacion text NOT NULL,
fecha timestamp without time zone NOT NULL,
id_empresa integer NOT NULL,
id_usuario integer NOT NULL,
id_estado integer NOT NULL,
id_movimiento integer NOT NULL,
se_debio_tramitar_hace bigint DEFAULT 0,
CONSTRAINT "PK15estadomovtec" PRIMARY KEY (id)
)
WITH (
OIDS=FALSE
);
ALTER TABLE master.estado_movimiento_inventario
OWNER TO postgres;
This table tracks the state of every inventory movement in my business logic. So, every movement that is not ended yet (there is not any id_estado =3 or id_estado=4 for any id_movimiento in the master.estado_movimiento_inventario table) must store in its last state's se_debio_tramitar_hace field the difference between now() and fecha field every time a scheduled task runs (Windows).
The query I built in order to do so is this:
with update_time as(
SELECT distinct on(id_movimiento) id
from master.estado_movimiento_inventario
where id_movimiento not in (
select id_movimiento
from master.estado_movimiento_inventario
where id_estado = 2 or id_estado=3
) order by id_movimiento, id desc
)
update master.estado_movimiento_inventario mi
set se_debio_tramitar_hace= EXTRACT(epoch FROM now()::timestamp - mi.fecha )/3600
where mi.id in (select id from update_time);
This works as expected but I suspect it is not optimal, specially in the update operation and here is my biggest doubt: what is optimal when performing this update operation:
Perform the updates as it currently does
or
The equivalent of postgresql function to this:
foreach(update_time.id as row_id){
update master.estado_movimiento_inventario mi set diferencia = now() - mi.fecha where mi.id=row_id;
}
Sorry if I am not explicit enough, I have not much experience working with databases, although I have understood the theory behind, yet not worked too much with it.
Edit
Please notice the id_estado is not unique per id_movimiento, just like the picture shows:

I think this improves the CTE:
with update_time as (
select id_movimiento, max(id) as max_id
from master.estado_movimiento_inventario
group by id_movimiento
having sum( (id_estado in (2, 3))::int ) = 0
)
update master.estado_movimiento_inventario mi
set diferencia = now() - mi.fecha
where mi.id in (select id from update_time);
If the last id would be the one in the "2" or "3" state, I would simply do:
update master.estado_movimiento_inventario mi
set diferencia = now() - mi.fecha
where mi.id = (select max(mi2.id)
from master.estado_movimiento_inventario mi
where mi2.id_movimiento = mi.id_movimiento
) and
mi.id_estado not in (2, 3);

Related

SQLite: Workaround for SQLite-TRIGGER with WITH

I'm working on a project to monitor downtimes of production lines with an embedded device. I want to automate acknowledging of these downtimes by generic rules the user can configure.
I want to use a TRIGGER but get a syntax error near UPDATE even though the documentation says it should be fine to use the WITH statement.
CREATE TRIGGER autoAcknowledge
AFTER UPDATE OF dtEnd ON ackGroups
FOR EACH ROW
WHEN old.dtEnd IS NULL AND new.dtEnd IS NOT NULL
BEGIN
WITH sub1(id, stationId, groupDur) AS (
SELECT MIN(d.id), d.station,
strftime('%s', ag.dtEnd) - strftime('%s', ag.dtStart)
FROM ackGroups AS ag
LEFT JOIN downtimes AS d on d.acknowledge = ag.id
WHERE ag.id = old.id
GROUP BY ag.id ),
sub2( originId, groupDur, reasonId, above, ruleDur) AS (
SELECT sub1.stationId, sub1.groupDur, aar.reasonId, aar.above, aar.duration
FROM sub1
LEFT JOIN autoAckStations AS aas ON aas.stationId = sub1.stationId
LEFT JOIN autoAckRules AS aar ON aas.autoAckRuleId = aar.id
ORDER BY duration DESC )
UPDATE ackGroups SET (reason, dtAck, origin)=(
SELECT reasonId, datetime('now'), originId
FROM sub2 as s
WHERE ( s.ruleDur < s.groupDur AND above = 1 ) OR (s.ruleDur > s.groupDur AND above = 0)
LIMIT 1
)
WHERE id = old.id;
END
Background: First we have the downtimes table. Each production line consists of multiple parts called stations. Each station can start the line downtime and they can overlap with other stations downtimes.
CREATE TABLE "downtimes" ("id" integer NOT NULL PRIMARY KEY AUTOINCREMENT,
"station" integer NOT NULL,
"acknowledge" integer,
"dtStart" datetime NOT NULL,
"dtEnd" datetime,
"dtLastModified" datetime)
Overlaping downtimes are grouped to acknowledge groups using TRIGGER AFTER INSERT on downtimes to set acknowledge id right or create a new group.
CREATE TABLE "ackGroups" ("id" integer NOT NULL PRIMARY KEY AUTOINCREMENT,
"reason" integer,
"dtAck" datetime,
"dtStart" datetime NOT NULL,
"dtEnd" datetime,
"line" integer NOT NULL,
"origin" integer)
The autoAckRules table represents the configuration. The user decides whether the rule should apply to durations higher or lower a certain value and which rasonId should be used to acknowledge.
CREATE TABLE "autoAckRules" ("id" integer NOT NULL PRIMARY KEY AUTOINCREMENT,
"description" text NOT NULL,
"reasonId" integer NOT NULL,
"above" bool NOT NULL,
"duration" real NOT NULL)
The autoAckStations table is used to manage M:N relationship. Each rule allow multiple stations which started the ackGroup.
CREATE TABLE autoAckStations (
autoAckRuleId INTEGER NOT NULL,
stationId INTEGER NOT NULL,
PRIMARY KEY ( autoAckRuleId, stationId )
)
When the last downtime ends dtEnd of ackGroups is set to datetime('now') and the trigger is fired to check if there is a autoAckRule that fits.
If I substitute the sub selects with a SELECT .. FROM( SELECT .. FROM(SELECT .. FROM ))) cascade
is there a nice way to avoid the need to write and evaluate it twice?
Or am I missing something stupid?

Common table expression are not supported for statements inside of triggers. You need to convert CTE to sub-query such as
CREATE TRIGGER autoAcknowledge
AFTER UPDATE OF dtEnd ON ackGroups
FOR EACH ROW
WHEN old.dtEnd IS NULL AND new.dtEnd IS NOT NULL
BEGIN
UPDATE ackGroups
SET (reason, dtAck, origin)= (
SELECT reasonId, datetime('now'), originId
FROM (SELECT sub1.stationId AS originId,
sub1.groupDur AS groupDur,
aar.reasonId AS reasonId,
aar.above AS above,
aar.duration AS ruleDur
FROM (SELECT MIN(d.id) AS id,
d.station AS stationId,
strftime('%s', ag.dtEnd) - strftime('%s', ag.dtStart) AS groupDur
FROM ackGroups AS ag
LEFT
JOIN downtimes AS d
ON d.acknowledge = ag.id
WHERE ag.id = old.id
GROUP BY ag.id ) AS sub1
LEFT
JOIN autoAckStations AS aas
ON aas.stationId = sub1.stationId
LEFT
JOIN autoAckRules AS aar
ON aas.autoAckRuleId = aar.id
ORDER BY duration DESC) as s
WHERE ( s.ruleDur < s.groupDur AND above = 1 ) OR (s.ruleDur > s.groupDur AND above = 0)
LIMIT 1
);
END;

Ambiguous column name SQL

I get the following error when I want to execute a SQL query:
"Msg 209, Level 16, State 1, Line 9
Ambiguous column name 'i_id'."
This is the SQL query I want to execute:
SELECT DISTINCT x.*
FROM items x LEFT JOIN items y
ON y.i_id = x.i_id
AND x.last_seen < y.last_seen
WHERE x.last_seen > '4-4-2017 10:54:11'
AND x.spot = 'spot773'
AND (x.technology = 'Bluetooth LE' OR x.technology = 'EPC Gen2')
AND y.id IS NULL
GROUP BY i_id
This is how my table looks like:
CREATE TABLE [dbo].[items] (
[id] INT IDENTITY (1, 1) NOT NULL,
[i_id] VARCHAR (100) NOT NULL,
[last_seen] DATETIME2 (0) NOT NULL,
[location] VARCHAR (200) NOT NULL,
[code_hex] VARCHAR (100) NOT NULL,
[technology] VARCHAR (100) NOT NULL,
[url] VARCHAR (100) NOT NULL,
[spot] VARCHAR (200) NOT NULL,
PRIMARY KEY CLUSTERED ([id] ASC));
I've tried a couple of things but I'm not an SQL expert:)
Any help would be appreciated
EDIT:
I do get duplicate rows when I remove the GROUP BY line as you can see:

I'm adding another answer in order to show how you'd typically select the lastest record per group without getting duplicates. You's use ROW_NUMBER for this, marking every last record per i_id with row number 1.
SELECT *
FROM
(
SELECT
i.*,
ROW_NUMBER() over (PARTITION BY i_id ORDER BY last_seen DESC) as rn
FROM items i
WHERE last_seen > '2017-04-04 10:54:11'
AND spot = 'spot773'
AND technology IN ('Bluetooth LE', 'EPC Gen2')
) ranked
WHERE rn = 1;
(You'd use RANK or DENSE_RANK instead of ROW_NUMBER if you wanted duplicates.)

You forgot the table alias in GROUP BY i_id.
Anyway, why are you writing an anti join query where you are trying to get rid of duplicates with both DISTINCT and GROUP BY? Did you have issues with a straight-forward NOT EXISTS query? You are making things way more complicated than they actually are.
SELECT *
FROM items i
WHERE last_seen > '2017-04-04 10:54:11'
AND spot = 'spot773'
AND technology IN ('Bluetooth LE', 'EPC Gen2')
AND NOT EXISTS
(
SELECT *
FROM items other
WHERE i.i_id = other.i_id
AND i.last_seen < other.last_seen
);
(There are other techniques of course to get the last seen record per i_id. This is one; another is to compare with MAX(last_seen); another is to use ROW_NUMBER.)

Querying across multiple tables avoiding a union all

I have the following DB tables that I am trying to query:
t_shared_users
user_id
user_category
folder_id
expiry
t_documents
id
folder_id
user_id
user_category
description
created
updated
t_folder
id
type
user_id
user_category
created
updated
I would like to find all the documents you own and have shared access to. ie. search for all documents in t_documents where user_id = 1 AND user_category = 100 but also include those documents in the folder you have access to in t_shared_users. Here is my attempt at the query:
SELECT
id,
folder_id,
user_id,
user_category,
description,
created,
updated
FROM
t_documents
WHERE
user_category = 100
AND user_id = 1
UNION ALL
SELECT
d.id,
d.folder_id,
d.user_id,
d.user_category,
d.description,
d.created,
d.updated
FROM
t_documents d
JOIN
t_shared_users s
ON
d.folder_id = s.folder_id
WHERE
d.user_category = 100
d.AND user_id = 1
ORDER BY
created ASC
LIMIT
10
Is there any better/more performant/concise way to write this query? The above seems a little verbose and slow.
edit:
CREATE TABLE t_folder (
id SERIAL NOT NULL,
user_category SMALLINT NOT NULL,
user_id INTEGER NOT NULL,
type INTEGER NOT NULL,
description TEXT,
created TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT now(),
updated TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT now(),
PRIMARY KEY (id)
);
CREATE TABLE t_documents (
id BIGSERIAL NOT NULL,
folder_id INTEGER,
user_category SMALLINT NOT NULL,
user_id INTEGER NOT NULL,
description TEXT NOT NULL,
created TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT now(),
updated TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT now(),
PRIMARY KEY (id)
);
CREATE TABLE t_shared_users (
id SERIAL,
folder_id INTEGER NOT NULL,
user_category INTEGER NOT NULL,
user_id INTEGER NOT NULL,
expiry TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT now(),
PRIMARY KEY (id)
);

This is your query:
SELECT
id,
folder_id,
user_id,
user_category,
description,
created,
updated
FROM
t_documents
WHERE
user_category = 100
AND user_id = 1
UNION ALL
SELECT
d.id,
d.folder_id,
d.user_id,
d.user_category,
d.description,
d.created,
d.updated
FROM
t_documents d
JOIN
t_shared_users s
ON
d.folder_id = s.folder_id
WHERE
d.user_category = 100
AND d.user_id = 1 -- your query actually has a typo here
What I don't understand about the above query is why you are filtering on d.user_category and d.user_id (t_documents table) in the bottom part of the query. Are you sure you didn't mean s.user_category and s.user_id (t_shared_users)? If not, what is the point of joining with t_shared_users?
Assuming that I am correct that your query is in error, this is how I would rewrite it:
select d.*
from t_documents d
where d.user_category = 100
and d.user_id = 1
union
select d.*
from t_shared_users s
join t_documents d
on d.folder_id = s.folder_id
where s.user_category = 100
and s.user_id = 1
Notice that I use union instead of union all, as I believe it's technically possible to get possibly unwanted duplicate documents otherwise.
Also, just as a rough approximation, these are the indexes I would define for good performance:
t_documents (user_id, user_category)
t_documents (folder_id)
t_shared_users (user_id, user_category, folder_id)

Starting from the query, you have given, I would replace join with left join
select
d.id,
d.folder_id,
d.user_id,
d.user_category,
d.description,
d.created,
d.updated
from t_documents d
left join t_shared_users s on d.folder_id = s.folder_id
where (d.user_category = 100 and d.user_id = 1)
or (s.user_category = 100 and s.user_id = 1)
This would give you all entries from t_documents with user_id = 1 and user_category = 100, and also all entries with the same where clause, where you have access to shared documents.

Using ORDER BY and LIMIT in an SQL view

I'm trying to create a view that is limited to the last entry per id
My table structure is as follows
CREATE TABLE IF NOT EXISTS `u_tbleeditlog` (
`editID` bigint(20) NOT NULL AUTO_INCREMENT,
`editType` int(1) NOT NULL,
`editTypeID` bigint(20) NOT NULL,
`editedID` bigint(20) NOT NULL,
`editedDtm` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
PRIMARY KEY (`editID`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=1 ;
And I'm trying to create a view that will only display the last entry assigned to the Type and TypeID
My view so far
CREATE OR REPLACE VIEW vwu_editlog AS
SELECT u_tbleeditlog.*, CONCAT_WS(' ',u_users.user_firstname,u_users.user_lastname) AS editedEditor
FROM u_tbleeditlog
JOIN u_users ON u_users.user_id = u_tbleeditlog.editedID
ORDER BY u_tbleeditlog.editedDtm DESC LIMIT 1
But my problem is that this limits the entire view to just 1 result overall, and I get the message Current selection does not contain a unique column. Grid edit, checkbox, Edit, Copy and Delete features are not available.
So say there are multiple values with 1, 1, 2017-08-16, 1, 1, 2016-05-14 etc it will only return 1, 1, 2017-08-16
Can anyone please tell me if what I'm trying to do is possible, and if so how? :)

Do this with the not exists approach to getting the last row in a series:
CREATE OR REPLACE VIEW vwu_editlog AS
SELECT el.*, CONCAT_WS(' ', u.user_firstname, u.user_lastname) AS editedEditor
FROM u_tbleeditlog el JOIN
u_users u
ON u.user_id = el.editedID
WHERE not exists (select 1
from u_tbleeditlog el2
where el2.editType = el.editType and
el2.editTypeID = el.editTypeID and
el2.editedDtm > el.editedDtm
);

You have to use GROUP BY and HAVING() for that. What database are you using?
It should look something like this:
SELECT editType, editedDtm
FROM u_tbleeditlog AS u
GROUP BY editType, editedDtm
HAVING editedDtm = (SELECT MAX(editedDtm) FROM u_tbleeditlog WHERE editType = u.editType)
ORDER BY editedDtm DESC

MySQL Subquery returning incorrect result?

I've got the following MySQL query / subquery:
SELECT id, user_id, another_id, myvalue, created, modified,
(
SELECT id
FROM users_values AS ParentUsersValue
WHERE ParentUsersValue.user_id = UsersValue.user_id
AND ParentUsersValue.another_id = UsersValue.another_id
AND ParentUsersValue.id < UsersValue.id
ORDER BY id DESC
LIMIT 1
) AS old_id
FROM users_values AS UsersValue
WHERE created >= '2009-12-20'
AND created <= '2010-01-21'
AND user_id = 9917
AND another_id = 23
Given the criteria listed, the result for the subquery (old_id) should be null (no matches would be found in my table). Instead of MySQL returning null, it just seems to drop the "WHERE ParentUsersValue.user_id = UsersValue.user_id" clause and pick the first value that matches the other two fields. Is this a MySQL bug, or is this for some reason the expected behavior?
Update:
CREATE TABLE users_values (
id int(11) NOT NULL AUTO_INCREMENT,
user_id int(11) DEFAULT NULL,
another_id int(11) DEFAULT NULL,
myvalue double DEFAULT NULL,
created datetime DEFAULT NULL,
modified datetime DEFAULT NULL,
PRIMARY KEY (id)
) ENGINE=InnoDB AUTO_INCREMENT=2801 DEFAULT CHARSET=latin1
EXPLAIN EXTENDED:
id select_type table type possible_keys key key_len ref rows filtered Extra
1 PRIMARY UsersValue index_merge user_id,another_id user_id,another_id 5,5 NULL 1 100.00 Using intersect(user_id,another_id); Using where
2 DEPENDENT SUBQUERY ParentUsersValue index PRIMARY,user_id,another_id PRIMARY 4 NULL 1 100.00 Using where
EXPLAIN EXTENDED Warning 1003:
select `mydb`.`UsersValue`.`id` AS `id`,`mydb`.`UsersValue`.`user_id` AS `user_id`,`mydb`.`UsersValue`.`another_id` AS `another_id`,`mydb`.`UsersValue`.`myvalue` AS `myvalue`,`mydb`.`UsersValue`.`created` AS `created`,`mydb`.`UsersValue`.`modified` AS `modified`,(select `mydb`.`ParentUsersValue`.`id` AS `id` from `mydb`.`users_values` `ParentUsersValue` where ((`mydb`.`ParentUsersValue`.`user_id` = `mydb`.`UsersValue`.`user_id`) and (`mydb`.`ParentUsersValue`.`another_id` = `mydb`.`UsersValue`.`another_id`) and (`mydb`.`ParentUsersValue`.`id` < `mydb`.`UsersValue`.`id`)) order by `mydb`.`ParentUsersValue`.`id` desc limit 1) AS `old_id` from `mydb`.`users_values` `UsersValue` where ((`mydb`.`UsersValue`.`another_id` = 23) and (`mydb`.`UsersValue`.`user_id` = 9917) and (`mydb`.`UsersValue`.`created` >= '2009-12-20') and (`mydb`.`UsersValue`.`created` <= '2010-01-21'))

This returns correct results (NULL) for me:
CREATE TABLE users_values (id INT NOT NULL PRIMARY KEY, user_id INT NOT NULL, another_id INT NOT NULL, created DATETIME NOT NULL);
INSERT
INTO users_values VALUES (1, 9917, 23, '2010-01-01');
SELECT *,
(
SELECT id
FROM users_values AS ParentUsersValue
WHERE ParentUsersValue.user_id = UsersValue.user_id
AND ParentUsersValue.another_id = UsersValue.another_id
AND ParentUsersValue.id < UsersValue.id
ORDER BY id
DESC
LIMIT 1
) AS old_id
FROM users_values AS UsersValue
WHERE created >= '2009-12-20'
AND created <= '2010-01-21'
AND user_id = 9917
AND another_id = 23
Could you please run this query:
SELECT COUNT(*)
FROM users_values AS UsersValue
WHERE user_id = 9917
AND another_id = 23
and make sure it returns 1?
Note that your subquery does not filter on created, so the subquery can return values out of the range the main query defines.
Update:
This is definitely a bug in MySQL.
Most probably the reason is that the access path chosen for UsersValues is index_intersect.
This selects appropriate ranges from both indexes and build their intersection.
Due to the bug, the dependent subquery is evaluated before the intersection completes, that's why you get the results with the correct another_id but wrong user_id.
Could you please check if the problem persists when you force PRIMARY scan on the UsersValues:
SELECT *,
(
SELECT id
FROM users_values AS ParentUsersValue
WHERE ParentUsersValue.user_id = UsersValue.user_id
AND ParentUsersValue.another_id = UsersValue.another_id
AND ParentUsersValue.id < UsersValue.id
ORDER BY id
DESC
LIMIT 1
) AS old_id
FROM users_values AS UsersValue FORCE INDEX (PRIMARY)
WHERE created >= '2009-12-20'
AND created <= '2010-01-21'
AND user_id = 9917
AND another_id = 23
Also, for this query you should create a composite index on (user_id, another_id, id) rather than two distinct indexes on user_id and another_id.
Create the index and rewrite the query a little:
SELECT *,
(
SELECT id
FROM users_values AS ParentUsersValue
WHERE ParentUsersValue.user_id = UsersValue.user_id
AND ParentUsersValue.another_id = UsersValue.another_id
AND ParentUsersValue.id < UsersValue.id
ORDER BY
user_id DESC, another_id DESC, id DESC
LIMIT 1
) AS old_id
FROM users_values AS UsersValue
WHERE created >= '2009-12-20'
AND created <= '2010-01-21'
AND user_id = 9917
AND another_id = 23
The user_id DESC, another_id DESC clauses are logically redundant, but they will make the index to be used for ordering.

Did you try running the subquery only to see if you are getting the right results? Could you show us the schema for your users_values table?
Also, try replacing your SELECT id in your subquery by SELECT ParentUsersValue.id

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Optimizing a query in PostgreSQL - sql

Related

SQLite: Workaround for SQLite-TRIGGER with WITH

Ambiguous column name SQL

Querying across multiple tables avoiding a union all

Using ORDER BY and LIMIT in an SQL view

MySQL Subquery returning incorrect result?

Categories

Resources