Querying across multiple tables avoiding a union all - sql

I have the following DB tables that I am trying to query:
t_shared_users
user_id
user_category
folder_id
expiry
t_documents
id
folder_id
user_id
user_category
description
created
updated
t_folder
id
type
user_id
user_category
created
updated
I would like to find all the documents you own and have shared access to. ie. search for all documents in t_documents where user_id = 1 AND user_category = 100 but also include those documents in the folder you have access to in t_shared_users. Here is my attempt at the query:
SELECT
id,
folder_id,
user_id,
user_category,
description,
created,
updated
FROM
t_documents
WHERE
user_category = 100
AND user_id = 1
UNION ALL
SELECT
d.id,
d.folder_id,
d.user_id,
d.user_category,
d.description,
d.created,
d.updated
FROM
t_documents d
JOIN
t_shared_users s
ON
d.folder_id = s.folder_id
WHERE
d.user_category = 100
d.AND user_id = 1
ORDER BY
created ASC
LIMIT
10
Is there any better/more performant/concise way to write this query? The above seems a little verbose and slow.
edit:
CREATE TABLE t_folder (
id SERIAL NOT NULL,
user_category SMALLINT NOT NULL,
user_id INTEGER NOT NULL,
type INTEGER NOT NULL,
description TEXT,
created TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT now(),
updated TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT now(),
PRIMARY KEY (id)
);
CREATE TABLE t_documents (
id BIGSERIAL NOT NULL,
folder_id INTEGER,
user_category SMALLINT NOT NULL,
user_id INTEGER NOT NULL,
description TEXT NOT NULL,
created TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT now(),
updated TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT now(),
PRIMARY KEY (id)
);
CREATE TABLE t_shared_users (
id SERIAL,
folder_id INTEGER NOT NULL,
user_category INTEGER NOT NULL,
user_id INTEGER NOT NULL,
expiry TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT now(),
PRIMARY KEY (id)
);

This is your query:
SELECT
id,
folder_id,
user_id,
user_category,
description,
created,
updated
FROM
t_documents
WHERE
user_category = 100
AND user_id = 1
UNION ALL
SELECT
d.id,
d.folder_id,
d.user_id,
d.user_category,
d.description,
d.created,
d.updated
FROM
t_documents d
JOIN
t_shared_users s
ON
d.folder_id = s.folder_id
WHERE
d.user_category = 100
AND d.user_id = 1 -- your query actually has a typo here
What I don't understand about the above query is why you are filtering on d.user_category and d.user_id (t_documents table) in the bottom part of the query. Are you sure you didn't mean s.user_category and s.user_id (t_shared_users)? If not, what is the point of joining with t_shared_users?
Assuming that I am correct that your query is in error, this is how I would rewrite it:
select d.*
from t_documents d
where d.user_category = 100
and d.user_id = 1
union
select d.*
from t_shared_users s
join t_documents d
on d.folder_id = s.folder_id
where s.user_category = 100
and s.user_id = 1
Notice that I use union instead of union all, as I believe it's technically possible to get possibly unwanted duplicate documents otherwise.
Also, just as a rough approximation, these are the indexes I would define for good performance:
t_documents (user_id, user_category)
t_documents (folder_id)
t_shared_users (user_id, user_category, folder_id)

Starting from the query, you have given, I would replace join with left join
select
d.id,
d.folder_id,
d.user_id,
d.user_category,
d.description,
d.created,
d.updated
from t_documents d
left join t_shared_users s on d.folder_id = s.folder_id
where (d.user_category = 100 and d.user_id = 1)
or (s.user_category = 100 and s.user_id = 1)
This would give you all entries from t_documents with user_id = 1 and user_category = 100, and also all entries with the same where clause, where you have access to shared documents.

Related

SQLite: Workaround for SQLite-TRIGGER with WITH

I'm working on a project to monitor downtimes of production lines with an embedded device. I want to automate acknowledging of these downtimes by generic rules the user can configure.
I want to use a TRIGGER but get a syntax error near UPDATE even though the documentation says it should be fine to use the WITH statement.
CREATE TRIGGER autoAcknowledge
AFTER UPDATE OF dtEnd ON ackGroups
FOR EACH ROW
WHEN old.dtEnd IS NULL AND new.dtEnd IS NOT NULL
BEGIN
WITH sub1(id, stationId, groupDur) AS (
SELECT MIN(d.id), d.station,
strftime('%s', ag.dtEnd) - strftime('%s', ag.dtStart)
FROM ackGroups AS ag
LEFT JOIN downtimes AS d on d.acknowledge = ag.id
WHERE ag.id = old.id
GROUP BY ag.id ),
sub2( originId, groupDur, reasonId, above, ruleDur) AS (
SELECT sub1.stationId, sub1.groupDur, aar.reasonId, aar.above, aar.duration
FROM sub1
LEFT JOIN autoAckStations AS aas ON aas.stationId = sub1.stationId
LEFT JOIN autoAckRules AS aar ON aas.autoAckRuleId = aar.id
ORDER BY duration DESC )
UPDATE ackGroups SET (reason, dtAck, origin)=(
SELECT reasonId, datetime('now'), originId
FROM sub2 as s
WHERE ( s.ruleDur < s.groupDur AND above = 1 ) OR (s.ruleDur > s.groupDur AND above = 0)
LIMIT 1
)
WHERE id = old.id;
END
Background: First we have the downtimes table. Each production line consists of multiple parts called stations. Each station can start the line downtime and they can overlap with other stations downtimes.
CREATE TABLE "downtimes" ("id" integer NOT NULL PRIMARY KEY AUTOINCREMENT,
"station" integer NOT NULL,
"acknowledge" integer,
"dtStart" datetime NOT NULL,
"dtEnd" datetime,
"dtLastModified" datetime)
Overlaping downtimes are grouped to acknowledge groups using TRIGGER AFTER INSERT on downtimes to set acknowledge id right or create a new group.
CREATE TABLE "ackGroups" ("id" integer NOT NULL PRIMARY KEY AUTOINCREMENT,
"reason" integer,
"dtAck" datetime,
"dtStart" datetime NOT NULL,
"dtEnd" datetime,
"line" integer NOT NULL,
"origin" integer)
The autoAckRules table represents the configuration. The user decides whether the rule should apply to durations higher or lower a certain value and which rasonId should be used to acknowledge.
CREATE TABLE "autoAckRules" ("id" integer NOT NULL PRIMARY KEY AUTOINCREMENT,
"description" text NOT NULL,
"reasonId" integer NOT NULL,
"above" bool NOT NULL,
"duration" real NOT NULL)
The autoAckStations table is used to manage M:N relationship. Each rule allow multiple stations which started the ackGroup.
CREATE TABLE autoAckStations (
autoAckRuleId INTEGER NOT NULL,
stationId INTEGER NOT NULL,
PRIMARY KEY ( autoAckRuleId, stationId )
)
When the last downtime ends dtEnd of ackGroups is set to datetime('now') and the trigger is fired to check if there is a autoAckRule that fits.
If I substitute the sub selects with a SELECT .. FROM( SELECT .. FROM(SELECT .. FROM ))) cascade
is there a nice way to avoid the need to write and evaluate it twice?
Or am I missing something stupid?
Common table expression are not supported for statements inside of triggers. You need to convert CTE to sub-query such as
CREATE TRIGGER autoAcknowledge
AFTER UPDATE OF dtEnd ON ackGroups
FOR EACH ROW
WHEN old.dtEnd IS NULL AND new.dtEnd IS NOT NULL
BEGIN
UPDATE ackGroups
SET (reason, dtAck, origin)= (
SELECT reasonId, datetime('now'), originId
FROM (SELECT sub1.stationId AS originId,
sub1.groupDur AS groupDur,
aar.reasonId AS reasonId,
aar.above AS above,
aar.duration AS ruleDur
FROM (SELECT MIN(d.id) AS id,
d.station AS stationId,
strftime('%s', ag.dtEnd) - strftime('%s', ag.dtStart) AS groupDur
FROM ackGroups AS ag
LEFT
JOIN downtimes AS d
ON d.acknowledge = ag.id
WHERE ag.id = old.id
GROUP BY ag.id ) AS sub1
LEFT
JOIN autoAckStations AS aas
ON aas.stationId = sub1.stationId
LEFT
JOIN autoAckRules AS aar
ON aas.autoAckRuleId = aar.id
ORDER BY duration DESC) as s
WHERE ( s.ruleDur < s.groupDur AND above = 1 ) OR (s.ruleDur > s.groupDur AND above = 0)
LIMIT 1
);
END;

How to show which students are still in school using sql

This table shows the records of students entering and leaving the school. IN represents student entering school and OUT represents student leaving school. I wondering how to show which students are still in school.
I'm trying so much but still cannot figure it out, does anyone can help me, Thank you so much.
DROP TABLE IF EXISTS `student`;
CREATE TABLE `student` (
`id` int(11) NOT NULL auto_increment,
`time` varchar(128) default NULL,
`status` varchar(128) default NULL,
`stu_id` varchar(128) default NULL,
PRIMARY KEY (`id`)
)
INSERT INTO `student` (`id`, `time`, `status`, `stu_id`) VALUES
(1,'11AM','IN','1'),
(2,'11AM','IN','2'),
(3,'12AM','OUT','1'),
(4,'12AM','IN','3'),
(5,'1PM','OUT','3'),
(6,'2PM','IN','3'),
(11,'2PM','IN','4');
I expect the answer is 2, 3, 4
The number of students in the school is the sum of the ins minus the sum of the outs:
select sum(case when status = 'in' then 1
when status = 'out' then -1
else 0
end)
from student;
Basically to see the students who are in the school, you want the students whose last status is in. One way uses a correlated subquery:
select s.stu_id
from student s
where s.time = (select max(s2.time)
from student s2
where s2.stu_id = s.stu_id
) and
s.status = 'in';
If status is either only IN or OUT can't you do
SELECT * from student WHERE status="IN"
here's the query considering the auto increment id
select t2.* from
student t2
left join (select ROW_NUMBER() OVER(PARTITION by stu_id ORDER BY id desc) as row_num, id from student) t1 on t1.id = t2.id
where t1.row_num = 1 and [status] = 'IN'

Optimizing a query in PostgreSQL

CREATE TABLE master.estado_movimiento_inventario
(
id integer NOT NULL,
eliminado boolean NOT NULL DEFAULT false,
fundamentacion text NOT NULL,
fecha timestamp without time zone NOT NULL,
id_empresa integer NOT NULL,
id_usuario integer NOT NULL,
id_estado integer NOT NULL,
id_movimiento integer NOT NULL,
se_debio_tramitar_hace bigint DEFAULT 0,
CONSTRAINT "PK15estadomovtec" PRIMARY KEY (id)
)
WITH (
OIDS=FALSE
);
ALTER TABLE master.estado_movimiento_inventario
OWNER TO postgres;
This table tracks the state of every inventory movement in my business logic. So, every movement that is not ended yet (there is not any id_estado =3 or id_estado=4 for any id_movimiento in the master.estado_movimiento_inventario table) must store in its last state's se_debio_tramitar_hace field the difference between now() and fecha field every time a scheduled task runs (Windows).
The query I built in order to do so is this:
with update_time as(
SELECT distinct on(id_movimiento) id
from master.estado_movimiento_inventario
where id_movimiento not in (
select id_movimiento
from master.estado_movimiento_inventario
where id_estado = 2 or id_estado=3
) order by id_movimiento, id desc
)
update master.estado_movimiento_inventario mi
set se_debio_tramitar_hace= EXTRACT(epoch FROM now()::timestamp - mi.fecha )/3600
where mi.id in (select id from update_time);
This works as expected but I suspect it is not optimal, specially in the update operation and here is my biggest doubt: what is optimal when performing this update operation:
Perform the updates as it currently does
or
The equivalent of postgresql function to this:
foreach(update_time.id as row_id){
update master.estado_movimiento_inventario mi set diferencia = now() - mi.fecha where mi.id=row_id;
}
Sorry if I am not explicit enough, I have not much experience working with databases, although I have understood the theory behind, yet not worked too much with it.
Edit
Please notice the id_estado is not unique per id_movimiento, just like the picture shows:
I think this improves the CTE:
with update_time as (
select id_movimiento, max(id) as max_id
from master.estado_movimiento_inventario
group by id_movimiento
having sum( (id_estado in (2, 3))::int ) = 0
)
update master.estado_movimiento_inventario mi
set diferencia = now() - mi.fecha
where mi.id in (select id from update_time);
If the last id would be the one in the "2" or "3" state, I would simply do:
update master.estado_movimiento_inventario mi
set diferencia = now() - mi.fecha
where mi.id = (select max(mi2.id)
from master.estado_movimiento_inventario mi
where mi2.id_movimiento = mi.id_movimiento
) and
mi.id_estado not in (2, 3);

Reuse SUM OVER PARTITION return value in query

I'm looking for a way to save some calculation from being done twice in a query like:
SELECT DISTINCT
coins.price_btc,
coins.price_eur,
coins.price_usd,
coins.currency_name,
coins.currency_symbol,
SUM ( market_transactions.quantity ) OVER ( PARTITION BY market_transactions.market_coin_id ) * coins.price_eur AS holdings_eur,
SUM ( market_transactions.quantity ) OVER ( PARTITION BY market_transactions.market_coin_id ) * coins.price_usd AS holdings_usd,
SUM ( market_transactions.quantity ) OVER ( PARTITION BY market_transactions.market_coin_id ) * coins.price_btc AS holdings_btc,
SUM ( market_transactions.quantity ) OVER ( PARTITION BY market_transactions.market_coin_id ) AS holdings
FROM
market_transactions
INNER JOIN coins ON coins.id = market_transactions.market_coin_id
WHERE
market_transactions.user_id = 1
ORDER BY
coins.currency_symbol
I'm not sure if that sum over partition is running all these times.
Thanks for any pointers, I'm sure the query can also be optimized but I'm unsure where to start.
CREATE TABLE "public"."coins" (
"id" int8 NOT NULL DEFAULT nextval('coins_id_seq'::regclass),
"currency_symbol" text COLLATE "pg_catalog"."default" NOT NULL DEFAULT NULL,
"currency_name" text COLLATE "pg_catalog"."default" NOT NULL DEFAULT NULL,
"price_usd" numeric(16,7) NOT NULL DEFAULT NULL,
"price_eur" numeric(16,7) NOT NULL DEFAULT NULL,
"price_btc" numeric(16,7) NOT NULL DEFAULT NULL,
CONSTRAINT "coins_pkey" PRIMARY KEY ("id")
)
CREATE TABLE "public"."market_transactions" (
"id" int8 NOT NULL DEFAULT nextval('market_transactions_id_seq'::regclass),
"user_id" int4 NOT NULL DEFAULT NULL,
"quantity" numeric(18,8) NOT NULL DEFAULT NULL,
"market_coin_id" int4 DEFAULT NULL,
CONSTRAINT "market_transactions_pkey" PRIMARY KEY ("id")
)
A user has many transactions involving a coin (market_transactions.market_coin_id is coins.id), I'm trying to SUM the quantity owned (market_transactions.quantity) for each one and then multiply this value for the price of the coin expressed in different currencies (btc, eur, usd)
I would suggest aggregating before joining and doing:
SELECT c.*,
mt.quantity * c.price_eur AS holdings_eur,
mt.quantity * c.price_usd AS holdings_usd,
mt.quantity * c.price_btc AS holdings_btc,
mt.quantity * c.market_coin_id AS holdings
FROM coins c JOIN
(SELECT mt.market_coin_id, SUM(mt.quantity) as quantity
FROM market_transactions t
WHERE mt.user_id = 1
GROUP BY mt.market_coin_id
) mt
ON c.id = mt.market_coin_id
ORDER BY c.currency_symbol
Run an EXPLAIN (i.e. EXPLAIN SELECT DISTINCT ...) on the query and see what the query plan is. Most likely, it's only running the window function once. If it is running it multiple times, try adding an outer SELECT:
SELECT DISTINCT
price_btc,
price_eur,
price_usd,
currency_name,
currency_symbol,
holdings * price_eur AS holdings_eur,
holdings * price_usd AS holdings_usd,
holdings * price_btc AS holdings_btc,
holdings
FROM (
SELECT
coins.price_btc,
coins.price_eur,
coins.price_usd,
coins.currency_name,
coins.currency_symbol,
SUM ( market_transactions.quantity ) OVER ( PARTITION BY market_transactions.market_coin_id ) AS holdings
FROM
market_transactions
INNER JOIN coins ON coins.id = market_transactions.market_coin_id
WHERE
market_transactions.user_id = 1
) src
ORDER BY
currency_symbol

MySQL Subquery returning incorrect result?

I've got the following MySQL query / subquery:
SELECT id, user_id, another_id, myvalue, created, modified,
(
SELECT id
FROM users_values AS ParentUsersValue
WHERE ParentUsersValue.user_id = UsersValue.user_id
AND ParentUsersValue.another_id = UsersValue.another_id
AND ParentUsersValue.id < UsersValue.id
ORDER BY id DESC
LIMIT 1
) AS old_id
FROM users_values AS UsersValue
WHERE created >= '2009-12-20'
AND created <= '2010-01-21'
AND user_id = 9917
AND another_id = 23
Given the criteria listed, the result for the subquery (old_id) should be null (no matches would be found in my table). Instead of MySQL returning null, it just seems to drop the "WHERE ParentUsersValue.user_id = UsersValue.user_id" clause and pick the first value that matches the other two fields. Is this a MySQL bug, or is this for some reason the expected behavior?
Update:
CREATE TABLE users_values (
id int(11) NOT NULL AUTO_INCREMENT,
user_id int(11) DEFAULT NULL,
another_id int(11) DEFAULT NULL,
myvalue double DEFAULT NULL,
created datetime DEFAULT NULL,
modified datetime DEFAULT NULL,
PRIMARY KEY (id)
) ENGINE=InnoDB AUTO_INCREMENT=2801 DEFAULT CHARSET=latin1
EXPLAIN EXTENDED:
id select_type table type possible_keys key key_len ref rows filtered Extra
1 PRIMARY UsersValue index_merge user_id,another_id user_id,another_id 5,5 NULL 1 100.00 Using intersect(user_id,another_id); Using where
2 DEPENDENT SUBQUERY ParentUsersValue index PRIMARY,user_id,another_id PRIMARY 4 NULL 1 100.00 Using where
EXPLAIN EXTENDED Warning 1003:
select `mydb`.`UsersValue`.`id` AS `id`,`mydb`.`UsersValue`.`user_id` AS `user_id`,`mydb`.`UsersValue`.`another_id` AS `another_id`,`mydb`.`UsersValue`.`myvalue` AS `myvalue`,`mydb`.`UsersValue`.`created` AS `created`,`mydb`.`UsersValue`.`modified` AS `modified`,(select `mydb`.`ParentUsersValue`.`id` AS `id` from `mydb`.`users_values` `ParentUsersValue` where ((`mydb`.`ParentUsersValue`.`user_id` = `mydb`.`UsersValue`.`user_id`) and (`mydb`.`ParentUsersValue`.`another_id` = `mydb`.`UsersValue`.`another_id`) and (`mydb`.`ParentUsersValue`.`id` < `mydb`.`UsersValue`.`id`)) order by `mydb`.`ParentUsersValue`.`id` desc limit 1) AS `old_id` from `mydb`.`users_values` `UsersValue` where ((`mydb`.`UsersValue`.`another_id` = 23) and (`mydb`.`UsersValue`.`user_id` = 9917) and (`mydb`.`UsersValue`.`created` >= '2009-12-20') and (`mydb`.`UsersValue`.`created` <= '2010-01-21'))
This returns correct results (NULL) for me:
CREATE TABLE users_values (id INT NOT NULL PRIMARY KEY, user_id INT NOT NULL, another_id INT NOT NULL, created DATETIME NOT NULL);
INSERT
INTO users_values VALUES (1, 9917, 23, '2010-01-01');
SELECT *,
(
SELECT id
FROM users_values AS ParentUsersValue
WHERE ParentUsersValue.user_id = UsersValue.user_id
AND ParentUsersValue.another_id = UsersValue.another_id
AND ParentUsersValue.id < UsersValue.id
ORDER BY id
DESC
LIMIT 1
) AS old_id
FROM users_values AS UsersValue
WHERE created >= '2009-12-20'
AND created <= '2010-01-21'
AND user_id = 9917
AND another_id = 23
Could you please run this query:
SELECT COUNT(*)
FROM users_values AS UsersValue
WHERE user_id = 9917
AND another_id = 23
and make sure it returns 1?
Note that your subquery does not filter on created, so the subquery can return values out of the range the main query defines.
Update:
This is definitely a bug in MySQL.
Most probably the reason is that the access path chosen for UsersValues is index_intersect.
This selects appropriate ranges from both indexes and build their intersection.
Due to the bug, the dependent subquery is evaluated before the intersection completes, that's why you get the results with the correct another_id but wrong user_id.
Could you please check if the problem persists when you force PRIMARY scan on the UsersValues:
SELECT *,
(
SELECT id
FROM users_values AS ParentUsersValue
WHERE ParentUsersValue.user_id = UsersValue.user_id
AND ParentUsersValue.another_id = UsersValue.another_id
AND ParentUsersValue.id < UsersValue.id
ORDER BY id
DESC
LIMIT 1
) AS old_id
FROM users_values AS UsersValue FORCE INDEX (PRIMARY)
WHERE created >= '2009-12-20'
AND created <= '2010-01-21'
AND user_id = 9917
AND another_id = 23
Also, for this query you should create a composite index on (user_id, another_id, id) rather than two distinct indexes on user_id and another_id.
Create the index and rewrite the query a little:
SELECT *,
(
SELECT id
FROM users_values AS ParentUsersValue
WHERE ParentUsersValue.user_id = UsersValue.user_id
AND ParentUsersValue.another_id = UsersValue.another_id
AND ParentUsersValue.id < UsersValue.id
ORDER BY
user_id DESC, another_id DESC, id DESC
LIMIT 1
) AS old_id
FROM users_values AS UsersValue
WHERE created >= '2009-12-20'
AND created <= '2010-01-21'
AND user_id = 9917
AND another_id = 23
The user_id DESC, another_id DESC clauses are logically redundant, but they will make the index to be used for ordering.
Did you try running the subquery only to see if you are getting the right results? Could you show us the schema for your users_values table?
Also, try replacing your SELECT id in your subquery by SELECT ParentUsersValue.id