I have the following table which logs chat messages
CREATE TABLE message_log
(
id serial NOT NULL,
message text,
from_id character varying(500),
to_id character varying(500),
match_id character varying(500),
unix_timestamp bigint,
own_account boolean,
reply_batch boolean DEFAULT false,
CONSTRAINT message_log_pkey PRIMARY KEY (id)
)
A chat conversation will have the same match_id
I want a query that would return a list of match_ids which the last message related to the match_id (the last message of the chat conversation) is from the non account holder (own_account = false)
I came up with the following query, but it is giving inconsistent results which I don't understand.
select * from message_log
where from_id <> ?
and to_id = ?
and unix_timestamp in ( select distinct max(unix_timestamp)
from message_log group by match_id )
The question mark in the SQL query represents the account holder's user id
It would seem you need to bind the message_id back to the base query as well, otherwise you could be getting a unix_timestamp from a different message:
select m.*
from message_log m
where m.from_id <> ?
and m.to_id = ?
and m.unix_timestamp = ( select max(unix_timestamp)
from message_log
where match_id = m.match_id
group by match_id )
I would suggest using distinct on. This is specific to Postgres, but designed for this situation:
select distinct on (match_id) ml.*
from message_log ml
where from_id <> ? and to_id = ?
order by match_id, unix_timestamp desc;
Related
I'm working on a project to monitor downtimes of production lines with an embedded device. I want to automate acknowledging of these downtimes by generic rules the user can configure.
I want to use a TRIGGER but get a syntax error near UPDATE even though the documentation says it should be fine to use the WITH statement.
CREATE TRIGGER autoAcknowledge
AFTER UPDATE OF dtEnd ON ackGroups
FOR EACH ROW
WHEN old.dtEnd IS NULL AND new.dtEnd IS NOT NULL
BEGIN
WITH sub1(id, stationId, groupDur) AS (
SELECT MIN(d.id), d.station,
strftime('%s', ag.dtEnd) - strftime('%s', ag.dtStart)
FROM ackGroups AS ag
LEFT JOIN downtimes AS d on d.acknowledge = ag.id
WHERE ag.id = old.id
GROUP BY ag.id ),
sub2( originId, groupDur, reasonId, above, ruleDur) AS (
SELECT sub1.stationId, sub1.groupDur, aar.reasonId, aar.above, aar.duration
FROM sub1
LEFT JOIN autoAckStations AS aas ON aas.stationId = sub1.stationId
LEFT JOIN autoAckRules AS aar ON aas.autoAckRuleId = aar.id
ORDER BY duration DESC )
UPDATE ackGroups SET (reason, dtAck, origin)=(
SELECT reasonId, datetime('now'), originId
FROM sub2 as s
WHERE ( s.ruleDur < s.groupDur AND above = 1 ) OR (s.ruleDur > s.groupDur AND above = 0)
LIMIT 1
)
WHERE id = old.id;
END
Background: First we have the downtimes table. Each production line consists of multiple parts called stations. Each station can start the line downtime and they can overlap with other stations downtimes.
CREATE TABLE "downtimes" ("id" integer NOT NULL PRIMARY KEY AUTOINCREMENT,
"station" integer NOT NULL,
"acknowledge" integer,
"dtStart" datetime NOT NULL,
"dtEnd" datetime,
"dtLastModified" datetime)
Overlaping downtimes are grouped to acknowledge groups using TRIGGER AFTER INSERT on downtimes to set acknowledge id right or create a new group.
CREATE TABLE "ackGroups" ("id" integer NOT NULL PRIMARY KEY AUTOINCREMENT,
"reason" integer,
"dtAck" datetime,
"dtStart" datetime NOT NULL,
"dtEnd" datetime,
"line" integer NOT NULL,
"origin" integer)
The autoAckRules table represents the configuration. The user decides whether the rule should apply to durations higher or lower a certain value and which rasonId should be used to acknowledge.
CREATE TABLE "autoAckRules" ("id" integer NOT NULL PRIMARY KEY AUTOINCREMENT,
"description" text NOT NULL,
"reasonId" integer NOT NULL,
"above" bool NOT NULL,
"duration" real NOT NULL)
The autoAckStations table is used to manage M:N relationship. Each rule allow multiple stations which started the ackGroup.
CREATE TABLE autoAckStations (
autoAckRuleId INTEGER NOT NULL,
stationId INTEGER NOT NULL,
PRIMARY KEY ( autoAckRuleId, stationId )
)
When the last downtime ends dtEnd of ackGroups is set to datetime('now') and the trigger is fired to check if there is a autoAckRule that fits.
If I substitute the sub selects with a SELECT .. FROM( SELECT .. FROM(SELECT .. FROM ))) cascade
is there a nice way to avoid the need to write and evaluate it twice?
Or am I missing something stupid?
Common table expression are not supported for statements inside of triggers. You need to convert CTE to sub-query such as
CREATE TRIGGER autoAcknowledge
AFTER UPDATE OF dtEnd ON ackGroups
FOR EACH ROW
WHEN old.dtEnd IS NULL AND new.dtEnd IS NOT NULL
BEGIN
UPDATE ackGroups
SET (reason, dtAck, origin)= (
SELECT reasonId, datetime('now'), originId
FROM (SELECT sub1.stationId AS originId,
sub1.groupDur AS groupDur,
aar.reasonId AS reasonId,
aar.above AS above,
aar.duration AS ruleDur
FROM (SELECT MIN(d.id) AS id,
d.station AS stationId,
strftime('%s', ag.dtEnd) - strftime('%s', ag.dtStart) AS groupDur
FROM ackGroups AS ag
LEFT
JOIN downtimes AS d
ON d.acknowledge = ag.id
WHERE ag.id = old.id
GROUP BY ag.id ) AS sub1
LEFT
JOIN autoAckStations AS aas
ON aas.stationId = sub1.stationId
LEFT
JOIN autoAckRules AS aar
ON aas.autoAckRuleId = aar.id
ORDER BY duration DESC) as s
WHERE ( s.ruleDur < s.groupDur AND above = 1 ) OR (s.ruleDur > s.groupDur AND above = 0)
LIMIT 1
);
END;
my query:
select a.id, a.affiliation
FROM public.affiliation AS a
WHERE NOT EXISTS (
SELECT *
FROM ncbi.affi_known1 AS b
WHERE a.id = b.id
)
limit 5000
it returns:
id
affiliation
4683763
Psychopharmacology Unit, Dorothy Hodgkin Building, University of Bristol, Whitson Street, Bristol, BS1 3NY, UK.
as first row.
but
select * from ncbi.affi_known1 where id = 4683763
do return the data with id = 4683763
both id are int8 type
table a
CREATE TABLE "public"."affiliation" (
"id" int8 NOT NULL,
"affiliation" text COLLATE "pg_catalog"."default",
"tsv_affiliation" tsvector,
CONSTRAINT "affiliation_pkey" PRIMARY KEY ("id")
)
;
CREATE INDEX "affi_idx_tsv" ON "public"."affiliation" USING gin (
to_tsvector('english'::regconfig, affiliation) "pg_catalog"."tsvector_ops"
);
CREATE INDEX "tsv_affiliation_idx" ON "public"."affiliation" USING gin (
"tsv_affiliation" "pg_catalog"."tsvector_ops"
);
table b
CREATE TABLE "ncbi"."affi_known1" (
"id" int8 NOT NULL,
"affi_raw" text COLLATE "pg_catalog"."default",
"affi_main" text COLLATE "pg_catalog"."default",
"affi_known" bool,
"divide" text COLLATE "pg_catalog"."default",
"divide_known" bool,
"sub_divides" text[] COLLATE "pg_catalog"."default",
"country" text COLLATE "pg_catalog"."default",
CONSTRAINT "affi_known_pkey" PRIMARY KEY ("id")
)
;
update:
after create index on id, everything works well.
delete the index, it seems go wrong.
so why primary key id fails there.
update2:
table b is generated from table a, using:
query = '''
select a.id, a.affiliation
FROM public.affiliation AS a
WHERE NOT EXISTS (
SELECT 1
FROM ncbi.affi_known AS b
WHERE a.id = b.id
)
limit 2000000
'''
data = pd.read_sql(query,conn)
while len(data):
for i,row in tqdm(data.iterrows()):
...
curser_insert.execute(
'insert into ncbi.affi_known(id,affi_raw, affi_main ,affi_known,divide,country) values ( %s, %s, %s,%s,%s,%s) ',
[affi_id,affi_raw, affi_main, affi_known,devide,country]
)
conn2.commit()
conn2.commit()
conn.commit()
data = pd.read_sql(query, conn)
and the code exit improperly.
Your understanding of how EXISTS works might be off. Your current exists query is saying that id 4683763 exists in the affiliation table, not the affi_known1 table. So, the following query should return the single record:
SELECT a.id, a.affiliation
FROM public.affiliation a
WHERE a.id = 4683763;
I am assuming the requirement is to fetch rows only when the id is not present in the second table, so you can try this
select a.id, a.affiliation
FROM public.affiliation AS a
WHERE a.id NOT IN (
SELECT id
FROM ncbi.affi_known1
)
If id were an integer, your query would do what you want.
If id is a string, you could have issues with "look-alikes". It is very hard to say what the problem is -- there could be spaces in the id, hidden characters, or something else. And this could be in either table.
Assuming the ids look like numbers, you could filter "bad" ids out using regular expressions:
select id
from ncbi.affi_known1
where not id ~ '^[0-9]*$';
I get the following error when I want to execute a SQL query:
"Msg 209, Level 16, State 1, Line 9
Ambiguous column name 'i_id'."
This is the SQL query I want to execute:
SELECT DISTINCT x.*
FROM items x LEFT JOIN items y
ON y.i_id = x.i_id
AND x.last_seen < y.last_seen
WHERE x.last_seen > '4-4-2017 10:54:11'
AND x.spot = 'spot773'
AND (x.technology = 'Bluetooth LE' OR x.technology = 'EPC Gen2')
AND y.id IS NULL
GROUP BY i_id
This is how my table looks like:
CREATE TABLE [dbo].[items] (
[id] INT IDENTITY (1, 1) NOT NULL,
[i_id] VARCHAR (100) NOT NULL,
[last_seen] DATETIME2 (0) NOT NULL,
[location] VARCHAR (200) NOT NULL,
[code_hex] VARCHAR (100) NOT NULL,
[technology] VARCHAR (100) NOT NULL,
[url] VARCHAR (100) NOT NULL,
[spot] VARCHAR (200) NOT NULL,
PRIMARY KEY CLUSTERED ([id] ASC));
I've tried a couple of things but I'm not an SQL expert:)
Any help would be appreciated
EDIT:
I do get duplicate rows when I remove the GROUP BY line as you can see:
I'm adding another answer in order to show how you'd typically select the lastest record per group without getting duplicates. You's use ROW_NUMBER for this, marking every last record per i_id with row number 1.
SELECT *
FROM
(
SELECT
i.*,
ROW_NUMBER() over (PARTITION BY i_id ORDER BY last_seen DESC) as rn
FROM items i
WHERE last_seen > '2017-04-04 10:54:11'
AND spot = 'spot773'
AND technology IN ('Bluetooth LE', 'EPC Gen2')
) ranked
WHERE rn = 1;
(You'd use RANK or DENSE_RANK instead of ROW_NUMBER if you wanted duplicates.)
You forgot the table alias in GROUP BY i_id.
Anyway, why are you writing an anti join query where you are trying to get rid of duplicates with both DISTINCT and GROUP BY? Did you have issues with a straight-forward NOT EXISTS query? You are making things way more complicated than they actually are.
SELECT *
FROM items i
WHERE last_seen > '2017-04-04 10:54:11'
AND spot = 'spot773'
AND technology IN ('Bluetooth LE', 'EPC Gen2')
AND NOT EXISTS
(
SELECT *
FROM items other
WHERE i.i_id = other.i_id
AND i.last_seen < other.last_seen
);
(There are other techniques of course to get the last seen record per i_id. This is one; another is to compare with MAX(last_seen); another is to use ROW_NUMBER.)
I have the following table which logs chat messages
CREATE TABLE public.message_log
(
id integer NOT NULL DEFAULT nextval('message_log_id_seq'::regclass),
message text,
from_id character varying(500),
to_id character varying(500),
match_id character varying(500),
unix_timestamp bigint,
own_account boolean,
reply_batch boolean DEFAULT false,
insert_time timestamp with time zone DEFAULT now(),
CONSTRAINT message_log_pkey PRIMARY KEY (id),
CONSTRAINT message_log_message_from_id_to_id_match_id_unix_timestamp_key UNIQUE (message, from_id, to_id, match_id, unix_timestamp)
)
A chat conversation has the same match_id
I have the following query which returns a list of match_ids which the last message related to the match_id (the last message of the chat conversation) is from the non account holder (own_account = false) the query is working fine
select m.* from message_log m where m.from_id <> ? and m.to_id = ? and m.unix_timestamp =
(
select max(unix_timestamp) from message_log where match_id = m.match_id group by match_id
)
I want to modify the query above that it would count the chat conversations that have had been replied to twice or more (I think we would need to use the reply_batch column). I can not get my mind around it. Any help would be appreciated.
SELECT match_id, replies_count FROM (SELECT match_id, COUNT(*) AS replies_count FROM message_log
WHERE from_id <> ? and to_id = ? GROUP BY match_id) AS replies_counter WHERE replies_count > 1
I have a table
CREATE TABLE `messages` ( `uid` BIGINT NOT NULL ,
`mid` BIGINT , `date` BIGINT NOT NULL , PRIMARY KEY (`mid`));
I want to select max(date) grouped by uid, i.e. for every uid(read user) I want to find the latest message (with tha maximum date)
tried this
select messages.mid, max(messages.date), messages.uid, messages.body
from messages
where messages.chat_id is NULL
group by messages.uid
but the query works wrong.
A subquery can give you the date you need in order to retrieve the newest message for each user:
SELECT messages.uid, messages.mid, messages.date, messages.body
FROM messages
WHERE messages.chat_id IS NULL
AND messages.date IN
( SELECT MAX(m2.date) FROM messages m2
WHERE m2.uid = messages.uid AND m2.chat_id IS NULL
)
;
u need to group by all the fields while using aggregate functions :) using a subquery would sort out the problem.
SELECT messages.date,messages.uid, messages.mid, messages.body
FROM messages
WHERE messages.chat_id IS NULL AND messages.date IN (SELECT MAX(msg.date) FROM messages msg WHERE messages.chat_id IS NULL And msg.uid = messages.uid )
alternatively it can also be done using the 'having' clause
done :)