Querying for who worked on an item first and second - sql

I have a table that looks like this:
Id (PK, int, not null)
ReviewedBy (nvarchar(255), not null)
ReviewDateTime(datetime, not null)
Decision_id (int, not null)
Item_id (FK, int, not null)
The business process with this table is that each Item (shown by Item_id foreign key) is to be worked on by 2 people.
How can I query this table to determine who (ReviewedBy) reviewed the item first and who reviewed it second.
I'm really struggling to figure this out because I neglected adding a Type column to my table that would determine which the user was acting as. :(
Edit
Given the following data
Id,ReviewedBy,ReviewedWhen,SomeOtherId,
16,111111,2011-12-14 22:06:54,1,
17,187935,2011-12-14 22:07:03,1,
18,187935,2011-12-14 22:07:18,2,
19,187935,2011-12-14 22:07:20,3,
20,111111,2011-12-14 22:07:23,2,
21,187935,2011-12-14 22:07:26,3,
22,123456,2011-12-14 22:27:50,4,
with schema
CREATE TABLE [Reviews] (
[Id] INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT,
[ReviewedBy] NVARCHAR(6) NOT NULL,
[ReviewedWhen] TIMESTAMP DEFAULT CURRENT_TIMESTAMP NOT NULL,
[SomeOtherId] INTEGER NOT NULL
);
Executing the following to get a list of people who did second reviews will return rows where there is only one review for SomeOtherId.
select t1.*
from Reviews as t1
left outer join Reviews as t2
on (t1.SomeOtherId = t2.SomeOtherId and t1.ReviewedWhen < t2.ReviewedWhen)
where t2.SomeOtherId is null;
Solution
-- First checks
select t1.ReviewedBy, count(t1.Id)
from Reviews as t1
left outer join Reviews as t2
on (t2.SomeOtherId = t1.SomeOtherId and t1.ReviewedWhen > t2.ReviewedWhen)
where t2.SomeOtherID is null
group by t1.ReviewedBy;
-- Second checks
select t1.ReviewedBy, count(t1.Id)
from Reviews as t1
left outer join Reviews as t2
on (t2.SomeOtherId = t1.SomeOtherId and t1.ReviewedWhen < t2.ReviewedWhen)
where t2.SomeOtherID is null
and t1.Id not in (select Id from Reviews group by SomeOtherId having count(SomeOtherId) = 1)
group by t1.ReviewedBy;
Essentially, it was counting items where there was only one review as both a first and second check. All I had to do was ensure that when I'm counting second checks that I'm not including rows with only one review.
I thought I could achieve this in one query but guess not.

Try this:
select
t1.ReviewedBy FirstReviewer,
t2.ReviewedBy SecondReviewer
from
Table t1
left outer join Table t2 on t1.Item_Id = t2.Item_Id and t2.ReviewDateTime > t1.ReviewDateTime
If you want to only return rows that have been reviewed by two people, change the left outer join to an inner join.
If ReviewDateTime is never updated and Id is an identity column you can change the join to join on Id rather ReviewDateTime, which will be faster.

Related

Inner join removes some rows unnecessarily

I have 3 tables defined like so
CREATE TABLE participants(
id SERIAL PRIMARY KEY,
Name TEXT NOT NULL,
Title TEXT NOT NULL
);
CREATE TABLE meetings (
id SERIAL PRIMARY KEY,
Subject TEXT NOT NULL,
Organizer TEXT NOT NULL,
StartTime TIMESTAMP NOT NULL,
EndTime TIMESTAMP NOT NULL
);
CREATE TABLE meetings_participants(
meeting_id int not null,
participant_id int not null,
primary key (meeting_id, participant_id),
foreign key(meeting_id) references meetings(id),
foreign key(participant_id) references participants(id)
);
I want to find meetings happening today with participants in them.
When I run this query I basically get them
SELECT * from meetings
INNER JOIN meetings_participants ON meetings.id = meetings_participants.meeting_id
INNER JOIN participants ON meetings_participants.participant_id = participants.id
WHERE starttime::date = NOW()::date;
Problem is this query discards meetings where there are no participants yet, I still wish to include them into my query result. How can I modify my query to work like that ?
You need a LEFT JOIN instead of INNER. Using ::date casting you are implying that you are only interested them to be taking place today, whether or not it might already ended. Still you should include EndTime in your query, taking into consideration that there might be meetings that span over several days:
SELECT * from meetings
left join meetings_participants on meetings.id = meetings_participants.meeting_id
left join participants on meetings_participants.participant_id = participants.id
WHERE starttime::date <= NOW()::date and endtime::date >= NOW()::date ;
DBFiddle demo here.
EDIT: Participants' name and title as JSON array:
SELECT id, subject, organizer, starttime, endtime, jsonb_pretty(tmp.participants)
from meetings m
left join lateral (
select jsonb_agg(row_to_json(tp)) as participants
from (select p.name, p.title
from meetings_participants mp
inner join participants p on mp.participant_id = p.id
where mp.meeting_id = m.id
) tp
) tmp on true
WHERE starttime::date <= NOW()::date
and endtime::date >= NOW()::date;
DBFiddle demo for participants added as JSON
You did not mention whether you want each participant on a separate row or as an aggregate (e.g. a comma separated list). If former then change inner to left join. For the latter case you could:
SELECT meetings.*, (
SELECT string_agg(participants.name, ', ')
FROM meetings_participants
JOIN participants ON meetings_participants.participant_id = participants.id
WHERE meetings_participants.meeting_id = meetings.id
) AS participants_list
FROM meetings
WHERE starttime::date = current_date

What is the best way to join tables

this is more like a general question.
I am looking for the best way to join 4, maybe 5 different tables. I am trying to create a Power Bi pulling live information from an IBM AS400 where customer service can type one of our parts number,
see how many parts we have in inventory, if none, see the lead time and if there are any orders already already entered for the typed part number.
SERI is our inventory table with 37180 records.
(active inventory that is available)
METHDM is our kit table with 37459 records.
(this table contains the bill of materials for custom kits, KIT A123 contains different part numbers in it witch are in SERI as well.)
STKA is our part lead time table with 76796 records.
(lead time means how long will it take for parts to come in)
OCRI is our sales order table with 6497 records.
(This table contains all customer orders)
I have some knowledge in writing queries but this one is more challenging of what I have created in the past. Should I start with the table that has the most records and start left joining the rest ?
From STKA 76796 records
Left join METHDM 37459 records on STKA
left join SERI 37180 records on STKA
left join OCRI 6497 records on STAK
Select
STKA.v6part as part,
STKA.v6plnt as plant,
STKA.v6tdys as pur_leadtime,
STKA.v6prpt as Pur_PrepLeadtime,
STKA.v6lead as Mfg_leadtime,
STKA.v6prpt as Mfg_PrepLeadTime,
METHDM.AQMTLP AS COMPONENT,
METHDM.AQQPPC AS QTYNEEDED,
SERI.HTLOTN AS BATCH,
SERI.HTUNIT AS UOM,
(HTQTY - HTQTYC) as ONHAND,
OCRI.DDORD# AS SALESORDER,
OCRI.DDRDAT AS PROMISED
from stka
left join METHDM on STKA.V6PART = METHDM.AQPART
left join SERI on STKA.V6PART = SERI.HTPART
left join OCRI on STKA.V6PART = OCRI.DDPART
Is this the best way to join the tables?
I think you already have your answer, but conceptually, there are a few issues here to deal with, and I figured I would give you a few examples, using data a little bit like yours, but massively simplified.
CREATE TABLE #STKA (V6PART INT, OTHER_DATA VARCHAR(50));
CREATE TABLE #METHDM (AQPART INT, KIT_ID INT, SOME_DATE DATETIME, OTHER_DATA VARCHAR(50));
CREATE TABLE #SERI (HTPART INT, OTHER_DATA VARCHAR(50));
CREATE TABLE #OCRI (DDPART INT, OTHER_DATA VARCHAR(50));
INSERT INTO #STKA SELECT 1, NULL UNION ALL SELECT 2, NULL UNION ALL SELECT 3, NULL; --1, 2, 3 Ids
INSERT INTO #METHDM SELECT 1, 1, '20200108 10:00', NULL UNION ALL SELECT 1, 2, '20200108 11:00', NULL UNION ALL SELECT 2, 1, '20200108 13:00', NULL; --1 Id appears twice, 2 Id once, no 3 Id
INSERT INTO #SERI SELECT 1, NULL UNION ALL SELECT 3, NULL; --1 and 3 Ids
INSERT INTO #OCRI SELECT 1, NULL UNION ALL SELECT 4, NULL; --1 and 4 Ids
So fundamentally we have a few issues here:
o the first problem is that the IDs in the tables differ, one table has an ID #4 but this isn't in any of the others;
o the second issue is that we have multiple rows for the same ID in one table;
o the third issue is that some tables are "missing" IDs that are in other tables, which you already covered by using LEFT JOINs, so I will ignore this.
--This will select ID 1 twice, 2 once, 3 once, and miss 4 completely
SELECT
*
FROM
#STKA
LEFT JOIN #METHDM ON #METHDM.AQPART = #STKA.V6PART
LEFT JOIN #SERI ON #SERI.HTPART = #STKA.V6PART
LEFT JOIN #OCRI ON #OCRI.DDPART = #STKA.V6PART;
So the problem here is that we don't have every ID in our "anchor" table STKA, and in fact there's no single table that has every ID in it. Now your data might be fine here, but if it isn't then you can simply add a step to find every ID, and use this as the anchor.
--This will select each ID, but still doubles up on ID 1
WITH Ids AS (
SELECT V6PART AS ID FROM #STKA
UNION
SELECT AQPART AS ID FROM #METHDM
UNION
SELECT HTPART AS ID FROM #SERI
UNION
SELECT DDPART AS ID FROM #OCRI)
SELECT
*
FROM
Ids I
LEFT JOIN #STKA ON #STKA.V6PART = I.Id
LEFT JOIN #METHDM ON #METHDM.AQPART = I.Id
LEFT JOIN #SERI ON #SERI.HTPART = I.Id
LEFT JOIN #OCRI ON #OCRI.DDPART = I.Id;
That's using a common-table expression, but a subquery would also do the job. However, this still leaves us with an issue where ID 1 appears twice in the list, because it has multiple rows in one of the sub-tables.
One way to fix this is to pick the row with the latest date, or any other ORDER you can apply to the data:
--Pick the best row for the table where it has multiple rows, now we get one row per ID
WITH Ids AS (
SELECT V6PART AS ID FROM #STKA
UNION
SELECT AQPART AS ID FROM #METHDM
UNION
SELECT HTPART AS ID FROM #SERI
UNION
SELECT DDPART AS ID FROM #OCRI),
BestMETHDM AS (
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY AQPART ORDER BY SOME_DATE DESC) AS ORDER_ID
FROM
#METHDM)
SELECT
*
FROM
Ids I
LEFT JOIN #STKA ON #STKA.V6PART = I.Id
LEFT JOIN BestMETHDM ON BestMETHDM.AQPART = I.Id AND BestMETHDM.ORDER_ID = 1
LEFT JOIN #SERI ON #SERI.HTPART = I.Id
LEFT JOIN #OCRI ON #OCRI.DDPART = I.Id;
Of course you could also add some aggregation (SUM, MAX, MIN, AVG, etc.) to fix this problem (if it is indeed an issue). Also, I used a common-table expression, but this would work just as well with a subquery.
Expanding on a comment made on the question..
I would say I will start with SERI as that table contains the entire inventory for our facility and should cover the other tables
However the question said
SERI is our inventory table with 37180 records. (active inventory that is available)
In my experience, active inventory, isn't the same as all parts.
Normally, in a query like this, I'd expect the first table to be a Parts Master table of some sort that contains every possible part ID.

Converting PostgreSQL Subqueries into Joins

Below is an example schema with 3 tables. I'm trying to run a query that returns all Jobs where all child Shifts are of status 6. If a Job has a child Shift with a status of 5, the Job should not be returned. The proper response for a query from the sample data inserted below is no rows returned.
There is a working query below with the comment "Works". I am trying to refactor the "works" query to use joins instead of subqueries. The query with the comment "Does not work" is my attempt.
-- begin setup and table creation: only run this section once.
CREATE EXTENSION "uuid-ossp";
CREATE TABLE jobs
(
id uuid NOT NULL DEFAULT uuid_generate_v4(),
CONSTRAINT jobs_pkey PRIMARY KEY (id)
);
CREATE TABLE bookings
(
id uuid NOT NULL DEFAULT uuid_generate_v4(),
job_id uuid,
CONSTRAINT bookings_pkey PRIMARY KEY (id)
);
CREATE TABLE shifts
(
id uuid NOT NULL DEFAULT uuid_generate_v4(),
booking_id uuid,
status integer,
CONSTRAINT shifts_pkey PRIMARY KEY (id)
);
insert into jobs (id) values ('e857c86c-bc31-11e6-9aae-57793f585d49');
insert into bookings (id, job_id) values ('736da82c-bc32-11e6-b9b8-f36753d321ac', 'e857c86c-bc31-11e6-9aae-57793f585d49');
insert into bookings (id, job_id) values ('7d839e5c-bc32-11e6-8bb3-4fa95be86a74', 'e857c86c-bc31-11e6-9aae-57793f585d49');
insert into shifts (booking_id, status) values ('736da82c-bc32-11e6-b9b8-f36753d321ac', 6);
insert into shifts (booking_id, status) values ('7d839e5c-bc32-11e6-8bb3-4fa95be86a74', 5);
-- end setup and table creation
We want all jobs where all child shifts are of status 6. If a job has a child shift with a status of 5, the job should not be returned. The proper response for a query from the sample data inserted above is no rows returned.
Does not work :(
SELECT "jobs".*
FROM "jobs"
inner join bookings b1 on jobs.id = b1.job_id
inner join shifts s1 on b1.id = s1.booking_id
left outer join bookings b2 on jobs.id = b2.job_id
left outer join shifts s2 on b2.id = s2.booking_id and s2.status IN (2,3,4,5)
WHERE s1.status = 6
AND s2.id IS NULL
GROUP BY "jobs"."id";
Works
SELECT "jobs".*
FROM "jobs"
WHERE jobs.id IN (
SELECT job_id
FROM bookings
WHERE bookings.id IN (
SELECT booking_id FROM shifts WHERE status = 6
)
) AND jobs.id NOT IN (
SELECT job_id FROM bookings WHERE bookings.id IN (
SELECT booking_id FROM shifts WHERE status IN (2,3,4,5)
)
)
GROUP BY "jobs"."id";
How can I refactor the "works" query to use joins instead of subqueries? The "does not work" query is my attempt.
Try this (haven't tested so there may be typos):
with prohibited_jobs as (
select distinct jobs.id
from jobs
join bookings on jobs.id == bookings.job_id
join shifts on shifts.booking_id = booking.job_id
where shift.status != 6
)
select jobs.*
from jobs
left outer join prohibited_jobs p on p.id = jobs.id
where
p.id IS NULL
It's not completely free from subqueries (doing everything with joins would most certainly be less efficient), but it removes some unnecessary checks, so may be a little bit faster (which I suspect is your goal).
There is a small difference to your working query, in that it returns all jobs where all shifts have status 6 (as you said you want), whereas your query also ensures that the job has at least one shift (of status 6).

SELECT Statement in CASE

Please don't downgrade this as it is bit complex for me to explain. I'm working on data migration so some of the structures look weird because it was designed by someone like that.
For ex, I have a table Person with PersonID and PersonName as columns. I have duplicates in the table.
I have Details table where I have PersonName stored in a column. This PersonName may or may not exist in the Person table. I need to retrieve PersonID from the matching records otherwise put some hardcode value in PersonID.
I can't write below query because PersonName is duplicated in Person Table, this join doubles the rows if there is a matching record due to join.
SELECT d.Fields, PersonID
FROM Details d
JOIN Person p ON d.PersonName = p.PersonName
The below query works but I don't know how to replace "NULL" with some value I want in place of NULL
SELECT d.Fields, (SELECT TOP 1 PersonID FROM Person where PersonName = d.PersonName )
FROM Details d
So, there are some PersonNames in the Details table which are not existent in Person table. How do I write CASE WHEN in this case?
I tried below but it didn't work
SELECT d.Fields,
CASE WHEN (SELECT TOP 1 PersonID
FROM Person
WHERE PersonName = d.PersonName) = null
THEN 123
ELSE (SELECT TOP 1 PersonID
FROM Person
WHERE PersonName = d.PersonName) END Name
FROM Details d
This query is still showing the same output as 2nd query. Please advise me on this. Let me know, if I'm unclear anywhere. Thanks
well.. I figured I can put ISNULL on top of SELECT to make it work.
SELECT d.Fields,
ISNULL(SELECT TOP 1 p.PersonID
FROM Person p where p.PersonName = d.PersonName, 124) id
FROM Details d
A simple left outer join to pull back all persons with an optional match on the details table should work with a case statement to get your desired result.
SELECT
*
FROM
(
SELECT
Instance=ROW_NUMBER() OVER (PARTITION BY PersonName),
PersonID=CASE WHEN d.PersonName IS NULL THEN 'XXXX' ELSE p.PersonID END,
d.Fields
FROM
Person p
LEFT OUTER JOIN Details d on d.PersonName=p.PersonName
)AS X
WHERE
Instance=1
Ooh goody, a chance to use two LEFT JOINs. The first will list the IDs where they exist, and insert a default otherwise; the second will eliminate the duplicates.
SELECT d.Fields, ISNULL(p1.PersonID, 123)
FROM Details d
LEFT JOIN Person p1 ON d.PersonName = p1.PersonName
LEFT JOIN Person p2 ON p2.PersonName = p1.PersonName
AND p2.PersonID < p1.PersonID
WHERE p2.PersonID IS NULL
You could use common table expressions to build up the missing datasets, i.e. your complete Person table, then join that to your Detail table as follows;
declare #n int;
-- set your default PersonID here;
set #n = 123;
-- Make sure previous SQL statement is terminated with semilcolon for with clause to parse successfully.
-- First build our unique list of names from table Detail.
with cteUniqueDetailPerson
(
[PersonName]
)
as
(
select distinct [PersonName]
from [Details]
)
-- Second get unique Person entries and record the most recent PersonID value as the active Person.
, cteUniquePersonPerson
(
[PersonID]
, [PersonName]
)
as
(
select
max([PersonID]) -- if you wanted the original Person record instead of the last, change this to min.
, [PersonName]
from [Person]
group by [PersonName]
)
-- Third join unique datasets to get the PersonID when there is a match, otherwise use our default id #n.
-- NB, this would also include records when a Person exists with no Detail rows (they are filtered out with the final inner join)
, cteSudoPerson
(
[PersonID]
, [PersonName]
)
as
(
select
coalesce(upp.[PersonID],#n) as [PersonID]
coalesce(upp.[PersonName],udp.[PersonName]) as [PersonName]
from cteUniquePersonPerson upp
full outer join cteUniqueDetailPerson udp
on udp.[PersonName] = p.[PersonName]
)
-- Fourth, join detail to the sudo person table that includes either the original ID or our default ID.
select
d.[Fields]
, sp.[PersonID]
from [Details] d
inner join cteSudoPerson sp
on sp.[PersonName] = d.[PersonName];

Select newest entry from a joined MySQL table

I have stock quantity information in my database.
1 table, "stock", holds the productid (sku) along with the quantity and the filename from where it came.
The other table, "stockfile", contains all the processed filenames along with dates.
Now I need to get all the products with their latest stock quantity values.
This gives me ALL the products multiple times with all their stock quantity (resulting in 300.000 records)
SELECT stock.stockid, stock.sku, stock.quantity, stockfile.filename, stockfile.date
FROM stock
INNER JOIN stockfile ON stock.stockfileid = stockfile.stockfileid
ORDER BY stock.sku ASC
I already tried this:
SELECT * FROM stock
INNER JOIN stockfile ON stock.stockfileid = stockfile.stockfileid
GROUP BY sku
HAVING stockfile.date = MAX( stockfile.date )
ORDER BY stock.sku ASC
But it did not work
SHOW CREATE TABLE stock:
CREATE TABLE stock (
stockid bigint(20) NOT NULL AUTO_INCREMENT,
sku char(25) NOT NULL,
quantity int(5) NOT NULL,
creationdate datetime NOT NULL,
stockfileid smallint(5) unsigned NOT NULL,
touchdate datetime NOT NULL,
PRIMARY KEY (stockid)
) ENGINE=MyISAM AUTO_INCREMENT=315169 DEFAULT CHARSET=latin1
SHOW CREATE TABLE stockfile:
CREATE TABLE stockfile (
stockfileid smallint(5) unsigned NOT NULL AUTO_INCREMENT,
filename varchar(25) NOT NULL,
creationdate datetime DEFAULT NULL,
touchdate datetime DEFAULT NULL,
date datetime DEFAULT NULL,
begindate datetime DEFAULT NULL,
enddate datetime DEFAULT NULL,
PRIMARY KEY (stockfileid)
) ENGINE=MyISAM AUTO_INCREMENT=265 DEFAULT CHARSET=latin1
This is an example of the frequently-asked "greatest-n-per-group" question that we see every week on StackOverflow. Follow that tag to see other similar solutions.
SELECT s.*, f1.*
FROM stock s
INNER JOIN stockfile f1
ON (s.stockfileid = f1.stockfileid)
LEFT OUTER JOIN stockfile f2
ON (s.stockfileid = f2.stockfileid AND f1.date < f2.date)
WHERE f2.stockfileid IS NULL;
If there are multiple rows in stockfile that have the max date, you'll get them both in the result set. To resolve this, you'd have to add some tie-breaker conditions into the join on f2.
Thanks for adding the CREATE TABLE info. That's very helpful when you're asking SQL questions.
I see from the AUTO_INCREMENT table options that you have 315k rows in stock and only 265 rows in stockfile. Your stockfile table is the parent in the relationship, and the stock table is the child, with a column stockfileid that references the primary key of stockfile.
So your original question was misleading. You want the latest row from stock, not the latest row from stockfile.
SELECT f.*, s1.*
FROM stockfile f
INNER JOIN stock s1
ON (f.stockfileid = s1.stockfileid)
LEFT OUTER JOIN stock s2
ON (f.stockfileid = s2.stockfileid AND (s1.touchdate < s2.touchdate
OR s1.touchdate = s2.touchdate AND s1.stockid < s2.stockid))
WHERE s2.stockid IS NULL;
I'm assuming you want "latest" to be relative to touchdate, so if you want to use creationdate instead, you can do the edit.
I've added a term to the join so that it resolves ties. I know you said the dates are "practically unique" but as the saying goes, "one in a million is next Tuesday."
Okay, I think I understand what you're trying to do now. You want the most recent row per sku, but the date by which to compare them is in the referenced table stockfile.
SELECT s1.*, f1.*
FROM stock s1
JOIN stockfile f1 ON (s1.stockfileid = f1.stockfileid)
LEFT OUTER JOIN (stock s2 JOIN stockfile f2 ON (s2.stockfileid = f2.stockfileid))
ON (s1.sku = s2.sku AND (f1.date < f2.date OR f1.date = f2.date AND f1.stockfileid < f2.stockfileid))
WHERE s2.sku IS NULL;
This does a self-join of stock to itself, looking for a row with the same sku and a more recent date. When none is found, then s1 contains the most recent row for its sku. And each instance of stock has to join to its stockfile to get the date.
Re comment about optimization: It's hard for me to test because I don't have tables populated with data matching yours, but I'd guess you should have the following indexes:
CREATE INDEX stock_sku ON stock(sku);
CREATE INDEX stock_stockfileid ON stock(stockfileid);
CREATE INDEX stockfile_date ON stockfile(date);
I'd suggest using EXPLAIN to analyze the query without the indexes, and then create one index at a time and re-analyze with EXPLAIN to see which one gives the most direct benefit.
Use:
SELECT DISTINCT s.stockid,
s.sku,
s.quantity,
sf.filename,
sf.date
FROM STOCK s
JOIN STOCKFILE sf ON sf.stockfileid = s.stockfileid
JOIN (SELECT t.stockfileid,
MAX(t.date) 'max_date'
FROM STOCKFILE t
GROUP BY t.stockfileid) x ON x.stockfileid = sf.stockfileid
AND x.max_date = sf.date
select *
from stock
where stockfileid in (
select top 1 stockfileid
from stockfile
order by date desc
)
There are two common ways to accomplish this: a sub query or a self-join.
See this example of selecting the group-wise maximum at the MySQL site.
Edit, an example using a subquery:
SELECT stock.stockid, stock.sku, stock.quantity,
stockfile.filename, stockfile.date
FROM stock
INNER JOIN stockfile ON stock.stockfileid = stockfile.stockfileid
WHERE stockfile.date = (SELECT MAX(date) FROM stockfile);