Limiting subqueries in SQL - sql

I have a situation which is a little hard to describe. I'll try to explain with an example and the result which I want.
I have three tables like so
Employee
| id | Name |
|----+-------|
| 1 | Alice |
| 2 | Bob |
| 3 | Jane |
| 4 | Jack |
Task
| id | employee_id | description |
|----+-------------+---------------------|
| 1 | 1 | Fix bug |
| 2 | 1 | Implement feature |
| 3 | 1 | Deploy master |
| 4 | 2 | Integrate feature |
| 5 | 2 | Fix cosmetic issues |
Status
| id | task_id | time | details | Terminal |
|----+---------+-------+-----------+----------|
| 1 | 1 | 12:00 | Assigned | false |
| 2 | 1 | 12:30 | Started | false |
| 3 | 1 | 13:00 | Completed | true |
| 4 | 2 | 12:10 | Assigned | false |
| 5 | 2 | 14:00 | Started | false |
| 6 | 3 | 12:15 | Assigned | false |
| 7 | 4 | 12:20 | Assigned | false |
| 8 | 5 | 12:25 | Assigned | false |
| 9 | 4 | 12:30 | Started | false |
(I have also put these into a sqlfiddle page at http://sqlfiddle.com/#!9/728c85/1)
The basic idea is that I have some employees and tasks. The tasks can be assigned to employees and as they work on them, they keep adding "status" rows.
Each status entry has a field "terminal" which can either be true or false.
If the last status entry for a task has terminal set to true, then that task is over and there's nothing more to be done on it.
If all tasks assigned to an employee are over, then the employee is considered free.
I need to get a list of free employees. This would basically mean, given an employee, a list of all his or her tasks with statuses. So, something like this for Alice
| Task | Completed |
|-------------------+-----------|
| Fix bug | true |
| Implement feature | false |
| Deploy master | false |
From which I know that she's not free right now since there are 'false' entries in completed.
How would I do this? If my tables are not constructed properly for this kind of query, I'd very much like some advice on that too.
(I titled the question like this since I want to order the statuses of each task per user and them limit them to the last row).
Update
It was suggested to me that the status field should really go inside the tasks table and the Status table should simple be a log table.

I would go for the idea to have the status in the tasks table. (Please see my comment on your request on this.) However, here are two queries to select free employees:
If tasks cannot be re-opened, it is simple: Get all incompleted tasks by checking whether a record with terminal = true exists. Free employees are all that have no incomplete task.
select *
from employee
where id not in
(
select employee_id
from task
where id not in (select task_id from status where terminal = true)
);
If tasks can be re-opened, however, then you do the same but must find the last status. This can be done with Postgre's DISTINCT ON for instance.
select *
from employee
where id not in
(
select employee_id
from task
where not
(
select distinct on (task_id) terminal
from status
where task_id = task.id
order by task_id, id desc
)
);
(I am using the ID to find the last entry per task, as the time without a date seems inappropriate. You could only use the time column instead, if a task will always run within one day only.)
SQL fiddles:
http://sqlfiddle.com/#!15/f0ea8/2
http://sqlfiddle.com/#!15/f0ea8/1

You have to group all the statuses togheter and you can then use MAX() to find if one of them is true, like this:
SELECT t.description, MAX(s.terminal)
FROM Employee e
INNER JOIN task t ON t.employee_id = e.id
INNER JOIN status s ON s.task_id = t.id
GROUP BY t.id;
When you want this just for one user add something like this WHERE e.id = 1.

Hope this helps
select T.employee_id, T.description, S.Terminal
from Employee E
INNER JOIN Task T ON E.id=T.employee_id
INNER JOIN (Select task_id, max(id) as status_id FROM Status GROUP BY task_id) as ST on T.id=ST.task_id
INNER JOIN Status S on S.id=ST.status_id

I hope this will help you...??
select E.Name,T.id as[Task Id],T.description,S.Terminal from Employee E
inner join Task T on E.id=T.employee_id
inner join Status S on S.task_id=T.id
where e.id not in (select employee_id from Task where id in (select task_id from Status where Terminal='true' and details='Completed') )

Related

Returning singular row/value from joined table date based on closest date

I have a Production Table and a Standing Data table. The relationship of Production to Standing Data is actually Many-To-Many which is different to how this relationship is usually represented (Many-to-One).
The standing data table holds a list of tasks and the score each task is worth. Tasks can appear multiple times with different "ValidFrom" dates for changing the score at different points in time. What I am trying to do is query the Production Table so that the TaskID is looked up in the table and uses the date it was logged to check what score it should return.
Here's an example of how I want the data to look:
Production Table:
+----------+------------+-------+-----------+--------+-------+
| RecordID | Date | EmpID | Reference | TaskID | Score |
+----------+------------+-------+-----------+--------+-------+
| 1 | 27/02/2020 | 1 | 123 | 1 | 1.5 |
| 2 | 27/02/2020 | 1 | 123 | 1 | 1.5 |
| 3 | 30/02/2020 | 1 | 123 | 1 | 2 |
| 4 | 31/02/2020 | 1 | 123 | 1 | 2 |
+----------+------------+-------+-----------+--------+-------+
Standing Data
+----------+--------+----------------+-------+
| RecordID | TaskID | DateActiveFrom | Score |
+----------+--------+----------------+-------+
| 1 | 1 | 01/02/2020 | 1.5 |
| 2 | 1 | 28/02/2020 | 2 |
+----------+--------+----------------+-------+
I have tried the below code but unfortunately due to multiple records meeting the criteria, the production data duplicates with two different scores per record:
SELECT p.[RecordID],
p.[Date],
p.[EmpID],
p.[Reference],
p.[TaskID],
s.[Score]
FROM ProductionTable as p
LEFT JOIN StandingDataTable as s
ON s.[TaskID] = p.[TaskID]
AND s.[DateActiveFrom] <= p.[Date];
What is the correct way to return the correct and singular/scalar Score value for this record based on the date?
You can use apply :
SELECT p.[RecordID], p.[Date], p.[EmpID], p.[Reference], p.[TaskID], s.[Score]
FROM ProductionTable as p OUTER APPLY
( SELECT TOP (1) s.[Score]
FROM StandingDataTable AS s
WHERE s.[TaskID] = p.[TaskID] AND
s.[DateActiveFrom] <= p.[Date]
ORDER BY S.DateActiveFrom DESC
) s;
You might want score basis on Record Level if so, change the where clause in apply.

Remove newest redundant row and update timestamp

I'm working with a SQLite database that receives large data dumps on a regular basis from several sources. Unfortunately, those sources aren't intelligent about what they dump, and I end up with a lot of repeated records from one time to the next. I'm looking for a way to remove these repeated records without affecting the records that have legitimately changed from the past dump to this one.
Here's the general structure of the data (_id is the primary key):
| _id | _dateUpdated | _dateEffective | _dateExpired | name | status | location |
|-----|--------------|----------------|--------------|------|--------|----------|
| 1 | 2016-05-01 | 2016-05-01 | NULL | Fred | Online | USA |
| 2 | 2016-05-01 | 2016-05-01 | NULL | Jim | Online | USA |
| 3 | 2016-05-08 | 2016-05-08 | NULL | Fred | Offline| USA |
| 4 | 2016-05-08 | 2016-05-08 | NULL | Jim | Online | USA |
| 5 | 2016-05-15 | 2016-05-15 | NULL | Fred | Offline| USA |
| 6 | 2016-05-15 | 2016-05-15 | NULL | Jim | Online | USA |
I'd like to be able to reduce this data to something like this:
| _id | _dateUpdated | _dateEffective | _dateExpired | name | status | location |
|-----|--------------|----------------|--------------|------|--------|----------|
| 1 | 2016-05-01 | 2016-05-01 | 2016-05-07 | Fred | Online | USA |
| 2 | 2016-05-15 | 2016-05-01 | NULL | Jim | Online | USA |
| 3 | 2016-05-15 | 2016-05-08 | NULL | Fred | Offline| USA |
The idea here is that rows 4, 5, and 6 exactly duplicate rows 2 and 3 except for the timestamps (I'd need to compare by all three fields - name, status, location). However, row 3 does not duplicate row 1 (status changed from Online to Offline), so the _dateExpired field is set in row 1, and row 3 becomes the most recent record.
I'm querying this table with something like this:
SELECT * FROM Data WHERE
date(_dateEffective) <= date("now")
AND (_dateExpired IS NULL OR date(_dateExpired) > date("now"))
Is this sort of reduction possible in SQLite?
I am still a beginner to SQL and database design in general, so it's possible that I haven't structured the database in the best way. I'm open to suggestions there as well...I'm going for the ability to query data at a given point in time - for example, "what was Jim's status around 2016-05-06?"
Thanks in advance!
Consider using a staging table where the dump file goes into a DumpTable (regularly cleaned out before each dump) and then an INSERT...SELECT query migrates to your final table.
Now the SELECT portion maintains a correlated subquery (to calculate new [_dateExpired] for needed rows) and derived table subquery (to filter out non-dups according to your criteria). Finally, the LEFT JOIN...NULL with FinalTable is to ensure no duplicate records are appended, assuming [_id] is a unique identifier. Below is the routine:
Clean Out DumpTable
DELETE FROM DumpTable;
Run Dump Routine to be appended into DumpTable
Append Records to FinalTable
INSERT INTO FinalTable ([_id], [_dateUpdated], [_dateEffective], [_dateExpired],
[name], status, location)
SELECT d.[_id], d.[_dateUpdated], d.[_dateEffective],
(SELECT Min(date(sub.[_dateEffective], '-1 day'))
FROM DumpTable sub
WHERE sub.[name] = DumpTable.[name]
AND sub.[_dateEffective] > DumpTable.[_dateEffective]
AND sub.status <> DumpTable.status) As calcExpired
d.name, d.status, d.location
FROM DumpTable d
INNER JOIN
(SELECT Min(DumpTable.[_id]) AS min_id,
DumpTable.name, DumpTable.status
FROM DumpTable
GROUP BY DumpTable.name, DumpTable.status) AS c
ON (c.name = d.name)
AND (c.min_id = d.[_id])
AND (c.status = d.status)
LEFT JOIN FinalTable f
ON d.[_id] = f.[_id]
WHERE f.[_id] IS NULL;
-- INSERTED RECORDS:
-- _id _dateUpdated _dateEffective _dateExpired name status location
-- 1 2016-05-01 2016-05-01 2016-05-07 Fred Online USA
-- 2 2016-05-01 2016-05-01 Jim Online USA
-- 3 2016-05-08 2016-05-08 Fred Offline USA
Is this sort of reduction possible in SQLite?
The answer to any "reduction" question in SQL is always Yes. The trick is to find what axes you're reducing along.
Here's a partial solution to illustrate; it gives the first Online date for each name & location.
select min(_dateEffective) as start_date
, name
, location
from Data
where status = 'Online'
group by
name
, location
With an outer join back to the table (on name & location) where the status is 'Offline' and the _dateEffective is greater than start_date, you get your _dateExpired.
_id is the primary key
There is a commonly held misunderstanding that every table needs some kind of sequential "ID" number as a primary key. The key you really care about is known as a natural key, 1 or more columns in the data that uniquely identify the data. In your case, it looks to me like that's _dateEffective, name, status, and location. At the very least, declare them unique to prevent accidental duplication.

Access SQL Select rows that have value in common with results of condition

Let's say I have a simple table, with the following format:
==================================
| ID | Invoice | Box | Delivered |
==================================
| 1 | AA11 | 1 | True |
----------------------------------
| 2 | AA11 | 2 | False |
----------------------------------
| 3 | AA22 | 1 | False |
----------------------------------
| 4 | AA33 | 1 | False |
----------------------------------
| 5 | AA44 | 1 | True |
----------------------------------
ID is a unique integer, Invoice is a TEXT field, Box is an Integer, and Delivered being a boolean (or BIT, as it's know in Access).
A query like this gets a list of everything that has been delivered:
SELECT * FROM Deliveries WHERE Delivered = True
However, each invoice can have multiple boxes (as is the case with invoice 1111), and sometimes not all the boxes are delivered at the same time. If a box has been delivered, I would like to be able to get the status of the other boxes with the same invoice number.
I know I can run multiple queries to do this. The one I mentioned above, and then another which loops through all the return results, and then runs another select with Invoice = ####.
Is there a way to do this all in a single query? I think it might be WHERE EXISTS, but I can't find out how to structure the query.
Ideally, I want the rows returned for a single query of the above able to be rows with ID's: 1, 2, and 5. This is the output I am looking for:
==================================
| ID | Invoice | Box | Delivered |
==================================
| 1 | 1111 | 1 | True |
----------------------------------
| 2 | 1111 | 2 | False |
----------------------------------
| 5 | 4444 | 1 | True |
----------------------------------
So that even though Delivered = False for ID 2, it is still returned because another item with the same invoice number has Delivered = True
When trying out some queries I got the error
Cannot join on memo, ole or hyperlink Object
Assuming you want something like this
select * from invoice where invoice in (SELECT invoice FROM invoice WHERE Delivered = 'True')
With the nested query your selecting an outputting the invoice numbers for reference in the parent query. Here the output of the nested query is used to 'filter' the results.
You already got it to work, but here is the another way, without changing the table.
SELECT invoice.ID, invoice.Invoice, invoice.Box, invoice.Delivered, invoice_1.Delivered AS Expr1
FROM invoice, invoice AS invoice_1
WHERE (((invoice.Invoice)=[invoice_1].[Invoice]) AND (([invoice_1].[Delivered])=Yes));
You can test it here.
For those getting into the same problem there is an explanation and a solution here
Apologies if I messed up the Access syntax:
select id
from T
group by id
having sum(iif(Delivered, 1, 0) < count(*)
Call the above query IC (incomplete). You may want to see all the undelivered data along with the ids:
select * from T
where id in (<IC>) and Delivered = false

Displaying a pair that have same value in another table

I'm trying to make a query that pair a worker that work on the same place. The relational model I'm asking looks like this:
Employee(EmNum, name)
Work(FiNum*, EmNum*)
Field(FiNum, Title)
(bold indicates primary key)
right now my code looks like
SELECT work.finum, e1.name,e1.emnum,e2.name,e2.emnum
FROM employee e1
INNER JOIN employee e2
on e1.EmNum = e2.EmNum
INNER JOIN work
on e1.emnum = work.emnum
This gives me result like
| finum | name | emnum | name_1 | emnum_1 |
| 1 | a | 1 | a | 1 |
| 1 | b | 2 | b | 2 |
| 2 | c | 3 | c | 3 |
| 3 | d | 4 | d | 4 |
| 3 | e | 5 | e | 5 |
while I want the result to be like
| finum | name | emnum | name_1 | emnum_1 |
| 1 | a | 1 | b | 2 |
| 1 | b | 2 | a | 1 |
| 3 | d | 4 | e | 4 |
| 3 | e | 5 | d | 5 |
I'm quite new at sql so I can't really think of a way to do this. Any help or input would be helpful.
Thanks
Your question is slightly unclear, but my guess is that you're trying to find employees that worked on the same place = same finum in work, but different row. That you can do this way:
SELECT w1.finum, e1.name,e1.emnum, e2.name,e2.emnum
from work w1
join work w2 on w1.finum = w2.finum and w1.emnum != w2.emnum
join employee e1 on e1.emnum = w1.emnum
join employee e2 on e2.emnum = w2.emnum
If you don't want to repeat the records (1 <-> 2 + 2 <-> 1 change the != in the join to > or <)
I'm trying to make a query that pair a worker that work on the same place.
Presumably the "places" are represented by the Field table. If you want to pair up employees on that basis then you should be performing a join conditioned on field numbers being the same, as opposed to one conditioned on employee numbers being the same.
It looks like your main join wants to be a self-join of Work to Work of records with matching FiNum. To get the employee names in the result then you will need also to join Employee twice. To avoid employees being paired with themselves, you will want to filter those cases out via a WHERE clause.

inner join shows same table for comments

I need to grab the comments that a user started and which were answered by him.
I'm trying to inner join below, but he repeats the result.
I need to show the User's comment that he started with the answers, and both he replied.
select *
from comments as comment
join comments as parent
on comment.user_parent_id = parent.user_id
where comment.member = 123
user_id | user_fname | user_lname | user_parent_id | member
1 | test 1 | xx | 1 | 123
2 | test 2 | xx | |
3 | test 3 | xx | |
4 | test 4 | xx | 1 | 123
You can make a recursive SQL function, use a hierarchycal index (SQL Server) or make a loop "while has parent".