SQLite query WHERE with OUTER JOIN - sql

I am a bit rusty with my SQL and am running into a little issue with a query. In our application we have two relative tables to this problem. There are entries, and for each entry there are N steps.
We are trying to optimize our querying, so instead of asking for all entries all the time, we just ask for entries that were updated after we last checked. There can be a lot of steps, so this query is just supposed to return the entries and some step summary data, and we can separately query for steps if needed.
The entry start time and updated time are calculated from the first and most recent process step time respectively. We also have to group together entry statuses.
Here's the query as we build it in python, since it seems easier to read:
statement = 'SELECT e.serial_number, ' + \
'e.description, ' + \
'min(p.start_time) begin_time, ' + \
'group_concat(p.status) status, ' + \
'max(p.last_updated) last_updated, ' + \
'FROM entries e ' + \
'LEFT OUTER JOIN process_steps p ON e.serial_number = p.serial_number ' + \
# if the user provides a "since" date, only return entries updated after
# that date
if since is not None:
statement += ' WHERE last_updated > "{0}"'.format(since)
statement += ' GROUP BY e.serial_number'
The issue we are having is that if we apply that WHERE clause, it filters the process steps too. So for example if we have this situation with two entries:
Entry: 123 foo
Steps:
1. start time 10:00, updated 10:30, status completed
2. start time 11:00, updated 11:30, status completed
3. start time 12:00, updated 12:30, status failed
4. start time 13:00, updated 13:30, status in_progress
Entry: 321 bar
Steps:
1. start time 01:00, updated 01:30, status completed
2. start time 02:00, updated 02:30, status completed
If we query without the where, we would get all entries. So for this case it would return:
321, bar, 01:00, "completed,completed", 02:30
123, foo, 10:00, "completed,completed,failed,in_progress", 13:30
If I had time of 12:15, then it would only return this:
123, foo, 12:00, "failed,in_progress", 13:30
In that result, the start time comes from step 3, and the statuses are only from steps 3 and 4. What I'm looking for is the whole entry:
123, foo, 10:00, "completed,completed,failed,in_progress", 13:30
So basically, I want to filter the final results based on that last_updated value, but it is currently filtering the join results as well, which throws off the begin_time, last_updated and status values since they are calculated with a partial set of steps. Any ideas how to modify the query to get what I want here?
Edit:
It seems like there might be some naming issues here too. The names I used in the example code are equal to or similar to what we actually have in our code. If we change max(p.last_updated) last_updated to max(p.last_updated) max_last_updated, and change the WHERE clause to use max_last_updated as well, we get OperationalError: misuse of aggregate: max() We have also tried adding AS statements in there with no difference.

Create a subquery that selects updated processes first:
SELECT whatever you need FROM entries e
LEFT OUTER JOIN process_steps p ON e.serial_number = p.serial_number
WHERE e.serial_number in (SELECT distinct serial_number from process_steps
WHERE last_updated > "date here")
GROUP BY e.serial_number

You can do this with a having clause:
SELECT . . .
FROM entries e LEFT JOIN
process_steps ps
ON e.serial_number = ps.serial_number
GROUP BY e.serial_number
HAVING MAX(ps.last_updated) > <your value here>;

Related

Calculate difference between first login and last logout with merging with other table - [postgresql]

I'm trying to get difference in hours between login and logout of the user.
SELECT t_logins.log_id,
t_logins.log_date,
t_logins.log_time, t_logouts.log_time,
(DATE_PART('hour', t_logouts.log_time - t_logins.log_time) * 60 +
DATE_PART('minute', t_logouts.log_time - t_logins.log_time)) / 60 as log_delta
FROM t_logins, t_logouts
ORDER BY t_logins.log_date DESC, t_logouts.log_date DESC
It works fine, but when I'm trying to merge Users table (t_users), it returns junk data. How to merge correctly? OUTER JOIN, INNER JOIN in different combination do not help.
SELECT t_logins.log_id,
t_users.first_name || ' ' || t_users.last_name,
t_logins.log_date,
t_logins.log_time, t_logouts.log_time,
(DATE_PART('hour', t_logouts.log_time - t_logins.log_time) * 60 +
DATE_PART('minute', t_logouts.log_time - t_logins.log_time)) / 60 as log_delta
FROM t_logins
LEFT JOIN t_logouts
ON t_logouts.log_date = t_logins.log_date
LEFT JOIN t_users
ON t_users.user_id = t_logins.user_id
ORDER BY t_logins.log_date DESC, t_logouts.log_date DESC
Returns:
17 "Alex Smith" "2020-07-17" "13:55:00" "10:30:00" -3.4166666666666665
17 "Alex Smith" "2020-07-17" "13:55:00" "23:02:00" 9.116666666666667
17 "Alex Smith" "2020-07-17" "13:55:00" "14:00:00" 0.08333333333333333
But table contains only one login and logout point for every user every day (first login and last logout). Difference is always zero or positive. So this is result of incorrect merging.
I've tried to give more strict rule on merging, but it throws error (yes, obviously, but I don't know how to merge it correctly anyway)
How to get only one answer for a user on every day?
For a complete answer provide sample data for each table, as text - no images. However your query(join version) does have a obvious major issue. The condition:
select ...
FROM t_logins
LEFT JOIN t_logouts
ON t_logouts.log_date = t_logins.log_date
...
This joins every t_logouts with every t_logins for a given date, without regard to whether it the same user or not. It also says give me the t_logins for any date that does not have a t_logouts for that date. What happens then? Leaving t he question of missing t_logouts aside, you (at a minimum) need to insure that the login user corresponds to the logout user. So:
select ...
FROM t_logins
LEFT JOIN t_logouts
ON ( t_logouts.log_date = t_logins.log_date
and t_logouts.user_id = t_logins.user_id
)
...
Actually, your first query with "FROM t_logins, t_logouts" without a WHERE actually joins does a cross join of every row in t_logins joined to every row t_logouts.

SQL / Access Query - How do I append two records for every one master record? Time Cards

I have a list of records in TBL_WheelHours with the following Schema:
**GUID - Operator 1 - Operator 2 - Data1 - Data2 - Data3 - Data4 Etc.**
I have a set of queries that append all new entries from this table to another table called TBL_CostLog.
What I want to do is create two entries in the cost log that looks as such:
**TableID - GUID - Operator 1 - Data1 etc.**
**TableID - GUID - Operator 2 - Data1 etc.**
And then I want to be able to run an update query using Tbl Wheel hours as the master, so if that any information in that table changed it propagates to the cost log.
I have many other tables and queries doing this exact same thing and its working beautifully. The difference here though is that there are two operators on this machine, and only 1 record with both names on it.
Any advice or direction I should pursue to do this?
EDIT:
Here is what I have for the other tables where this is not an issue:
APPEND QUERY
INSERT INTO TBL_TimeLog ( Customer, RefNumber, StartTime, StopTime, Multiplier, FromTable, WorkType, [TableID], ProductID, QtySprayed, CoatDesc, Operator_1, Operator_2 )
SELECT TBL_BlastHours.Customer, TBL_BlastHours.[WO #], TBL_BlastHours.[Start Time], TBL_BlastHours.[Stop Time], "1" AS Expr1, "Blast" AS Expr2, "Blast" AS Expr3, TBL_BlastHours.IDLoc, "NA" AS Expr4, 0 AS Expr5, TBL_BlastHours.Booth, TBL_BlastHours.Blaster, "NA" AS Expr6
FROM TBL_BlastHours
LEFT JOIN TBL_TimeLog ON TBL_BlastHours.IDLoc = TBL_TimeLog.TableID
WHERE (((TBL_TimeLog.TableID) Is Null));
UPDATE QUERY
UPDATE TBL_BlastHours
INNER JOIN TBL_TimeLog
ON TBL_BlastHours.IDLoc = TBL_TimeLog.TableID
SET TBL_TimeLog.Customer = [TBL_BlastHours].[Customer], TBL_TimeLog.RefNumber = [TBL_BlastHours].[WO #], TBL_TimeLog.StartTime = [TBL_BlastHours].[Start Time], TBL_TimeLog.StopTime = [TBL_BlastHours].[Stop Time], TBL_TimeLog.CoatDesc = [TBL_BlastHours].[Booth], TBL_TimeLog.Operator_1 = [TBL_BlastHours].[Blaster], TBL_TimeLog.Operator_2 = "NA"
WHERE (((TBL_TimeLog.FromTable)="Blast"));
I think you want union all:
select guid, operator1, data1
from tbl_wheelhours
union all
select guid, operator2, data1
from tbl_wheelhours;
From your description, you may also need a trigger. However, you say have similar code working for a single record, so union all might be the missing piece.
I found a way to get the results I want. I added a column called OperatorKey and OperatorFlag
Append new entries as I had been doing, with all new entries having "1" in the operator flag. I then append a second set of entries with the second operator for all entries that have "1" with the operator flag.
I then run an update query that changes all operator flags to "0".
I create a unique Operator Key for each entry, and then I can run two update queries with the operator key and the GUID key and update the entries from the master list.
Seems to be working great for now.

Where clause with dates in hive

The where clause in the below hive query is not working
select
e.num as badge
from dbo.events as e
where TO_DATE(e.event_time_utc) > TO_DATE(select event_date from DL_EDGE_LRF_facilities.card_swipes_lastpulldate)
both event_time_utc and event_date fields are defined as strings and event_time_utc has timestamp values like '2017-09-18 20:10:19.000000' and event_date has only one date value like '2018-01-25'
i am getting an error like "cannot recognize input near 'select' 'event_date' 'from' in function specification " when i run the query, Please help
#user86683; hive does not recognize the syntax since it does not allow in-query in the inequality condition (>). You may try this query and let me know the result.
select e.num as badge
from dbo.events as e, DL_EDGE_LRF_facilities.card_swipes_lastpulldate c
where TO_DATE(e.event_time_utc) > TO_DATE(c.event_date)
You will get a warning but you may ignore it since the table for event_date has only one record.
Warning: Map Join MAPJOIN[10][bigTable=e] in task 'Map 1' is a cross product
Query ID = xxx_20180201102128_aaabb2235-ee69275cbec1
Total jobs = 1
Launching Job 1 out of 1
Status: Running (Executing on YARN cluster with App id application_09fdf345)
Hope this helps. Thanks.

Unable to get correct EndTime for the next Stage

I am trying to get how long an activity has been "InProgress" based on the history data i have. Each history record contains StartTime and the "Stage" of an activity.
Stages flow like this:
Ready
InProgress
Completed
Also there is a stage named "OnHold" which puts an activity on Hold. While calculating how long an activity has been "InProgress", i need to subtract the amount of time it was "OnHold".
In the given example you will see Activity named "MA50665" went "InProgress" at "2014-07-17 13:08:04.013" and then was put on hold at "2014-07-17 13:12:14.473" which is roughly about 4 minutes. Then it went "InProgress" again at "2014-07-17 13:22:45.503" and was completed at around "2014-07-17 13:33:38.513" which is roughly around 11 minutes. Which means MA50665 was InProgress for about 11+4=15 minutes.
I have the query which is getting me close to what i am looking for. It gives me two records for "MA50665" which i am expecting but the EndTime for both the records comes to "2014-07-17 13:33:38.513" which is incorrect.
For start time "2014-07-17 13:08:04.013", EndTime should have been "2014-07-17 13:12:14.473" because that is when the "InProgress" stage ends. For the second row, StartTime and EndTime are correct.
How do i say in the query that Get the End Time for the stage from the next history row of that activity? I cannot hard code "+1" in the join .
Here is the SQLFiddle for the Table schema and query:http://sqlfiddle.com/#!3/37ef3/4
I think I'm seeing a duplicate row in your example that you say works but has the "+1" in it. Records 5 & 6, seem to be the same but have different end times. Assuming that you are correct here is a fix for the query.
SELECT ROW_NUMBER()OVER(ORDER BY T1.Seqid, T1.Historyid)Rnumber,
T1.Srid,
T1.Activityid,
T1.Seqid,
T1.Currentactstatus,
T1.Previousactstatus,
T1.Timechanged Statusstarttime,
Endtimehist.Timechanged Statusendtime
FROM T1
LEFT JOIN T1 Endtimehist ON T1.Srid = Endtimehist.Srid
AND T1.Activityid = Endtimehist.Activityid
AND T1.Currentactstatus = Endtimehist.Previousactstatus
WHERE T1.SRId = 'SR50660'
AND T1.Currentactstatus = 'InProgress'
AND T1.Previousactstatus = 'Ready'
AND T1.Historyid < Endtimehist.Historyid --This works but i cannot hard code +1 here as history ids may appear in the random incrementing order
ORDER BY T1.SRId DESC, T1.SeqId, T1.HistoryId

Calculating relay-race times via a recurrence (in dynamic-SQL)

I have a database with a table of records storing timestamps between a series of interim-transactions and a completed transaction.
It's stored in a very odd way in the database, which is causing me problems.
Let's exemplify this as a relay-race. This is how the data's recorded.
RACE TIME RUNNER FINISHTIME
1 2011-09-28 11:27:01.437 1 2011-09-28 17:19:00.843
1 2011-09-28 12:35:33.427 2 2011-09-28 17:19:00.843
1 2011-09-28 12:36:15.270 3 2011-09-28 17:19:00.843
The "Time" indicates when the baton was passed and the last runner had finished.
So the math behind an individual runner's time is:
Time(Runner_n) = Time(Runner_n+1) - Time (Runner_n)
Except for the finishing runner, where there is no n+1 recorded. They get:
Time(Runner_final) = FinishTime - Time(final)
I was going to attempt making a new table and iterating through each race with a cursor-- to try and store tuples of: Race, RunnerID, TimeCompleted.
This doesn't require dynamic SQL at all, just a join. Join to the next record. If it exists, then you use the time from that record. Otherwise, use the finish time for the race:
select t.race, t.runner, t.time as starttime,
coalesce(tnext.time, t.finishtime) as endtime,
DATEDIFF(sec, t.time, coalesce(tnext.time, t.finishtime)) as Seconds
from t left outer join
t tnext
on t.race = tnext.race and
t.runner = tnext.runner - 1