Query to get the duration and details from a table - sql

I have a scenario and not quite sure how to query it. As a sample, I have following table structure and want to get the history of the action for bus:
ID-----TIME---------BUSID----OPID----MOVING----STOPPED----PARKED----COUNT
1------10:10:10-----101------1101-----1---------0----------0---------15
2------10:10:11-----102------1102-----0---------1----------0---------5
3------10:11:10-----101------1101-----1---------0----------0---------15
4------10:12:10-----101------1101-----0---------1----------0---------15
5------10:13:10-----101------1101-----1---------0----------0---------19
6------10:14:10-----101------1101-----1---------0----------0---------19
7------10:15:10-----101------1101-----0---------1----------0---------19
8------10:16:10-----101------1101-----0---------0----------1---------0
9------10:17:10-----101------1101-----0---------0----------1---------0
I want to write a query to get the status of a bus like:
BUSID----OPID----STATUS-----TIME---------DURATION---COUNT
101------1101----MOVING-----10:10:10-----2-----------15
101------1101----STOPPED----10:12:10-----1-----------15
101------1101----MOVING-----10:13:10-----2-----------19
101------1101----STOPPED----10:15:10-----1-----------19
101------1101----PARKED-----10:16:10-----2-----------0
I am using SQL Server 2008.
Thanks for your help.

You can use Common Table Expressions to calculate the duration between the different rows.
WITH cte_log AS
(
SELECT
Row_Number()
OVER
(
ORDER BY time DESC
)
AS
id, time, busid, opid, moving, stopped, parked, count
FROM
log_table
WHERE
busid = 101
)
SELECT
current_rows.busid,
current_rows.opid,
current_rows.time,
DATEDIFF(second, current_rows.time, previous_rows.time) AS duration
current_rows.count
FROM
cte_log_position AS current_rows
LEFT OUTER JOIN
log_table AS previous_rows ON ((current_rows.row_id + 1) = previous_rows.row_id)
WHERE
current_rows.busid = 101
ORDER BY
current_rows.time DESC;
The WITH statement creates a temporary result set that is defined within the execution scope of this query. We are using it to fetch the previous records of each row and to calculate the time difference between the the current and the previous record.
This example was not tested, and it may not work perfectly, but I hope it gets you going in the correct direction. Feel free to leave feedback.
You may also want to check the following external links on how to use Common Table Expressions:
SQL Select Next Row and SQL Select Previous Row with Current Row using T-SQL CTE
Calculate Difference between current and previous rows... CTE and Row_Number() rocks!
4 Guys From Rolla: Common Table Expressions (CTE) in SQL Server 2005
MSDN: Using Common Table Expressions

personally i would denormalize the data so you have start_time and end_time in the one row. this will make the query much more efficient.

I don't have access to SQL Server at the moment, so there may be syntax errors in the following:
SELECT
BUSID,
OPID,
IF (MOVING = 1) 'MOVING' ELSE IF (STOPPED = 1) 'STOPPED' ELSE 'PARKED' AS STATUS
TIME,
COUNT
FROM BUS_DATA_TABLE
GROUP BY BUSID
ORDER BY TIME
You'll note that this does not include duration. Until you order your data, you don't know which is the previous entry. Once the data is ordered you can calculate the duration as the difference between the times in consecutive records. You could do this by SELECTing into a new table and then running a second query.
Grouping by BUSID, should give you your report for all buses.

Making certain assumptions about column type, etc:
SELECT
BUSID,
OPID,
STATUS,
TIME,
DURATION,
COUNT
FROM
TABLENAME
WHERE
BUSID = 1O1
ORDER BY
TIME
;

Related

What else do I need to add to my SQL query to bring related information in other columns if using MIN() GROUP BY

There is a table with the following column headers: indi_cod, ries_cod, date, time and level. Each ries_cod contains more than one indi_cod, and these indi_cod are random consecutive numbers.
Which SQL query would be appropriate to build if the aim is to find the smallest ID of each ries_cod, and at the same time bring its related information corresponding to date, time and level?
I tried the following query:
SELECT MIN (indi_cod) AS min_indi_cod
FROM my-project-01-354113.indi_cod.second_step
GROUP BY ries_cod
ORDER BY ries_cod
And, indeed, it presented me with the minimum value of indi_cod for each group of ries_cod, but I couldn't write the appropriate query to bring me the information from the date, time and level columns corresponding to each indi_cod.
I usually use some kind of ranking for this type of thing. you can use row_number, rank, or dense_rank depending on your rdbms. here is an example.
with t as(select a.*,
row_number() over (partition by ries_cod, order by indi_cod) as rn
from mytable)
select * from t where rn = 1
in addition if you are using oracle you can do this without two queries by using keep.
https://renenyffenegger.ch/notes/development/databases/SQL/select/group-by/keep-dense_rank/index
I think you just need to group by with the other columns
SELECT MIN (indi_cod), ries_cod, date, time, level AS min_indi_cod
FROM mytavke p
GROUP BY ries_cod, date, time, level
ORDER BY ries_cod

SQL Statement to Return Records with More than Two Instances and Omit the Last Instance

I am looking to write a SQL statement that will pull EventIDs with two or more instances, but will omit the last instance of these records. This seems crazy, but the purpose of this is to look at events that have multiple updates (the updates are related to the TIME_OF_EVENT column, each time a crew/person updates the event, the time it was updated gets stored here) and see which ones have expired in the middle. I see if they expired in the middle by comparing the TIME_OF_EVENT to the PREV_ERT.
select *
from ert_change_log
where time_of_event > '30-SEP-21 23:59:59'
and source <> 'I'
The SQL above generates the picture below. This is simply just a reference of the table to provide an example of EventIDs that meet this criteria.
In a perfect world, the query I am needing would only return EventIDs 210043901 and 210044021 and would omit the latest TIME_OF_EVENT for those EventID.
If this is confusing I would be glad to offer more explanation or clarification!
Thanks for any input.
You can use the ROW_NUMBER analytic function:
SELECT *
FROM (
SELECT e.*,
ROW_NUMBER() OVER (PARTITION BY event_id ORDER BY time_of_event DESC) AS rn
FROM ert_change_log e
WHERE time_of_event >= DATE '2021-10-01'
AND source <> 'I'
)
WHERE rn > 1;

Retrieving last record in each group from database with additional max() condition in MSSQL

This is a follow-up question to Retrieving last record in each group from database - SQL Server 2005/2008
In the answers, this example was provided to retrieve last record for a group of parameters (example below retrieves last updates for each value in computername):
select t.*
from t
where t.lastupdate = (select max(t2.lastupdate)
from t t2
where t2.computername = t.computername
);
In my case, however, "lastupdate" is not unique (some updates come in batches and have same lastupdate value, and if two updates of "computername" come in the same batch, you will get non-unique output for "computername + lastupdate").
Suppose I also have field "rowId" that is just auto-incremental. The mitigation would be to include in the query another criterion for a max('rowId') field.
NB: while the example employs time-specific name "lastupdate", the actual selection criteria may not be related to the time at all.
I, therefore, like to ask, what would be the most performant query that selects the last record in each group based both on "group-defining parameter" (in the case above, "computername") and on maximal rowId?
If you don't have uniqueness, then row_number() is simpler:
select t.*
from (select t.*,
row_number() over (partition by computername order by lastupdate, rowid desc) as seqnum
from t
) t
where seqnum = 1;
With the right indexes, the correlated subquery is usually faster. However, the performance difference is not that great.

Set-based alternative to loop in SQL Server

I know that there are several posts about how BAD it is to try to loop in SQL Server in a stored procedure. But I haven't quite found what I am trying to do. We are using data connectivity that can be linked internally directly into excel.
I have seen some posts where a few people have said they could convert most loops to a standard query. But for the life of me I am having trouble with this one.
I need all custIDs who have orders right before an event of type 38,40. But only get them if there is no other order between the event and the order in the first query.
So there are 3 parts. I first query for all orders (orders table) based on a time frame into a temporary table.
Select into temp1 odate, custId from orders where odate>'5/1/12'
Then I could use the temp table to inner join on the secondary table to get a customer event (LogEvent table) that may have occurred some time in the past prior to the current order.
Select into temp2 eventdate, temp1.custID from LogEvent inner join temp1 on
temp1.custID=LogEvent.custID where EventType in (38,40) and temp1.odate>eventdate
order by eventdate desc
The problem here is that the queries I am trying to run will return all rows for each of the customers from the first query where I only want the latest for each customer. So this is where on the client side I would loop to only get one Event instead of all the old ones. But as all the query has to run inside of Excel I can't really loop client side.
The third step then could use the results from the second query to make check if the event occurred between most current order and any previous order. I only want the data where the event precedes the order and no other orders are in between.
Select ordernum, shopcart.custID from shopcart right outer join temp2 on
shopcart.custID=temp2.custID where shopcart.odate >= temp2.eventdate and
ordernum is null
Is there a way to simplify this and make it set-based to run in SQL Server instead of some kind of loop that I is perform at the client?
THis is a great example of switching to set-based notation.
First, I combined all three of your queries into a single query. In general, having a single query let's the query optimizer do what it does best -- determine execution paths. It also prevents accidental serialization of queries on a multithreaded/multiprocessor machine.
The key is row_number() for ordering the events so the most recent has a value of 1. You'll see this in the final WHERE clause.
select ordernum, shopcart.custID
from (Select eventdate, temp1.custID,
row_number() over (partition by temp1.CustID order by EventDate desc) as seqnum
from LogEvent inner join
(Select odate, custId
from order
where odate>'5/1/12'
) temp1
on temp1.custID=LogEvent.custID
where EventType in (38,40) and temp1.odate>eventdate order by eventdate desc
) temp2 left outer join
ShopCart
on shopcart.custID=temp2.custID
where seqnum = 1 and shopcart.odate >= temp2.eventdate and ordernum is null
I kept your naming conventions, even though I think "from order" should generate a syntax error. Even if it doesn't it is bad practice to name tables and columns with reserved SQL words.
If you are using a newer version of sql server, then you can use the ROW_NUMBER function. I will write an example shortly.
;WITH myCTE AS
(
SELECT
eventdate, temp1.custID,
ROW_NUMBER() OVER (PARTITION BY temp1.custID ORDER BY eventdate desc) AS CustomerRanking
FROM LogEvent
JOIN temp1
ON temp1.custID=LogEvent.custID
WHERE EventType IN (38,40) AND temp1.odate>eventdate
)
SELECT * into temp2 from myCTE WHERE CustomerRanking = 1;
This gets you the most recent event for each customer without a loop.
Also, you could use RANK, however that will create duplicates for ties, whereas ROW_NUMBER will guarantee no duplicate numbers for your partition.

sql get max based on field

I need to get the ID based from what ever the max amount is. Below is giving me an error
select ID from Prog
where Amount = MAX(Amount)
An aggregate may not appear in the WHERE clause unless it is in a subquery contained in a HAVING clause or a select list, and the column being aggregated is an outer reference.
The end result is that I need to get the just the ID as I need to pass it something else that is expecting it.
You need to order by Amount and select 1 record instead...
SELECT ID
FROM Prog
ORDER BY Amount DESC
LIMIT 1;
This takes all the rows in Prog, orders them in descending order by Amount (in other words, the first sorted row has the highest Amount), then limits the query to select only one row (the one with the highest Amount).
Also, subqueries are bad for performance. This code runs on a table with 200k records in half the time as the subquery versions.
Just pass a subquery with the max value to the where clause :
select ID from Prog
where Amount = (SELECT MAX(Amount) from Prog)
If you're using SQL Server that should do it :
SELECT TOP 1 ID
FROM Prog
ORDER BY Amount DESC
This should be something like:
select P.ID from Prog P
where P.Amount = (select max(Amount) from Prog)
EDIT:
If you really want only 1 row, you should do:
select max(P.ID) from Prog P
where P.Amount = (select max(Amount) from Prog);
However, if you have multiple rows that would match amount and you only want 1 row, you should have some kind of logic behind how you pick your one row. Not just relying on this max trick, or limit 1 type logic.
Also, I don't write limit 1, because this is not ANSI sql -- it works in mysql but OP doesn't say what he wants. Every db is different -- see here: Is there an ANSI SQL alternative to the MYSQL LIMIT keyword? Don't get used to one db's extensions unless you only want to work in 1 db for the rest of your life.
select min(ID) from Prog
where Amount in
(
select max(amount)
from prog
)
The min statement ensures that you get only one result.