Selecting minimum date in data set - sql

I have a data set that i am attempting to select the first record with a station id of 2.
InspectionNbr Station DateTimeStamp
825065 1 2010-11-16 04:38:49.000
825065 2 2010-11-16 12:38:31.000
825065 2 2010-12-06 01:35:14.000
825065 2 2011-01-24 08:11:04.000
In this case i want to select the second line of the results. How can i use SQL to get the minimum date where stationid = 2?
That being stated, this is what i have.
i created a temporary table in SQL. i have it setup to populate the table with the latest date. Then i attempt and update the temporary table with the following code
UPDATE #report_out
set
DateTimeStamp = Min(si.CreatedDate)
from
#report_out as r
INNER JOIN
StationInspection as si
on si.ModifiedDate = r.DateTimeStamp
where
r.Station = 2
For some reason beyond me it doesn't like the DateTimeStamp = Min(si.CreatedDate)
i get the follwing error:
An aggregate may not appear in the set list of an UPDATE statement.
any pointers?

As far as I can figure out, an aggregate can't be used in an update statement because the aggregate and the update affect two different row sets. Think about a normal SELECT with an aggregate:
SELECT MIN(CreatedDate)
FROM StationInspection
WHERE Station = 2
The aggregate works on all rows in the row set. The row set is determined by the WHERE clause, which determines which rows will be in the row set.
In an update statement, the WHERE clause determines which rows will be changed:
UPDATE StationInspection
SET CreatedDate = #newDate
WHERE Station = 2
The update affects all rows in the row set (all rows that pass the filter specified by the WHERE clause).
So, in the case where you try to do both (I realize this is somewhat simplified from your code, but it makes the point):
UPDATE StationInspection
SET CreatedDate = MIN(CreatedDate)
WHERE Station = 2
You have two operations that require unique row sets, but only one row set selector (WHERE clause).
SQL doesn't support two WHERE clauses in a single statement. So you'll need two statements:
DECLARE #newDate datetime
SELECT #newDate = SELECT MIN(CreatedDate)
FROM StationInspection
WHERE Station = 2
UPDATE StationInspection
SET CreatedDate = #newDate
WHERE Station = 2

If DateTimeStap is a candidate key (at least when composed with Station), then there is no need to create a temp table; just do:
Select a.* from a join
(select a.Station, Min(a.DateTimeStamp) as m group by a.Station) as b
on a.Station = b.Station and a.DateTimeStamp = b.m
then you've got StationID and minimum DateTimeStamp for all Stations. This is a fast Query.
If DateTimeStamp is not a candidate key... The query becomes slow.

If you just want to get the Record with Station id '2' and having minimum date, just try:
SELECT InspectionNbr, Station, DateTimeStamp
FROM StationInspection
WHERE Station = 2
AND DateTimeStamp = (
SELECT MIN(DateTimeStamp)
FROM StationInspection
WHERE Station = 2
)
This way Eliminates Grouping

select T.InspectionNbr,
T.Station,
T.DateTimeStamp
from (
select *,
row_number() over(order by DateTimeStamp) as rn
from StationInspection
where Station = 2
) as T
where T.rn = 1

A shorter statement for some DBs (notably MySQL) might be:
SELECT InspectionNbr, Station, DateTimeStamp
FROM StationInspection
WHERE Station = 2
ORDER BY DateTimeStamp ASC
LIMIT 1

Related

MariaDB - GROUP BY with an order

So I have a dataset, where I would like to order it based on strings ORDER BY FIELD(field_name, ...) after the order I wan't it to group the dataset based on another column.
I have tried with a subquery, but it seems like it ignores by ORDER BY when it gets subqueried.
This is the query I would like to group with GROUP BY setting_id
SELECT *
FROM `setting_values`
WHERE ((`owned_by_type` = 'App\\Models\\Utecca\\User' AND `owned_by_id` = 1 OR ((`owned_by_type` = 'App\\Models\\Utecca\\Agreement' AND `owned_by_id` = 1006))) OR (`owned_by_type` = 'App\\Models\\Utecca\\Employee' AND `owned_by_id` = 1)) AND `setting_values`.`deleted_at` IS NULL
ORDER BY FIELD(owned_by_type, 'App\\Models\\Utecca\\Employee', 'App\\Models\\Utecca\\Agreement', 'App\\Models\\Utecca\\User')
The order by works just fine, but I cannot get it to group it based on my order, it always selects the one with lowest primary key (id).
Here is my attempt which did not work.
SELECT * FROM (
SELECT *
FROM `setting_values`
WHERE ((`owned_by_type` = 'App\\Models\\Utecca\\User' AND `owned_by_id` = 1 OR ((`owned_by_type` = 'App\\Models\\Utecca\\Agreement' AND `owned_by_id` = 1006))) OR (`owned_by_type` = 'App\\Models\\Utecca\\Employee' AND `owned_by_id` = 1)) AND `setting_values`.`deleted_at` IS NULL
ORDER BY FIELD(owned_by_type, 'App\\Models\\Utecca\\Employee', 'App\\Models\\Utecca\\Agreement', 'App\\Models\\Utecca\\User')
) AS t
GROUP BY setting_id;
Here is some sample data
What I am trying to accomplish with this sample data is 1 row with the id 3 as the row.
The desired result set from the query should obey these rules
1 row for each setting_id
owned_by_type together with owned_by_id is filtered the following way agreement = 1006, user = 1, employee = 1.
When limiting the 1 row for each setting_idit should be done with the following priority in owned_by_type column Employee, Agreement, User
Here is a SQLFiddle with it.
Running MariaDB version 10.2.6-MariaDB
First of all, the Optimizer is free to ignore the inner ORDER BY. So, please describe further what your intent is.
Getting past that, you can use a subquery:
SELECT ...
FROM ( SELECT
...
GROUP BY ...
ORDER BY ... -- This is lost unless followed by :
LIMIT 9999999999 -- something valid; or very high (for all)
) AS x
GROUP BY ...
Perhaps you are doing groupwise max ??

SQL Server 2012 - updating a column based on row to row comparison

I have a table that contains dates and times. For example columns are Date, ExTime, NewTime, Status. I am ordering them based on a expkey column that makes them show in the right order.
I want to do a row by row comparison and compare the second row column of extime to the first row column NewTime. If extime < Newtime then I want to update status with a "1". And then traverse through the table row by row where second row in the above example becomes the first and a new second is pull and used. Here is a sample of what I have now - but it is not hitting and working all all of the rows for some reason.
UPDATE t
SET t.Status = 1
FROM MyTable t
CROSS APPLY (SELECT TOP 1 NewTime
FROM MyTable
WHERE ID = t.ID AND [Date] = t.[Date]
ORDER BY ExpKey) t1
WHERE t.Extime < t1.NewTime
This is not hitting all the rows like I want it to. I have the where clause comparing fields ID and Date to insure that the rows are attached to the same person. If the ID or Dates are not the same it is not attached to the same person so I would not want to update the status. So basically if the ID of Row 2 = ID of Row 1 and Date of Row 2 = Date of Row 1 I want to compare extime of row 2 and see if it is less than newtime of Row 1 - if so then update the status field of row 2.
Any help in figuring out why this sort of works but not on all would be appreciated.
Ad.
On SQL Server 2012 you can easily update status with window function lag():
with cte as (
select
extime,
lag(newtime) over(partition by id, date order by expKey) as newtime,
status
from table1
)
update cte set status = 1 where extime < newtime;
sql fiddle demo
I haven't tested this, but I've dealt with similar issues of comparing adjacent rows. I put this together off-the-cuff, so it may need tweaking, but give it a try.
;WITH CTE AS
( SELECT ID,[Date],ExpKey,ExTime,NewTime,
ROW_NUMBER()OVER(PARTITION BY ID,[Date] ORDER BY ExpKey) AS Sort
FROM MyTable
)
UPDATE row2
SET row2.[Status] = 2
WHERE row2.ExTime < row1.NewTime
FROM CTE row2
CROSS JOIN CTE row1
ON row1.ID = row2.ID
AND row1.[Date] = row2.[Date]
AND row1.Sort = row2.Sort-1 --Join to prior row

SQL: Updating a column based on subquery results

I have a T-SQL table that contains the following columns: Date, StationCode, HDepth, and MaxDepth. Each row in the MaxDepth column is set to 0 by default. What I am trying to do is find the maximum HDepth by Date and StationCode, and update the MaxDepth to a column on these rows. I have written a SELECT statement to find where the maximums occur and it is:
SELECT StationCode, [Date], MAX(HDepth) AS Maximum FROM dbo.[DepthTable] GROUP BY [Date], StationCode
How could I put this query into an Update statement to set the MaxDepth to 1 on the rows that are returned by this query?
You might try something like this:
UPDATE a
SET MaxDepth = 1
FROM dbo.[DepthTable] AS a
JOIN (
-- Your original query
SELECT StationCode, [Date], MAX(HDepth) AS Maximum
FROM dbo.[DepthTable]
GROUP BY [Date], StationCode
) AS b ON a.StationCode = b.StationCode
AND a.[DATE] = b.[DATE]
AND a.HDepth = b.Maximum -- Here we get only the max rows
However, if a column is simply based upon other columns, then you might think about putting this logic into a view (to avoid update anomalies). The select for such a view might look like:
SELECT a.[Date], a.StationCode, a.HDepth,
CASE WHEN b.Maximum IS NULL THEN 0 ELSE 1 END AS MaxDepth
FROM dbo.[DepthTable] AS a
LEFT JOIN (
-- Your original query
SELECT StationCode, [Date], MAX(HDepth) AS Maximum
FROM dbo.[DepthTable]
GROUP BY [Date], StationCode
) AS b ON a.StationCode = b.StationCode
AND a.[DATE] = b.[DATE]
AND a.HDepth = b.Maximum -- Here we get only the max rows

Update based on subquery fails

I am trying to do the following update in Oracle 10gR2:
update
(select voyage_port_id, voyage_id, arrival_date, port_seq,
row_number() over (partition by voyage_id order by arrival_date) as new_seq
from voyage_port) t
set t.port_seq = t.new_seq
Voyage_port_id is the primary key, voyage_id is a foreign key. I'm trying to assign a sequence number based on the dates within each voyage.
However, the above fails with ORA-01732: data manipulation operation not legal on this view
What is the problem and how can I avoid it ?
Since you can't update subqueries with row_number, you'll have to calculate the row number in the set part of the update. At first I tried this:
update voyage_port a
set a.port_seq = (
select
row_number() over (partition by voyage_id order by arrival_date)
from voyage_port b
where b.voyage_port_id = a.voyage_port_id
)
But that doesn't work, because the subquery only selects one row, and then the row_number() is always 1. Using another subquery allows a meaningful result:
update voyage_port a
set a.port_seq = (
select c.rn
from (
select
voyage_port_id
, row_number() over (partition by voyage_id
order by arrival_date) as rn
from voyage_port b
) c
where c.voyage_port_id = a.voyage_port_id
)
It works, but more complex than I'd expect for this task.
You can update some views, but there are restrictions and one is that the view must not contain analytic functions. See SQL Language Reference on UPDATE and search for first occurence of "analytic".
This will work, provided no voyage visits more than one port on the same day (or the dates include a time component that makes them unique):
update voyage_port vp
set vp.port_seq =
( select count(*)
from voyage_port vp2
where vp2.voyage_id = vp.voyage_id
and vp2.arrival_date <= vp.arrival_date
)
I think this handles the case where a voyage visits more than 1 port per day and there is no time component (though the sequence of ports visited on the same day is then arbitrary):
update voyage_port vp
set vp.port_seq =
( select count(*)
from voyage_port vp2
where vp2.voyage_id = vp.voyage_id
and (vp2.arrival_date <= vp.arrival_date)
or ( vp2.arrival_date = vp.arrival_date
and vp2.voyage_port_id <= vp.voyage_port_id
)
)
Don't think you can update a derived table, I'd rewrite as:
update voyage_port
set port_seq = t.new_seq
from
voyage_port p
inner join
(select voyage_port_id, voyage_id, arrival_date, port_seq,
row_number() over (partition by voyage_id order by arrival_date) as new_seq
from voyage_port) t
on p.voyage_port_id = t.voyage_port_id
The first token after the UPDATE should be the name of the table to update, then your columns-to-update. I'm not sure what you are trying to achieve with the select statement where it is, but you can' update the result set from the select legally.
A version of the sql, guessing what you have in mind, might look like...
update voyage_port t
set t.port_seq = (<select statement that generates new value of port_seq>)
NOTE: to use a select statement to set a value like this you must make sure only 1 row will be returned from the select !
EDIT : modified statement above to reflect what I was trying to explain. The question has been answered very nicely by Andomar above

Get last item in a table - SQL

I have a History Table in SQL Server that basically tracks an item through a process. The item has some fixed fields that don't change throughout the process, but has a few other fields including status and Id which increment as the steps of the process increase.
Basically I want to retrieve the last step for each item given a Batch Reference. So if I do a
Select * from HistoryTable where BatchRef = #BatchRef
It will return all the steps for all the items in the batch - eg
Id Status BatchRef ItemCount
1 1 Batch001 100
1 2 Batch001 110
2 1 Batch001 60
2 2 Batch001 100
But what I really want is:
Id Status BatchRef ItemCount
1 2 Batch001 110
2 2 Batch001 100
Edit: Appologies - can't seem to get the TABLE tags to work with Markdown - followed the help to the letter, and looks fine in the preview
Assuming you have an identity column in the table...
select
top 1 <fields>
from
HistoryTable
where
BatchRef = #BatchRef
order by
<IdentityColumn> DESC
It's kind of hard to make sense of your table design - I think SO ate your delimiters.
The basic way of handling this is to GROUP BY your fixed fields, and select a MAX (or MIN) for some unqiue value (a datetime usually works well). In your case, I think that the GROUP BY would be BatchRef and ItemCount, and Id will be your unique column.
Then, join back to the table to get all columns. Something like:
SELECT *
FROM HistoryTable
JOIN (
SELECT
MAX(Id) as Id.
BatchRef,
ItemCount
FROM HsitoryTable
WHERE
BacthRef = #batchRef
GROUP BY
BatchRef,
ItemCount
) as Latest ON
HistoryTable.Id = Latest.Id
Assuming the Item Ids are incrementally numbered:
--Declare a temp table to hold the last step for each item id
DECLARE #LastStepForEach TABLE (
Id int,
Status int,
BatchRef char(10),
ItemCount int)
--Loop counter
DECLARE #count INT;
SET #count = 0;
--Loop through all of the items
WHILE (#count < (SELECT MAX(Id) FROM HistoryTable WHERE BatchRef = #BatchRef))
BEGIN
SET #count = #count + 1;
INSERT INTO #LastStepForEach (Id, Status, BatchRef, ItemCount)
SELECT Id, Status, BatchRef, ItemCount
FROM HistoryTable
WHERE BatchRef = #BatchRef
AND Id = #count
AND Status =
(
SELECT MAX(Status)
FROM HistoryTable
WHERE BatchRef = #BatchRef
AND Id = #count
)
END
SELECT *
FROM #LastStepForEach
SELECT id, status, BatchRef, MAX(itemcount) AS maxItemcount
FROM HistoryTable GROUP BY id, status, BatchRef
HAVING status > 1
It's a bit hard to decypher your data the way WMD has formatted it, but you can pull of the sort of trick you need with common table expressions on SQL 2005:
with LastBatches as (
select Batch, max(Id)
from HistoryTable
group by Batch
)
select *
from HistoryTable h
join LastBatches b on b.Batch = h.Batch and b.Id = h.Id
Or a subquery (assuming the group by in the subquery works - off the top of my head I don't recall):
select *
from HistoryTable h
join (
select Batch, max(Id)
from HistoryTable
group by Batch
) b on b.Batch = h.Batch and b.Id = h.Id
Edit: I was assuming you wanted the last item for every batch. If you just need it for the one batch then the other answers (doing a top 1 and ordering descending) are the way to go.
As already suggested you probably want to reorder your query to sort it in the other direction so you actually fetch the first row. Then you'd probably want to use something like
SELECT TOP 1 ...
if you're using MSSQL 2k or earlier, or the SQL compliant variant
SELECT * FROM (
SELECT
ROW_NUMBER() OVER (ORDER BY key ASC) AS rownumber,
columns
FROM tablename
) AS foo
WHERE rownumber = n
for any other version (or for other database systems that support the standard notation), or
SELECT ... LIMIT 1 OFFSET 0
for some other variants without the standard SQL support.
See also this question for some additional discussion around selecting rows. Using the aggregate function max() might or might not be faster depending on whether calculating the value requires a table scan.