Selecting latest of rows with the same values in several columns - sql

In a project I am working on there are measurements stored in a database. A measurement consists of a worldcoordinate (posX, posY, posZ) a station identification number (stationID) and a time for measurement (time).
Sometimes a measurement is redone in the field for different reasons and then there are several measurements with the same coordinate and station id but performed at different times.
Is there a way to write an sql query such that I get all VALID measurements ie, only the latest ones in the case where the coordinates and station id are the same?
I am not very adept at SQL so I don't even really know what to google for so any pointers are very much appreciateed even if you only know what type of command I should use :)
EDIT:
My task was just changed, apparently station id does not matter, only coordinates and times.
Also, I am using DISQLite3 that implements SQL-92.

Yes, you can do it in SQL.
It seems you want to take the latest entry for each combination of station and co-ordinates - look at GROUP BY or ROW_NUMBER()
Depending on your SQL variant (It's helpful if you specify it), something like...
select *
from
(Select *,
row_number() over (Partition by coordinates, stationid order by measurementtime desc) rn
from yourtable
) v
where rn = 1
Without Ranking functions
select yourtable.*
from yourtable
inner join
(
select coordinate, MAX(time) maxtime from yourtable
group by coordinate
) v
on yourtable.coordinate = v.coordinate
and yourtable.time = v.maxtime

Related

How to join records by date range

I need to match scrap records in one table with records indicating the material that was running at the same time on a machine. I have a table with the scrap counts and a table with records showing whenever the material changed on a machine.
I have a working query of which I will include a simplified version below, but it is very slow when applied to a large data set. I would like to try one of Oracle's analytical functions to make it faster, but I can't figure out how. I tried FIRST_VALUE, and ROW_NUMBER in a few different forms, but I couldn't get them right. Looking for any suggestions.
Please let me know if you would like more details.
Following are simplified versions of the tables:
Scrap readings table (~41m rows)
Machine
ScrapReasonCode
ScrapQuantity
ReportTime
Material numbers (~3m rows)
Machine
MaterialNumber
MEASUREMENT_TIMESTAMP
SELECT Scrap.Machine,
Scrap.MaterialNumber,
Scrap.ScrapReasonCode,
Scrap.ScrapQuantity,
Scrap.ReportTime
FROM Scrap, Materials
WHERE Scrap.Machine = Materials.Machine
AND Materials.MEASUREMENT_TIMESTAMP =
(SELECT MAX (M2.MEASUREMENT_TIMESTAMP)
FROM Materials M2
WHERE M2.Materials.Machine = Scrap.Machine
AND M2.MEASUREMENT_TIMESTAMP <= Scrap.ReportTime)
I think this is what you are trying to do. You can use the FIRST_VALUE window function.
SELECT DISTINCT
s.Machine,
s.MaterialNumber,
s.ScrapReasonCode,
s.ScrapQuantity,
s.ReportTime,
FIRST_VALUE(m.MEASUREMENT_TIMESTAMP) OVER(PARTITION BY s.Machine ORDER BY m.MEASUREMENT_TIMESTAMP DESC)
--or you can use the `MAX` window function too.
--MAX(m.MEASUREMENT_TIMESTAMP) OVER(PARTITION BY s.Machine)
FROM Scrap s
JOIN Materials m
WHERE s.Machine = m.Machine AND m.MEASUREMENT_TIMESTAMP <= s.ReportTime
I may be misunderstanding your requirements but I believe the following query should work in terms of implementing using ROW_NUMBER:
SELECT q.*
FROM (
SELECT ROW_NUMBER() OVER (PARTITION BY Scrap.Machine ORDER BY Materials.MEASUREMENT_TIMESTAMP DESC) AS RNO
Scrap.MaterialNumber,
Scrap.ScrapReasonCode,
Scrap.ScrapQuantity,
Scrap.ReportTime
FROM Scrap, Materials
WHERE Scrap.Machine = Materials.Machine
AND Materials.MEASUREMENT_TIMESTAMP <= Scrap.ReportTime
) q
WHERE q.RNO = 1
Edit: if you need the measurement timestamp before (rather than on-or-before) the Scrap ReportTime, you could just change the <= sign to a < sign in the query above.

Access 2013 - Query not returning correct Number of Results

I am trying to get the query below to return the TWO lowest PlayedTo results for each PlayerID.
select
x1.PlayerID, x1.RoundID, x1.PlayedTo
from P_7to8Calcs as x1
where
(
select count(*)
from P_7to8Calcs as x2
where x2.PlayerID = x1.PlayerID
and x2.PlayedTo <= x1.PlayedTo
) <3
order by PlayerID, PlayedTo, RoundID;
Unfortunately at the moment it doesn't return a result when there is a tie for one of the lowest scores. A copy of the dataset and code is here http://sqlfiddle.com/#!3/4a9fc/13.
PlayerID 47 has only one result returned as there are two different RoundID's that are tied for the second lowest PlayedTo. For what I am trying to calculate it doesn't matter which of these two it returns as I just need to know what the number is but for reporting I ideally need to know the one with the newest date.
One other slight problem with the query is the time it takes to run. It takes about 2 minutes in Access to run through the 83 records but it will need to run on about 1000 records when the database is fully up and running.
Any help will be much appreciated.
Resolve the tie by adding DatePlayed to your internal sorting (you wanted the one with the newest date anyway):
select
x1.PlayerID, x1.RoundID
, x1.PlayedTo
from P_7to8Calcs as x1
where
(
select count(*)
from P_7to8Calcs as x2
where x2.PlayerID = x1.PlayerID
and (x2.PlayedTo < x1.PlayedTo
or x2.PlayedTo = x1.PlayedTo
and x2.DatePlayed >= x1.DatePlayed
)
) <3
order by PlayerID, PlayedTo, RoundID;
For performance create an index supporting the join condition. Something like:
create index P_7to8Calcs__PlayerID_RoundID on P_7to8Calcs(PlayerId, PlayedTo);
Note: I used your SQLFiddle as I do not have Acess available here.
Edit: In case the index does not improve performance enough, you might want to try the following query using window functions (which avoids nested sub-query). It works in your SQLFiddle but I am not sure if this is supported by Access.
select x1.PlayerID, x1.RoundID, x1.PlayedTo
from (
select PlayerID, RoundID, PlayedTo
, RANK() OVER (PARTITION BY PlayerId ORDER BY PlayedTo, DatePlayed DESC) AS Rank
from P_7to8Calcs
) as x1
where x1.RANK < 3
order by PlayerID, PlayedTo, RoundID;
See OVER clause and Ranking Functions for documentation.

Select finishes where athlete didn't finish first for the past 3 events

Suppose I have a database of athletic meeting results with a schema as follows
DATE,NAME,FINISH_POS
I wish to do a query to select all rows where an athlete has competed in at least three events without winning. For example with the following sample data
2013-06-22,Johnson,2
2013-06-21,Johnson,1
2013-06-20,Johnson,4
2013-06-19,Johnson,2
2013-06-18,Johnson,3
2013-06-17,Johnson,4
2013-06-16,Johnson,3
2013-06-15,Johnson,1
The following rows:
2013-06-20,Johnson,4
2013-06-19,Johnson,2
Would be matched. I have only managed to get started at the following stub:
select date,name FROM table WHERE ...;
I've been trying to wrap my head around the where clause but I can't even get a start
I think this can be even simpler / faster:
SELECT day, place, athlete
FROM (
SELECT *, min(place) OVER (PARTITION BY athlete
ORDER BY day
ROWS 3 PRECEDING) AS best
FROM t
) sub
WHERE best > 1
->SQLfiddle
Uses the aggregate function min() as window function to get the minimum place of the last three rows plus the current one.
The then trivial check for "no win" (best > 1) has to be done on the next query level since window functions are applied after the WHERE clause. So you need at least one CTE of sub-select for a condition on the result of a window function.
Details about window function calls in the manual here. In particular:
If frame_end is omitted it defaults to CURRENT ROW.
If place (finishing_pos) can be NULL, use this instead:
WHERE best IS DISTINCT FROM 1
min() ignores NULL values, but if all rows in the frame are NULL, the result is NULL.
Don't use type names and reserved words as identifiers, I substituted day for your date.
This assumes at most 1 competition per day, else you have to define how to deal with peers in the time line or use timestamp instead of date.
#Craig already mentioned the index to make this fast.
Here's an alternative formulation that does the work in two scans without subqueries:
SELECT
"date", athlete, place
FROM (
SELECT
"date",
place,
athlete,
1 <> ALL (array_agg(place) OVER w) AS include_row
FROM Table1
WINDOW w AS (PARTITION BY athlete ORDER BY "date" ASC ROWS BETWEEN 3 PRECEDING AND CURRENT ROW)
) AS history
WHERE include_row;
See: http://sqlfiddle.com/#!1/fa3a4/34
The logic here is pretty much a literal translation of the question. Get the last four placements - current and the previous 3 - and return any rows in which the athlete didn't finish first in any of them.
Because the window frame is the only place where the number of rows of history to consider is defined, you can parameterise this variant unlike my previous effort (obsolete, http://sqlfiddle.com/#!1/fa3a4/31), so it works for the last n for any n. It's also a lot more efficient than the last try.
I'd be really interested in the relative efficiency of this vs #Andomar's query when executed on a dataset of non-trivial size. They're pretty much exactly the same on this tiny dataset. An index on Table1(athlete, "date") would be required for this to perform optimally on a large data set.
; with CTE as
(
select row_number() over (partition by athlete order by date) rn
, *
from Table1
)
select *
from CTE cur
where not exists
(
select *
from CTE prev
where prev.place = 1
and prev.athlete = cur.athlete
and prev.rn between cur.rn - 3 and cur.rn
)
Live example at SQL Fiddle.

SQL conundrum, how to select latest date for part, but only 1 row per part (unique)

I am trying to wrap my head around this one this morning.
I am trying to show inventory status for parts (for our products) and this query only becomes complex if I try to return all parts.
Let me lay it out:
single table inventoryReport
I have a distinct list of X parts I wish to display, the result of which must be X # of rows (1 row per part showing latest inventory entry).
table is made up of dated entries of inventory changes (so I only need the LATEST date entry per part).
all data contained in this single table, so no joins necessary.
Currently for 1 single part, it is fairly simple and I can accomplish this by doing the following sql (to give you some idea):
SELECT TOP (1) ldDate, ptProdLine, inPart, inSite, inAbc, ptUm, inQtyOh + inQtyNonet AS in_qty_oh, inQtyAvail, inQtyNonet, ldCustConsignQty, inSuppConsignQty
FROM inventoryReport
WHERE (ldPart = 'ABC123')
ORDER BY ldDate DESC
that gets me my TOP 1 row, so simple per part, however I need to show all X (lets say 30 parts). So I need 30 rows, with that result. Of course the simple solution would be to loop X# of sql calls in my code (but it would be costly) and that would suffice, but for this purpose I would love to work this SQL some more to reduce the x# calls back to the db (if not needed) down to just 1 query.
From what I can see here I need to keep track of the latest date per item somehow while looking for my result set.
I would ultimately do a
WHERE ldPart in ('ABC123', 'BFD21', 'AA123', etc)
to limit the parts I need. Hopefully I made my question clear enough. Let me know if you have an idea. I cannot do a DISTINCT as the rows are not the same, the date needs to be the latest, and I need a maximum of X rows.
Thoughts? I'm stuck...
SELECT *
FROM (SELECT i.*,
ROW_NUMBER() OVER(PARTITION BY ldPart ORDER BY ldDate DESC) r
FROM inventoryReport i
WHERE ldPart in ('ABC123', 'BFD21', 'AA123', etc)
)
WHERE r = 1
EDIT: Be sure to test the performance of each solution. As pointed out in this question, the CTE method may outperform using ROW_NUMBER.
;with cteMaxDate as (
select ldPart, max(ldDate) as MaxDate
from inventoryReport
group by ldPart
)
SELECT md.MaxDate, ir.ptProdLine, ir.inPart, ir.inSite, ir.inAbc, ir.ptUm, ir.inQtyOh + ir.inQtyNonet AS in_qty_oh, ir.inQtyAvail, ir.inQtyNonet, ir.ldCustConsignQty, ir.inSuppConsignQty
FROM cteMaxDate md
INNER JOIN inventoryReport ir
on md.ldPart = ir.ldPart
and md.MaxDate = ir.ldDate
You need to join into a Sub-query:
SELECT i.ldPart, x.LastDate, i.inAbc
FROM inventoryReport i
INNER JOIN (Select ldPart, Max(ldDate) As LastDate FROM inventoryReport GROUP BY ldPart) x
on i.ldPart = x.ldPart and i.ldDate = x.LastDate

Getting max. screen resolution with Group By

Question:
I have a website where I gather browser statistics.
Thus, I have an SQL table (T_Visits), with the following columns:
uniqueidentifier Visit_UID,
uniqueidentifier User_UID,
datetime Visit_DateTime,
float Screen_w,
float Screen_h,
float Resolution = Screen_w * Screen_h
varchar resolutionstring = screen_w + ' x ' + screen_h
Since a user can visit the site from several computers, there can be different entries in screensize for each visit for the same user.
Now I want to get the maximum/minimum resolution each user had:
Select User_UID, max(resolution) from T_Visits GROUP BY User_UID
How can I get the corresponding resolution string ?
I mean I can get the max(screen_w) and max(screen_h), but there's no guarantee that the corresponding resolutionstring would be max(screen_w) +' x '+ max(screen_h)
Try something like:
;WITH resCTE
AS
(
SELECT User_UID
,resolutionstring
,ROW_NUMBER() OVER (PARTITION BY User_UID
ORDER BY Resolution desc
,Screen_w desc
) AS rnMax
,ROW_NUMBER() OVER (PARTITION BY User_UID
ORDER BY Resolution
,Screen_w
) AS rnMin
)
SELECT maxr.User_UID
,maxr.resolutionstring AS maxRes
,minr.resolutionstring AS minRes
FROM resCTE AS maxr
JOIN resCTE AS minr
ON minr.User_UID = maxr.User_UID
AND minr.rnMin = 1
WHERE maxr.rnMax = 1
(untested)
Note that this assumes you only want to see 1 row per user id, regardless of whether more than one HxW gives the same resolution.
It would be possible to modify the query to use RANK() rather than ROWNUMBER() if this isn't the behaviour you want.
EDIT
Amended to show the max/min resolution sub-sorted by screen width
SELECT DISTINCT T_Visits.User_UID, T_Visits.Resolution, T_Visits.resolutionstring
FROM T_Visits
INNER JOIN (SELECT User_UID, max(resolution) AS max
FROM T_Visits
GROUP BY User_UID) temp
ON T_Visits.User_UID = temp.User_UID
AND T_Visits.Resolution = temp.max
This query first creates a temp table of each users id and max resolution, then inner joins that with the T_Visits table matching the user id and resolution fields, which should give you the corresponding resolutionstring(s).
There are some problems though with this kind of query however. First off, while the DISTINCT takes care of multiple rows being returned for the same resolutionstring, it should still return multiple rows per user if they have multiple monitors with the same resolution. For example, what if someone visits your site with an iPhone, and you record a hit with 320x480, but then they turn their phone sideways and hit your site again, which should now register 480x320 because their X and Y values are now swapped due to orientation. This would produce multiple max resolution hits with different resolutionstrings.
The same thing can happen for monitors. It is not uncommon for document editors to rotate their monitors for a more "legal paper" style view. However, when they visit your site from their homes, they might not have the same setup, but do have the same resolution.
What exactly would you want your query to return if that is the case?
This should work to get a list of User_UID with the string of the maximum resolution. Adapt it to get all the minimum resolutions. Maybe its not the most efficient way...
Select User_UID, resolutionstring from T_Visits as p
WHERE resolution = (Select max(resolution) from T_Visits where User_UID =p.User_UID)