Sql -after group by I need to take rows with newest date - sql

I need to write a query in sql and I can't do it correctly. I have a table with 7 columns 1st_num, 2nd_num, 3rd_num, opening_Date, Amount, code, cancel_Flag.
For every 1st_num, 2nd_num, 3rd_num I want to take only the record with the min (cancel_flag), and if there's more then 1 row so take the the newest opening Date.
But when I do group by and choose min and max for the relevant fields, I get a mix of the rows, for example:
1. 12,130,45678,2015-01-01,2005,333,0
2. 12,130,45678,2015-01-09,105,313,0
The result will be
:12,130,45678,2015-01-09,2005,333,0
and that mixes the rows into one
Microsoft sql server 2008 . using ssis by visual studio 2008
my code is :
SELECT
1st_num,
2nd_num,
3rd_num,
MAX(opening_date),
MAX (Amount),
code,
MIN(cancel_flag)
FROM do. tablename
GROUP BY
1st_num,
2nd_num,
3rd_num,
code
HAVING COUNT(*) > 1
How do I take the row with the max date or.min cancel flag as it is without mixing values?
I can't really post my code because of security reasons but I'm sure you can help.
thank you,
Oren

It is very difficult like this to answer, because every DBMS has different syntax.
Anyways, for most dbms this should work. Using row_number() function to rank the rows, and take only the first one by our definition (all your conditions):
SELECT * FROM (
SELECT t.*,
ROW_NUMBER() OVER ( PARTITION BY t.1st_num,t.2nd_num,t.3rd_num order by t.cancel_flag asc,t.opening_date desc) as row_num
FROM YourTable t ) as tableTempName
WHERE row_num = 1

Use NOT EXISTS to return a row as long as no other row with same 1st_num, 2nd_num, 3rd_num has a lower cancel_flag value, or same cancel_flag but a higher opening_Date.
select *
from tablename t1
where not exists (select 1 from tablename t2
where t2.1st_num = t1.1st_num
and t2.2nd_num = t1.2nd_num
and t2.3rd_num = t1.3rd_num
and (t2.cancel_flag < t1.cancel_flag
or (t2.cancel_flag = t1.cancel_flag and
t2.opening_Date > t1.opening_Date)))
Core ANSI SQL-99, expected to work with (almost) any dbms.

Related

Sql get the latest row in a table by date - double select vs order by

Have a query that uses double select (with select max) to fetch the row with the latest 'calculation_time' column among multiple rows which can have the same 'patient_set_id'. If there are multiple rows with the same 'patient_set_id', only the row with the latest 'calculation_time' should be retrieved. Calculation time is a date.
So far I've tried this but I'm not really sure if there is any better way for this, maybe using ORDER BY. But I'm very new to sql and need to know which one would be the fastest and more appropriate?
SELECT median from diagnostic_risk_stats WHERE
calculation_time=(SELECT MAX(calculation_time) FROM diagnostic_risk_stats WHERE
patient_set_id = UNHEX(REPLACE('5a9dbfca-74d6-471a-af27-31beb4b53bb2', "-","")));
You can use not exists as follows:
SELECT median
from diagnostic_risk_stats t
WHERE not exists
(Select 1 from diagnostic_risk_stats tt
Where t.patient_set_id = tt.patient_set_id
And tt.calculation_time > t.calculation_time)
And t.patient_set_id = UNHEX(REPLACE('5a9dbfca-74d6-471a-af27-31beb4b53bb2', "-",""));

How to join records by date range

I need to match scrap records in one table with records indicating the material that was running at the same time on a machine. I have a table with the scrap counts and a table with records showing whenever the material changed on a machine.
I have a working query of which I will include a simplified version below, but it is very slow when applied to a large data set. I would like to try one of Oracle's analytical functions to make it faster, but I can't figure out how. I tried FIRST_VALUE, and ROW_NUMBER in a few different forms, but I couldn't get them right. Looking for any suggestions.
Please let me know if you would like more details.
Following are simplified versions of the tables:
Scrap readings table (~41m rows)
Machine
ScrapReasonCode
ScrapQuantity
ReportTime
Material numbers (~3m rows)
Machine
MaterialNumber
MEASUREMENT_TIMESTAMP
SELECT Scrap.Machine,
Scrap.MaterialNumber,
Scrap.ScrapReasonCode,
Scrap.ScrapQuantity,
Scrap.ReportTime
FROM Scrap, Materials
WHERE Scrap.Machine = Materials.Machine
AND Materials.MEASUREMENT_TIMESTAMP =
(SELECT MAX (M2.MEASUREMENT_TIMESTAMP)
FROM Materials M2
WHERE M2.Materials.Machine = Scrap.Machine
AND M2.MEASUREMENT_TIMESTAMP <= Scrap.ReportTime)
I think this is what you are trying to do. You can use the FIRST_VALUE window function.
SELECT DISTINCT
s.Machine,
s.MaterialNumber,
s.ScrapReasonCode,
s.ScrapQuantity,
s.ReportTime,
FIRST_VALUE(m.MEASUREMENT_TIMESTAMP) OVER(PARTITION BY s.Machine ORDER BY m.MEASUREMENT_TIMESTAMP DESC)
--or you can use the `MAX` window function too.
--MAX(m.MEASUREMENT_TIMESTAMP) OVER(PARTITION BY s.Machine)
FROM Scrap s
JOIN Materials m
WHERE s.Machine = m.Machine AND m.MEASUREMENT_TIMESTAMP <= s.ReportTime
I may be misunderstanding your requirements but I believe the following query should work in terms of implementing using ROW_NUMBER:
SELECT q.*
FROM (
SELECT ROW_NUMBER() OVER (PARTITION BY Scrap.Machine ORDER BY Materials.MEASUREMENT_TIMESTAMP DESC) AS RNO
Scrap.MaterialNumber,
Scrap.ScrapReasonCode,
Scrap.ScrapQuantity,
Scrap.ReportTime
FROM Scrap, Materials
WHERE Scrap.Machine = Materials.Machine
AND Materials.MEASUREMENT_TIMESTAMP <= Scrap.ReportTime
) q
WHERE q.RNO = 1
Edit: if you need the measurement timestamp before (rather than on-or-before) the Scrap ReportTime, you could just change the <= sign to a < sign in the query above.

Access 2013 - Query not returning correct Number of Results

I am trying to get the query below to return the TWO lowest PlayedTo results for each PlayerID.
select
x1.PlayerID, x1.RoundID, x1.PlayedTo
from P_7to8Calcs as x1
where
(
select count(*)
from P_7to8Calcs as x2
where x2.PlayerID = x1.PlayerID
and x2.PlayedTo <= x1.PlayedTo
) <3
order by PlayerID, PlayedTo, RoundID;
Unfortunately at the moment it doesn't return a result when there is a tie for one of the lowest scores. A copy of the dataset and code is here http://sqlfiddle.com/#!3/4a9fc/13.
PlayerID 47 has only one result returned as there are two different RoundID's that are tied for the second lowest PlayedTo. For what I am trying to calculate it doesn't matter which of these two it returns as I just need to know what the number is but for reporting I ideally need to know the one with the newest date.
One other slight problem with the query is the time it takes to run. It takes about 2 minutes in Access to run through the 83 records but it will need to run on about 1000 records when the database is fully up and running.
Any help will be much appreciated.
Resolve the tie by adding DatePlayed to your internal sorting (you wanted the one with the newest date anyway):
select
x1.PlayerID, x1.RoundID
, x1.PlayedTo
from P_7to8Calcs as x1
where
(
select count(*)
from P_7to8Calcs as x2
where x2.PlayerID = x1.PlayerID
and (x2.PlayedTo < x1.PlayedTo
or x2.PlayedTo = x1.PlayedTo
and x2.DatePlayed >= x1.DatePlayed
)
) <3
order by PlayerID, PlayedTo, RoundID;
For performance create an index supporting the join condition. Something like:
create index P_7to8Calcs__PlayerID_RoundID on P_7to8Calcs(PlayerId, PlayedTo);
Note: I used your SQLFiddle as I do not have Acess available here.
Edit: In case the index does not improve performance enough, you might want to try the following query using window functions (which avoids nested sub-query). It works in your SQLFiddle but I am not sure if this is supported by Access.
select x1.PlayerID, x1.RoundID, x1.PlayedTo
from (
select PlayerID, RoundID, PlayedTo
, RANK() OVER (PARTITION BY PlayerId ORDER BY PlayedTo, DatePlayed DESC) AS Rank
from P_7to8Calcs
) as x1
where x1.RANK < 3
order by PlayerID, PlayedTo, RoundID;
See OVER clause and Ranking Functions for documentation.

Select finishes where athlete didn't finish first for the past 3 events

Suppose I have a database of athletic meeting results with a schema as follows
DATE,NAME,FINISH_POS
I wish to do a query to select all rows where an athlete has competed in at least three events without winning. For example with the following sample data
2013-06-22,Johnson,2
2013-06-21,Johnson,1
2013-06-20,Johnson,4
2013-06-19,Johnson,2
2013-06-18,Johnson,3
2013-06-17,Johnson,4
2013-06-16,Johnson,3
2013-06-15,Johnson,1
The following rows:
2013-06-20,Johnson,4
2013-06-19,Johnson,2
Would be matched. I have only managed to get started at the following stub:
select date,name FROM table WHERE ...;
I've been trying to wrap my head around the where clause but I can't even get a start
I think this can be even simpler / faster:
SELECT day, place, athlete
FROM (
SELECT *, min(place) OVER (PARTITION BY athlete
ORDER BY day
ROWS 3 PRECEDING) AS best
FROM t
) sub
WHERE best > 1
->SQLfiddle
Uses the aggregate function min() as window function to get the minimum place of the last three rows plus the current one.
The then trivial check for "no win" (best > 1) has to be done on the next query level since window functions are applied after the WHERE clause. So you need at least one CTE of sub-select for a condition on the result of a window function.
Details about window function calls in the manual here. In particular:
If frame_end is omitted it defaults to CURRENT ROW.
If place (finishing_pos) can be NULL, use this instead:
WHERE best IS DISTINCT FROM 1
min() ignores NULL values, but if all rows in the frame are NULL, the result is NULL.
Don't use type names and reserved words as identifiers, I substituted day for your date.
This assumes at most 1 competition per day, else you have to define how to deal with peers in the time line or use timestamp instead of date.
#Craig already mentioned the index to make this fast.
Here's an alternative formulation that does the work in two scans without subqueries:
SELECT
"date", athlete, place
FROM (
SELECT
"date",
place,
athlete,
1 <> ALL (array_agg(place) OVER w) AS include_row
FROM Table1
WINDOW w AS (PARTITION BY athlete ORDER BY "date" ASC ROWS BETWEEN 3 PRECEDING AND CURRENT ROW)
) AS history
WHERE include_row;
See: http://sqlfiddle.com/#!1/fa3a4/34
The logic here is pretty much a literal translation of the question. Get the last four placements - current and the previous 3 - and return any rows in which the athlete didn't finish first in any of them.
Because the window frame is the only place where the number of rows of history to consider is defined, you can parameterise this variant unlike my previous effort (obsolete, http://sqlfiddle.com/#!1/fa3a4/31), so it works for the last n for any n. It's also a lot more efficient than the last try.
I'd be really interested in the relative efficiency of this vs #Andomar's query when executed on a dataset of non-trivial size. They're pretty much exactly the same on this tiny dataset. An index on Table1(athlete, "date") would be required for this to perform optimally on a large data set.
; with CTE as
(
select row_number() over (partition by athlete order by date) rn
, *
from Table1
)
select *
from CTE cur
where not exists
(
select *
from CTE prev
where prev.place = 1
and prev.athlete = cur.athlete
and prev.rn between cur.rn - 3 and cur.rn
)
Live example at SQL Fiddle.

SQL conundrum, how to select latest date for part, but only 1 row per part (unique)

I am trying to wrap my head around this one this morning.
I am trying to show inventory status for parts (for our products) and this query only becomes complex if I try to return all parts.
Let me lay it out:
single table inventoryReport
I have a distinct list of X parts I wish to display, the result of which must be X # of rows (1 row per part showing latest inventory entry).
table is made up of dated entries of inventory changes (so I only need the LATEST date entry per part).
all data contained in this single table, so no joins necessary.
Currently for 1 single part, it is fairly simple and I can accomplish this by doing the following sql (to give you some idea):
SELECT TOP (1) ldDate, ptProdLine, inPart, inSite, inAbc, ptUm, inQtyOh + inQtyNonet AS in_qty_oh, inQtyAvail, inQtyNonet, ldCustConsignQty, inSuppConsignQty
FROM inventoryReport
WHERE (ldPart = 'ABC123')
ORDER BY ldDate DESC
that gets me my TOP 1 row, so simple per part, however I need to show all X (lets say 30 parts). So I need 30 rows, with that result. Of course the simple solution would be to loop X# of sql calls in my code (but it would be costly) and that would suffice, but for this purpose I would love to work this SQL some more to reduce the x# calls back to the db (if not needed) down to just 1 query.
From what I can see here I need to keep track of the latest date per item somehow while looking for my result set.
I would ultimately do a
WHERE ldPart in ('ABC123', 'BFD21', 'AA123', etc)
to limit the parts I need. Hopefully I made my question clear enough. Let me know if you have an idea. I cannot do a DISTINCT as the rows are not the same, the date needs to be the latest, and I need a maximum of X rows.
Thoughts? I'm stuck...
SELECT *
FROM (SELECT i.*,
ROW_NUMBER() OVER(PARTITION BY ldPart ORDER BY ldDate DESC) r
FROM inventoryReport i
WHERE ldPart in ('ABC123', 'BFD21', 'AA123', etc)
)
WHERE r = 1
EDIT: Be sure to test the performance of each solution. As pointed out in this question, the CTE method may outperform using ROW_NUMBER.
;with cteMaxDate as (
select ldPart, max(ldDate) as MaxDate
from inventoryReport
group by ldPart
)
SELECT md.MaxDate, ir.ptProdLine, ir.inPart, ir.inSite, ir.inAbc, ir.ptUm, ir.inQtyOh + ir.inQtyNonet AS in_qty_oh, ir.inQtyAvail, ir.inQtyNonet, ir.ldCustConsignQty, ir.inSuppConsignQty
FROM cteMaxDate md
INNER JOIN inventoryReport ir
on md.ldPart = ir.ldPart
and md.MaxDate = ir.ldDate
You need to join into a Sub-query:
SELECT i.ldPart, x.LastDate, i.inAbc
FROM inventoryReport i
INNER JOIN (Select ldPart, Max(ldDate) As LastDate FROM inventoryReport GROUP BY ldPart) x
on i.ldPart = x.ldPart and i.ldDate = x.LastDate