Get a single max date if dates are not unique - sql

For sql 2000,
Very similar to what I asked here
Get distinct max date using SQL
But this time the dates aren't unique so for this table pc_bsprdt_tbl
pc_bsprhd_key pc_bsprdt_shpiadt pc_bsprdt_prod
21ST 99-00 2001-04-30 23:59:59.000 72608-12895
21ST 99-00 2001-04-30 23:59:59.000 72608-12910
AFCC990915 1999-09-01 00:00:00.000 72608-12115
AFCC990915 1999-09-01 00:00:00.000 CHU99-01514
AFCC990915 1999-09-01 00:00:00.000 POP99-01514
I would like returned
21ST 99-00 2001-04-30 23:59:59.000
AFCC990915 1999-09-01 00:00:00.000
Now, the pc_bsprdt_prod is unique so what I have tried is using the max for the product like this to give me uniqueness.
Select T.pc_bsprhd_key, T.pc_bsprdt_shpiadt
From pc_bsprdt_tbl As T
Join (
Select pc_bsprhd_key, Max( T1.pc_bsprdt_shpiadt ) As MaxDateTime, Max(pc_bsprdt_prod) as Product
From pc_bsprdt_tbl As T1
Group By T1.pc_bsprhd_key
) As Z
On Z.pc_bsprhd_key = T.pc_bsprhd_key
And Z.MaxDateTime = T.pc_bsprdt_shpiadt
AND Z.Product = T.pc_bsprdt_prod
It seems like it works :)
Is there a way to do it though just using the date? Maybe a top 1 in there somewhere?

SELECT pc_bsprhd_key, MAX(pc_bsprdt_shpiadt)
FROM pc_bsprdt_tbl
GROUP BY pc_bsprhd_key;

That might not be working as you think it is. That will give you the MAX(Date) and MAX(prod) which might not be on the same row. Here is an example:
CREATE TABLE #Test
(
a int,
b date,
c int,
)
INSERT INTO #Test(a, b, c)
SELECT 1, '01/01/2010', 3 UNION ALL
SELECT 1, '01/02/2010', 2 UNION ALL
SELECT 1, '01/03/2010', 1 UNION ALL
SELECT 2, '01/01/2010', 1
SELECT a, MAX(b), MAX(c) FROM #TEST
GROUP BY a
Which will return
----------- ---------- -----------
1 2010-01-03 3
2 2010-01-01 1
Notice that 1/03/2010 and 3 are not in the same row. In this situation I don't think it matters to you, but just a heads up.
As for the actual question- in SQL2005 we would probably apply a ROW_NUMBER over the groups to get the row with the latest date for each part, however you don't have access to this feature in 2000. If the above is giving you correct results I'd say use it.

Related

Deleting record in SQL depending on next record

I have records with columns: ID, Time_End and Attribute.
I need to delete all records,
WHERE Time_End = '1990-01-01 00:00:00.000' AND Attribute <> '9'
but only:
if the next row does not have the same attribute number
or
the next row has the same attribute number and a Time_End value of 1990-01-01 00:00:00.000
For example:
ID Time_End Attribute
---------------------------------------------
235 1990-01-01 00:00:00.000 5 /delete
236 1990-01-01 00:00:00.000 5 /delete
237 1990-01-01 00:00:00.000 5
238 2016-10-10 23:45:40.000 5
ID Time_End Attribute
---------------------------------------------
312 1990-01-01 00:00:00.000 8 /delete
313 2016-01-09 18:00:00.000 6
314 1990-01-01 00:00:00.000 4 /delete
315 1990-01-01 00:00:00.000 7
316 2016-10-10 23:45:40.000 7
Our customer have 50 database tables with thousands of records in every table (and of course more columns, I mentioned only those, which have impact on solution). Records are send in to the database from PLC, but sometimes (we don't know why) PLC send also wrong records.
So what I need is a query which finds those wrong records and deletes them. :)
Anybody who knows how the SQL code should look like?
Please see my SQL below. First, we collect ids to delete using two window functions (LEAD) to get the next row needed data. Then, with all needed data computed, apply the evaluation rules proposed by the OP. Last, use the obtained ids to delete the affected records of the tablet by id with an in clause.
DELETE toDeleteTable
WHERE toDeleteTable.id IN (WITH dataSet
AS (SELECT toDeleteTable.id,
toDeleteTable.time_end,
toDeleteTable.attribute,
LEAD(toDeleteTable.time_end,1,0) OVER (ORDER BY toDeleteTable.id) AS next_time_end,
LEAD(toDeleteTable.attribute,1,0) OVER (ORDER BY toDeleteTable.id) AS next_attribute
FROM toDeleteTable)
SELECT dataSet.id
FROM dataSet
WHERE dataSet.time_end = '1990-01-01 00:00:00.000'
AND dataSet.attribute <> '9'
AND ( (dataSet.next_attribute = dataSet.attribute AND dataSet.next_time_end = '1990-01-01 00:00:00.000')
OR dataSet.next_attribute <> dataSet.attribute)
)
You can accomplish this with a simple apply join. The below should give you enough to make this work for your needs without doing anything complex:
declare #t table(ID int
,Time_End datetime
,Attribute int
);
insert into #t values(235,'1990-01-01 00:00:00.000',5),(236,'1990-01-01 00:00:00.000',5),(237,'1990-01-01 00:00:00.000',5),(238,'2016-10-10 23:45:40.000',5),(312,'1990-01-01 00:00:00.000',8),(313,'2016-01-09 18:00:00.000',6),(314,'1990-01-01 00:00:00.000',4),(315,'1990-01-01 00:00:00.000',7),(316,'2016-10-10 23:45:40.000',7);
select t.*
,tm.*
from #t t
outer apply (select top 1 tt.Time_End
,tt.Attribute
from #t tt
where t.ID < tt.ID
order by tt.ID
) tm
where t.Attribute <> tm.Attribute
or (t.Attribute = tm.Attribute
and tm.Time_End = '1990-01-01 00:00:00.000'
);
I think you can use ROW_NUMBER() like this:
;WITH t AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY Time_End ORDER BY ID DESC) AS seq
FROM yourTable
WHERE Attribute <> '9'
AND Time_End = CAST('1990-01-01 00:00:00.000' as datetime)
)
DELETE FROM t
WHERE seq > 1;
Not Tested - HTH ;).

show recent records only

I have a requirement to show most recent records when user selects the option to view most recent records. I have 3 different tables from which I take data and display on the screen.
Below are the sample tables created.
Create table one(sealID integer,product_ser_num varchar2(20),create_time timestamp,status varchar2(10));
create table two(transID integer,formatID integer, formatStatus varchar,ctimeStamp timestamp,sealID integer);
create table three(transID integer,fieldStatus varchar,fieldValue varchar,exctype varchar);
I'm joining above 3 tables and showing the results in a single screen. I want to display the most recent records based on the timestamp.
Please find the sample data on the screen taken from 3 different tables.
ProductSerialNumber formatID formatStatus fieldStatus TimeStamp
ASD100 100 P P 2015-09-03 10:30:22
ASD100 200 p P 2015-09-03 10:30:22
ASD100 100 p P 2015-09-03 10:22:11
ASD100 200 p P 2015-09-03 10:22:11
I want to display the most recent records from the above shown table which should return first 2 rows as they are the recent records when checked with the timestamp column.
Please suggest what changes to be done to the below query to show most recent records.
SELECT transId,product_ser_num,status, to_char(timestamp, 'yyyy-mm-dd hh24:mi:ss') timestamp,
cnt
FROM (SELECT one.*,
row_number() over(ORDER BY
CASE
WHEN :orderDirection like '%asc%' THEN
CASE
WHEN :orderBy='product_ser_num' THEN product_ser_num,
WHEN :orderBy='status' THEN status
WHEN :orderBy='timestamp' THEN to_char(timestamp, 'yyyy-mm-dd hh24:mi:ss')
ELSE to_char(timestamp, 'yyyy-mm-dd hh24:mi:ss')
END
END ASC,
CASE
WHEN :orderDirection like '%desc%' THEN
CASE
WHEN :orderBy='product_ser_num' THEN product_ser_num,
WHEN :orderBy='status' THEN status
WHEN :orderBy='timestamp' THEN to_char(timestamp, 'yyyy-mm-dd hh24:mi:ss')
ELSE to_char(timestamp, 'yyyy-mm-dd hh24:mi:ss')
END
END DESC , transId ASC) line_number
FROM (select one_inner.*, COUNT(1) OVER() cnt
from (select two_tran.transaction_id,
one_res.product_serial_number productSerialNumber,
one_res.status status,from one one_res
left outer join two two_trans on two_trans.sealID = one_res.sealID
left outer join three three_flds on two_tran.transID = three_flds.transID and (three_flds.fieldStatus = 'P')
I don't think you are looking for a Top-n query as your topic title suggests.
It seems like you want to display the data in a customized order, as you have shown in the first image. You want the set of three rows to be grouped together on the basis of timestamp.
I have prepared a small test case to demonstrate the custom order of the rows:
SQL> WITH DATA(ID, num, datetime) AS(
2 SELECT 10, 1001, SYSDATE FROM dual UNION ALL
3 SELECT 10, 6009, SYSDATE FROM dual UNION ALL
4 SELECT 10, 3951, SYSDATE FROM dual UNION ALL
5 SELECT 10, 1001, SYSDATE -1 FROM dual UNION ALL
6 SELECT 10, 6009, SYSDATE -1 FROM dual UNION ALL
7 SELECT 10, 3951, SYSDATE -1 FROM dual
8 )
9 SELECT ID,
10 num,
11 TO_CHAR(DATETIME, 'yyyy-mm-dd hh24:mi:ss') TIMESTAMP
12 FROM
13 (SELECT t.*,
14 row_number() OVER(ORDER BY DATETIME DESC,
15 CASE num
16 WHEN 1001
17 THEN 1
18 WHEN 6009
19 THEN 2
20 WHEN 3951
21 THEN 3
22 END, num) rn
23 FROM DATA t
24 );
ID NUM TIMESTAMP
---------- ---------- -------------------
10 1001 2015-09-04 11:04:48
10 6009 2015-09-04 11:04:48
10 3951 2015-09-04 11:04:48
10 1001 2015-09-03 11:04:48
10 6009 2015-09-03 11:04:48
10 3951 2015-09-03 11:04:48
6 rows selected.
Now, you can see that for the same ID 10, the NUM values are grouped and also in a custom order.
This query seems very large and complex, so this may be oversimplifying things:
Add a clause to the end limit 3 ?
What I think you need to do is:
select
max(timestamp), engine_serial_number, formatID
from
<
joins here
>
group by engine_serial_number, formatID
This will basically give you the lines you want, but not all metadata.
Hence, you will just have to re-join all this with the main join to get the rest of the info (join on all three columns, engine serial number, formatID AND timestamp).
That should work.
Hope this helps!
It's hard to give you a precise answer, because your query is incomplete. But I'll give you the general idea, and you can tweak it into your query.
One way to accomplish what you want is by using the dense_rank() analytical function to number your rows by timestamp in descending order (You could use rank() too in this case, it doesn't actually matter). All rows with the same timestamp will be assigned the same "rank", so you can then filter by rank to only get the most recent records.
Try to adjust your query to something like this:
select ...
from (select ...,
dense_rank() over (order by timestamp desc) as timestamp_rank
from ...)
where timestamp_rank = 1
...
I suspect that with a better understanding of your data model and query, there would probably be a better solution. But based on the information provided, I think that the above should yield the results you are looking for.

Select most recent InstanceID base on max end date

I am trying to pull the memberinstance from a table based on the max DateEnd. If it is Null I want to pull that as it would be still ongoing. I am using sql server.
select memberinstanceid
from table
group by memberid
having MAX(ISNULL(date_end, '2099-12-31'))
This query above doesnt work for me. I have tried different ones and have gotten it to return the separate instances, but not just the one with the max date.
Below is what my table looks like.
MemberID MemberInstanceID DateStart DateEnd
2 abc12 2013-01-01 2013-12-31
4 abc21 2010-01-01 2013-12-31
2 abc10 2015-01-01 NULL
4 abc19 2014-01-01 2014-10-31
I would expect my results to look like this
MemberInstanceID
abc10
abc19
I have been trying to figure out how to do this but have not had much luck. Any help would be much appreciated. Thanks
I think you need something like the following:
select MemberID, MemberInstanceID
from table t
where (
-- DateEnd is null...
DateEnd is null
or (
-- ...or pick the latest DateEnd for this member...
DateEnd = (
select max(DateEnd)
from table
where MemberID = t.MemberID
)
-- ... and check there's not a NULL entry for DateEnd for this member
and not exists (
select 1
from table
where MemberID = t.MemberID
and DateEnd is null
)
)
)
The problem with this approach would be if there are multiple rows that match for each member, i.e. multiple NULL rows with the same MemberID, or multiple rows with the same DateEnd for the same MemberID.
SELECT TOP 1 memberinstanceid
from table
ORDER BY (CASE WHEN [DateEnd] IS NULL THEN 1 ELSE 0 END) DESC,
[DateEnd] DESC
The ORDER BY is essentially creating a "column" to sort the NULL values to the top, then doing a secondary sort on the dates that are not null.
You have a good start but you don't need to perform any explicit grouping. What you want is the row where the EndDate is null or is the largest value (latest date) of all the records with the same MemberID. You also realized that the Max couldn't return the latest non-null date because the null, if one exists, must be the latest date.
select m.*
from Members m
where m.DateEnd is null
or m.DateEnd =(
select Max( IsNull( DateEnd, '9999-12-31' ))
from Members
where MemberID = m.MemberID );

What's the most efficient way to match values between 2 tables based on most recent prior date?

I've got two tables in MS SQL Server:
dailyt - which contains daily data:
date val
---------------------
2014-05-22 10
2014-05-21 9.5
2014-05-20 9
2014-05-19 8
2014-05-18 7.5
etc...
And periodt - which contains data coming in at irregular periods:
date val
---------------------
2014-05-21 2
2014-05-18 1
Given a row in dailyt, I want to adjust its value by adding the corresponding value in periodt with the closest date prior or equal to the date of the dailyt row. So, the output would look like:
addt
date val
---------------------
2014-05-22 12 <- add 2 from 2014-05-21
2014-05-21 11.5 <- add 2 from 2014-05-21
2014-05-20 10 <- add 1 from 2014-05-18
2014-05-19 9 <- add 1 from 2014-05-18
2014-05-18 8.5 <- add 1 from 2014-05-18
I know that one way to do this is to join the dailyt and periodt tables on periodt.date <= dailyt.date and then imposing a ROW_NUMBER() (PARTITION BY dailyt.date ORDER BY periodt.date DESC) condition, and then having a WHERE condition on the row number to = 1.
Is there another way to do this that would be more efficient? Or is this pretty much optimal?
I think using APPLY would be the most efficient way:
SELECT d.Val,
p.Val,
NewVal = d.Val + ISNULL(p.Val, 0)
FROM Dailyt AS d
OUTER APPLY
( SELECT TOP 1 Val
FROM Periodt p
WHERE p.Date <= d.Date
ORDER BY p.Date DESC
) AS p;
Example on SQL Fiddle
If there relatively very few periodt rows, then there is an option that may prove quite efficient.
Convert periodt into a From/To ranges table using subqueries or CTEs. (Obviously performance depends on how efficiently this initial step can be done, which is why a small number of periodt rows is preferable.) Then the join to dailyt will be extremely efficient. E.g.
;WITH PIds AS (
SELECT ROW_NUMBER() OVER(ORDER BY PDate) RN, *
FROM #periodt
),
PRange AS (
SELECT f.PDate AS FromDate, t.PDate as ToDate, f.PVal
FROM PIds f
LEFT OUTER JOIN PIds t ON
t.RN = f.RN + 1
)
SELECT d.*, p.PVal
FROM #dailyt d
LEFT OUTER JOIN PRange p ON
d.DDate >= p.FromDate
AND (d.DDate < p.ToDate OR p.ToDate IS NULL)
ORDER BY 1 DESC
If you want to try the query, the following produces the sample data using table variables. Note I added an extra row to dailyt to demonstrate no periodt entries with a smaller date.
DECLARE #dailyt table (
DDate date NOT NULL,
DVal float NOT NULL
)
INSERT INTO #dailyt(DDate, DVal)
SELECT '20140522', 10
UNION ALL SELECT '20140521', 9.5
UNION ALL SELECT '20140520', 9
UNION ALL SELECT '20140519', 8
UNION ALL SELECT '20140518', 7.5
UNION ALL SELECT '20140517', 6.5
DECLARE #periodt table (
PDate date NOT NULL,
PVal int NOT NULL
)
INSERT INTO #periodt
SELECT '20140521', 2
UNION ALL SELECT '20140518', 1

Recursive CTE - consolidate start and end dates

I have the following table:
row_num customer_status effective_from_datetime
------- ------------------ -----------------------
1 Active 2011-01-01
2 Active 2011-01-02
3 Active 2011-01-03
4 Suspended 2011-01-04
5 Suspended 2011-01-05
6 Active 2011-01-06
And am trying to achieve the following result whereby consecutive rows with the same status are merged into one row with an effective from and to date range:
customer_status effective_from_datetime effective_to_datetime
--------------- ----------------------- ---------------------
Active 2011-01-01 2011-01-04
Suspended 2011-01-04 2011-01-06
Active 2011-01-06 NULL
I can get a recursive CTE to output the correct effective_to_datetime based on the next row, but am having trouble merging the ranges.
Code to generate sample data:
CREATE TABLE #temp
(
row_num INT IDENTITY(1,1),
customer_status VARCHAR(10),
effective_from_datetime DATE
)
INSERT INTO #temp
VALUES
('Active','2011-01-01')
,('Active','2011-01-02')
,('Active','2011-01-03')
,('Suspended','2011-01-04')
,('Suspended','2011-01-05')
,('Active','2011-01-06')
EDIT SQL updated as per comment.
WITH
group_assigned_data AS
(
SELECT
ROW_NUMBER() OVER (PARTITION BY customer_status ORDER BY effective_from_date) AS status_sequence_id,
ROW_NUMBER() OVER ( ORDER BY effective_from_date) AS sequence_id,
customer_status,
effective_from_date
FROM
your_table
)
,
grouped_data AS
(
SELECT
customer_status,
MIN(effective_from_date) AS min_effective_from_date,
MAX(effective_from_date) AS max_effective_from_date
FROM
group_assigned_data
GROUP BY
customer_status,
sequence_id - status_sequence_id
)
SELECT
[current].customer_status,
[current].min_effective_from_date AS effective_from,
[next].min_effective_from_date AS effective_to
FROM
grouped_data AS [current]
LEFT JOIN
grouped_data AS [next]
ON [current].max_effective_from_date = [next].min_effective_from_date + 1
ORDER BY
[current].min_effective_from_date
This isn't recursive, but that's possibly a good thing.
It doesn't deal with gaps in your data. To deal with that you could create a calendar table, with every relevant date, and join on that to fill missing dates with 'unknown' status, and then run the query against that. (Infact you cate do it it a CTE that is used by the CTE above).
At present...
- If row 2 was missing, it would not change the result
- If row 3 was missing, the end_date of the first row would change
Different behaviour can be determined by preparing your data, or other methods. We'd need to know the business logic you need though.
If any one date can have multiple status entries, you need to define what logic you want it to follow. At present the behaviour is undefined, but you could correct that as simply as adding customer_status to the ORDER BY portions of ROW_NUMBER().