Get last item in a table - SQL - sql

I have a History Table in SQL Server that basically tracks an item through a process. The item has some fixed fields that don't change throughout the process, but has a few other fields including status and Id which increment as the steps of the process increase.
Basically I want to retrieve the last step for each item given a Batch Reference. So if I do a
Select * from HistoryTable where BatchRef = #BatchRef
It will return all the steps for all the items in the batch - eg
Id Status BatchRef ItemCount
1 1 Batch001 100
1 2 Batch001 110
2 1 Batch001 60
2 2 Batch001 100
But what I really want is:
Id Status BatchRef ItemCount
1 2 Batch001 110
2 2 Batch001 100
Edit: Appologies - can't seem to get the TABLE tags to work with Markdown - followed the help to the letter, and looks fine in the preview

Assuming you have an identity column in the table...
select
top 1 <fields>
from
HistoryTable
where
BatchRef = #BatchRef
order by
<IdentityColumn> DESC

It's kind of hard to make sense of your table design - I think SO ate your delimiters.
The basic way of handling this is to GROUP BY your fixed fields, and select a MAX (or MIN) for some unqiue value (a datetime usually works well). In your case, I think that the GROUP BY would be BatchRef and ItemCount, and Id will be your unique column.
Then, join back to the table to get all columns. Something like:
SELECT *
FROM HistoryTable
JOIN (
SELECT
MAX(Id) as Id.
BatchRef,
ItemCount
FROM HsitoryTable
WHERE
BacthRef = #batchRef
GROUP BY
BatchRef,
ItemCount
) as Latest ON
HistoryTable.Id = Latest.Id

Assuming the Item Ids are incrementally numbered:
--Declare a temp table to hold the last step for each item id
DECLARE #LastStepForEach TABLE (
Id int,
Status int,
BatchRef char(10),
ItemCount int)
--Loop counter
DECLARE #count INT;
SET #count = 0;
--Loop through all of the items
WHILE (#count < (SELECT MAX(Id) FROM HistoryTable WHERE BatchRef = #BatchRef))
BEGIN
SET #count = #count + 1;
INSERT INTO #LastStepForEach (Id, Status, BatchRef, ItemCount)
SELECT Id, Status, BatchRef, ItemCount
FROM HistoryTable
WHERE BatchRef = #BatchRef
AND Id = #count
AND Status =
(
SELECT MAX(Status)
FROM HistoryTable
WHERE BatchRef = #BatchRef
AND Id = #count
)
END
SELECT *
FROM #LastStepForEach

SELECT id, status, BatchRef, MAX(itemcount) AS maxItemcount
FROM HistoryTable GROUP BY id, status, BatchRef
HAVING status > 1

It's a bit hard to decypher your data the way WMD has formatted it, but you can pull of the sort of trick you need with common table expressions on SQL 2005:
with LastBatches as (
select Batch, max(Id)
from HistoryTable
group by Batch
)
select *
from HistoryTable h
join LastBatches b on b.Batch = h.Batch and b.Id = h.Id
Or a subquery (assuming the group by in the subquery works - off the top of my head I don't recall):
select *
from HistoryTable h
join (
select Batch, max(Id)
from HistoryTable
group by Batch
) b on b.Batch = h.Batch and b.Id = h.Id
Edit: I was assuming you wanted the last item for every batch. If you just need it for the one batch then the other answers (doing a top 1 and ordering descending) are the way to go.

As already suggested you probably want to reorder your query to sort it in the other direction so you actually fetch the first row. Then you'd probably want to use something like
SELECT TOP 1 ...
if you're using MSSQL 2k or earlier, or the SQL compliant variant
SELECT * FROM (
SELECT
ROW_NUMBER() OVER (ORDER BY key ASC) AS rownumber,
columns
FROM tablename
) AS foo
WHERE rownumber = n
for any other version (or for other database systems that support the standard notation), or
SELECT ... LIMIT 1 OFFSET 0
for some other variants without the standard SQL support.
See also this question for some additional discussion around selecting rows. Using the aggregate function max() might or might not be faster depending on whether calculating the value requires a table scan.

Related

How can I improve the native query for a table with 7 millions rows?

I have the below view(table) in my database(SQL SERVER).
I want to retrieve 2 things from this table.
The object which has the latest booking date for each Product number.
It will return the objects = {0001, 2, 2019-06-06 10:39:58} and {0003, 2, 2019-06-07 12:39:58}.
If all the step number has no booking date for a Product number, it wil return the object with Step number = 1. It will return the object = {0002, 1, NULL}.
The view has 7.000.000 rows. I must do it by using native query.
The first query that retrieves the product with the latest booking date:
SELECT DISTINCT *
FROM TABLE t
WHERE t.BOOKING_DATE = (SELECT max(tbl.BOOKING_DATE) FROM TABLE tbl WHERE t.PRODUCT_NUMBER = tbl.PRODUCT_NUMBER)
The second query that retrieves the product with booking date NULL and Step number = 1;
SELECT DISTINCT *
FROM TABLE t
WHERE (SELECT max(tbl.BOOKING_DATE) FROM TABLE tbl WHERE t.PRODUCT_NUMBER = tbl.PRODUCT_NUMBER) IS NULL AND t.STEP_NUMBER = 1
I tried using a single query, but it takes too long.
For now I use 2 query for getting this information but for the future I need to improve this. Do you have an alternative? I also can not use stored procedure, function inside SQL SERVER. I must do it with native query from Java.
Try this,
Declare #p table(pumber int,step int,bookdate datetime)
insert into #p values
(1,1,'2019-01-01'),(1,2,'2019-01-02'),(1,3,'2019-01-03')
,(2,1,null),(2,2,null),(2,3,null)
,(3,1,null),(3,2,null),(3,3,'2019-01-03')
;With CTE as
(
select pumber,max(bookdate)bookdate
from #p p1
where bookdate is not null
group by pumber
)
select p.* from #p p
where exists(select 1 from CTE c
where p.pumber=c.pumber and p.bookdate=c.bookdate)
union all
select p1.* from #p p1
where p1.bookdate is null and step=1
and not exists(select 1 from CTE c
where p1.pumber=c.pumber)
If performance is main concern then 1 or 2 query do not matter,finally performance matter.
Create NonClustered index ix_Product on Product (ProductNumber,BookingDate,Stepnumber)
Go
If more than 90% of data are where BookingDate is not null or where BookingDate is null then you can create Filtered Index on it.
Create NonClustered index ix_Product on Product (ProductNumber,BookingDate,Stepnumber)
where BookingDate is not null
Go
Try row_number() with a proper ordering. Null values are treated as the lowest possible values by sql-server ORDER BY.
SELECT TOP(1) WITH TIES *
FROM myTable t
ORDER BY row_number() over(partition by PRODUCT_NUMBER order by BOOKING_DATE DESC, STEP_NUMBER);
Pay attention to sql-server adviced indexes to get good performance.
Possibly the most efficient method is a correlated subquery:
select t.*
from t
where t.step_number = (select top (1) t2.step_number
from t t2
where t2.product_number = t.product_number and
order by t2.booking_date desc, t2.step_number
);
In particular, this can take advantage of an index on (product_number, booking_date desc, step_number).

Finding the Difference of Two Results

I have two results with two different dates (a recent one and the previous one) the numbers below are the result 250 being the most recent and 300 being the previous result:
250
300
The code I use is here:
SELECT TOP 2
MY FIELD as bmi
FROM
MY TABLE
ORDER BY
THE DATE FIELD DESC
Within this same code I want to be able to find the difference between those two numbers and for that to appear not the two numbers?
I have tried a few things of skipping N rows etc but now I don't know what I can do?
I think you want something like this:
declare #firstBmiRes int
declare #secondBmiRes int
SET #firstBmiRes = 250 /* insert your query */
SET #secondBmiRes = 300 /* insert your query */
(SELECT SUM(#secondBmiRes - #firstBmiRes))
If you want to continue to use the calculated result. You can obviously store the value into another variable like this:
declare #bmi int
SET #bmi = (SELECT SUM(#secondBmiRes - #firstBmiRes))
SELECT #bmi
2nd Approach:
Since we don't have very much information to work with. you could try something like this... But i'm assuming a lot of your datastructure here.
declare #BmiScore int
declare #firstBmiRes int
declare #secondBmiRes int
SET #firstBmiRes = (SELECT TOP 1 MY_FIELD
FROM MY_TABLE
ORDER BY DATE_FIELD DESC)
SET #secondBmiRes = (SELECT MY_FIELD
FROM MY_TABLE
ORDER BY DATE_FIELD DESC
OFFSET 1 ROW
FETCH NEXT 1 ROW ONLY)
SET #bmiScore = (SELECT SUM(#secondBmiRes - #firstBmiRes))
SELECT #bmiScore
SELECT
MYFIELD - LAG (MYFIELD,1) OVER (ORDER BY MYDATE) AS BMI
FROM
MYTABLE;
ORDER BY MYDATE DESC
Using a LEAD function if you want your code to be a part of new code for some reason:
select TOP 1 (bmi - lead(bmi) over (order by date_field)) as result
from( SELECT TOP 2 my_field as bmi
, date_field
FROM my_table
ORDER BY date_field DESC) A
Here is a DEMO
Or by LAG :
select TOP 1 (lag(my_field) over (order by date_field) - my_field ) as result
FROM my_table
ORDER BY date_field DESC;
You can use LEAD/ LAG if your version of SQL Server supports these functions. If you are on an older version then you can use a windowed function to apply an order to the rows.
Here's your data going into a temporary table variable:
DECLARE #MY_TABLE TABLE (THE_DATE_FIELD DATE, MY_FIELD INT);
INSERT INTO #MY_TABLE SELECT '20200114', 300 UNION ALL SELECT '20200113', 250;
...and here's a query to perform the calculation you needed:
WITH x AS (
SELECT TOP 2
THE_DATE_FIELD,
MY_FIELD AS bmi,
ROW_NUMBER() OVER (ORDER BY THE_DATE_FIELD DESC) AS order_id
FROM
#MY_TABLE)
SELECT
MAX(CASE WHEN order_id = 1 THEN bmi END) - MAX(CASE WHEN order_id = 2 THEN bmi END) AS difference_bmi
FROM
x;
If I peek at the data from the CTE then I see this (and this is why I included the date field, which is redundant, and could otherwise be removed):
THE_DATE_FIELD bmi order_id
2020-01-14 300 1
2020-01-13 250 2
Now it's simply a case of picking the two values, as one has an order_id = 1 and one has an order_id = 2.

Querying database for records that don't have a 2nd record canceling it out

I have a table where I'm trying to find a set of particular records. Here's what my table looks like...
tblA
ID VouchID Action Amount
1 177-17 Add 700
2 177-17 Update 1
3 198-01 Add 600
4 198-01 Update 620
So what happens here, is if a record was canceled/deleted, the action would be 'Update' and Amount would be updated to 1. In other words, the VouchID = 177-17, would not be counted/be selected in this query...
What I'm hoping to do here is only select records, that don't have a corresponding Update record with Amount = 1
Select distinct vouchID where Action='add'
However, this query does not take under consideration VoucherID's that have an 'update' action. Update action can be applied in two instances, in VouchID 177-17 the amount = 1 on action='update' that means, that the ADD action does not count, it's almost as if we removed the record all together (it's just there for record keeping). Another Update in case of VoucherID = 198-01, the update line and amount = 620, means that the Amount was updated by 20 to 620, that record i hope to be able to see in my end reuslt
Desired end result from above table:
ID VouchID Action Amount
3 198-01 Add 600
You could use LEAD (SQL Server 2012 and above):
WITH cte AS (
SELECT *, LEAD(Amount) OVER(PARTITION BY VouchID ORDER BY ID) AS next_amount
FROM table
)
SELECT *
FROM cte
WHERE (next_amount <> 1 OR next_amount IS NULL) AND Action='add';
EDIT
non-recursive CTE can be always replaced with simple subquery:
SELECT *
FROM (SELECT *,
LEAD(Amount) OVER(PARTITION BY VouchID ORDER BY ID) AS next_amount
FROM table) sub
WHERE (next_amount <> 1 OR next_amount IS NULL) AND Action='add';
EDIT:
Using EXISTS:
SELECT *
FROM table t1
WHERE Action='add'
AND NOT EXISTS (SELECT TOP 1
FROM table t2
WHERE t1.VouchId = t2.VouchId
AND Action='Update'
AND Amount = 1
ORDER BY ID ASC);
What I'm hoping to do here is only select records, that don't have a
corresponding Update record with Amount = 1
Seems easy enough with NOT EXISTS():
Select distinct vouchID FROM MyTable t1 where Action='add'
AND NOT EXISTS(SELECT * FROM MyTable t2
WHERE Action='Update'
AND Amount=1
AND t2.VouchId=t1.VouchId
Are you using SQL Server 2008 or better? If you are, I would try something like :
SELECT
ID, vouchID, Action, Amount
FROM tblA s
WHERE
Action='add'
AND NOT EXISTS(Select 1 from tblA l where l.vouchID = s.vouchID and l.Action = 'Update' and l.Amount = 1);

Remove duplicate row based on select statement

I have two select statements which is returning duplicated data. What I'm trying to accomplish is to remove a duplicated leg. But I'm having hard times to get to the second row programmatically.
select i.InvID, i.UID, i.StartDate, i.EndDate, i.Minutes,i.ABID from inv_v i, InvoiceLines_v i2 where
i.Period = '2014/08'
and i.EndDate = i2.EndDate
and i.Minutes = i2.Minutes
and i.Uid <> i2.Uid
and i.abid = i2.abid
order by i.EndDate
This select statement returns the following data.
As you can see it returns duplicate rows where minutes are the same ABID is the same but InvID are different. What I need to do is to remove one of the InvID where the criteria matches. Doesn't matter which one.
The second select statement is returning different data.
select i.InvID, i.UID, i.StartDate, i.EndDate, i.Minutes from InvoiceLines_v i, InvoiceLines_v i2 where
i.Period = '2014/08'
and i.EndDate = i2.EndDate
and i.Uid = i2.Uid
and i.Abid <> i2.Abid
and i.Language <> i2.Language
order by i.startdate desc
In this select statement I want to remove an InvID where UID is the same then select the lowest Mintues. In This case, I would remove the following InvIDs: 2537676 , 2537210
My goal is to remove those rows...
I could accomplish this using cursor grab the InvID and remove it by simple delete statement, but I'm trying to stay away from cursors.
Any suggestions on how I can accomplish this?
You can use exists to delete all duplicates except the one with the highest InvID by deleting those rows where another row exists with the same values but with a higher InvID
delete from inv_v
where exists (
select 1 from inv_v i2
where i2.InvID > inv_v.InvID
and i2.minutes = inv_v.minutes
and i2.EndDate = inv_v.EndDate
and i2.abid = inv_v.abid
and i2.uid <> inv_v.uid -- not sure why <> is used here, copied from question
)
I have faced similar problems regarding duplicate data and some one told me to use partition by and other methods but those were causing performance issues
However , I had a primary key in my table through which I was able to select one row from the duplicate data and then delete it.
For example in the first select statement "minutes" and "ABID" are the criteria to consider duplicacy in data.But "Invid" can be used to distinguish between the duplicate rows.
So you can use below query to remove duplicacy.
delete from inv_i where inv_id in (select max(inv_id) from inv_i group by minutes,abid having count(*) > 1 );
This simple concept was helpful to me. It can be helpful in your case if "Inv_id" is unique.
;WITH CTE AS
(
SELECT InvID
,[UID]
,StartDate
,EndDate
,[Minutes]
,ROW_NUMBER() OVER (PARTITION BY InvID, [UID] ORDER BY [Minutes] ASC) rn
FROM InvoiceLines_v
)
SELECT *
FROM CTE
WHERE rn = 1
Replace the ORIGINAL_TABLE with your table name.
QUERY 1:
WITH DUP_TABLE AS
(
SELECT ROW_NUMBER()
OVER (PARTITION BY minutes, ABID ORDER BY minutes, ABID) As ROW_NO
FROM <ORIGINAL_TABLE>
)
DELETE FROM DUP_TABLE WHERE ROW_NO > 1;
QUERY 2:
WITH DUP_TABLE AS
(
SELECT ROW_NUMBER()
OVER (PARTITION BY UID ORDER BY minutes) As ROW_NO
FROM <ORIGINAL_TABLE>
)
DELETE FROM DUP_TABLE WHERE ROW_NO > 1;

Calculating current consecutive days from a table

I have what seems to be a common business request but I can't find no clear solution. I have a daily report (amongst many) that gets generated based on failed criteria and gets saved to a table. Each report has a type id tied to it to signify which report it is, and there is an import event id that signifies the day the imports came in (a date column is added for extra clarification). I've added a sqlfiddle to see the basic schema of the table (renamed for privacy issues).
http://www.sqlfiddle.com/#!3/81945/8
All reports currently generated are working fine, so nothing needs to be modified on the table. However, for one report (type 11), not only I need pull the invoices that showed up today, I also need to add one column that totals the amount of consecutive days from date of run for that invoice (including current day). The result should look like the following, based on the schema provided:
INVOICE MESSAGE EVENT_DATE CONSECUTIVE_DAYS_ON_REPORT
12345 Yes July, 30 2013 6
54355 Yes July, 30 2013 2
644644 Yes July, 30 2013 4
I only need the latest consecutive days, not any other set that may show up. I've tried to run self joins to no avail, and my last attempt is also listed as part of the sqlfiddle file, to no avail. Any suggestions or ideas? I'm quite stuck at the moment.
FYI: I am working in SQL Server 2000! I have seen a lot of neat tricks that have come out in 2005 and 2008, but I can't access them.
Your help is greatly appreciated!
Something like this? http://www.sqlfiddle.com/#!3/81945/14
SELECT
[final].*,
[last].total_rows
FROM
tblEventInfo AS [final]
INNER JOIN
(
SELECT
[first_of_last].type_id,
[first_of_last].invoice,
MAX([all_of_last].event_date) AS event_date,
COUNT(*) AS total_rows
FROM
(
SELECT
[current].type_id,
[current].invoice,
MAX([current].event_date) AS event_date
FROM
tblEventInfo AS [current]
LEFT JOIN
tblEventInfo AS [previous]
ON [previous].type_id = [current].type_id
AND [previous].invoice = [current].invoice
AND [previous].event_date = [current].event_date-1
WHERE
[current].type_id = 11
AND [previous].type_id IS NULL
GROUP BY
[current].type_id,
[current].invoice
)
AS [first_of_last]
INNER JOIN
tblEventInfo AS [all_of_last]
ON [all_of_last].type_id = [first_of_last].type_id
AND [all_of_last].invoice = [first_of_last].invoice
AND [all_of_last].event_date >= [first_of_last].event_date
GROUP BY
[first_of_last].type_id,
[first_of_last].invoice
)
AS [last]
ON [last].type_id = [final].type_id
AND [last].invoice = [final].invoice
AND [last].event_date = [final].event_date
The inner most query looks up the starting record of the last block of consecutive records.
Then that joins on to all the records in that block of consecutive records, giving the final date and the count of rows (consecutive days).
Then that joins on to the row for the last day to get the message, etc.
Make sure that in reality you have an index on (type_id, invoice, event_date).
You have multiple problems. Tackle them separately and build up.
Problems:
1) Identifying consecutive ranges: subtract the row_number from the range column and group by the result
2) No ROW_NUMBER() functions in SQL 2000: Fake it with a correlated subquery.
3) You actually want DENSE_RANK() instead of ROW_NUMBER: Make a list of unique dates first.
Solutions:
3)
SELECT MAX(id) AS id,invoice,event_date FROM tblEventInfo GROUP BY invoice,event_date
2)
SELECT t2.invoice,t2.event_date,t2.id,
DATEDIFF(day,(SELECT COUNT(DISTINCT event_date) FROM (SELECT MAX(id) AS id,invoice,event_date FROM tblEventInfo GROUP BY invoice,event_date) t1 WHERE t2.invoice = t1.invoice AND t2.event_date > t1.event_date),t2.event_date) grp
FROM (SELECT MAX(id) AS id,invoice,event_date FROM tblEventInfo GROUP BY invoice,event_date) t2
ORDER BY invoice,grp,event_date
1)
SELECT
t3.invoice AS INVOICE,
MAX(t3.event_date) AS EVENT_DATE,
COUNT(t3.event_date) AS CONSECUTIVE_DAYS_ON_REPORT
FROM (
SELECT t2.invoice,t2.event_date,t2.id,
DATEDIFF(day,(SELECT COUNT(DISTINCT event_date) FROM (SELECT MAX(id) AS id,invoice,event_date FROM tblEventInfo GROUP BY invoice,event_date) t1 WHERE t2.invoice = t1.invoice AND t2.id > t1.id),t2.event_date) grp
FROM (SELECT MAX(id) AS id,invoice,event_date FROM tblEventInfo GROUP BY invoice,event_date) t2
) t3
GROUP BY t3.invoice,t3.grp
The rest of your question is a little ambiguous. If two ranges are of equal length, do you want both or just the most recent? Should the output MESSAGE be 'Yes' if any message = 'Yes' or only if the most recent message = 'Yes'?
This should give you enough of a breadcrumb though
I had a similar requirement not long ago getting a "Top 5" ranking with a consecutive number of periods in Top 5. The only solution I found was to do it in a cursor. The cursor has a date = #daybefore and inside the cursor if your data does not match quit the loop, otherwise set #daybefore = datediff(dd, -1, #daybefore).
Let me know if you want an example. There just seem to be a large number of enthusiasts, who hit downvote when they see the word "cursor" even if they don't have a better solution...
Here, try a scalar function like this:
CREATE FUNCTION ConsequtiveDays
(
#invoice bigint, #date datetime
)
RETURNS int
AS
BEGIN
DECLARE #ct int = 0, #Count_Date datetime, #Last_Date datetime
SELECT #Last_Date = #date
DECLARE counter CURSOR LOCAL FAST_FORWARD
FOR
SELECT event_date FROM tblEventInfo
WHERE invoice = #invoice
ORDER BY event_date DESC
FETCH NEXT FROM counter
INTO #Count_Date
WHILE ##FETCH_STATUS = 0 AND DATEDIFF(dd,#Last_Date,#Count_Date) < 2
BEGIN
#ct = #ct + 1
END
CLOSE counter
DEALLOCATE counter
RETURN #ct
END
GO