Finding the Difference of Two Results - sql

I have two results with two different dates (a recent one and the previous one) the numbers below are the result 250 being the most recent and 300 being the previous result:
250
300
The code I use is here:
SELECT TOP 2
MY FIELD as bmi
FROM
MY TABLE
ORDER BY
THE DATE FIELD DESC
Within this same code I want to be able to find the difference between those two numbers and for that to appear not the two numbers?
I have tried a few things of skipping N rows etc but now I don't know what I can do?

I think you want something like this:
declare #firstBmiRes int
declare #secondBmiRes int
SET #firstBmiRes = 250 /* insert your query */
SET #secondBmiRes = 300 /* insert your query */
(SELECT SUM(#secondBmiRes - #firstBmiRes))
If you want to continue to use the calculated result. You can obviously store the value into another variable like this:
declare #bmi int
SET #bmi = (SELECT SUM(#secondBmiRes - #firstBmiRes))
SELECT #bmi
2nd Approach:
Since we don't have very much information to work with. you could try something like this... But i'm assuming a lot of your datastructure here.
declare #BmiScore int
declare #firstBmiRes int
declare #secondBmiRes int
SET #firstBmiRes = (SELECT TOP 1 MY_FIELD
FROM MY_TABLE
ORDER BY DATE_FIELD DESC)
SET #secondBmiRes = (SELECT MY_FIELD
FROM MY_TABLE
ORDER BY DATE_FIELD DESC
OFFSET 1 ROW
FETCH NEXT 1 ROW ONLY)
SET #bmiScore = (SELECT SUM(#secondBmiRes - #firstBmiRes))
SELECT #bmiScore

SELECT
MYFIELD - LAG (MYFIELD,1) OVER (ORDER BY MYDATE) AS BMI
FROM
MYTABLE;
ORDER BY MYDATE DESC

Using a LEAD function if you want your code to be a part of new code for some reason:
select TOP 1 (bmi - lead(bmi) over (order by date_field)) as result
from( SELECT TOP 2 my_field as bmi
, date_field
FROM my_table
ORDER BY date_field DESC) A
Here is a DEMO
Or by LAG :
select TOP 1 (lag(my_field) over (order by date_field) - my_field ) as result
FROM my_table
ORDER BY date_field DESC;

You can use LEAD/ LAG if your version of SQL Server supports these functions. If you are on an older version then you can use a windowed function to apply an order to the rows.
Here's your data going into a temporary table variable:
DECLARE #MY_TABLE TABLE (THE_DATE_FIELD DATE, MY_FIELD INT);
INSERT INTO #MY_TABLE SELECT '20200114', 300 UNION ALL SELECT '20200113', 250;
...and here's a query to perform the calculation you needed:
WITH x AS (
SELECT TOP 2
THE_DATE_FIELD,
MY_FIELD AS bmi,
ROW_NUMBER() OVER (ORDER BY THE_DATE_FIELD DESC) AS order_id
FROM
#MY_TABLE)
SELECT
MAX(CASE WHEN order_id = 1 THEN bmi END) - MAX(CASE WHEN order_id = 2 THEN bmi END) AS difference_bmi
FROM
x;
If I peek at the data from the CTE then I see this (and this is why I included the date field, which is redundant, and could otherwise be removed):
THE_DATE_FIELD bmi order_id
2020-01-14 300 1
2020-01-13 250 2
Now it's simply a case of picking the two values, as one has an order_id = 1 and one has an order_id = 2.

Related

How can I improve the native query for a table with 7 millions rows?

I have the below view(table) in my database(SQL SERVER).
I want to retrieve 2 things from this table.
The object which has the latest booking date for each Product number.
It will return the objects = {0001, 2, 2019-06-06 10:39:58} and {0003, 2, 2019-06-07 12:39:58}.
If all the step number has no booking date for a Product number, it wil return the object with Step number = 1. It will return the object = {0002, 1, NULL}.
The view has 7.000.000 rows. I must do it by using native query.
The first query that retrieves the product with the latest booking date:
SELECT DISTINCT *
FROM TABLE t
WHERE t.BOOKING_DATE = (SELECT max(tbl.BOOKING_DATE) FROM TABLE tbl WHERE t.PRODUCT_NUMBER = tbl.PRODUCT_NUMBER)
The second query that retrieves the product with booking date NULL and Step number = 1;
SELECT DISTINCT *
FROM TABLE t
WHERE (SELECT max(tbl.BOOKING_DATE) FROM TABLE tbl WHERE t.PRODUCT_NUMBER = tbl.PRODUCT_NUMBER) IS NULL AND t.STEP_NUMBER = 1
I tried using a single query, but it takes too long.
For now I use 2 query for getting this information but for the future I need to improve this. Do you have an alternative? I also can not use stored procedure, function inside SQL SERVER. I must do it with native query from Java.
Try this,
Declare #p table(pumber int,step int,bookdate datetime)
insert into #p values
(1,1,'2019-01-01'),(1,2,'2019-01-02'),(1,3,'2019-01-03')
,(2,1,null),(2,2,null),(2,3,null)
,(3,1,null),(3,2,null),(3,3,'2019-01-03')
;With CTE as
(
select pumber,max(bookdate)bookdate
from #p p1
where bookdate is not null
group by pumber
)
select p.* from #p p
where exists(select 1 from CTE c
where p.pumber=c.pumber and p.bookdate=c.bookdate)
union all
select p1.* from #p p1
where p1.bookdate is null and step=1
and not exists(select 1 from CTE c
where p1.pumber=c.pumber)
If performance is main concern then 1 or 2 query do not matter,finally performance matter.
Create NonClustered index ix_Product on Product (ProductNumber,BookingDate,Stepnumber)
Go
If more than 90% of data are where BookingDate is not null or where BookingDate is null then you can create Filtered Index on it.
Create NonClustered index ix_Product on Product (ProductNumber,BookingDate,Stepnumber)
where BookingDate is not null
Go
Try row_number() with a proper ordering. Null values are treated as the lowest possible values by sql-server ORDER BY.
SELECT TOP(1) WITH TIES *
FROM myTable t
ORDER BY row_number() over(partition by PRODUCT_NUMBER order by BOOKING_DATE DESC, STEP_NUMBER);
Pay attention to sql-server adviced indexes to get good performance.
Possibly the most efficient method is a correlated subquery:
select t.*
from t
where t.step_number = (select top (1) t2.step_number
from t t2
where t2.product_number = t.product_number and
order by t2.booking_date desc, t2.step_number
);
In particular, this can take advantage of an index on (product_number, booking_date desc, step_number).

Sorting twice on same column

I'm having a bit of a weird question, given to me by a client.
He has a list of data, with a date between parentheses like so:
Foo (14/08/2012)
Bar (15/08/2012)
Bar (16/09/2012)
Xyz (20/10/2012)
However, he wants the list to be displayed as follows:
Foo (14/08/2012)
Bar (16/09/2012)
Bar (15/08/2012)
Foot (20/10/2012)
(notice that the second Bar has moved up one position)
So, the logic behind it is, that the list has to be sorted by date ascending, EXCEPT when two rows have the same name ('Bar'). If they have the same name, it must be sorted with the LATEST date at the top, while staying in the other sorting order.
Is this even remotely possible? I've experimented with a lot of ORDER BY clauses, but couldn't find the right one. Does anyone have an idea?
I should have specified that this data comes from a table in a sql server database (the Name and the date are in two different columns). So I'm looking for a SQL-query that can do the sorting I want.
(I've dumbed this example down quite a bit, so if you need more context, don't hesitate to ask)
This works, I think
declare #t table (data varchar(50), date datetime)
insert #t
values
('Foo','2012-08-14'),
('Bar','2012-08-15'),
('Bar','2012-09-16'),
('Xyz','2012-10-20')
select t.*
from #t t
inner join (select data, COUNT(*) cg, MAX(date) as mg from #t group by data) tc
on t.data = tc.data
order by case when cg>1 then mg else date end, date desc
produces
data date
---------- -----------------------
Foo 2012-08-14 00:00:00.000
Bar 2012-09-16 00:00:00.000
Bar 2012-08-15 00:00:00.000
Xyz 2012-10-20 00:00:00.000
A way with better performance than any of the other posted answers is to just do it entirely with an ORDER BY and not a JOIN or using CTE:
DECLARE #t TABLE (myData varchar(50), myDate datetime)
INSERT INTO #t VALUES
('Foo','2012-08-14'),
('Bar','2012-08-15'),
('Bar','2012-09-16'),
('Xyz','2012-10-20')
SELECT *
FROM #t t1
ORDER BY (SELECT MIN(t2.myDate) FROM #t t2 WHERE t2.myData = t1.myData), T1.myDate DESC
This does exactly what you request and will work with any indexes and much better with larger amounts of data than any of the other answers.
Additionally it's much more clear what you're actually trying to do here, rather than masking the real logic with the complexity of a join and checking the count of joined items.
This one uses analytic functions to perform the sort, it only requires one SELECT from your table.
The inner query finds gaps, where the name changes. These gaps are used to identify groups in the next query, and the outer query does the final sorting by these groups.
I have tried it here (SQL Fiddle) with extended test-data.
SELECT name, dat
FROM (
SELECT name, dat, SUM(gap) over(ORDER BY dat, name) AS grp
FROM (
SELECT name, dat,
CASE WHEN LAG(name) OVER (ORDER BY dat, name) = name THEN 0 ELSE 1 END AS gap
FROM t
) x
) y
ORDER BY grp, dat DESC
Extended test-data
('Bar','2012-08-12'),
('Bar','2012-08-11'),
('Foo','2012-08-14'),
('Bar','2012-08-15'),
('Bar','2012-08-16'),
('Bar','2012-09-17'),
('Xyz','2012-10-20')
Result
Bar 2012-08-12
Bar 2012-08-11
Foo 2012-08-14
Bar 2012-09-17
Bar 2012-08-16
Bar 2012-08-15
Xyz 2012-10-20
I think that this works, including the case I asked about in the comments:
declare #t table (data varchar(50), [date] datetime)
insert #t
values
('Foo','20120814'),
('Bar','20120815'),
('Bar','20120916'),
('Xyz','20121020')
; With OuterSort as (
select *,ROW_NUMBER() OVER (ORDER BY [date] asc) as rn from #t
)
--Now we need to find contiguous ranges of the same data value, and the min and max row number for such a range
, Islands as (
select data,rn as rnMin,rn as rnMax from OuterSort os where not exists (select * from OuterSort os2 where os2.data = os.data and os2.rn = os.rn - 1)
union all
select i.data,rnMin,os.rn
from
Islands i
inner join
OuterSort os
on
i.data = os.data and
i.rnMax = os.rn-1
), FullIslands as (
select
data,rnMin,MAX(rnMax) as rnMax
from Islands
group by data,rnMin
)
select
*
from
OuterSort os
inner join
FullIslands fi
on
os.rn between fi.rnMin and fi.rnMax
order by
fi.rnMin asc,os.rn desc
It works by first computing the initial ordering in the OuterSort CTE. Then, using two CTEs (Islands and FullIslands), we compute the parts of that ordering in which the same data value appears in adjacent rows. Having done that, we can compute the final ordering by any value that all adjacent values will have (such as the lowest row number of the "island" that they belong to), and then within an "island", we use the reverse of the originally computed sort order.
Note that this may, though, not be too efficient for large data sets. On the sample data it shows up as requiring 4 table scans of the base table, as well as a spool.
Try something like...
ORDER BY CASE date
WHEN '14/08/2012' THEN 1
WHEN '16/09/2012' THEN 2
WHEN '15/08/2012' THEN 3
WHEN '20/10/2012' THEN 4
END
In MySQL, you can do:
ORDER BY FIELD(date, '14/08/2012', '16/09/2012', '15/08/2012', '20/10/2012')
In Postgres, you can create a function FIELD and do:
CREATE OR REPLACE FUNCTION field(anyelement, anyarray) RETURNS numeric AS $$
SELECT
COALESCE((SELECT i
FROM generate_series(1, array_upper($2, 1)) gs(i)
WHERE $2[i] = $1),
0);
$$ LANGUAGE SQL STABLE
If you do not want to use the CASE, you can try to find an implementation of the FIELD function to SQL Server.

How to write this sql query

I have a SQL Server table with the following structure
cod_turn (PrimaryKey)
taken (bit)
time (datetime)
and several other fields which are irrelevant to the problem. I cant alter the table structure because the app was made by someone else.
given a numeric variable parameter, which we will assume to be "3" for this example, and a time, I need to create a query which looking from that time on, it looks the first 3 consecutive records which are not marked as "taken". I cant figure out how to make the query in pure sql, if possible.
PS: I accepted the answer because it was correct, but I made a bad description of the problem. I will open another question later. Feeling stupid after seeing the size of the answers =)
SELECT TOP 3 * FROM table WHERE taken = 0 AND time>=#Time ORDER BY time
Where #Time is whatever time you pass in.
Assuming current versions of SQL Server and assuming you've named you "numeric variable parameter" as #top int. Note:the parenthesis around #top are required when using a parameter-ized TOP
SELECT TOP (#top)
cod_turn,
taken ,
time
FROM yourtable
WHERE Taken = 0 AND time>=#Time
ORDER BY time DESC
You can also do
with cte as
(
SELECT
ROW_NUMBER() over (order by time desc) rn
cod_turn,
taken ,
time
FROM yourtable
WHERE Taken = 0 AND time>=#Time
)
SELECT
cod_turn,
taken ,
time
FROM CTE
WHERE rn <= #top
ORDER BY time DESC
SELECT TOP 3
*
FROM
table
WHERE
time >= #inserted_time
AND taken = 0
ORDER BY
cod_turn ASC
select MT.*
from
(
select cod_turn, ROW_NUMBER() OVER (ORDER BY cod_turn) [RowNumber] -- or by time
from myTable
where taken = 0
and time >= #myTime
) T
inner join myTable MT on MT.cod_turn = T.cod_turn
where T.RowNumber < #myNumber
select top 3 * from theTable where taken = 0 and time > theTime orderby time

Remove duplicates (1 to many) or write a subquery that solves my problem

Referring to the diagram below the records table has unique Records. Each record is updated, via comments through an Update Table. When I join the two I get lots of duplicates.
How to remove duplicates? Group By does not work for me as I have more than 10 fields in select query and some of them are functions.
Write a sub query which pulls the last updates in the Update table for each record that is updated in a particular month. Joining with this sub query will solve my problem.
Thanks!
Edit
Table structure that is of interest is
create table Records(
recordID int,
90more_fields various
)
create table Updates(
update_id int,
record_id int,
comment text,
byUser varchar(25),
datecreate datetime
)
Here's one way.
SELECT * /*But list columns explicitly*/
FROM Orange o
CROSS APPLY (SELECT TOP 1 *
FROM Blue b
WHERE b.datecreate >= '20110901'
AND b.datecreate < '20111001'
AND o.RecordID = b.Record_ID2
ORDER BY b.datecreate DESC) b
Based on the limited information available...
WITH cteLastUpdate AS (
SELECT Record_ID2, UpdateDateTime,
ROW_NUMBER() OVER(PARTITION BY Record_ID2 ORDER BY UpdateDateTime DESC) AS RowNUM
FROM BlueTable
/* Add WHERE clause if needed to restrict date range */
)
SELECT *
FROM cteLastUpdate lu
INNER JOIN OrangeTable o
ON lu.Record_ID2 = o.RecordID
WHERE lu.RowNum = 1
Last updates per record and month:
SELECT *
FROM UPDATES outerUpd
WHERE exists
(
-- Magic part
SELECT 1
FROM UPDATES innerUpd
WHERE innerUpd.RecordId = outerUpd.RecordId
GROUP BY RecordId
, date_part('year', innerUpd.datecolumn)
, date_part('month', innerUpd.datecolumn)
HAVING max(innerUpd.datecolumn) = outerUpd.datecolumn
)
(Works on PostgreSQL, date_part is different in other RDBMS)

Get last item in a table - SQL

I have a History Table in SQL Server that basically tracks an item through a process. The item has some fixed fields that don't change throughout the process, but has a few other fields including status and Id which increment as the steps of the process increase.
Basically I want to retrieve the last step for each item given a Batch Reference. So if I do a
Select * from HistoryTable where BatchRef = #BatchRef
It will return all the steps for all the items in the batch - eg
Id Status BatchRef ItemCount
1 1 Batch001 100
1 2 Batch001 110
2 1 Batch001 60
2 2 Batch001 100
But what I really want is:
Id Status BatchRef ItemCount
1 2 Batch001 110
2 2 Batch001 100
Edit: Appologies - can't seem to get the TABLE tags to work with Markdown - followed the help to the letter, and looks fine in the preview
Assuming you have an identity column in the table...
select
top 1 <fields>
from
HistoryTable
where
BatchRef = #BatchRef
order by
<IdentityColumn> DESC
It's kind of hard to make sense of your table design - I think SO ate your delimiters.
The basic way of handling this is to GROUP BY your fixed fields, and select a MAX (or MIN) for some unqiue value (a datetime usually works well). In your case, I think that the GROUP BY would be BatchRef and ItemCount, and Id will be your unique column.
Then, join back to the table to get all columns. Something like:
SELECT *
FROM HistoryTable
JOIN (
SELECT
MAX(Id) as Id.
BatchRef,
ItemCount
FROM HsitoryTable
WHERE
BacthRef = #batchRef
GROUP BY
BatchRef,
ItemCount
) as Latest ON
HistoryTable.Id = Latest.Id
Assuming the Item Ids are incrementally numbered:
--Declare a temp table to hold the last step for each item id
DECLARE #LastStepForEach TABLE (
Id int,
Status int,
BatchRef char(10),
ItemCount int)
--Loop counter
DECLARE #count INT;
SET #count = 0;
--Loop through all of the items
WHILE (#count < (SELECT MAX(Id) FROM HistoryTable WHERE BatchRef = #BatchRef))
BEGIN
SET #count = #count + 1;
INSERT INTO #LastStepForEach (Id, Status, BatchRef, ItemCount)
SELECT Id, Status, BatchRef, ItemCount
FROM HistoryTable
WHERE BatchRef = #BatchRef
AND Id = #count
AND Status =
(
SELECT MAX(Status)
FROM HistoryTable
WHERE BatchRef = #BatchRef
AND Id = #count
)
END
SELECT *
FROM #LastStepForEach
SELECT id, status, BatchRef, MAX(itemcount) AS maxItemcount
FROM HistoryTable GROUP BY id, status, BatchRef
HAVING status > 1
It's a bit hard to decypher your data the way WMD has formatted it, but you can pull of the sort of trick you need with common table expressions on SQL 2005:
with LastBatches as (
select Batch, max(Id)
from HistoryTable
group by Batch
)
select *
from HistoryTable h
join LastBatches b on b.Batch = h.Batch and b.Id = h.Id
Or a subquery (assuming the group by in the subquery works - off the top of my head I don't recall):
select *
from HistoryTable h
join (
select Batch, max(Id)
from HistoryTable
group by Batch
) b on b.Batch = h.Batch and b.Id = h.Id
Edit: I was assuming you wanted the last item for every batch. If you just need it for the one batch then the other answers (doing a top 1 and ordering descending) are the way to go.
As already suggested you probably want to reorder your query to sort it in the other direction so you actually fetch the first row. Then you'd probably want to use something like
SELECT TOP 1 ...
if you're using MSSQL 2k or earlier, or the SQL compliant variant
SELECT * FROM (
SELECT
ROW_NUMBER() OVER (ORDER BY key ASC) AS rownumber,
columns
FROM tablename
) AS foo
WHERE rownumber = n
for any other version (or for other database systems that support the standard notation), or
SELECT ... LIMIT 1 OFFSET 0
for some other variants without the standard SQL support.
See also this question for some additional discussion around selecting rows. Using the aggregate function max() might or might not be faster depending on whether calculating the value requires a table scan.