Simple Database Query - Is there a faster way without a cursor? - sql

I have 2 tables that I'm trying to query. The first has a list of meters. The second, has the data for those meters. I want to get the newest reading for each meter.
Originally, this was in a group by statement, but it ended up processing all 7 million rows in our database, and took a little over a second. A subquery and a number of other ways of writing it had the same result.
I have a clustered index that covers the EndTime and the MeterDataConfigurationId columns in the MeterRecordings table.
Ultimately, this is what I wrote, which performs in about 20 milliseconds. It seems like SQL should be smart enough to perform the "group by" query in the same time.
Declare #Meters Table
(
MeterId Integer,
LastValue float,
LastTimestamp DateTime
)
Declare MeterCursor Cursor For
Select Id
From MeterDataConfiguration
Declare #MeterId Int
Open MeterCursor
Fetch Next From MeterCursor Into #MeterId
While ##FETCH_STATUS = 0
Begin
Declare #LastValue int
Declare #LastTimestamp DateTime
Select #LastValue = mr.DataValue, #LastTimestamp = mr.EndTime
From MeterRecording mr
Where mr.MeterDataConfigurationId = #MeterId
And mr.EndTime = (Select MAX(EndTime) from MeterRecording mr2 Where mr2.MeterDataConfigurationId = #MeterId)
Insert Into #Meters
Select #MeterId, #LastValue, #LastTimestamp
Fetch Next From MeterCursor Into #MeterId
End
Deallocate MeterCursor
Select *
From #Meters
Here is an example of the same query that performs horribly:
select mdc.id, mr.EndTime
from MeterDataConfiguration mdc
inner join MeterRecording mr on
mr.MeterDataConfigurationId = mdc.Id
and mr.EndTime = (select MAX(EndTime) from MeterRecording mr2 where MeterDataConfigurationId = mdc.Id)

You can try a CTE (Common Table Expression) using ROW_NUMBER:
;WITH Readings AS
(
SELECT
mdc.id, mr.EndTime,
ROW_NUMBER() OVER(PARTIION BY mdc.id ORDER BY mr.EndTime DESC) AS 'RowID'
FROM dbo.MeterDataConfiguration mdc
INNER JOIN dbo.MeterRecording mr ON mr.MeterDataConfigurationId = mdc.Id
)
SELECT
ID, EndTime, RowID
FROM
Readings
WHERE
RowID = 1
This creates "partitions" of data, one for each mdc.id, and numbers them sequentially, descending on mr.EndTime, so for each partition, you get the most recent reading as the RowID = 1 row.
Of course, to get decent performance, you need appropriate indices on:
mr.MeterDataConfigurationId since it's a foreign key into MeterDataConfiguration, right?
mr.EndTime since you do an ORDER BY on it
mdc.Id which I assume is a primary key, so it's indexed already
Update: sorry, I missed this tidbit:
I have a clustered index that covers
the EndTime and the
MeterDataConfigurationId columns in
the MeterRecordings table.
Quite honestly : I would toss that. Don't you have some other unique ID on the MeterRecordings table that would be suitable as a clustering index? An INT IDENTITY ID or something??
If you have a compound index on (EndTime, MeterDataConfigurationId), this won't be able to be used for both purposes - ordering on EndTime, and joining on MeterDataConfigurationId - one of them will not be doable - pity!

How does this query perform? This one gets all the data in MeterRecording ignoring the list in MeterDataConfiguration. If this is not safe to do so, that can be joined to this query to restrict the output.
SELECT Id, DataValue, EndTime
FROM (
select mr.MeterDataConfigurationId as Id,
mr.DataValue
mr.EndTime,
RANK() OVER(PARTITION BY mr.MeterDataConfigurationId
ORDER BY mr.EndTime DESC) as r
from MeterRecording mr) as M
WHERE M.r = 1

I would go with marc's answer, but if you ever need to use Cursors again(you should try to avoid them) and you need to process a lot of records, i would suggest that you create a temp table (or table variable) that has all the columns from the table you are reading plus an autogenerated identity field (IDENTITY(1,1)) and then just use a while loop to read from the table. Basically, increment an int variable (call it #id) inside the loop and do
select
#col1Value = column1,
#col2Value = column2, ...
from #temp_table
where id = #id
this is behaves just like a cursor, but i find this to be much faster.

Related

How can I improve the native query for a table with 7 millions rows?

I have the below view(table) in my database(SQL SERVER).
I want to retrieve 2 things from this table.
The object which has the latest booking date for each Product number.
It will return the objects = {0001, 2, 2019-06-06 10:39:58} and {0003, 2, 2019-06-07 12:39:58}.
If all the step number has no booking date for a Product number, it wil return the object with Step number = 1. It will return the object = {0002, 1, NULL}.
The view has 7.000.000 rows. I must do it by using native query.
The first query that retrieves the product with the latest booking date:
SELECT DISTINCT *
FROM TABLE t
WHERE t.BOOKING_DATE = (SELECT max(tbl.BOOKING_DATE) FROM TABLE tbl WHERE t.PRODUCT_NUMBER = tbl.PRODUCT_NUMBER)
The second query that retrieves the product with booking date NULL and Step number = 1;
SELECT DISTINCT *
FROM TABLE t
WHERE (SELECT max(tbl.BOOKING_DATE) FROM TABLE tbl WHERE t.PRODUCT_NUMBER = tbl.PRODUCT_NUMBER) IS NULL AND t.STEP_NUMBER = 1
I tried using a single query, but it takes too long.
For now I use 2 query for getting this information but for the future I need to improve this. Do you have an alternative? I also can not use stored procedure, function inside SQL SERVER. I must do it with native query from Java.
Try this,
Declare #p table(pumber int,step int,bookdate datetime)
insert into #p values
(1,1,'2019-01-01'),(1,2,'2019-01-02'),(1,3,'2019-01-03')
,(2,1,null),(2,2,null),(2,3,null)
,(3,1,null),(3,2,null),(3,3,'2019-01-03')
;With CTE as
(
select pumber,max(bookdate)bookdate
from #p p1
where bookdate is not null
group by pumber
)
select p.* from #p p
where exists(select 1 from CTE c
where p.pumber=c.pumber and p.bookdate=c.bookdate)
union all
select p1.* from #p p1
where p1.bookdate is null and step=1
and not exists(select 1 from CTE c
where p1.pumber=c.pumber)
If performance is main concern then 1 or 2 query do not matter,finally performance matter.
Create NonClustered index ix_Product on Product (ProductNumber,BookingDate,Stepnumber)
Go
If more than 90% of data are where BookingDate is not null or where BookingDate is null then you can create Filtered Index on it.
Create NonClustered index ix_Product on Product (ProductNumber,BookingDate,Stepnumber)
where BookingDate is not null
Go
Try row_number() with a proper ordering. Null values are treated as the lowest possible values by sql-server ORDER BY.
SELECT TOP(1) WITH TIES *
FROM myTable t
ORDER BY row_number() over(partition by PRODUCT_NUMBER order by BOOKING_DATE DESC, STEP_NUMBER);
Pay attention to sql-server adviced indexes to get good performance.
Possibly the most efficient method is a correlated subquery:
select t.*
from t
where t.step_number = (select top (1) t2.step_number
from t t2
where t2.product_number = t.product_number and
order by t2.booking_date desc, t2.step_number
);
In particular, this can take advantage of an index on (product_number, booking_date desc, step_number).

T-SQL Trigger to subtract previous from current to be inserted in new table

I have a T-SQL table SystemReading which holds meter readings for a water system. After the next reading is taken at a given site I need to subtract the previous reading from the current reading to get the usage between the two reads to be inserted to the SystemUsage table. The first table has ReadingID (identity), SysMeterID (meter site), Reading, ReadingDate. The second table, my usage table, has UsageID (identity), SysMeterID (meter site), Usage, ReadingDate. I need to make sure that only the previous reading of the same SysMeterID as the current is used in the trigger. I was using a CTE before to find this usage but now I need it automatically calculated and inserted to the new Usage table. Any help is appreciated.
Here is my old CTE for a reference:
;WITH tblDifference as (
SELECT Row_Number() OVER (ORDER BY ReadingID) as RowNumber, Reading, SysMeterID, ReadingDate
FROM Supplydb.app.SystemReading
WHERE SysMeterID = 18
)
SELECT Cur.Reading, Cur.Reading - Prv.Reading as TotalPumped, cur.ReadingDate as Date
FROM tblDifference as Cur
LEFT OUTER JOIN tblDifference Prv
ON Cur.RowNumber = Prv.RowNumber+1
where cur.rownumber = 3
ORDER BY cur.ReadingDate DESC
Hope you can calculate the usage with this insert trigger
CREATE TRIGGER Tgr_AddUsage ON SystemReading
AFTER INSERT
AS
BEGIN
SET NOCOUNT ON;
DECLARE #SysMeterID int
SELECT #SysMeterID = SysMeterID FROM Inserted
INSERT INTO SystemUsage (SysMeterID, Usage, ReadingDate)
SELECT t1.SysMeterID, t1.Reading - t2.Reading as Usage, t1.ReadingDate
FROM Inserted t1 LEFT OUTER JOIN
(SELECT TOP 2 * FROM SystemReading WHERE SysMeterID = #SysMeterID ORDER BY ReadingID DESC) t2 ON t1.ReadingID <> t2.ReadingID
END
GO
I'd skip the CTE and employ LAG (partitioned by SysMeterID) and ordered as necessary to to your Current-Previous calculation.

select top 1 record per group in ms sql without using Max or Min

I know googling this question comes back a lot of solutions, but none of them apply to my situation.
I have a table such that:
CREATE TABLE [Batch](
[batch_id] [int] NOT NULL,
...(more columns)
[date] [datetime] NULL)
CONSTRAINT [pk_Positions] PRIMARY KEY CLUSTERED
(
[batch_id] ASC,
...(more columns)
)
batch_id and date has a one-on-one relationship. I.e., for a given batch_id, all dates are the same, for a given date, all batch_id are the same. (I know it's poor design. If I were to design the table I would probably create a separate table for batch_id and date)
there can be multiple records that have the same batch_id
Now I want to get a list of all distinct dates.
Since the table is very huge and date is not an indexed column, I don't want to try anything like:
select distinct date from Batch
And for similar reasons, I have ruled out the option of creating a non-clustered index on date
Instead, I want to do something like:
select First(date) from Batch
Group by batch_id
or
select Top 1 date from Batch
Group by batch_id
but MS SQL doesn't provide First() function, and the latter one returns a "not in a aggregate function" error.
As far as I see based on my research, I should use Min() or Max() as an alternative to First(), such as:
select Max(date) from Batch
Group by batch_id
However, since there can be sometimes over 100k records with the same batch_id, using Min() or Max() is not as efficient as just returning the first record without any comparison. So how can I optimize the last query to achieve better performance?
If you create this function:-
CREATE FUNCTION [dbo].GetDateForBatch_id
(
#batch_id int
)
RETURNS datetime
AS
BEGIN
RETURN (SELECT TOP 1 [date]
FROM dbo.Batch
WHERE batch_id=#batch_id)
END
go
and then run this query:-
select
b.batch_id,
dbo.GetDateForBatch_id(b.batch_id) AS [date]
FROM (SELECT DISTINCT batch_id
FROM Batch) b
You should get optimal performance with the index strategy you have in place.
Much as it irks my SQL-karma to say so, I think this may be one situation where iterative processing is useful. In pseudo-code:
declare #WorkingTable(batchID, date)
declare #CurrentBatchID = NULL
declare #BatchDate = NULL;
select top 1
#Current BatchID = batch_id,
#BatchDate = [Date]
from Batch
where batch_id > -1 -- less than the smallest in the table
order by batch_id asc;
while #CurrentBatchID is not NULL
begin
insert #WorkingTable values (#BatchID, #BatchDate);
select top 1
#CurrentBatchID = batch_id,
#BatchDate = [Date]
from Batch
where batch_id > #CurrentBatchID
order by batch_id asc;
end
select * from #WorkingTable
Although there will be one table access per iteration it will be on the the clustered key with all the advantages that brings. Still ugly, though.
If you intend to do this on a regular basis it would be better to create a look-up table with just batch_id and [Date] in it which is maintained by your ETL and purge processes.
Since you say there's a one-to-one relationship between batch_id and date this will do the job:
SELECT DISTINCT batch_id, date FROM Batch
If it's not true, you can associate a row number to each record and retrieve only the first:
WITH BatchWithRowNum AS
(
SELECT
*
, RowNum = ROW_NUMBER() OVER (PARTITION BY batch_id ORDER BY date)
FROM Batch
)
SELECT * FROM BatchWithRowNum WHERE RowNum = 1
The third way of doing this which I expect to be faster than the row number approach is:
SELECT B.batch_id, T.MinDate AS date
FROM Batch B
INNER JOIN
(
SELECT B2.batch_id, MIN(B2.date) AS MinDate
FROM Batch B2
GROUP BY B2.batch_id
) T
ON B.batch_id = T.batch_id
GROUP BY B.batch_id, T.MinDate
The following is not generally an efficient solution, but may have a better performance in your case because it only relies on the already existing index on batch_id:
SELECT
DISTINCT B.batch_id
, date = (SELECT TOP 1 date FROM Batch B2 WHERE B2.batch_id = B.batch_id)
FROM
Batch B
If you have serious performance issues and adding index is not an option, none of the above will help you unless you narrow down the result-set with a WHERE clause. For example bring through a subset of batches with a certain set of batch-ids, or those in a specific date range.
--Just delete duplicate records -- best way yet
DECLARE #juvenileid int, #luCountyName varchar(40),#DispHearDate datetime,#vid int
set #vid = 0
declare db_cursor cursor for
SELECT juvenileid, luCountyName,DispHearDate FROM #TEMP48 ORDER BY juvenileid
OPEN db_cursor
FETCH NEXT FROM db_cursor INTO #juvenileid, #luCountyName,#PlDt
WHILE ##FETCH_STATUS = 0
BEGIN
BEGIN100:
IF #vid = 0
BEGIN
SET #vid = #juvenileid
END
ELSE
BEGIN
IF #vid= #juvenileid
BEGIN
delete from #TEMP48 where juvenileid = #juvenileid
and luCountyName = #luCountyName
and DispHearDate = #DispHearDate
END
ELSE
BEGIN
SET #vid = 0
GOTO BEGIN100
END
END
FETCH NEXT FROM db_cursor INTO #juvenileid, #luCountyName,#DispHearDate
END
CLOSE db_cursor
DEALLOCATE db_cursor

How can I query a T-SQL table in limited rows at a time

I have a table with (eg) 1500 rows. I have determined that (due to the resources of the server) my query can quickly process up to 500 rows of my table in my desired query. Any more than 500 rows at once and it suddenly becomes very slow.
How can I structure a query to process the contents of a table in row groups of 500, through to the end of the table?
[EDIT] The query which takes a long time is this:
select p.childid, max(c.childtime) ChildTime
from child c
inner join #parents p on p.parentid = c.parentid
and c.ChildTypeID = 1
AND c.childtime < getdate()
group by p.parentid
The problem is that the point table has millions of rows and (for reasons I can't go into here) can't be reduced.
The main problem is: reducing the number of rows from the child table to make the query performant. Unfortunately, this query is being performed to populate a temporary table so that a subsequent query can execute quickly.
One possibility is to use the windowing functions. Only trick is that they cannot be used in the WHERE clause, so you will have to use subqueries.
Select a.*, b.*
From
(
Select *, rownum = ROW_NUMBER() over (order by fieldx)
From TableA
) a
Inner Join TableB b on a.fieldx=b.fieldx
Where a.rownum between #startnum and #endnum
If this is just for processing you might be able to do something like this:
DECLARE #RowsPerPage INT = 500, #PageNumber INT = 1
DECLARE #TotalRows INT
SET #TotalRows = (SELECT count(1) from test)
WHILE (#RowsPerPage * #PageNumber < #TotalRows)
BEGIN
SELECT
Id
FROM
(
SELECT
Id,
ROW_NUMBER() OVER (ORDER BY Id) as RowNum
FROM Test
) AS sub
WHERE sub.RowNum BETWEEN ((#PageNumber-1)*#RowsPerPage)+1
AND #RowsPerPage*(#PageNumber)
SET #PageNumber = #PageNumber + 1
END
Calculates the total rows and then loops and pages through the results. It's not too helpful if you need your results together, though, because this will run X number of separate queries. You might be able to put this in a stored procedure and union the results or something crazy like that to get the results in one "query".
I still think a better option would be to fix the slowness in your original query. Is it possible to load the join results into a CTE/Temp table to only perform calculations a single time? If you could give an example of your query that would help a lot...

Sorting twice on same column

I'm having a bit of a weird question, given to me by a client.
He has a list of data, with a date between parentheses like so:
Foo (14/08/2012)
Bar (15/08/2012)
Bar (16/09/2012)
Xyz (20/10/2012)
However, he wants the list to be displayed as follows:
Foo (14/08/2012)
Bar (16/09/2012)
Bar (15/08/2012)
Foot (20/10/2012)
(notice that the second Bar has moved up one position)
So, the logic behind it is, that the list has to be sorted by date ascending, EXCEPT when two rows have the same name ('Bar'). If they have the same name, it must be sorted with the LATEST date at the top, while staying in the other sorting order.
Is this even remotely possible? I've experimented with a lot of ORDER BY clauses, but couldn't find the right one. Does anyone have an idea?
I should have specified that this data comes from a table in a sql server database (the Name and the date are in two different columns). So I'm looking for a SQL-query that can do the sorting I want.
(I've dumbed this example down quite a bit, so if you need more context, don't hesitate to ask)
This works, I think
declare #t table (data varchar(50), date datetime)
insert #t
values
('Foo','2012-08-14'),
('Bar','2012-08-15'),
('Bar','2012-09-16'),
('Xyz','2012-10-20')
select t.*
from #t t
inner join (select data, COUNT(*) cg, MAX(date) as mg from #t group by data) tc
on t.data = tc.data
order by case when cg>1 then mg else date end, date desc
produces
data date
---------- -----------------------
Foo 2012-08-14 00:00:00.000
Bar 2012-09-16 00:00:00.000
Bar 2012-08-15 00:00:00.000
Xyz 2012-10-20 00:00:00.000
A way with better performance than any of the other posted answers is to just do it entirely with an ORDER BY and not a JOIN or using CTE:
DECLARE #t TABLE (myData varchar(50), myDate datetime)
INSERT INTO #t VALUES
('Foo','2012-08-14'),
('Bar','2012-08-15'),
('Bar','2012-09-16'),
('Xyz','2012-10-20')
SELECT *
FROM #t t1
ORDER BY (SELECT MIN(t2.myDate) FROM #t t2 WHERE t2.myData = t1.myData), T1.myDate DESC
This does exactly what you request and will work with any indexes and much better with larger amounts of data than any of the other answers.
Additionally it's much more clear what you're actually trying to do here, rather than masking the real logic with the complexity of a join and checking the count of joined items.
This one uses analytic functions to perform the sort, it only requires one SELECT from your table.
The inner query finds gaps, where the name changes. These gaps are used to identify groups in the next query, and the outer query does the final sorting by these groups.
I have tried it here (SQL Fiddle) with extended test-data.
SELECT name, dat
FROM (
SELECT name, dat, SUM(gap) over(ORDER BY dat, name) AS grp
FROM (
SELECT name, dat,
CASE WHEN LAG(name) OVER (ORDER BY dat, name) = name THEN 0 ELSE 1 END AS gap
FROM t
) x
) y
ORDER BY grp, dat DESC
Extended test-data
('Bar','2012-08-12'),
('Bar','2012-08-11'),
('Foo','2012-08-14'),
('Bar','2012-08-15'),
('Bar','2012-08-16'),
('Bar','2012-09-17'),
('Xyz','2012-10-20')
Result
Bar 2012-08-12
Bar 2012-08-11
Foo 2012-08-14
Bar 2012-09-17
Bar 2012-08-16
Bar 2012-08-15
Xyz 2012-10-20
I think that this works, including the case I asked about in the comments:
declare #t table (data varchar(50), [date] datetime)
insert #t
values
('Foo','20120814'),
('Bar','20120815'),
('Bar','20120916'),
('Xyz','20121020')
; With OuterSort as (
select *,ROW_NUMBER() OVER (ORDER BY [date] asc) as rn from #t
)
--Now we need to find contiguous ranges of the same data value, and the min and max row number for such a range
, Islands as (
select data,rn as rnMin,rn as rnMax from OuterSort os where not exists (select * from OuterSort os2 where os2.data = os.data and os2.rn = os.rn - 1)
union all
select i.data,rnMin,os.rn
from
Islands i
inner join
OuterSort os
on
i.data = os.data and
i.rnMax = os.rn-1
), FullIslands as (
select
data,rnMin,MAX(rnMax) as rnMax
from Islands
group by data,rnMin
)
select
*
from
OuterSort os
inner join
FullIslands fi
on
os.rn between fi.rnMin and fi.rnMax
order by
fi.rnMin asc,os.rn desc
It works by first computing the initial ordering in the OuterSort CTE. Then, using two CTEs (Islands and FullIslands), we compute the parts of that ordering in which the same data value appears in adjacent rows. Having done that, we can compute the final ordering by any value that all adjacent values will have (such as the lowest row number of the "island" that they belong to), and then within an "island", we use the reverse of the originally computed sort order.
Note that this may, though, not be too efficient for large data sets. On the sample data it shows up as requiring 4 table scans of the base table, as well as a spool.
Try something like...
ORDER BY CASE date
WHEN '14/08/2012' THEN 1
WHEN '16/09/2012' THEN 2
WHEN '15/08/2012' THEN 3
WHEN '20/10/2012' THEN 4
END
In MySQL, you can do:
ORDER BY FIELD(date, '14/08/2012', '16/09/2012', '15/08/2012', '20/10/2012')
In Postgres, you can create a function FIELD and do:
CREATE OR REPLACE FUNCTION field(anyelement, anyarray) RETURNS numeric AS $$
SELECT
COALESCE((SELECT i
FROM generate_series(1, array_upper($2, 1)) gs(i)
WHERE $2[i] = $1),
0);
$$ LANGUAGE SQL STABLE
If you do not want to use the CASE, you can try to find an implementation of the FIELD function to SQL Server.